Labour History Index Project (November 2000)

 

This paper was originally written by the International Institute of Social History (IISH) for the IALHI Automation Group meeting in Milan, November 2000. It tries to establish criteria and procedures for building a Web-based index of archival (and possibly other) holdings in all IALHI member institutions -- here provisionally called the Labour History Index (LHI). What follows is largely based on work done for the European Union Archive Network (EUAN) -- in which the IISH participates together with the Scottish National Archives, the Swedish Riksarkivet and the Italian Ufficio centrale per i beni archivistici -- and on a paper written jointly by the IISH and the Dutch Archival School for a project intended to create an archival index of the Netherlands.

Preliminary Remarks

Three assumptions underlie the argument:

  • Management Considerations
    Although we build the system for historians and others interested in labour history, its limits will be set by the institutions that are providing the data. Clearly, we should try to attract as many participants as possible. Yet some IALHI members -- and especially the most experienced ones -- have already invested heavily in improving access to their collections through the Internet and are not going to make major adaptations to their existing systems just for the fun of it. Others, who are still at the planning stage, will not want to rely entirely on an international effort that may not suit local needs. The LHI will have to accomodate those considerations.
  • Cooperative Model
    Consequently, the LHI should be conceived as a central point of access to autonomous local systems. The participants remain the owners of the data they submit to the LHI, which in turn will refer back to their Websites.
  • Future Developments
    Although we are addressing first of all the problems associated with building an index of archival collections, we should keep an eye on future developments in two directions. First, even from an archival point of view the LHI is likely to be a first step: several IALHI members are already offering detailed finding aids on their Websites, and demand for some sort of central access to such data will grow. Second, most member institutions hold significant library and audiovisual collections whose presence would considerably enrich the LHI, even though summary descriptions of special collections -- as different from individual titles listed in a library catalogue -- are (still?) rare. Our system should not a priori exclude expansion in those areas.

The following characteristics would seem to result from these premises:

  • The LHI is a search engine for end-users, not a management tool.
  • The LHI mirrors existing practice in the participating institutions as closely as possible.
  • Any standardization of the data is done at the central level, and only if it can be done automatically.
  • The data submitted are limited in scope and structure.
  • The LHI combines tolerance for incomplete or defective data with frequent updating.
  • Search results lead back to the Websites of the participating institutions.

Data Model

We can further detail this outline. If standardization takes place at the central level it seems reasonable to base data modelling on ISAD(G), the General International Standard Archival Description established by the International Council on Archives. This is a comprehensive model, however, which needs to be reduced in order to make it practicable for the LHI. The EUAN project mentioned above considers the following elements to be both necessary and sufficient for the identification of an archival collection:

Element No Name of Element Short Label for User Display Notes
Identity Statement Area
3.1.1 Reference code(s) Reference Includes:
- country code (use ISO 3166)
- repository code
- fonds number
3.1.2 Title Title For a corporate body, give name only;
For a family or individual, use: Papers of ...
3.1.3 Dates of creation of the material in the unit of description Dates Use ISO 8601 internally; display according to local preference
3.1.4 Level of description   Note that this is level of description of this finding aid, therefore fonds is the default
3.1.5 Extent of the unit of description Extent Use linear shelf meterage for conventional records.
Use units for non-conventional records (e.g. microfilm)
Context Area
3.2.1 Name of creator Record Creator Use highest level of rules available (e.g. national, institutional)
3.2.2 Administrative / Biographical history Administrative history or Biographical history as appropriate Should contain information about the record creator, not about the records. It should be held separately from the description, though a subset may be held with the finding aid itself. Personal and corporate name access points should follow 3.2.1
Content and Structure Area
3.3.1 Scope and content / Abstract Abstract Should contain information about the records, not about the record creator. Personal and corporate name access points should follow 3.2.1
Conditions of Access and Use Area
3.4.1 Conditions governing access Access Conditions  
3.4.3 Language / scripts of material Language Use ISO format internally; display according to local preference
3.4.5 Finding aids Finding Aids Language (use ISO codes) should be included if different from the language in 3.4.3
Description Control Area
3.7.1 Archivist's Note Do not display Give name of creator of this finding aid
3.7.2 Rules or Conventions Do not display Give information on protocols used in compiling this finding aid
3.7.3 Date(s) of descriptions Do not display Give date of creation of this finding aid

This model can be used for the LHI, provided we take the number of elements to be an upper limit and consider the standardization recommendations in the 'Notes'-column (those made by the International Standards Organization etc) as just that -- recommendations.

For clarity's sake, it should be obvious that the final model will have to incorporate a number of other elements generated by the system, such as a record-ID, input and mutation dates etc.

 

Existing Practice

At the moment 12 IALHI member institutions present collection-level information on their archival holdings on the Web. Argentina will follow in the course of 2001. In addition, some Spanish and Swiss members have supplied archival descriptions to their respective national systems, while the Hoover Institution has catalogued its archival holdings in the RLIN system of the Research Libraries Group. Other members may have similar data available though they do not yet present them on the Web. In general, it should be noted that the data are offered in very diverse ways and that it is often difficult to guess how they are stored.

For comparison, it may be interesting to note that the IISH tested 82 archival Websites in the Netherlands on the presence or absence of eight of the EUAN elements. 'Title' (3.1.2), 'dates' (3.1.3) and 'record creator' (3.2.1) proved to be virtually omnipresent. In addition, over half the sites listed 'finding aids' (3.4.5) and 'extent' (3.1.5). Almost a third gave some indication of 'access conditions' (3.4.1), but 'administrative or biographical history' (3.2.2) and 'abstract' (3.3.1) were rare (less than 10 per cent). From these results we concluded that an archival index of the Netherlands was feasible. The above IALHI Websites do not present a fundamentally different picture; the same conclusion will probably apply.

We also studied a number of international sites using Terry Abraham's Repositories of Primary Sources (http://www.uidaho.edu/special-collections/Other.Repositories.html) at the University of Idaho as a starting point:

Directories of Repositories

Archival Indexes

Detailed Finding Aids

Looking at the directories of repositories -- in fact more than the two sites mentioned since many others offer similar information -- suggests that it makes sense to add this type of information. Brief records on IALHI's members will often usefully supplement their Websites, which are sometimes located on servers they don't own. In actual practice, this would amount to incorporating the IALHI Directory in the LHI.

We looked at the detailed finding aids mainly to identify problems that might arise in the future, once we decide to go beyond collection-level descriptions. Clearly, this investigation was not very serious as the underlying data structures are invisible. Yet the 'logic' of those sites would seem to indicate that they are largely compatible with the EUAN model.

The archival indexes mentioned vary widely. Next to a simple but meaningful Swiss example we find the more than 500,000 extensive MARC-AMC records in the RLIN database, sometimes directly linked to local finding aids; next to the Australian register of non-governmental records, the database of Rhineland-Westphalia often including information on the series and sub-series level. The Australian index offers a detailed overview of its data model (http://www.nla.gov.au/raam/raamdb.html), which differs little from EUAN. Except in the US, the threshold for participation is very low; in Australia, 'title' and 'creator' suffice. Except for the (relatively old) Swiss example, which is presented as a list, all systems offer a database interface.

The Westphalian example differs from all others in that it has a hierarchical structure used for handling the series and sub-series data. As it is unlikely that many IALHI members will have such data on offer, and since we are aiming for a maximum number of participants, it is plausible to assume that the LHI will more or less resemble the Australian model.

Network

In the LHI system local participants supply structured archival descriptions at the collection level, preferably from existing databases, to a central server. This server checks the incoming data against certain criteria of consistency, but allows input of incomplete records.

From a local perspective, the system will be all the more effective if the data can be supplied in different formats. Consequently, the central server should be able to identify which data elements are being transferred from the local site and whether these are new or updated data. Data transfer can take the form of uploading from the local to the central level or having a central robot visit the local sites at regular intervals.

For obvious reasons, the LHI should be linked to the IALHI Website. The IISH is prepared to take on this task, provided sufficient financial means can be found for installing and maintaining the system.

Search Facilities

We are interested in not just attracting a maximum number of participants, but also the largest possible number of users. The LHI search interface should therefore conform to what has become almost a standard on the Web:

  • Differentiation between 'simple search' and 'advanced search' capabilities. In the first case, the user enters one or more search terms; in the second, a series of separate data-elements can be combined.
  • Inside the 'simple search' option, differentiation between 'keyword' and 'browse' options. In the first case, all words in all records are searched; in the second, the user can choose from an alphabetical list of titles and/or record creators.

In all cases, the search results will lead users to the Webservers of the participating institutions (or to their email addresses if they so require).