GeneaBloggers

Friday, 20 September 2013

Where are the Standards for Historical Data?



We all accept the need for standards – standards of measurement, electrical and mechanical components, information representation, etc. How many people have noticed, though, that there are no standards for historical data? I will explore this with you, and even given a couple of concrete examples of omissions, before considering why historians might be ignored.

A quick computer search for “historical data” in the context of computers and standards leads you into transaction processing systems (TPS) and the records associated with past transactions, which isn’t what we want.

The normally very helpful Cindi’s List presents a very meagre list, including FHISO (which we’ll come to again shortly), GenContent, historical-data.org, and GenTech.

A glance at the available List of ISO Standards confirms that existing international standards relate to industry, technology (incl. IT), science, consumer products, foodstuffs, and documentation. There are none for history or historical data. The closest example to an historical standard there appears to be ISO 3166-3 which describes codes for old country names. However, this is only those countries that have been deleted since the main ISO 3166-1 country-code standard was first published in 1974. Anything older than that is not included. A look at ANSI’s Web site reveals a page on the history of standards but not standards for history.

You might be about to point out the MARC (MAchine-Readable Cataloging) standards. However, this set of digital formats, developed at the US Library of Congress during the 1960s, is for the description of items catalogued by libraries. Similarly, the METS standard (Metadata Encoding and Transmission Standard) relates to the encoding of descriptive, administrative, and structural metadata for objects within a digital library. These standards are both inward-facing and relate to the cataloguing and organisation within archives and digital libraries. The Open Archives Initiative is in a similar vein as it relates to interoperability between those archives and digital libraries.

OK, so what exactly am I looking for? Well, international standards for the unambiguous representation and exchange of data relating to historical entities by software agents. Not just for items held in a repository.

In all honesty, the only standard along these lines that I’m aware of is the Unicode character set standard. Version 2.0, which was released in 1996, included a multi-word mechanism, otherwise known as surrogate pairs, to remove the restriction of 16-bit character codes (i.e. the limitation of 64k characters). This allowed it to represent characters from historical languages, including Egyptian Hieroglyphs, and this is all now part of ISO/IEC 10646.

Let me briefly describe two example voids that could do with some help in this direction:

Place types

When we exchange place references, we want to know what type of place they are (see A Place for Everything). ISO 3166-1 only defines codes for present-day countries so how would we describe America before it was the United States. It would be wrong to apply the modern US tag as they’re not synonymous. Also, ISO 3166-2 defines codes for the names of the principal present-day subdivisions of the countries in ISO 3166-1 (e.g. provinces or states). This does not include old subdivisions such as Shires in the UK although they are still historically relevant. There is a similar standard to ISO 3166-2 developed independently by the European Union and called the Nomenclature of Units for Territorial Statistics (NUTS). This has the same issues with historical entities.

Calendars

This ISO_8601 standard was first published by ISO in 1988 and concerns the exchange of dates and times from the Gregorian calendar. It does not support any other calendar system. There are many other calendar systems, though, and it would be wrong to assume that every date in every calendar has a unique and unambiguous representation in the Gregorian system. It’s an issue of preserving the integrity of the evidence, and not being forced to mangle it in order to suit some modern standard. In fact, this particular case is even more important because several of those alternative calendars are used to this day – they’re not all obsolete and archaic. There’s therefore an additional cultural dimension to this.


FHISO (Family History Information Standards Organisation) was created with the goal of looking after (developing, maintaining, collaboration on) digital standards that affect genealogy. They are careful to treat the terms genealogy and family history in an equal way, although I would prefer the wider term of micro-history, especially as they have been involved in discussions with groups from that wider sphere. However, the even bigger sphere of generic history is outside of their remit. So does anyone look after that area for historians? Is there any collaboration for our common good?

There is an International Classification for Standards (ICS) with categories into which new standards can be placed. Not surprisingly, there are none for historical data. This would mean any new standard relating to genealogy, for instance, would have to be placed in a general catch-all category (e.g. 35.240.99). This might be acceptable if genealogy was an isolated case but I’ve already suggested that it is part of a much bigger category – one deserving of its own designation.

The focus on modern technology and business requirements implicitly assumes that historical data is no longer exchanged in a live fashion. It is simply “dead data” consigned to some archive or library. Clearly this is not the case and our genealogical pursuits are a prime example.

Another example is schemes like historical-data.org which employ microdata to attach semantics to historical data in HTML pages. The Semantic Web will involve historical and genealogical data but it must be reliant on international standards to represent historical entities.

We can’t change historical data to fit new standards – relevant standards must represent what we know about history, and as it was. They must also acknowledge that history isn’t as tangible as modern information since evidence is both finite and disjointed, and often supplemented by subjective conclusions. This should make sense to historians and genealogists, and so they have an obligation to educate and collaborate with those who would make standards for us.