Monday, 3 March 2014

Supporting a Proof Standard

Should our computer software support a single standard of genealogical proof, and if so then should it be part of the data that we share?

The question of whether a genealogical data model should support a single standard of proof came up recently on the RootsDev developer group, and the consensus was ‘No’. This may come as a shock to some people but bear with me while we examine the context more deeply.

The Genealogical Proof Standard (GPS) is held in high regard amongst genealogists, not just as a standard of professionalism but also as evidence of their credibility in academic circles. It would therefore seem reasonable to expect software to support it, and this has certainly been recommended before[1]. The crux of the issue is not that software shouldn’t support it, but that it should not involve a mandatory and standardised representation in a data model designed for sharing.

Although I’m not aware of any alternatives to GPS in the field of genealogy, other fields do have their own “standards” of proof. This obviously includes mathematics, but other investigative fields, too, such as CSI, archaeology, and heir hunting. Whether this is significant to us is debatable, but what is clearly significant is that the application of the GPS is not prescribed by the standard itself. What I mean by that is that the process by which a user addresses the requirements of the GPS — the five elements — and how it keeps track of their evidence and its applicability, would be done differently by different products. An entry-level product may take a very literal approach which could result in it appearing more laborious, but a sophisticated product could make it more palatable through better insight and visualisation. This has to be the prerogative of the product designer, and any attempt to impose a straightjacket through standardisation would be the death of that standard.

Indeed, a product may not be concerned with the GPS at all if the context is inappropriate, or if the commerciality would be in doubt. While we can recommend the GPS, and encourage it through training and qualifications, a heavyweight product may not have the mass-market appeal to make it viable.

I briefly covered this topic in some of my early thoughts on standardisation at: Musings on Standardisation, but my lack of clarity there caused at least one serious rift between a colleague and me. The difference, you see, is that those thoughts, and those in the recent RootsDev thread, are talking about the data model used to exchange data, and not about the software products that we use. I’ve just explained why the designers and vendors of software must be given free-rein to innovate as they choose. The data that those products share may or may-not include the meta-data associated with a particular research process.

Now this is going to raise a whole bunch of questions and potential responses so let me just recap on things. Our core data will involve both evidence and conclusions, and so it must indicate how they relate to each other — independently of the research process used by the creator. Any data used to support a specific research process is technically meta-data because it’s an aid to the research rather than a product of the research. It doesn’t matter whether we consider our data to be lineage-linked, evidence-based, event-based, or all of the above (as I do).

So what impact does this have a standardised data model used for exchange? In computer programming, there is a pragma concept which involves additional statements being associated with the computer code. These statements are designed purely to direct a software program how to process the computer code. They are not part of the code’s language, or its grammar, and those pragma statements are specific to different programs. This situation is actually quite similar to the genealogical case since the core data is independent of such meta-data, being quite valid without it, and that meta-data would be specific to a given product. If the two contributions are distinct then we have the potential benefits of both worlds. We can share our core data (including lineage, events, places, sources, transcriptions, attachments, etc) with anyone, and optionally share our additional research status with someone using the same product.

So is this possible? In fact, it is quite easy since the XML data syntax routinely uses XML namespaces to differentiate different grammars in the same data contribution. Although I tried to give an introduction to the benefits of this feature in Digital Freedom, it’s still quite technical. You might imagine it as a foreign-language annotation that someone has added to someone else’s text in order to explain the context of selected words and phrases. You could share the combined work with someone, or filter out the foreign annotation and leave the original work. Although this namespace concept is mostly associated with XML, it is actually something that could be employed with any data syntax, if necessary, including a GEDCOM-like one.

[1] Mark Tucker, “10 Things Genealogy Software Should Do", Family History Technology Workshop (Provo, UT: Brigham Young University, 2008), (accessed 26 Dec 2015).