Wednesday, 15 November 2017

Thither FHISO



I want to temporarily break my blogging hiatus to summarise the progress of everyone’s genealogical friend: FHISO.

There can’t seriously be anyone in the community who hasn’t heard of FHISO (Family History Information Standards Organisation), but how many of you might have written it off? If so then look again! I want to raise awareness of its recent substantial progress, and to challenge pundits to evaluate its relevance.


History

To put this article in context, let’s just wind things back to 2010. Pat Richley-Erickson (alias DearMYRTLE), Greg Lamberson, and Russ Worthington, had become so fed up with the problems of sharing basic genealogical data that they created the BetterGEDCOM wiki: its goal being to produce an internationally applicable standard for the sharing and long-term storage of genealogical data.

Although this wiki garnered huge support — 134 members within its first year, contributing over 3,000 pages and 8,500 discussion posts — no actual standard emerged. The reasons for this were manyfold: there was no real structure or assigned responsibilities in the membership, the goal was too poorly defined (what genealogical scope? what level of backwards compatibility?), and there was no technical strategy (what technologies? what file formats?). As a result, discussions — valuable as they were — became mired in minutia, no consensus was reached, and nothing was formally written up.

Early in 2012, a small group of BetterGEDCOM members formed FHISO with the goal of overcoming these failings. They spent considerable effort designing an organisation that would accommodate a large and diverse number of contributors, and in planning for consensus-building and digital organisation.

In April 2012, it received a grant from genealogist Megan Smolenyak to help get the organisation started, and during the remainder of 2012 it pulled off an incredible coup by getting industry support from the following high-profile Founding Members (in chronological order of announcement):


During the summer of 2012, there were a number of blog-posts related to GEDCOM-X and to FHISO, including those of Louis Kessler (Whither GEDCOM-X?, 7 Jun 2012), Randy Seaver (Whither FHISO and GEDCOM X? Observations and Commentary, 18 Jul 2012), Tamura Jones (FHISO and GEDCOM X, 18 Jul 2012), and Pat Richley-Erickson (Whose sandbox is it anyway?, 19 Jul 2012). Randy’s subsequent Follow-Up Friday (20 Jul 2012) provided a more complete summary.

These posts were mostly concerned with proprietary versus community standards. GEDCOM-X was new and people believed that it would be a competing de facto standard — exacerbated by the fact that FamilySearch weren’t in the list of Founding Members, above. This turned out to be ill-founded paranoia (more on this in a moment), but even FHISO was defensive and quietly concerned, as can be inferred from GeneJ’s contribution to Randy’s summary.

During March 2013, in order to try and keep membership attention in the absence of technical work, FHISO began its Call For Papers initiative. The idea was that people could send in their ideas and proposals in preparation for more consensus-based work. Although quite a few papers were submitted, and on a wide range of topics, the number of distinct submitters was small — possibly an unrecognised warning to FHISO that few people would find the time or inclination if the effort was onerous.

The investigations into software tools for the burgeoning organisation showed that good ones were too expensive for FHISO and the cheap (or free) ones didn’t deliver what was needed, and this work dragged on for too long. Over the years, Board members have had to reach deep into their personal pockets to help move the organisation to the point where it would be fair for members to subscribe for another year (original memberships have been continually extended, for free, since August 2014).

During 2013–14, there were a number of team changes, and it would be a fair criticism to say that FHISO dropped the ball during these changes. There was little visible activity to people outside of FHISO (as explained by Tamura: Genealogy 2013: events & trends, 31 Dec 2013) and so it was to be expected that the community would lose interest in it.

During 2014, FHISO finally established its TSC (Technical Standing Committee), and so began the real technical work. This included the creation of the TSC-Public mailing list, and the creation of several exploratory groups, each of which had its own mailing list. However, the mailing lists demonstrated the same issues that were previously experienced in BetterGEDCOM: topics digressed and discussions meandered without formal conclusions. Such discussions must necessarily resort to a bewildering and ever-evolving technical vocabulary, and software people generally find it hard to explain their concepts in familiar terms without losing technical accuracy. It was truly amazing, therefore, that some well-known non-software genealogists participated, and I genuinely take my hat off to those that succeeded in balancing the discussions with real-world genealogical issues.

Quandary

When the flurry of posts on these mailing lists began to fizzle out, and the exploratory groups all floundered, FHISO took a deep breath and a long look at the reality of standards development. If it was going to achieve its goals then it needed to better-understand why other initiatives had failed, and it clearly needed to adopt a quite different approach.

It became evident that the hardest part of standards development was not the technical side but the commercial and/or political side. Creating a new data representation has some technical challenges, but it’s doable; there were several examples out there, ranging from the old GEDCOM to more recent data models, file formats, database schemas, and APIs, all coming from a range of commercial products and private research projects. But despite its age and deprecated status, GEDCOM was still the most widely-used way of exchanging data. There were those who believed that the industry could stay with GEDCOM, and that its problems and equivocality could be fixed in a new version. Then there were those who believed that this would restrict the evolution of genealogy, and that we must leapfrog lineage-linked data to include non-person subjects of history, or to integrate real research-based narrative. In reality, none of these viewpoints were entirely correct, … or incorrect.

If a new data representation were to be produced — even just an updated GEDCOM — then it would be unlikely that the industry would immediately embrace it for the simple fact that commercial stakeholders would have a financial commitment to their current internal data models, and to their supported modes of import/export. Companies and their products would have evolved along with their internal data model; any new data model with larger scope — no matter how powerful or modern — would have no impact if it required companies to abandon their existing products and to start again. For instance, taking advantage of the powerful analogy between persons and places as historical subjects would not help companies whose import/export was entirely via GEDCOM, or whose internal data models had not recognised the analogy. Data models such as GEDCOM-X were designed around the specific requirements of the parent organisation, and not as future-proofed models to be shared by all genealogical software.

Another issue was that there were many stakeholders out there (including every user who simply wanted to exchange data without error or loss), but fewer people prepared to openly contribute on mailing lists, and fewer still who had the time and skills to produce formal written material. FHISO’s impressive Founding Members seemed content to sit on the sidelines, and there was little (if any) engagement with them following the original announcements.

This was a tough problem. There was no doubt that the industry needed not just an open standard but an evolutionary path: one that would permit ‘software genealogy’ to mature, and to become part of the modern digital world. However, there was clearly some apathy to doing the heavy lifting, and there could be later resistance to anything too radical.

If ever the term catch-22 found its true mark then it was in the field of software genealogy.

The first thing to be done was to modify the organisational structure to one more appropriate to the reach of genealogical standards — FHISO was not a general-purpose ISO or ANSI. FHISO already had the concept of an Extended Organisational Period (EOP), now embodied in article 24 of its by-laws, during which the membership would be populated (including the Founding Members), new officers appointed and roles filled, and the TSC established. The EOP also allowed the Board to amend the by-laws as necessary, and without the need for an Annual General Meeting (AGM). It may have been envisaged that the EOP would only last for maybe six months, but it was still in effect at this time of change.

From a technical perspective, FHISO needed a focus: something concrete that could be debated, cited, and built upon. It would therefore be necessary for a core of dedicated people to form a Technical Project Team (as allowed by the TSC Charter) to establish a technical strategy and to publish a selection of draft component standards in order to kick-start subsequent work.

These processes could all be done within the remit of the EOP, but it would have to keep the membership (and the public) aware; certain organisations would not acknowledge any third-party standard unless it was developed through a proper transparent process by an incorporated organisation. FHISO does publish regular Board minutes and TSC minutes, but it would only be allowed to publish draft standards for comment during this period (not official standards) since there would be no voting mechanism. Until this work had reached an acceptable level, and elections and AGMs could be resumed, then it was deemed inappropriate to require existing members to pay for each year.

During this phase, FHISO produced a technical strategy paper that amplified on these points, and a policy document on the preferred nature of software vocabularies. The vocabularies document was updated during Feb/Mar 2016 to incorporate public feedback.

Wheel Hubs

Part of FHISO’s technical strategy was to focus on what might be called component standards: standards related to specific parts of genealogical data (e.g. personal names, citation elements, place references, dates), and allow these to be integrated into existing data models. This would not preclude the future publishing of a single FHISO data model that embraced all of these components, but for the shorter term it would allow existing data models to incorporate them more quickly, while minimising the impact on their core software. The basis for this was that, within certain limits, it should be possible to have distinct proprietary data models cooperating if they shared a common currency.

FHISO had previously used the analogy of car design to explain this strategy; rather than standardise what cars we can drive, there would be benefit in standardising the parts from which all cars are made. Well, there’s a real instance of this that can be cited: all the cars around the world (with a few exotic examples that can be ignored) share the same wheel hub sizes. There is a standard set of accepted sizes, and they’re all measured in Imperial units — even in countries that use the metric system. This means that the same range of tyres can be used for all our cars, no matter which model or where it was manufactured.

The plan, therefore, was to work on a number of these component standards, each of which would include details of how it should be integrated into existing data models — the so-called “bindings”. But there was a problem here: GEDCOM-X could never supplant GEDCOM because there were probably millions of GEDCOM files still out there, and software products that were tied to the GEDCOM model. These problems for FamilySearch effectively mirrored those of FHISO’s standardisation effort, but for the component standards to work then it required a version of GEDCOM that could be taken forwards. If those companies that were bound to GEDCOM were not to be left behind then there had to be a new version of it, one for which bindings could be defined for FHISO’s new component standards. The two initial data models of interest, therefore, would be GEDCOM and GEDCOM-X.

The following diagram illustrates how these component standards would be assimilated by the various data models, including a supported GEDCOM continuation (shown here as “ELF”).

FHISO Component Standards
Figure 1 – FHISO Component Standards.

FHISO ELF

GEDCOM hasn’t been updated in decades, and there are acknowledged weaknesses and ambiguities in its specification. Furthermore, the name is still the property of FamilySearch.

FHISO would, therefore, define a fully compatible format called Extended Legacy Format, or ELF for short. ELF v1.0 would be compatible with GEDCOM 5.5(.1), such that ELF could be loaded by a GEDCOM processor, and vice versa. This means not only that stakeholders could declare support for ELF v1.0 with not too much effort, but also that there would be no reasonable excuse for not declaring support.

Of all FHISO’s draft standards, ELF is probably the most important since it presents a future for GEDCOM data and software, a future that would support enhanced movement of data both between compliant products and between differing proprietary data models.

Figure 2 – FHISO ELF.[1]

As well as being a supported and more tightly-specified version of GEDCOM, ELF would include an extension mechanism that would be employed in later versions to embrace the FHISO component standards, and any third-party extensions by using proper namespaces.

During the preparation of the first ELF draft, FHISO engaged with members of the German group GEDCOM-L, which represents over twenty genealogical programs over there. Their goal since 2009 has been to reach agreement on the interpretation of the GEDCOM 5.5.1 specification, and to extend it to include a number of “user-defined tags”. For instance, high among their priorities was support for the German Rufname, or “appellation name”, which is an everyday form of personal name.

FHISO intends to utilise the knowledge and experience of the GEDCOM-L group in making a better GEDCOM.

Milestones and Signposts

Industry contacts had identified a citation-element vocabulary — a representation of the discrete elements of data within citations — as filling an important niche in today's standards, and so this became the focus of the first component standard.

In 2016, the Technical Project Team began to draft possible standards text, releasing an early draft micro-format for a citation-element ‘creator name’ for comment in the spring. During June 2017, FHISO was able to publish a number of high-quality draft standards for public comment, and during the September it incorporated public feedback from its TSC-Public mailing list.

This milestone puts FHISO ahead of all previous standards initiatives! The level of detail and accuracy in these drafts, combined with the choice of technologies, establishes a future-proof model that could take genealogical data as far as is needed, and so it sets the bar for all future FHISO work.

The next phase will involve releasing draft citation-element bindings for both GEDCOM-X and GEDCOM. Already released is a draft bindings document for RDFa. Rather than being a genealogical data model, RDFa defines a set of attribute-level extensions to HTML. This is especially interesting as it allows pre-formatted citations to have their embedded elements marked-up. The industry norm is to first define the individual citation elements as discrete items, and then rely on some citation template system to build them into a formatted citation. Traditional genealogists, and anyone who prefers to hand-craft their own citations (including me), should welcome this inverted alternative as it recognises the power and flexibility of citations as sentences rather than formulae.

Affiliation

Although currently acting chairman on the FHISO Board, I write this article as someone who has always believed that standardisation absolutely must happen in our field. It borders on hypocrisy that users are expected to collaborate on unified trees, and to play fair with each other, when the large organisations have been unable to set a precedent with their data sharing.




[1] Original base image used with kind permission of SuperColoring (http://www.supercoloring.com/drawing-tutorials/how-to-draw-a-christmas-elf : accessed 26 Jun 2017).