I want to temporarily break my blogging hiatus to summarise
the progress of everyone’s genealogical friend: FHISO.
There can’t seriously be anyone in the community who hasn’t
heard of FHISO (Family History Information
Standards Organisation), but how many of you might have written it off? If so
then look again! I want to raise awareness of its recent substantial progress,
and to challenge pundits to evaluate its relevance.
To put this article in context, let’s just wind things back
to 2010. Pat Richley-Erickson (alias DearMYRTLE), Greg Lamberson, and Russ
Worthington, had become so fed up with the problems of sharing basic
genealogical data that they created the BetterGEDCOM wiki: its goal being to
produce an internationally applicable standard for the sharing and long-term
storage of genealogical data.
Although this wiki garnered huge support — 134 members
within its first year, contributing over 3,000 pages and 8,500 discussion posts
— no actual standard emerged. The reasons for this were manyfold: there was no
real structure or assigned responsibilities in the membership, the goal was too
poorly defined (what genealogical scope? what level of backwards
compatibility?), and there was no technical strategy (what technologies? what
file formats?). As a result, discussions — valuable as they were — became mired
in minutia, no consensus was reached, and nothing was formally written up.
Early in 2012, a small group of BetterGEDCOM members formed
FHISO with the goal of overcoming these failings. They spent considerable
effort designing an organisation that would accommodate a large and diverse
number of contributors, and in planning for consensus-building and digital
organisation.
In April 2012, it received
a grant from genealogist Megan
Smolenyak to help get the organisation started, and during the remainder of
2012 it pulled off an incredible coup by getting industry support from the
following high-profile Founding Members (in chronological order of announcement):
- Ancestry.com (May 2012)
- RootsMagic (July 2012)
- WikiTree (August 2012)
- ourFamily•ology (September 2012)
- Calico Pie — UK (September 2012)
- Coret Genealogie — Netherlands (October 2012)
- Federation of Genealogical Societies (FGS) (October 2012)
- Federation of Family History Societies (FFHS) — UK (December 2012)
- brightsolid (owner of findmypast) — UK (December 2012)
- Mocavo (January 2013)
- Eneclann — Ireland (May 2013)
During the summer of 2012, there were a number of blog-posts
related to GEDCOM-X and to FHISO, including those of Louis Kessler (Whither GEDCOM-X?, 7 Jun
2012), Randy Seaver (Whither
FHISO and GEDCOM X? Observations and Commentary, 18 Jul 2012), Tamura Jones
(FHISO and GEDCOM
X, 18 Jul 2012), and Pat Richley-Erickson (Whose
sandbox is it anyway?, 19 Jul 2012). Randy’s subsequent Follow-Up
Friday (20 Jul 2012) provided a more complete summary.
These posts were mostly concerned with proprietary versus
community standards. GEDCOM-X was new and people believed that it would be a
competing de facto standard —
exacerbated by the fact that FamilySearch weren’t in the list of Founding
Members, above. This turned out to be ill-founded paranoia (more on this in a
moment), but even FHISO was defensive and quietly concerned, as can be inferred
from GeneJ’s contribution to Randy’s summary.
During March 2013, in order to try and keep membership
attention in the absence of technical work, FHISO began its Call For Papers initiative. The idea
was that people could send in their ideas and proposals in preparation for more
consensus-based work. Although quite a few papers were submitted, and on a wide
range of topics, the number of distinct submitters was small — possibly an
unrecognised warning to FHISO that few people would find the time or
inclination if the effort was onerous.
The investigations into software tools for the burgeoning
organisation showed that good ones were too expensive for FHISO and the cheap
(or free) ones didn’t deliver what was needed, and this work dragged on for too
long. Over the years, Board members have had to reach deep into their personal
pockets to help move the organisation to the point where it would be fair for
members to subscribe for another year (original memberships have been
continually extended, for free, since August 2014).
During 2013–14, there were a number of team changes, and it
would be a fair criticism to say that FHISO dropped
the ball during these changes. There was little visible activity to people
outside of FHISO (as explained by Tamura: Genealogy 2013: events
& trends, 31 Dec 2013) and so it was to be expected that the community
would lose interest in it.
During 2014, FHISO finally established its TSC (Technical
Standing Committee), and so began the real technical work. This included the
creation of the TSC-Public mailing list,
and the creation of several exploratory groups, each of which had its own
mailing list. However, the mailing lists demonstrated the same issues that were
previously experienced in BetterGEDCOM: topics digressed and discussions
meandered without formal conclusions. Such discussions must necessarily resort
to a bewildering and ever-evolving technical vocabulary, and software people
generally find it hard to explain their concepts in familiar terms without
losing technical accuracy. It was truly amazing, therefore, that some
well-known non-software genealogists participated, and I genuinely take my hat
off to those that succeeded in balancing the discussions with real-world
genealogical issues.
When the flurry of posts on these mailing lists began to
fizzle out, and the exploratory groups all floundered, FHISO took a deep breath
and a long look at the reality of standards development. If it was going to
achieve its goals then it needed to better-understand why other initiatives had
failed, and it clearly needed to adopt a quite different approach.
It became evident that the hardest part of standards
development was not the technical side but the commercial and/or political
side. Creating a new data representation has some technical challenges, but
it’s doable; there were several examples out there, ranging from the old GEDCOM
to more recent data models, file formats, database schemas, and APIs, all coming
from a range of commercial products and private research projects. But despite its
age and deprecated status, GEDCOM was still the most widely-used way of
exchanging data. There were those who believed that the industry could stay
with GEDCOM, and that its problems and equivocality could be fixed in a new
version. Then there were those who believed that this would restrict the
evolution of genealogy, and that we must leapfrog lineage-linked data to include
non-person subjects of history, or to integrate real research-based narrative.
In reality, none of these viewpoints were entirely correct, … or incorrect.
If a new data representation were to be produced — even just
an updated GEDCOM — then it would be unlikely that the industry would immediately
embrace it for the simple fact that commercial stakeholders would have a
financial commitment to their current internal data models, and to their
supported modes of import/export. Companies and their products would have
evolved along with their internal data model; any new data model with larger
scope — no matter how powerful or modern — would have no impact if it required
companies to abandon their existing products and to start again. For instance,
taking advantage of the powerful analogy between persons and places as
historical subjects would not help companies whose import/export was entirely
via GEDCOM, or whose internal data models had not recognised the analogy. Data
models such as GEDCOM-X were designed around the specific requirements of the
parent organisation, and not as future-proofed models to be shared by all
genealogical software.
Another issue was that there were many stakeholders out
there (including every user who simply wanted to exchange data without error or
loss), but fewer people prepared to openly contribute on mailing lists, and
fewer still who had the time and skills to produce formal written material.
FHISO’s impressive Founding Members seemed content to sit on the sidelines, and
there was little (if any) engagement with them following the original
announcements.
This was a tough problem. There was no doubt that the
industry needed not just an open standard but an evolutionary path: one that
would permit ‘software genealogy’ to mature, and to become part of the modern
digital world. However, there was clearly some apathy to doing the heavy lifting, and there could be later
resistance to anything too radical.
If ever the term
catch-22 found its true mark then it was in the field of software genealogy.
The first thing to be done was to modify the organisational
structure to one more appropriate to the reach of genealogical standards —
FHISO was not a general-purpose ISO or ANSI. FHISO already had the concept of
an Extended Organisational Period (EOP), now embodied in article 24 of its by-laws, during which the membership would
be populated (including the Founding Members), new officers appointed and roles
filled, and the TSC established. The EOP also allowed the Board to amend the
by-laws as necessary, and without the need for an Annual General Meeting (AGM).
It may have been envisaged that the EOP would only last for maybe six months,
but it was still in effect at this time of change.
From a technical perspective, FHISO needed a focus:
something concrete that could be debated, cited, and built upon. It would
therefore be necessary for a core of dedicated people to form a Technical
Project Team (as allowed by the TSC Charter) to establish a technical strategy and
to publish a selection of draft component standards in order to kick-start
subsequent work.
These processes could all be done within the remit of the
EOP, but it would have to keep the membership (and the public) aware; certain
organisations would not acknowledge any third-party standard unless it was
developed through a proper transparent process by an incorporated organisation.
FHISO does publish regular Board minutes
and TSC minutes, but it would only
be allowed to publish draft standards for comment during this period (not
official standards) since there would be no voting mechanism. Until this work
had reached an acceptable level, and elections and AGMs could be resumed, then it
was deemed inappropriate to require existing members to pay for each year.
During this phase, FHISO produced a technical strategy paper that
amplified on these points, and a policy document on the preferred nature of
software vocabularies. The vocabularies document was updated during Feb/Mar
2016 to incorporate public feedback.
Part of FHISO’s technical strategy was to focus on what
might be called component standards:
standards related to specific parts of genealogical data (e.g. personal names,
citation elements, place references, dates), and allow these to be integrated
into existing data models. This would not preclude the future publishing of a
single FHISO data model that embraced all of these components, but for the
shorter term it would allow existing data models to incorporate them more
quickly, while minimising the impact on their core software. The basis for this
was that, within certain limits, it should be possible to have distinct
proprietary data models cooperating if they shared a common currency.
FHISO had previously used the analogy of car design to
explain this strategy; rather than standardise what cars we can drive, there would
be benefit in standardising the parts from which all cars are made. Well,
there’s a real instance of this that can be cited: all the cars around the
world (with a few exotic exceptions that can be ignored) share the same wheel hub
sizes. There is a standard set of accepted sizes, and they’re all measured in
Imperial units — even in countries that use the metric system. This means that
the same range of tyres can be used for all our cars, no matter which model or
where it was manufactured.
The plan, therefore, was to work on a number of these
component standards, each of which would include details of how it should be
integrated into existing data models — the so-called “bindings”. But there was a
problem here: GEDCOM-X could never supplant GEDCOM because there were probably
millions of GEDCOM files still out there, and software products that were tied
to the GEDCOM model. These problems for FamilySearch effectively mirrored those
of FHISO’s standardisation effort, but for the component standards to work then it required a version of GEDCOM
that could be taken forwards. If those companies that were bound to GEDCOM were
not to be left behind then there had to be a new version of it, one for which
bindings could be defined for FHISO’s new component standards. The two initial data
models of interest, therefore, would be GEDCOM and GEDCOM-X.
The following diagram illustrates how these component
standards would be assimilated by the various data models, including a
supported GEDCOM continuation (shown here as “ELF”).
Figure 1 – FHISO Component Standards.
FHISO ELF
GEDCOM hasn’t been updated in decades, and there are
acknowledged weaknesses and ambiguities in its specification. Furthermore, the
name is still the property of FamilySearch.
FHISO would, therefore, define a fully compatible format
called Extended Legacy Format, or ELF for short. ELF v1.0 would be compatible
with GEDCOM 5.5(.1), such that ELF could be loaded by a GEDCOM processor, and vice versa. This means not only that
stakeholders could declare support for ELF v1.0 with not too much effort, but also
that there would be no reasonable excuse for not declaring support.
Of all FHISO’s draft standards, ELF is probably the most
important since it presents a future for GEDCOM data and software, a future
that would support enhanced movement of data both between compliant
products and between differing proprietary data models.
Figure 2 – FHISO ELF.[1]
As well as being a supported and more tightly-specified
version of GEDCOM, ELF would include an extension mechanism that would be
employed in later versions to embrace the FHISO component standards, and any third-party
extensions by using proper namespaces.
During the preparation of the first ELF draft, FHISO engaged
with members of the German group GEDCOM-L,
which represents over twenty genealogical programs over there. Their goal since
2009 has been to reach agreement on the interpretation of the GEDCOM 5.5.1
specification, and to extend it to include a number of “user-defined tags”. For
instance, high among their priorities was support for the German Rufname, or
“appellation name”, which is an everyday form of personal name.
FHISO intends to utilise the knowledge and experience of the
GEDCOM-L group in making a better GEDCOM.
Industry contacts had identified a citation-element
vocabulary — a representation of the discrete elements of data within citations
— as filling an important niche in today's standards, and so this became the
focus of the first component standard.
In 2016, the Technical Project Team began to draft possible
standards text, releasing an early draft micro-format for a citation-element ‘creator name’ for comment in the
spring. During June 2017, FHISO was able to publish a number of high-quality
draft standards for public comment,
and during the September it incorporated public feedback from its TSC-Public
mailing list.
This milestone puts
FHISO ahead of all previous standards initiatives! The level of detail and
accuracy in these drafts, combined with the choice of technologies, establishes
a future-proof model that could take genealogical data as far as is needed, and
so it sets the bar for all future FHISO work.
The next phase will involve releasing draft citation-element
bindings for both GEDCOM-X and GEDCOM. Already released is a draft bindings document for RDFa.
Rather than being a genealogical data model, RDFa defines a set of
attribute-level extensions to HTML. This is especially interesting as it allows
pre-formatted citations to have their embedded elements marked-up. The industry
norm is to first define the individual citation elements as discrete items, and
then rely on some citation template system to build them into a formatted
citation. Traditional genealogists, and anyone who prefers to hand-craft their
own citations (including me), should welcome this inverted alternative as it
recognises the power and flexibility of citations as sentences rather than
formulae.
Although currently acting
chairman on the FHISO Board, I write this article as someone who has always
believed that standardisation absolutely
must happen in our field. It borders on hypocrisy that users are expected
to collaborate on unified trees, and to play fair with each other, when the
large organisations have been unable to set a precedent with their data
sharing.
[Backup copy of the BetterGEDCOM wiki has recently been restored at: BetterGEDCOM Archive. The original site has been locked by wikispaces.com]
[1] Original base image
used with kind permission of SuperColoring
(http://www.supercoloring.com/drawing-tutorials/how-to-draw-a-christmas-elf
: accessed 26 Jun 2017).
No comments:
Post a Comment