GeneaBloggers

Wednesday, 20 November 2013

The Future Representation of the Past


I want to explore the way we represent the past in our computerised data, and to question whether this is good enough. What do we really want to represent? Are we constrained by technology or by convention?

Asked who I am, people would probably say ‘Oh yeah, he’s the one that writes those technical blog posts that no one understands’. Some might even associate me with STEMMA®. However, there’s a rationale and a message behind my work so I want to pull it all together for this special post. Hopefully, people will then understand me a little better, even if the technical stuff is still over the rainbow somewhere.

Family trees are a very limited form of data that I’ve sometimes described as ‘genealogy in its literal sense’. They describe the lineage of a number of related people.  Although they typically include the dates of vital events, they do not try to create a history from their data. Many people new to genealogy assume that they have to create a family tree because they do not realise that there’s anything more[1]. A huge amount of marketing talks specifically about “family trees”.

Most genealogists realise that they need to capture the history of a family in order to create a picture of their lives, and maybe to understand how they, themselves, came into being. Although genealogy as a discipline is widely considered to incorporate such history, the term family history is sometimes preferred in order to emphasise the nature of that pursuit.

However, restricting ourselves to family history alone is rather artificial. The history of the places they lived in, of the occupations they worked in, of their neighbouring families, and even of world events, will have had an impact on their lives. Also, they will undoubtedly have played a part in some of those events themselves. Although you may not use the term micro-history yourself, this is what that study would be described as[2].

This is all very well and experienced genealogists may be nodding in agreement (hopefully). When a genealogist performs their research, they will try to assimilate all relevant data and write it up according to their professional standards. My short study on Bendigo’s Ring[3] was partially designed as a case in point since it is specifically – and unusually - about a place rather than a person or a family, but the research principles are exactly the same. A traditional research report would have no trouble in representing this case, but what about our software products? That’s a different story.


Unfortunately, our software is mostly preoccupied with the representation of a family tree. Even when it allows you to enter historical notes, the framework used is still that of a family tree. If you want to generate a timeline then this has to be inferred from your tree-based data because it was never entered using a timeline paradigm.

I realised, very early on, that none of the products I’d looked at would be useful to me. I wanted to be able to record micro-history, not just family history, and certainly not just a family tree. It shouldn’t matter whether I wanted to record information about a family, or a person (related or not), or a place, or a surname, or specific events – I wanted to be able to record all such data in a structured way for the computer. In other words, I wanted something much richer than a simple narrative report. As a result, I began an R&D project back in 2011, later to be named STEMMA[4], to define a culturally-neutral computer representation for micro-history (if not generic history) and implement the associated software.

That project is still ongoing but it has already demonstrated that this goal is achievable; it’s not unrealistic at all. Although its data model is still being refined, I intend to use it to representation my own data. This rather puts me out on a limb but I have no choice in the current climate. In the meantime, I have used this blog to try and raise peoples’ awareness of what could be possible if we took a step back. The STEMMA Web site makes freely-available the current specification, my research notes, various downloads, and a number of example case studies for anyone with a coding background.

So do I expect the STEMMA research to affect the software market? I’m rather pessimistic about this. Ideally, an organisation such as FHISO could try and build the underlying concepts into a more powerful, and standardised, data model but then software vendors are unlikely to take up the challenge of moving towards a micro-history approach[5]. There is no demonstrated market for this enhanced scope so why would they take the risk? I feel the situation is possibly chicken-and-egg since there’s no precedent upon which peoples’ expectations can be lifted.

So what would I consider to be the essential elements of a data model that would be more aligned with a representation of micro-history?

  • Events are an essential element. These must be top-level entities, be shared (i.e. allow multiple people to be associated with each Event), and have sufficient internal structure to be able to model real-life events (i.e. durations and hierarchical arrangement)[6] [7].
  • Structured Narrative. Rather than plain-text notes, the model must make copious use of rich-text with semantic mark-up (i.e. inline meta-data). This must cater both for new narrative and for transcriptions of evidence. It must allow references to persons, places, events, and dates to be clearly marked, and linked to other data entities when relevant. It must also support citations and general reference notes, and support transcription anomalies such as marginalia, uncertain characters, original emphasis, and interlinear/intralinear notes[8] [9].
  • Place Support. The model must treat Persons and Places on an equal footing[10], and support a hierarchical organisation of Place entities (not just an issue of how to name them)[11].

If I could only make one final blog post then this would be a close candidate as it defines me as well as my work.



[1] See “OK, I have a Family Tree. Now What?”, Blogger.com, Parallax View, 5 Oct 2013 (http://parallax-viewpoint.blogspot.com/2013/10/ok-i-have-family-tree-now-what.html).
[2] See “Micro-history for Genealogists”, Blogger.com, Parallax View, 30 Oct 2013 (http://parallax-viewpoint.blogspot.com/2013/10/micro-history-for-genealogists.html).
[3] See “Where is Bendigo's Ring?”, Blogger.com, Parallax View, 15 Nov 2013 (http://parallax-viewpoint.blogspot.com/2013/11/where-is-bendigos-ring.html).
[4] STEMMA (Source Text for Event and Ménage MApping) R&D Project, Family History Data (http://www.parallaxview.co/familyhistorydata).
[5] See “Commercial Realities of Data Standards”, Blogger.com, Parallax View, 26 Aug 2013 (http://parallax-viewpoint.blogspot.com/2013/08/are-we-modelling-data-or-commerce.html).
[6] See “Eventful Genealogy”, Blogger.com, Parallax View, 3 Nov 2013 (http://parallax-viewpoint.blogspot.com/2013/11/eventful-genealogy.html).
[7] See “Eventful Genealogy - Part II”, Blogger.com, Parallax View, 6 Nov 2013 (http://parallax-viewpoint.blogspot.com/2013/11/eventful-genealogy-part-ii.html).
[8] See “Semantic Tagging of Historical Data”, Blogger.com, Parallax View, 5 Sep 2013 (http://parallax-viewpoint.blogspot.com/2013/09/semantic-tagging-of-historical-data.html).
[9] Tony Proctor, “A Story of Olde”, Family History Data (www.parallaxview.co/familyhistorydata/downloads/StructuredNarrative.pdf).
[11] See “A Place for Everything”, Blogger.com, Parallax View, 19 Aug 2013 (http://parallax-viewpoint.blogspot.com/2013/08/a-place-for-everything.html).