Sunday, 24 November 2013

Evidence and Where to Stick It


… so to speak. Do you attach census pages to people in your tree as evidence of a birth date? If so then there is a good chance that you are currently attaching items to the wrong entities in your data. Read about the pitfalls that we all face when associating evidence, why we often do this incorrectly, and what the future holds for us.

If you find one of your ancestors in a census page, do you cite that page as evidence of where they lived, or their date of birth, or their place of birth? It may yield such evidence but then what do you attach the census information to? Many people would add a citation in the details of that person, and also attach any image of the census page directly to that person, but this is demonstrably wrong. Although this sort of evidence can be gleaned from a census record, the actual record wasn’t generated as a proof of any of those items. In effect, you would be confusing the relevance of extracted information (e.g. something about a given person) with the nature of the source itself (i.e. what the record was originally intended for). This may be a subtle point but it has profound implications when modelling the real-life data relationships.


This issue is as much about data organisation as about philosophy so let’s just take a moment to look at some simple practical problems resulting from this common approach.

The chances are that the same census page includes other family members and relatives so do you duplicate this operation for every member? You cannot always attach a scan to some type of family record since the members may be more distant or loosely-connected relatives, or they may even be unrelated until some later marriage.

We’re not just talking about census pages either. Consider a marriage certificate. You might be attaching details (citation or scan) to both the bride and the groom, but what about their fathers? Both of these would be mentioned on many certificates so you might be able to glean evidence of their names and occupations too. Do you also transcribe the marriage date and record it separately in the timeline for each of these individuals? If you had initially misread the date because it was so faint then does that mean you have proliferated the error? There’s always the very real risk, too, that you may have picked the wrong marriage and need to undo those associations.

The same issue applies to a birth certificate since it probably contains the parents’ names (whether married or not) and the father’s occupation. You may be surprised but this issue even applies to photographs. For instance, I have a group photograph of my grandparents and their family that was printed in a Nottingham newspaper in the 1950s. Do I attach that image to every one of those people in my data, together with details on where and when it was taken, plus the newspaper citation, plus the newspaper caption that went with it? Your choice of software makes a big difference in how serious an issue this is to you, but there is a better way.

[…come on, Tony, get to the point…]

OK, I think the astute readers have already guessed where I’m taking this, especially since I have set the stage by writing about the importance of events in recent blog posts[1][2]. The thing is that the vast majority of our evidence – if not all of it – relates to events; things that happened in a particular place at a particular time. The people involved in all of the cases illustrated here are sharing certain events (i.e. a census, a marriage, a birth, and a family group outing). The record (or document, or artefact) details are therefore best associated with the Event entity in the data, and the relevant Person entities linked to the Event with their respective roles.

Multi-Person Events were described in more detail in my previous posts, but where does that leave the information extracted from one of these event-orientated data sources? For instance, if source details are now associated with an Event then how do you associate the items of extracted information (i.e. Properties[3]) with each of the Persons sharing that Event?


This figure illustrates a marriage event using similar symbols to those of my previous posts. We can see that bride and groom are both connected to the Event entity, as well as the pairs’ fathers, and they would each be distinguished by their respective Role[4]. The source details (citation, image, etc.) are associated only with the Event, but the Properties – the items of extracted information that are relevant to each of those Persons – are associated with the individual Event-to-Person connections, not specifically the Persons or the Event. This is important since an Event may have more than one Person connected to it, and each Person will have more than one Event connected to it.

This natural factoring of the data results in less duplication and redundancy, but at no loss of information. What we’re avoiding is dumping the same source details on every associated Person simply because that source yields some evidence about them. When several Persons are sharing the same Event, different Properties may be derived for each of them but the source information as a whole describes the Event.

For any code-junkies, an example of how this is represented in STEMMA® can be found at Single Source Events, and a further example involving multiple sources for the same event at Multi-Source Events.

Of course, not all software can actually do this since it requires support for shared Events. You’re probably thinking, though, ‘what if I have conflicting Properties such as a date-of-birth?’. We all know that we may get conflicting Properties from different sources, but I haven’t changed that by describing this approach. The final set of conclusion Properties that you associate with each Person will be the result of assessing the aggregated evidence for them – the evidence Properties from each of the Events in their timeline. This has always been the case since no one has multiple dates of birth! All I’ve done is re-factor the sources and the evidence.

So what’s the advantage of this? Well, apart from avoiding unnecessary redundancy, the scheme is modelling the true nature of the data relationships. This is what I meant by the issue being partly philosophical above. Perhaps more important, though, is that your data is then organised according to the natural timeline. If you want to present a timeline, either in a report or on your screen, then it doesn’t have to be forced, and it can accommodate the lives of multiple people when necessary.

Unfortunately, the future looks a little bleak for this. There are many people who do not adopt this approach, either because their software cannot handle it or because it’s not the way that they were taught, and their data will never change. Even if their software improves or changes, and even if some new data standard emerges that better models real-life, then their data cannot be re-factored automatically to become better organised. It’s set in stone.

By far the biggest issue, though, is the use of so-called collaborative online family trees. When these models actually accommodate sources, and when their contributors actually enter them, then they have no choice but to associate them directly with Person entities because that’s all they have. They’re not representing event-based history and so their misplaced sources will be inherited by anyone copying from them. This does not bode well for anyone wanting to adopt an event-based approach to family history, or even micro-history. Genealogy’s preoccupation with mere family trees will continually pollute the waters.



[1] See “Eventful Genealogy”, Blogger.com, Parallax View, 3 Nov 2013 (http://parallax-viewpoint.blogspot.com/2013/11/eventful-genealogy.html).
[2] See “Eventful Genealogy - Part II”, Blogger.com, Parallax View, 6 Nov 2013 (http://parallax-viewpoint.blogspot.com/2013/11/eventful-genealogy-part-ii.html).
[3] ‘Properties’ is the terminology adopted by STEMMA for items of extracted and summarised information such as a date-of-birth (see http://parallaxview.co/stemma/home/document-structure/person/properties). I feel strongly that the word ‘facts’ is misleading since the possibility of something being factual depends on the nature of the associated source. I also do not like the software term of ‘PFACT’, which stands for property, fact, attribute, characteristic, or trait.
[4] The Roles might be something like Bride, Groom, Bride.Father, and Groom.Father in this case. I will discuss how extensible roles can easily be accommodated in a future post.

No comments:

Post a Comment