Wednesday, 6 November 2013

Eventful Genealogy – Part II

In the first part of this post (Eventful Genealogy), I explained how the undervalued Event[1] entity was essential for binding our historical information in order to create a coherent description of the past. I also stressed that this entity should be able to model the structure of real-life events rather than merely being a software convenience. I now want to present more advantages following from my suggestion.

The suggestion involved giving Events an optional duration, thus allowing them to model protracted events in addition to simple events, and allowing them to be linked hierarchically in order to represent the internal structure of complex events. All of these Event types could be shared, meaning that zero-or-more arbitrary Persons could be linked to them.

Let’s briefly look at the anatomy of one of these Events:

This shows that each Event has a bounding start and end date. It can have any number of subordinate (child) Events linked to it which fall within those bounds. It has a single associated Place, but any number of Persons can be connected to the Event allowing it to be shared rather than duplicated.

Either of the bounding dates could be implied by child Events. For instance, the start date could be equated to the start date of one child Event, and the end date could be equated to the end date of another child Event. That nicely avoids any duplication of date values.

Unfortunately, we all know that exact dates may not be readily available. Maybe we have approximate ones, or maybe the evidence could not be deciphered with confidence, or maybe there was no available evidence at all. Most systems (including STEMMA® ) have a way of representing approximate, or “fuzzy”, dates but what if you have no date value; what do you enter then? Do you just guess?

Well, it’s nice to have exact dates but they’re not always essential in order to create a coherent picture of the past. Sometimes, we can say with confidence that one event preceded or followed another, irrespective of whether we know exactly when they occurred. This could still allow us to paint a timeline if we had a way of representing this knowledge. Furthermore, it could also help to put finer limits on those fuzzy dates as more information becomes available.

One obvious example is that any date of death must follow a date of birth, although these dates could be identical for a stillborn if one appears in your data. More practical cases involve a baptism following a birth, or a burial/cremation following a death. If the only early evidence we have for someone is a baptism entry then it doesn’t mean that there was no birth event. We can still create a corresponding Event entity and relate it to the baptism one using what STEMMA calls a relational constraint. This is basically just a link between the two Events indicating which came first using the following options:

  • AfterEvent. Current event > specified event
  • BeforeEvent. Current event < specified event
  • FromEvent. Current event >= specified event
  • UntilEvent. Current event <= specified event
  • AtEvent. Current event = specified event

The scenario of a burial following a death is one with a particular significance since I have seen many cases where a family tree has listed someone’s burial date, from a parish register, as their date of death. Acknowledging that these are distinct events, with a particular order, could have helped to make those trees more accurate.
So how does this look from a software perspective? Is it all just too complicated to implement? Well, the problem of having discrete, interconnected entities, each with their own duration, and wanting to put limits on unknown dates as well as validating the overall system, has been solved before – decades ago! In fact, it is almost exactly the same as a PERT chart in project management[2].

Something that STEMMA is currently investigating is a type of Event constraint that puts a particular distance (or rather duration) between two Events. In England and Wales, for instance, the civil registration of a vital event was expected to be made within three months of the event itself. Hence, if you merely have the registration date from the GRO index (say from FreeBMD) then you can still put useful constraints on when the event happened. Another case is if you know someone married their partner but you cannot locate an actual marriage record. You can estimate that it was a certain length of time after their birth event – say 18 years, dependent upon the local laws on marriageable age at that time. Neither of these cases need specify an exact separation since even an approximate one provides useful information that can help when creating or validating a proof argument.

In summary, certain Events can be ordered even when their dates are unknown, and relying on dates alone could leave related Events as apparently independent of each other. Adding relational constraints provides additional information that cannot be represented numerically.

[1] As before, I’m also using capitalisation here to distinguish software entities from the everyday usage of such words.
[2] PERT chart from CS Odessa’s ConceptDraw Office samples under Creative Commons licensing. Last accessed: 5 November 2013.