Monday 18 August 2014

Time-dependent Attributes

The subject of time-dependent attributes is not one that I’ve seen discussed very often. I want to illustrate how easy they are to handle when the underlying data model is event-orientated.

To most readers interested in this issue, the subject will be interpreted in the context of personal attributes; those relating to a person. Although in STEMMA® it equally applies to the attributes of a place, or of a group (including families, organisations, regiments), I will focus on personal attributes in order to keep things relevant to family historians.

By time-dependent, I mean those attributes that may change over time, and which we are likely see progressive variation of in different sources. Obvious examples would be height and weight, although such items are rarely recorded in our data. A more familiar example might be a person’s residential address. Time-dependent attributes contrast with those fixed ones, usually attributable to our birth, such as our biological parentage, birth sex, date of birth, and place of birth.

The only in-depth presentation on this subject that I’m aware of is the paper submitted to FHISO by Richard Smith: Expressing time-dependent personal attributes. This paper comments that the GEDCOM way of handling them, by attaching a DATE tag to an item, doesn’t cope with all possibilities, including a personal name and someone’s effective sex. Unfortunately, since GEDCOM is primarily a lineage-linked representation then the temporal nature of the attribute has to be added as an afterthought, and the relationship to other attributes for that person, the date or event that they pertain to, attributes of other people associated with the same event, and the sources supporting that event, are uncoordinated at best.

The proposal in the aforementioned FHISO paper works for a representation involving people and their lineage (e.g. GEDCOM), and any representation where people must have attributes directly associated with them, but would be unnecessary if there was a natural representation of time and events. The core concept that is missing, and which would automatically unify those uncoordinated items, is the Event entity — a representation of a moment in time, or a span of time, for which source information exists.

Let’s pick a really obvious case to illustrate the event-based approach. A person’s age is time-dependent, and we won’t see the same value in all the sources we consult. We would never think of associating all these variant ages directly with a Person entity, and the same approach should apply to other time-dependent attributes. In the case of age, we usually calculate an associated date-of-birth from the contextual date of the information and record that instead. The ability to do this calculation is specific to this particular attribute, and it could not be applied to, say, a residential address, occupation, or military rank. It’s also a conclusion derived from information. We’ll continue to take the simplistic view of the age attribute, though, in order to explore the general case.

Every source of information has a temporal context; a date, or range of dates, to which it pertains. That source information therefore supports the concept of an event, and can be represented by an equivalent Event entity to which those sources are connected. If the source information mentions specific persons (or places, or groups) then their associated entities can be connected to that Event. Any attributes — or what STEMMA calls Properties — given in the information are then associated with the corresponding connections between the subject entities and the Event.

Using this approach, those attributes are now associated with the Person (or other subject entity), the relevant date(s), the relevant place, the relevant sources, and any other Persons (or subject entities) mentioned in the same source information.

Someone’s recorded age should be monotonically increasing with time but that’s not what we see. We will encounter values that do not progress smoothly, and may even appear contradictory by remaining static or running backwards. The importance of this is that what is written cannot be treated as fact, and may need some clarification, correction, or other annotation. The general Property mechanism in STEMMA allows such annotation, as well as adding conclusions about their interpretation such as the identification of a place. This is discussed further in Evidence and Where to Stick It, and in Is That a Fact? (which also mentions some coding examples).

One last consideration: the aforementioned FHISO paper introduces a complication in the form of:

Consider a letter written in 1821 that says “In 1799 I was living in Shrewsbury where my father was a schoolmaster”.

This indirect documentation of attributes may be a complication for GEDCOM but not for a representation that fully embraces events. I have already defined an Event entity as a representation of a moment in time, or a span of time, for which source information exists, and this particular source simply has information supporting multiple events to different degrees. This is no different, say, to a military enlistment record from 1899 that mentions a marriage occurring in 1870. The source directly supports the enlistment event but also supports the marriage event to a lesser extent. The same applies to a census. The information, as given to an enumerator on census night, directly supports the census event, but a recorded age doesn’t support a birth event to the same degree. In other words, a source as a whole may contain information supporting multiple events to different degrees.

No comments:

Post a Comment