Friday, 3 January 2014

Role of the Role

Most people have come across the concept of a Role in the context of a census return. However, they are an essential component of shared (i.e. multi-person) events. How much consideration and flexibility do we give them though? How do Roles differ from Relationships?

Think of a Role in your census data and you will probably think of terms like Head, Wife, Son, Lodger, etc. Although there is a fairly common set of terms that represent relationships, occupations, and circumstances, there is no fixed or standardised set. Enumerators sometimes had to elaborate or invent terms to describe an unusual situation. For instance, in the household of George Binch in the 1911 census of England & Wales[1], the explicit terms ‘Daughter Of Father’, ’Son Of Father’, ‘Son Of Mother’, and ‘Son Of Daughter’ were all used in the ‘Relationship to Head of Family Column’ column.

The intention of this census column was obvious but there were cases where the reason for a person being in the census household weren’t ideally expressed as a relationship relative to its Head; of which there should have been just one. You could argue that a visitor was actually a visitor to the household rather than to the Head. I have also seen cases of “Wife” being implicitly relative to a “Boarder”, and “LodgerHead” as an explicit way of setting a new reference point for subsequent relationships. In effect, that column was overloaded with person-to-person relationships — not all of which were relative to the same person — and event-specific roles.[2]

Enumerators are known to have made mistakes, too, so it’s important to have a way of recording both what we found and our interpretation of it. This topic was covered in more detail in my previous post at Is That a Fact?, and a STEMMA® example that deals with erroneous roles and relationships can be found at Census Roles. That particular worked example involves the household of a Samuel Brady [Bradley][3] where two separate women were given the relationship-to-head of Wife. It is clear that the enumerator was leaving implicit the fact these were relative to different men on that census page; one to the head of the household and one to a boarder. I apologise in advance but I need to present a small amount of code here to show how this situation is handled in STEMMA. The following two lines represent the way the Relationship Property (aka Relationship “fact”) is encoded for the two women:

<Property Name=’Relationship’ Key=’pSamuelBradley’ Value=’Wife’>

<Property Name=’Relationship’ Key=’pJohnBradley’ Value=’Wife’>

In other words: ‘(Samuel Bradley).Wife’ and ‘(John Bradley).Wife’, respectively. Notice that they both capture what was recorded (i.e. “Wife” in both cases), and this may also include any uncertain characters or explanatory notes as in the full worked example. They supplement this, though, with a separate field for its interpretation. At first sight, it may appear that this is just expressing a simple relationship between two explicit people, but there’s a little more to it than that.

Before I explain what I mean, let me just expand the scope from census events to general multi-person events, as suggested previously at Eventful Genealogy. One or more Roles are essential to specify the part that each person played in an event. Usually, this means they were physically present and alive during the event, but not always. Examples include: a deceased person during a funeral, or an entry struck-out in a census because they were away, or someone talked about or mentioned during an event. The thing they actually have in common is one or more references to them within the sources supporting that event. The previous example cases would be distinguished by a separate Property called Status which could be ‘Deceased’ or ‘Absent’, in addition to more common cases like ‘Single’ or ‘Widow’.

I mentioned above that the census field may represent different things, such as a relationship (e.g. Son), an occupation (e.g. Servant), or a circumstance (e.g. Boarder). In general, a Role will have a well-defined interpretation for the current event type, whereas a Relationship must be relative to another person. In general, that reference point may be identified by name (e.g. John’s brother) or by a Role, such as Head in the census, or Bride/Groom at a wedding, or Mother during a birth.

Relationships may be applied to each other, too, resulting in a chain of more than two terms. In the George Binch household, cited above, the enumerator’s use of ‘Son Of Daughter’ would actually involve three terms, as follows:

<Property Name=’RelationshipKey=’pGeorgeBinch’ Value=’Daughter.Son’>
    Son Of Daughter

This would be read as ‘(George Binch).Daughter.Son’ since the first value-term applies to the identified person whereas each subsequent value-term apply to the preceding one.

In addition to event Roles, and direct Relationships, this diagram also depicts an indirect Relationship type. The difference is that there is at least one implied person who is missing, and so one Relationship is indirectly related to its preceding one. The example I’ve chosen is a familiar one to genealogists, and corresponds to a newspaper death notice where, say, a grandchild is quoted. In this diagram, the missing person is depicted as a male because the surname of the grandchild, and that of the deceased, may indicate a paternal descent. However, that’s not always the case. If only the given name was quoted then we have no clue as to her immediate lineage. Even knowing the surname, though, it is not the same as recording a value of ‘(Deceased).Son.Daughter’ if the source provides no evidence for the son.

If there are any heroes out there who managed to read through my earlier post, Digital Freedom, on a single cup of coffee then you may be thinking about custom Roles and Relationships. These are essential if we’re using multi-person Event entities to represent arbitrary events in our family histories. In order to illustrate this without using too much code, let me describe a contrived example: A newspaper report of a wedding describes how the neighbour (Alison) of the sister of the best-man (John Smith) fainted during the ceremony (a common occurrence?). Now STEMMA declares its Relationship Properties to be of a data-type it calls EnumList (see Extended Properties), which basically just means that it is a period-separated list of Relationship terms, each of which can be a custom one. For this example, we need custom Relationship name of Neighbour.

<Dataset Name=’Example’  xmlns:r=’’>

    <Event Key=‘eWedding’>
        <SourceLnk Key=’sWedding>
            <PersonLnk Key=’pJohnSmith’>
                <Property Name=’Name’> John Smith </Property>
                <Property Name=’Role’> BestMan </Property>
                <Property Name=’Name’> Alison </Property>
                <Property Name=’Relationship’ Key=’pJohnSmith’>                 Sister.r:Neighbour </Property>

Let me try and put this into plain English now. The owner of the domain name has registered a private relationship term called Neighbour. The <SourceLnk> element of the Event is assembling the Property values for the referenced persons from the associated source, together with their properties such as their roles and relationships. John’s role is simple enough (BestMan) but Alison’s relationship to him is a little more complicated. It is effectively (John Smith).Sister.Neighbour, and this is necessary because there was no explicit mention of John’s Sister.

Before you ask, the way these Roles and Relationships are described on your screen, or in your report, would be different from these programmatic terms, e.g. “next-door neighbour” or “neighbor” (US spelling). Also, the “r:” prefix simply tells the software who defined those programmatic terms, and prevents any clashes with someone else’s terms, so no end-user would see them.

Furthermore, this example relies on the fact that John Smith has his own Person entity (pJohnSmith) and so the normalised Relationship Property of Alison can be represented relative to him. In a more analytical phase, it might be the case that none of the subject references have had a final identification with respective subject entities, and that the relationship isn’t ready to be normalised. That is a situation where the Source entity (sWedding) would be used as it allows a profile to be built for each any every referenced subject, and for the relationships to be described in plain language rather than normalised equivalents.

What I’ve presented here is a scheme for describing the relationship of one person to another person, or to the underlying event, using a normalised vocabulary. The vocabulary itself can be easily extended such that the scheme can be used to model any real-life event in our own family history (i.e. no straightjacket). The normalised vocabulary allows a software product to depict those relationships rather than simply showing some piece of text extracted from a supporting source. Although the vocabulary is locale-neutral (since they’re programmatic terms), remember that the majority of non-biological relationships are culturally dependent and so if you’re modelling worldwide event types then this capability is extremely important. As an instance, consider all the parts of the world where the concept of a Godfather has no significance in any of their events.

** Post updated on 22 Nov 2015 to align with the changes in STEMMA V4.0 **

[1] "1911 Census for England and Wales”, database, FindMyPast ( : accessed 18 Dec 2013), household of George Binch (age 49); citing RG 14/20397 RD429, SD2, ED6, SN7; The National Archives of the UK (TNA).
[2] STEMMA V3 adopted these distinct terms, and corresponding Property names, to avoid the confusion and overloading in its previous versions.
[3] "1861 England, Wales & Scotland Census", database, FindMyPast ( : accessed 18 Dec 2013), household of Samuel Brady [Bradley] (age 30); citing RG 09/2560, folio 23, page 6; TNA.