Wednesday, 19 November 2014

Genealogical Inheritance

If you think this is about bequests, wills, estate planning, or probate then you’d be wrong. I’m afraid this is about software inheritance and how it simplifies the creation of one genealogical entity (e.g. a Citation or an Event) from a similar one. Some amount of code is inevitable as this is really intended for a software-orientated audience, but I will try and explain what is happening and what the advantages are.

Anyone with knowledge of Object Orientated Programming (OOP) will already be familiar with software inheritance. A programming concept called a ‘class’ is used to describe some real-world entity (e.g. an employee), including the data associated with it (e.g. name, salary) and the operations that can be performed on it (e.g. promotion). A ‘derived class’ can then be created from such a generic ‘base class’ in order to describe a more-specialised entity (e.g. a salesperson, or an engineer). In this small illustration, that would allow all the common aspects of an employee to be programmed once and automatically shared by all the various employee types; the derived classes embracing any extra data or operations associated with specific cases.

Schematic of structural inheritance

STEMMA software, for instance, has a base class representing a generic subject entity corresponding to some subject mentioned in historical sources, such as a person. That encapsulates all the common aspects such as name handling (see Game of the Name) and their relationship to events and sources (see Time-Dependent Attributes). STEMMA also has derived classes that extend that base class in order to represent specific subject entities, such as a Person, Animal, Place, or Group; each of which has some slightly different requirements, including the representation of their respective hierarchies.

What I want to present in this article, though, is the inheritance mechanism provided in the STEMMA data model itself rather than in the associated software. This came about because many of my data files were created by hand in the early days, and I wanted a means to avoid duplication and to enable the re-use of entities. Little did I know how much I would come to rely on this feature.

This inheritance mechanism is applicable to each of the entity types: Event, Citation, and Resource. However, there is an additional parameterisation mechanism applicable to the latter two that works in conjunction with inheritance.

Inheritance

Let me pick a very simple example to kick this off. Say we’re about to create an Event entity for an English household in the 1901 census. We’ll need the census date for this — which many of us would have to look up — but we’ll very likely have further households to document from that same census. Wouldn’t it be nice to only enter the date and description just once. The code, below, creates a base Event entity representing the day of that census. This merely contains the event type and sub-type, and the specific date. The ‘Abstract’ attribute imposes certain restrictions to ensure that it constitutes a sound basis for inheritance. A second Event then inherits the details in order to describe the census event in a particular household.

<Event Key=’eCensus1901’ Abstract=’1’>

<Type> Survey </Type><SubType> Census </SubType>

<When Value=’1901-03-31’/>

</Event>


<Event Key=’eCensus1901ManningGrove’>

<BaseEventLnk Key=’eCensus1901’/>

<PlaceLnk Key=’wManningGrove’/>

</EventLnk>


Now, you may be thinking that a good software product would know about the various census events and enter the date, place, or other details, for you. That’s true but the product can never know all of the events in your ancestors’ lives, and the more micro-historical your focus then the more esoteric your required event types will be. What I was doing by hand could be implemented inside a product as a custom-Event builder, but the bigger difference is that this dependency wasn’t simply an aid to data entry; the dependency was modelled in the data file, and any change to the base entity (such as adding narrative) would be reflected in all dependent entities.

A previous post, Rock Family Trees, showed an example that built up a custom Event entity to use as a base for representing musical events. This effectively encapsulated the use of custom types to describe musical events and, more specifically, changes in band membership.

<Dataset Name=’RockFamilyTrees’

xmlns:et=’http://familyofrock.com/event-type’

xmlns:est=’http://familyofrock.com/event-subtype’>


<Event Key=’eMusicalBand’ Abstract=’1’>

<Type> et:Musical </Type>

<SubType> est:BandMembership</SubType>

</Event>


<Event Key=’eDannyJoined’>

<PlaceLnk Key=’wBrixton’/>

<When Value=’1968-08’/>

<BaseEventLnk Key=’eMusicalBand’/>

</Event>


This same mechanism may be used for Resource entities describing data files, physical artefacts, or both. For instance, the following base entity might describe a collection of original photographs that also happens to have been digitised.

<Resource Key=’rElizPhotos’ Abstract=’1’>

<Title> Elizabeth’s Photographic Collection </Title>

<URL ContentType=’image/jpeg’> file:Eliz-Photos/*.jpg </URL>

<Type Artefact=’1’> Photograph </Type>

<DataControl>

<Permission> Elizabeth gave permission to

share with family in 2008 </Permission>

</DataControl>

<Text>

Collection received from Elizabeth Smith on

<DateRef Value=’2008-06-09’/>

</Text>

</Resource>


A simple entity representing one specific digitised photograph from the collection might appear as:

<Resource Key=’rPhotoSmithFamily’>

<BaseResourceLnk Key=’rElizPhotos’/>

<URL> file:Eliz-Photos/SmithFamily1952.jpg </URL>

</Resource>


This inherits quite a bit from the base entity, including a permissions notice that software would display when any type of sharing is attempted. Note that if that notice were modified in any way then it would automatically affect all the derived entities that depend on it.

However, the following section will indicate how this example can be improved upon.

Parameterisation

Whereas a Resource entity uniquely identifies a data file though its URL string, a Citation entity requires both a URI string and a set of parameter values to uniquely identify an information source.

A Citation entity uses parameters to represent individual citation-elements, as described in Cite Seeing, and the following example uses them to describe a published book

<Citation Key=’cOldNottm’ Abstract=’1’>

<Title>Old Nottingham Notes</Title>

<URI> http://stemma.parallaxview.co/source-type/book/ </URI>

<Params>

<Param Name=’Author’>James Granger</Param>

<Param Name=’Title’>OLD NOTTINGHAM : Its Streets, People, etc</Param>

<Param Name=’Publisher’>Nottingham Daily Express Office</Param>

<Param Name=’Date’>1902</Param>

<Param Name=’Page’ ItemList=’1’/>

</Params>

</Citation>


The URI implies a given set of named and typed parameters that are relevant to this source type. This base Citation provides parameter information about the book as a whole, but not the specific page(s) — note that selected parameters, such as this one, may specify a series of values. That page information might be provided in a new Citation entity that inherits from the base one as follows:

<Citation Key=’cHandleysHospital’>

<BaseCitationLnk Key=’cOldNottm’/>

<Params>

<Param Name=’Page’>94</Param>

</Params>

</Citation>


Alternatively, the page information could be provided when the base entity is referenced; say in some narrative. This effectively creates an unnamed, transient Citation entity through inheritance:

<CitationRef Key=’cOldNottm’>

<Param Name=’Page’>94</Param>

</CitationRef>


Parameterisation is available in both Citation and Resource entities, and the values may be inherited from a base entity, declared explicitly in the body of an entity, or applied to a link from one entity instance to another. All of these schemes can be used together.

The parameters may also be substituted into selected items by using ${param-name} markers. For Citation entities, this is available in the citation-title, the format-string, the values of parameters themselves (e.g. within a Params element), and narrative elements. For Resource entities, it is available in the resource-title, URL, parameter values, and narrative elements.

The next example shows a simple parameterised Resource for accessing individual photographs from a given folder. The base Resource defines the names and types of the parameters, and derived entities or entity references can specify the corresponding parameter values.

<Resource Key=’rPhotos’ Abstract=’1’>

<Title>Family photograph:${PhotoName}</Title>

<URL ContentType=’image/jpeg’>file:myphotos/family/{$PhotoName}.jpg </URL>

<Params>

<Param Name=’PhotoName’ Type=’Text’/>

</Params>

</Resource>


<ResourceLnk Key=’rPhotos’>

<Param Name=’PhotoName’>Tony</Param>

</ResourceLnk>


This last, more-involved example will illustrate how the inheritance and parameterisation mechanisms can be used in conjunction with both Citation and Resource entities in order to handle online images. It uses a shorthand source citation for a general census page of England & Wales for a particular year, e.g. [RG13/3178/51/12]. While not recommended, this catalogue-reference example makes an illustration easier to read.

<Resource Key=’rCensusImage’ Abstract=’1'>

<Title>1851-1901 Census Images of England and Wales</Title>

<URL>http://www.census.com/image?series=${Series}&piece=${Piece}&folio=${Folio}&page=${Page}</URL>

<Params>

<Param Name=’Series’ Type=’Text’/>

<Param Name=’Piece’ Type=’Integer’/>

<Param Name=’Folio’ Type=’Integer’/>

<Param Name=’Page’ Type=’Integer’/>

</Params>

</Resource>


<Citation Key=’cCensus1901’ Abstract=’1’>

<Title> 1901 Census of England and Wales </Title>

<DisplayFormat Mode<=’RefNote’>

<Text Language=’eng’>

[<Subs><i>${Series}/${Piece}/${Folio}/${Page}</i></Subs>]

</Text>

</DisplayFormat>

<URI> http://stemma.parallaxview.co/source-type/census-eng-wales </URI>

<Params>

<Param Name=’Series’>RG13</Param>

<Param Name=’Piece’ Type=’Integer’/>

<Param Name=’Folio’ Type=’Integer’/>

<Param Name=’Page’ Type=’Integer’/>

</Params>

</Citation>


<Source Key=’sCensus1901ManningGrove’>

<Title> 1901 Census for Manning Grove</Title>

<Frame>

<CitationLnk Key=’cCensus1901’>

<Param Name=’Piece’>3178</Param>

<Param Name=’Folio’>51</Param>

<Param Name=’Page’>12</Param>

</CitationLnk>

<ResourceLnk Key=’rCensusImage’>

<Param Name=’Series’>RG13</Param>

</Param>

<Param Name=’Folio’>51</Param>

<Param Name=’Page’>12</Param>

</ResourceLnk>

</Frame>

</Source>


Now there’s a lot going on here. The Source entity ’sCensus1901ManningGrove’ represents a specific page in the 1901 census of England & Wales. It nominates a specific Resource for the census image and an associated Citation, both of which inherit a number of items from the base Citation entity (’cCensus1901’) and base Resource entity (rCensusImage).  For the Citation, this includes the source-type URI, format-string, and parameter names & types. For the Resource entity, it includes a URL for accessing the associated page images via a hypothetical Web.

An important point regarding the application of parameter substitution is that it always occurs after the inheritance process has completed. Hence, the following distinct stages may occur:

  1. Inheritance of fields from the base entity.
  2. Overriding (in memory) with explicit fields from the derived entity.
  3. Creation of a transient unnamed entity from the parameter settings in a *Lnk/*Ref element.
  4. Substitution of current parameter values, in the source-order of their substitution markers.

Hence, in the last example, stage two hasn’t been employed, so the CitationLnk element specifies parameter values to create an unnamed Citation entity in memory, directly from the base entity.

Conclusion

Most of the STEMMA entity types have their own concept of a structural hierarchy, e.g. lineage for Persons and Animals, geographical/administrative hierarchies for Places and Groups, source provenance for Citations, and hierarchical Events. An inheritance hierarchy, though, is fundamentally different in that it allows sharing of data between related entities. As stated above, this is more than just a mechanism of convenience for automatically adding required data to a new entity; the dependency is represented in the data and so any change to the base entity will automatically affect all the derived entities.

Although the mechanism requires that a derivation can only be made from an abstract entity, the mechanism can be multi-level, i.e. deriving new abstract entities from prior ones. This can be used, for instance, to add parameters for the citation of a specialised source type based on a more generic one. Some real examples may be found in www.familyhistorydata.parallaxview.co/downloads/JessonLesson.xml.


** Post updated on 19 Apr 2017 to align with the changes in STEMMA V4.1 **