Wednesday, 11 September 2013

Genealogical Persona Non Grata

You may have heard the term Persona (pl. Personae) being used in a genealogical context, especially by people with a software background. What is a persona, though? Do you ignore it as a software aberration? Does it have any value?

A persona is the term used to describe the reference to some person from one specific source. There are no conclusions in a persona — only information — and so it is sometimes inaccurately referred to as an “evidence person” in order to distinguish it from the traditional “conclusion person” that we might have in a family tree. Hence, a persona is not equated with any actual person. The use of the term evidence is inaccurate here as evidence is an intangible mental construct, unlike source information. It is what we think certain source information means.[1]

In principle — I know we all work in different ways — it is possible to take a number of similar personae and group them together in order to form one or more conclusion persons that we can identify with actual people; more on this process in a moment, though. Some advocates describe a multi-tier process where this grouping occurs at different levels. For instance, personae that are obviously similar being grouped first, and then those groups being tentatively grouped themselves based on less obvious criteria. It’s interesting to note that these persona groups are not personae themselves since they are the result of some inference and conclusion, and ideally need some justification.

 

This brief outline embodies the generally accepted nature of a persona. However, things start to go awry from here and opinions begin to differ. The persona is a much-debated concept in the Evidence & Conclusion model of genealogy, and many threads on the subject can be found on the BetterGEDCOM wiki, such as Do we need persona?

As a representation of the reference to person from a single source, there cannot be much debate over the concept. However, in practice, that information is usually distilled down to a number of named properties, as in the illustration above. I’m using the STEMMA® terminology of properties here rather than “facts” or “PFACTs”, etc. STEMMA defines a Property as extracted and summarised source information, and acknowledges that they require the same support for uncertain characters, uncertain interpretations, and other anomalies as do transcriptions. Properties are valuable as a window onto the supporting information but they do not replace the raw information since they are only a digested form of it. To do that would lose the contextual parts of the information such as: what the event was, who else was there, what parts they played, and how reliable the source itself is.

The persona concept itself can be traced to a 1959 paper entitled Automatic Linkage of Vital Records[2]. Indeed, there are still those who believe that one of the primary uses of personae is in their automated combination by software. This might yield a first-pass result when many records are involved but I would be very concerned about accepting that result without putting in the real analysis expected of genealogical research. However, this is straying into the field of how software might utilise personae rather than their expressive power.

The origin of the term itself is uncertain but at the meeting that kicked off the GenTech model, in 1994, Tom Wetmore gave a talk entitled "Structured Flexibility in Genealogical Data" in which he stressed the need to record evidence data, and where he used the term persona in that context. The concept of persona exists in several data models, including GenTech and more recently GEDCOM-X.

So, is there any merit in representing personae in our data? STEMMA records Property values for a person reference, such as their name, age, and occupation, but it also wants to retain the source context of that information — the where-and-when. It does this by subdividing its Event entities into a number of SourceLnk elements, each of which is supported by a distinct source. Those SourceLnk elements may contain multiple PersonLnk elements corresponding to person references in that source and these are, therefore, similar to personae.


<Person Key=’pWilliamElliott’>
    <Eventlet>
        <!-- Private event (no other persons involved) -->
        <When Value=’1870-11-17’/>
        <SourceLnk Key=’sEveningPost’>
            <PersonLnk>
                <Property Name='Name'>
                Wm. Elliott
                </Property>
                <Property Name=’Age’> 29 </Property>
            </PersonLnk>
        </SourceLnk>
    </Eventlet>
</Person>

<!-- Multi-person events -->

<Event Key='eCensusElliott1851'>
    <SourceLnk Key=’sCensusElliott1851’>
        <PersonLnk Key=’pWilliamElliott’>
            <Property Name='Name'>
            William Elliott </Property>
            <Property Name='Age'> 10 </Property>
            <Property Name='Occupation'>
            Scholar </Property>
            <Property Name='BirthPlace' Key='wUttoxeter'>
            Staffordshire Uttoxeter </Property>
            <Property Name='Relationship’
            Key='pTimothyElliott'> Son </Property>
            <Property Name='Status'/>
        </PersonLnk>
    </SourceLnk>
</Event>

<Event Key='eCensusElliott1861'>
    <SourceLnk Key=’sCensusElliott1861’>
        <PersonLnk Key=’pWilliamElliott’>
            <Property Name='Name'>
            William Elliott </Property>
            <Property Name='Age'> 20 </Property>
            <Property Name='Occupation'>
            Labourer </Property>
            <Property Name='BirthPlace' Key='wUttoxeter'>
            Staffordshire Uttoxeter </Property>
            <Property Name='Relationship’
            Key='pTimothyElliott'> Son </Property>
            <Property Name='Status'>
            Unmarried </Property>
        </PersonLnk>
    </SourceLnk>
</Event>

<Event Key='eMarriageElliott1862'>
    <SourceLnk Key=’sMarriageElliott1862’>
        <PersonLnk Key=’pWilliamElliott’>
            <Property Name='Name'>
            William Elliott </Property>
            <Property Name='Age'> 21 </Property>
            <Property Name='Occupation'>
            Hammersman </Property>
            <Property Name='ResidencePlace'
            Key='wVictoriaStreet'> Victoria Street Derby
            </Property>
            <Property Name='Role'> Groom </Property>
            <Property Name='Status'>
            Unmarried </Property>
        </PersonLnk>
    </SourceLnk>
</Event>


The PersonLnk elements representing the subject references are assembled from the discrete Property values derived from the supporting source. When the Properties describe relationships for the subjects then they can also be represented, and may be inter-person relationships (such as wife-of or wife-of-brother-of), membership of some group, or ones relative to referenced places. Putting this information into Events allows the information to be presented by time (i.e. a timeline), or geography, or both. The Property values for the Event itself, such as the dates or place, may also be specified in the SourceLnk element as Event properties.

OK, so why don’t I describe these sets of Property values as personae and use them as such? For a start, the interpretation and summarisation of these items constitutes a level of inference, and so they are one level removed from the persona concept. STEMMA also generalised the concept so that there are equivalents for all of its subject references, including places, groups, and animals. Furthermore, as of STEMMA V4.0, there is a much closer concept that has true value for research and analysis purposes. Its Source entity allows references to subjects (such as persons), and to dates and other important details or phrases, to be marked, collected, and built into a network for a graphic analyser. This allows those references to be analysed in terms of other context from the source information, and for similar references — in either a single source or across multiple sources — to be assembled into multi-tier persona-like entities.

In summary I believe the concept of personae has merit in micro-history data, but without the contextual information that surrounded those references in their respective sources then they cannot be used for research purposes. Similarly, STEMMA’s sets of Property values are merely an extracted and summarised form of information from a source and are not designed for deep analysis. Conversely, its Source entity embraces references to more subjects than merely persons, and to any information that the researcher feels will be important to their historical analysis. This is not mandating a given research methodology — which is a basic premise of STEMMA — but it does provide support for a genuine approach to handling complex evidence.


** Post updated on 22 Nov 2015 to align with the changes in STEMMA V4.0 **


[1] “QuickLesson 13: Classes of Evidence—Direct, Indirect & Negative“, Evidence Explained: Historical Analysis, Citation & Source Usage (https://www.evidenceexplained.com/content/quicklesson-13-classes-evidence%E2%80%94direct-indirect-negative : accessed 10 Sep 2014).
[2] H. B. Newcombe, J. M. Kennedy, S. J. Axford, and A. P. James, “Automatic Linkage of Vital Records”, Science, Vol. 130, No. 3381 (16 Oct 1959): p.954959.

No comments:

Post a Comment