… so to speak. Do you attach census pages to people in your
tree as evidence of a birth date? If so then there is a good chance that you
are currently attaching items to the wrong entities in your data. Read about
the pitfalls that we all face when associating evidence, why we often do this
incorrectly, and what the future holds for us.
If you find one of your ancestors in a census page, do you cite
that page as evidence of where they lived, or their date of birth, or their place
of birth? It may yield such evidence but then what do you attach the census
information to? Many people would add a citation in the details of that person,
and also attach any image of the census page directly to that person, but this
is demonstrably wrong. Although this sort of evidence can be gleaned from a
census record, the actual record wasn’t generated as a proof of any of those
items. In effect, you would be confusing the relevance of extracted information
(e.g. something about a given person) with the nature of the source itself
(i.e. what the record was originally intended for). This may be a subtle point
but it has profound implications when modelling the real-life data
relationships.
This issue is as much about data organisation as about philosophy
so let’s just take a moment to look at some simple practical problems resulting
from this common approach.
The chances are that the same census page includes other
family members and relatives so do you duplicate this operation for every member?
You cannot always attach a scan to some type of family record since the members
may be more distant or loosely-connected relatives, or they may even be
unrelated until some later marriage.
We’re not just talking about census pages either. Consider a
marriage certificate. You might be attaching details (citation or scan) to both
the bride and the groom, but what about their fathers? Both of these would be
mentioned on many certificates so you might be able to glean evidence of their
names and occupations too. Do you also transcribe the marriage date and record
it separately in the timeline for each of these individuals? If you had initially
misread the date because it was so faint then does that mean you have proliferated
the error? There’s always the very real risk, too, that you may have picked the
wrong marriage and need to undo those associations.
The same issue applies to a birth certificate since it
probably contains the parents’ names (whether married or not) and the father’s
occupation. You may be surprised but this issue even applies to photographs. For
instance, I have a group photograph of my grandparents and their family that
was printed in a Nottingham newspaper in the 1950s. Do I attach that image to
every one of those people in my data, together with details on where and when
it was taken, plus the newspaper citation, plus the newspaper caption that went
with it? Your choice of software makes a big difference in how serious an issue
this is to you, but there is a better way.
[…come on, Tony, get to the point…]
OK, I think the astute readers have already guessed where
I’m taking this, especially since I have set the stage by writing about the
importance of events in recent blog posts[1][2].
The thing is that the vast majority of our evidence – if not all of it –
relates to events; things that happened in a particular place at a particular
time. The people involved in all of the cases illustrated here are sharing
certain events (i.e. a census, a marriage, a birth, and a family group outing).
The record (or document, or artefact) details are therefore best associated
with the Event entity in the data, and the relevant Person entities linked to
the Event with their respective roles.
Multi-Person Events were described in more detail in my
previous posts, but where does that leave the information extracted from one of
these event-orientated data sources? For instance, if source details are now
associated with an Event then how do you associate the items of extracted information (i.e. Properties[3])
with each of the Persons sharing that Event?
This figure illustrates a marriage event using similar
symbols to those of my previous posts. We can see that bride and groom are both
connected to the Event entity, as well as the pairs’ fathers, and they would
each be distinguished by their respective Role[4].
The source details (citation, image, etc.) are associated only with the Event,
but the Properties – the items of extracted information that are relevant to each
of those Persons – are associated with the individual Event-to-Person connections,
not specifically the Persons or the Event. This is important since an Event may
have more than one Person connected to it, and each Person will have more than
one Event connected to it.
This natural factoring of the data results in less
duplication and redundancy, but at no loss of information. What we’re avoiding
is dumping the same source details on every associated Person simply because
that source yields some evidence about them. When several Persons are sharing
the same Event, different Properties may be derived for each of them but the
source information as a whole describes the Event.
For any code-junkies, an example of how this is represented
in STEMMA® can be found at Single
Source Events, and a further example involving multiple sources for the
same event at Multi-Source
Events.
Of course, not all software can actually do this since it
requires support for shared Events. You’re probably thinking, though, ‘what if
I have conflicting Properties such as a date-of-birth?’. We all know that we
may get conflicting Properties from different sources, but I haven’t changed
that by describing this approach. The final set of conclusion Properties that
you associate with each Person will be the result of assessing the aggregated
evidence for them – the evidence Properties from each of the Events in their
timeline. This has always been the case since no one has multiple dates of
birth! All I’ve done is re-factor the sources and the evidence.
So what’s the advantage of this? Well, apart from avoiding
unnecessary redundancy, the scheme is modelling the true nature of the data
relationships. This is what I meant by the issue being partly philosophical
above. Perhaps more important, though, is that your data is then organised
according to the natural timeline. If you want to present a timeline, either in
a report or on your screen, then it doesn’t have to be forced, and it can
accommodate the lives of multiple people when necessary.
Unfortunately, the future looks a little bleak for this.
There are many people who do not adopt this approach, either because their
software cannot handle it or because it’s not the way that they were taught,
and their data will never change. Even if their software improves or changes,
and even if some new data standard emerges that better models real-life, then their
data cannot be re-factored automatically to become better organised. It’s set
in stone.
By far the biggest issue, though, is the use of so-called
collaborative online family trees. When these models actually accommodate
sources, and when their contributors actually enter them, then they have no
choice but to associate them directly with Person entities because that’s all
they have. They’re not representing event-based history and so their misplaced
sources will be inherited by anyone copying from them. This does not bode well
for anyone wanting to adopt an event-based approach to family history, or even
micro-history. Genealogy’s preoccupation with mere family trees will
continually pollute the waters.
[1]
See “Eventful Genealogy”, Blogger.com, Parallax View, 3 Nov 2013 (http://parallax-viewpoint.blogspot.com/2013/11/eventful-genealogy.html).
[2] See “Eventful
Genealogy - Part II”, Blogger.com, Parallax View, 6 Nov 2013 (http://parallax-viewpoint.blogspot.com/2013/11/eventful-genealogy-part-ii.html).
[3] ‘Properties’
is the terminology adopted by STEMMA for items of extracted and summarised information such as a
date-of-birth (see http://parallaxview.co/stemma/home/document-structure/person/properties).
I feel strongly that the word ‘facts’ is misleading since the possibility of
something being factual depends on the nature of the associated source. I also
do not like the software term of ‘PFACT’, which stands for property, fact,
attribute, characteristic, or trait.
[4] The
Roles might be something like Bride, Groom, Bride.Father, and Groom.Father in
this case. I will discuss how extensible roles can easily be accommodated in a
future post.
No comments:
Post a Comment