Following my previous post, Collaboration
With Tears, where I suggested that some element of Isolationism is inherent in genealogy, and what criteria would make
collaboration practical, I want to now describe a novel approach to
collaboration.
Collaboration is important, not just for users but for
content providers too. It is often seen
as a natural progression from records-based content that allows new users to
get involved more quickly, and providing a social connection. At the time of
writing, not all sites provide collaborative tools – for instance findmypast – but that is the way of the future
for these sites. However, there are many ways that collaboration can occur.
Current tools focus on the creation of a shared picture of our biological
lineage, aka a global family tree, but future tools may support full family
history. There are also a number of other subjects under the micro-history umbrella
that are currently dealt with by specialist sites, including One-Name Studies and One-Place Studies. I
recently commented on a new UK site called myhomespast which allows people to
create a pictorial history of the homes they've lived in. It is not currently
viewed by its creators in terms of micro-history or social networking but the
scope for collaborating on your old streets, and for finding your old
neighbours, is huge.
At the end of my previous post, I hinted that focusing on a
unified family tree is deeply flawed. The problem is that the unit of
collaboration – the ‘person’ – is a conclusion that has been formed from
available evidence. Some things can never be determined, though, because
there’s simply no surviving evidence. In other cases, the evidence may be scant
and circumstantial which necessarily means those conclusions will become more subjective.
Establishing, then, that my Person-A is the same as your Person-A becomes more
complex. It’s not even a matter of is-it-the-same or is-it-different since some
subset of the person’s details may be substantially different.
A number of people have suggested using a different unit;
one based on evidence rather than conclusions. The growing concept of a Persona
has been the most common unit of these suggestions. Late last year, I imagined
a model that I would dearly love to have used. However, it wasn’t focused on a
family tree and so it did not exist. The more I considered it, the more I
realised how easily this could be implemented, not just by some large content
provider but by a small team of independent developers. As a long-standing
developer, I went as far as writing myself a specification and even prototyping
some code. However, I no longer have enough time to launch a new start-up. I
know through experience that they not only need a good product and talented
people but also the right mix of personalities – they’re hard!
The essence of the concept was of a tool for identifying
people and relationships in a mass of evidence, as opposed to working top-down on
a unified family tree. My own family history research incorporates far more
than a mere tree and I was happy to continue work on that with some limited
isolation. At the time, though, I was looking for a particular person in one of
the census returns of England & Wales. They were pretty elusive and I was
sure they’d changed their name. I was attempting to identify all their family
members, in-laws, neighbours, etc., in order to locate them. It seemed like a
task where collaboration would really have helped.
Let’s briefly stop to examine the essential differences of
this approach. Rather than working with subjective conclusions, it would be
working with something tangible; something whose existence is beyond contention
– an entry on a census page. Irrespective of the identification of the person
in that entry, and even if the name is not readable, the entry can be
referenced unambiguously. That allows collaborators to make the identification
of the person together, and of the relationships to other persons. Imagine
being able to draw a link between two entries on a census page, or between two
entries on side-by-side pages, and attach a relationship type, comments, etc.
Yes, most census returns do have a Role field but even when
it is correct, it is intra-household, and sometimes even more localised as in
the STEMMA®
example at: Census
Roles. This collaborative model would be extremely useful in identifying
those “strays” (people found in unexpected places on census night), or people
with misspelled or uncertain names, and those people who had deliberately tried
to obfuscate their true identity. This is far more than simply adding an
alternative name, birth year, or place of birth, as currently supported on ancestry.com.
OK, so why did I pick on the census of England & Wales
rather than, say, the civil registrations of vital (BMD) events or parish
registers? The answer was the goal of ring-fencing the specification to keep it
as simple as possible, and as independent of the large data collections as
possible. Each page of this census has a unique identifying code of
class/piece/folio/page. This, plus the ability to reference a particular line
relative to the start of its respective page, gives a convenient way to address
each and every entry. A difficulty in using other sources of evidence – and
ones for vital events in particular – is that they have no standardised
reference codes, and so it would be quite hard to ensure that any given source
doesn’t materialise in multiple independent forms. This would be less of a
problem for a content provider than it would be for an independent team.
So what about the images of the census pages? This is the
one item that the model really needs from an external source, and the answer is
delegation. The scheme I toyed with
was to summon the images onto the screen using a customisable URL that could be
sent to whichever content provider you were subscribed to, thus abstracting
that source. The idea is a little like the way FamilySearch delegates to its partner site findmypast for such a census image. In
their case, the URL has a private format and appears to use some internal image
identification, although a mapping from the public reference code obviously
exists somewhere. For instance:
After creating a link between two census persons, I imagined
being able to add a relationship type (e.g. Father-of) and some justifying
text. I also imagined another user adding a different relationship to the same
link, and then people being able to compare the cases we were making and either up-vote or down-vote our
interpretations.
The following illustration concerns two Nottingham families
in the 1881 census of England & Wales. I am identifying the adult members
of each household using their natural keys of class/piece/folio/page/line:
John Knowles RG11/3342/3/60/6
Eliza Knowles RG11/3342/3/60/7
Eliza Barker RG11/3342/3/60/10
John Webber RG11/3358/139/16/2
Elizabeth A. Webber RG11/3358/139/16/3
The links, voting, and associated notes in this illustration
revolve around the identification of Eliza Knowles. Two links each give
justifications for their differing identification, but one receives a down-vote
with an explanation as to why it must be wrong.
So, if you’ve gotten this far and you’re still awake then
you’re probably thinking ‘OK, that sounds a nice idea but isn’t it a
distraction from building your family tree’. Apart from it being a genuinely
workable method of collaboration, and one which could be implemented
independently, or by a content provider, or even by a partnership of content
providers (there’s a thought!), it also has hidden potential. During my
prototyping, it didn’t take long before I realised that I could turn my data
inside-out. That is, use the links that describe direct biological
relationships to generate a family tree for all or part-of my complete data. I
remember thinking ‘Heh! Wow!’ because that family tree would also include any
information on non-biological relationships, any per-user comments, implicit
citations to each census, and any explicit citations for other sources provided
by the users. More interestingly, it would also support alternative depictions
of the tree that could be rated against each other. In other words, a very rich
tapestry!
Technical Notes
I wouldn’t bother reading further unless you’re particularly
interested in the technical details of the prototype. This section is just a
summary of implementation details that might answer some burning questions.
- A link between any two given persons is unique in the database. Users can add details to the link such as a relationship type but each link has its own unique ID.
- A set of biological relationship types must be predefined and used for validation purposes. This can be supplemented by non-biological ones such as adoptive parents or step-siblings.
- Each person (i.e. entry on a census page) has its own ‘natural key’ that happens to be unique. Users can add details such as notes to a person.
- Some relationship types such as father and son-of are symmetrical. These would be normalised to reflect a preferred direction and reduce duplication.
- Biological relationship types would have to be checked, say overnight, to ensure that they constitute a Directed Acyclic Graph (i.e. no loops).
- A user can only vote once per link details, or per person details, but not for any owned by themselves.
- The birth name cannot always be identified from a census and so this, and the name as-written, both need a distinct field in the database tables.
- When incorporating multiple census returns (i.e. from different years), the links can no longer be directed (person-to-person) since their presence may not have been identified in all relevant census returns. Their per-census identities effectively form a SET and would be handled by different tables.
- The notes were plain-text in the prototype. Mark-up similar to STEMMA’s structured narrative would be ideal since it could represent citations to other sources, URLs, attachments such as images, and references to other persons on a census page.
- The prototype did not consider the lifecycle events such as notifying users when voting changes on their link/person details, or notifying a voting user when the associated details have been revised.
Database Schema
This is another technical section presenting a partial database schema in order to explain the entity relationships more clearly. The columns use the following key to their properties:
Auto Auto-generated ID
PK Primary key
FK Foreign Key
NK Natural Key
U[n] Uniqueness constraint on one-or-more columns
Null Column is nullable
LINK
|
||
Defines person-to-person links in same census. Only one
link exists for any two persons
|
||
LinkId
|
Auto,PK
|
ID for this link
|
P1Id
|
NK,U1
|
Key for first person
|
P2Id
|
NK,U1
|
Key for second person
|
LINK_DETAILS
|
||
Provides per-user details (e.g. type, notes) on any given
link. Each user can only add one set of details per link.
|
||
LinkDetId
|
Auto,PK
|
ID for these link details
|
LinkId
|
FK,U1
|
Reference to a specific link
|
UserId
|
FK,U1
|
User adding details
|
Notes
|
FK,Null
|
ID for any textual notes
|
Type
|
Link type (relationship)
|
PERSON_DETAILS
|
||
Provides per-user details (e.g. notes) on any given
person. Each user can only add one set of details per person.
|
||
PerDetId
|
Auto,PK
|
ID for these person details
|
PId
|
NK,U1
|
Key for person
|
UserId
|
FK,U1
|
User adding details
|
Notes
|
FK
|
ID for textual notes
|
LINK_VOTE
|
||
Voting on user details for specific link. Each user can
only vote once per link details, but not for their own.
|
||
LinkDetId
|
FK,U1
|
ID for link-details being voted on
|
UserId
|
FK,U1
|
User doing the voting
|
Vote
|
+1/-1 vote
|
|
Notes
|
FK,Null
|
ID for any textual notes
|
PERSON_VOTE
|
||
Voting on user details for a specific person. Each user
can only vote once per person details, but not for their own.
|
||
PerDetId
|
FK,U1
|
ID for the person-details being voted on
|
UserId
|
FK,U1
|
User doing the voting
|
Vote
|
+1/-1 vote
|
|
Notes
|
FK,Null
|
ID for any textual notes
|
No comments:
Post a Comment