Has genealogy found itself in a rut? Part
I of this article looked at the major contributors to modern genealogy so I
now want to examine the repercussions of their efforts and question whether,
collectively, they’ve resulted in good or bad for us as users.
Back in Internet
Genealogy - is this progress?, Janet Few mentioned that many
end-users do not venture beyond the brief details that have been transcribed
for them, often doing no more than saving any images to their computer. As well
as wasting any un-transcribed content, this would also mean that they have not
verified the transcribed portions. Since only enough portions are transcribed
to support a database index then valuable information may be missed. How many researchers
look at the neighbours in a census, or travelling companions in a passenger
list? But not all online sources have accompanying images, and so end-users are
then expected to blindly accept someone else’s limited transcription.
A converse to this situation occurs with certain newspaper
archives. Although they will have used OCR to generate text from the image of a
whole article, as opposed to merely selected items, they do not allow the end-user
to save that fully searchable and editable text; instead, expecting them to simply
save the image or transcribe it themselves. In other words, rather than relying
on someone’s transcription, end-users are then forced to rely on the image and
make their own transcriptions (if any).
Full and accurate transcription may be an effort but also so
valuable, and so why are there so few tools that support it, or standards for
its representation? What we have here is a failure of software and commercial
genealogy to appreciate the importance of a transcription, and its relationship
to an image copy of the original. Particularly in the newspaper case, there is
no excuse whatsoever for not providing access to the already transcribed text. But
even if you have a full transcript of a document, your online trees offer no
way to keep that and the associated image together as a single item associated
with your source reference. Where are users expected to include an analysis of such
a source?
Finally, it may be difficult-to-impossible to determine the
provenance of the digital information (image or transcript). Although this is
gradually changing, that change has been the result of pressure from traditional
genealogy, and generally from research-orientated genealogists who need that
information in order to make a considered argument. Maybe this was initially
considered too complicated for the host’s intended market, or maybe they just
didn’t understand genealogical research to that extent.
There are adequate technologies for recording provenance,
and other information, in the image itself, in meta-data that would be
invisible to the end-user but not to the software. What is missing is an agreed
standard from software genealogy and commitment within commercial genealogy.
What we have is zilch! A tentative proposal was produced within FHISO but there was no feedback or discussion over
it.
Software is all about organisation
and visualisation of your data, and
these are quite different to each other. In effect, the organisation helps make
data accessible, but the
visualisation is designed to convey information
to the end-user. One organisational schema will typically support multiple
visualisations, dependent upon the perspective being studied, the regional
settings and preferences of the user, and the sophistication of the product.
A family tree is essentially just one way of visualising
lineage — another being a pedigree chart — but the simplistic model used to
attract the mass market has made it into an organisational structure upon which
anything and everything is expected to be hung. This effectively treats a tree
like a wardrobe full of hangers, and results in an inappropriate organisation
of the data.
But what is meant by “your family tree”? A search for that
phrase suggests there are hundreds of thousands of references, combined with
verbs such as: find, discover, climb, Google, trace, flesh-out, grow, build, create,
and research. Do we all have just one family tree? Obviously not, unless you
believe that only male ancestors carrying your own surname are significant in
your history; however, even if we accept that we have multiple family trees
then where do we put step relations, half siblings, adopted or foster parents?
Where do we put incidental people who may have been so important? Where do we
record non-vital historical events in all those lives?
Surprisingly, many casual genealogists do look only at their
“surname tree” and commercial genealogy eagerly accommodates that, but what
happens if you want to store the lineage associated with every paternal and
maternal branch of each generation? How many distinct trees are we limited to?
What happens, too, when you want to attach a document or
photograph involving multiple people, maybe from distinct trees or even from none
of your trees? If you’re not forced to duplicate it then are you allowed to see
all the persons it is attached to? What about shared events where multiple
persons were involved, such as a wedding, or even a census? Merely attaching
the same item to multiple people leaves no room for a representation of the
event and its history. Such requirements should be common sense, but the
unswerving adherence to using trees as an organisational concept means that they’re
hard to accommodate.
Figure 1 – Documents or images relating to people on
different trees.
As an aside, notice that the linkages shown in this diagram
are bidirectional. Yes, it is possible for specific parts of an image to be
connected to entries in a tree — or some other organisational framework — a
little like being tagged on a
social-networking site. As mentioned already, the technology is there but the standards
and commitment are not.
Maybe the biggest criticism of online trees is that they’re
conclusion-based; they usually represent someone’s conclusions based on absent
evidence. Support for source citations began to appear when it became obvious
that unsubstantiated conclusions were propagating like a virus, but then
citations are insufficient unless the evidence is direct and non-conflicting. Where
is the incentive for writing any justification? Is it incompatible with the
mass-market objective of commercial genealogy?
Trees are generally about discrete data: names, vital-event
dates and places, and biological relationships. When combined with the
point-and-click ease with which search results can be added to a tree then there’s
no room for justifying an operation. If the expected name isn’t visible in the
search results then it is a brick wall for many people, and similarly if there are
too many close alternatives. That simple model offered by the commercial sites
presumes that the discrete data you require are all in their records,
somewhere, and that you’ll just be assembling them into your tree. From that
perspective, a citation serves only to say where a datum came from, and not why
you believe it to be relevant or correct. In effect, online trees bypass huge
parts of the research process. A source-based approach would fix that, and yet
still allow the tree as a visualisation of the underlying data.
Ask any group of people what they think collaboration means and the majority will mention unified online
trees or exchanging GEDCOM files. They’re not wrong but there is more: there
are forums, groups, message boards, and wikis devoted to helping people and
that is also a form of collaboration. Although collaboration existed before
Internet genealogy there are many more possibilities now, but do they really
help?
Let’s try and breakdown the types of collaboration into some
broad categories.
- Operational: helping others with questions about the how, why, or where. This is one of the biggest uses of the groups and forums.
- Research: working with others on a given research topic, such as a family or a surname. Working with other family members qualifies, but so too would one-name and one-place groups, as would any crowd-sourcing initiative. I will also add unified trees to this category.
- Publication: effectively sharing work that we’ve already done with others who may be interested. Examples include blogs, dedicated Web sites, and user-owned trees.
It’s interesting to compare which of these are currently supported
within commercial genealogy. Most sites provide public user-owned trees and
some provide unified trees. Not all offer a community area for operational
collaboration, only FamilySearch offers an area for “memories” that can be
linked to entries in trees, and Findmypast have no offerings that fit these
categories. But who supports collaborative research? Even collaborative
publication is not supported well.
In Part I, I mentioned that sites could allow their patrons
to upload images or documents under a Creative Commons licence, and that this
would help researchers elsewhere to make use of them with appropriate
attribution. My emphasis is to indicate that those sites invariably have an
insular approach to collaboration. They are not really concerned with
non-subscribers, but it’s a two-way street and they stand to benefit from a
little more vision.
Let me present an example. I recently published an article
entitled A Sad
Career in which I researched the short life of a girl who was not related
to me at all. I therefore do not have a tree for her family, but I still want
to share all my research with descendants of that family. Try as I did, the
best I could do was to add a link to a tree on Ancestry, but I could not
contact the owner there or elsewhere. It was about this time that I suggested
people like me could volunteer meta-data for their articles (or Web pages) that
would allow these sites to make them freely searchable (see Blogs
as Genealogical Sources). Making written research articles available as
another type of source means that these sites could retain their existing focus
on family trees, and would not have to embark on a major enhancement to support
narrative locally.
As that article states: this proposal should be a win-win
for all concerned; however, commercial genealogy has not even acknowledged it.
The worrying aspect to this is that if, as these commercial sites claim, they
have genealogists on their teams then they would be well aware of the written
suggestion and subsequent discussions. Alternatively, they must be in an ivory
tower.
There are a couple of software tools that currently
acknowledge evidence and its respective sources, but I am not aware of any that
allow you to work upwards from the information in sources without a precisely
predefined goal. I have discussed support for source-based genealogy at Our
Days of Future Passed — Part III, and a particular way of working at
Source
Mining. The difference is that it dissects a source to identify information
to be associated with several persons (rather than searching for specific
details for a given person), and with source-mining it assembles the history of
a person, family, place, etc., from all the information that can be found for it
across multiple sources. All the same issues of handling source references,
conflicts, and interpretation apply but the goal is much more general than
finding, say, a birth or marriage. Under Digitised Sources, above, I suggested
that a scaled-down version of this approach would also apply when building
family trees, but an inappropriate data organisation might render the approach
difficult-to-impossible to implement.
I recently had to make contributions to online trees at Ancestry
and FamilySearch and was shocked to find how hard it was to do something as
simple as take a given census page and associate all the details of a household
with the respective tree entries. I had to switch from person to person and continually
repeat myself, but when looking at it from the perspective of the census event
for that household then it is so much more natural — the citation is the same;
the event, place and date are the same; the people are usually part of the same
family; proof arguments relating to identification of the family, analysis of
errors, etc., are the same. So, as well as the existing ergonomics being very
poor, the obvious place to hold a written source analysis is stolen away.
But what else has been missed by our tools? In addition to
source-based genealogy, that series of posts beginning at Our
Days of Future Passed — Part I identified
the following alternative approaches that have been tried within software
genealogy: arboreal (tree) genealogy, event-based genealogy, and narrative
genealogy. One of the main thrusts of that series was that these should not
have to be exclusive alternatives. As already explained, the many possible
visualisations, including the user interactions with them, must be supported by
a separate organisational framework, which in the STEMMA case is a single
all-embracing one.
Another thrust was
that of generalising the subjects of our research from people to include
places, groups, and animals, including an orthogonal treatment of events,
evidence, sources, narrative, and hierarchical relationships. This level of
generalisation was the reason why I prefer the term micro-history to either family
history or genealogy.
There are many variations in the way that software tools can
be written, with each having its own benefits, as recently detailed by Tamura
Jones in a series of posts related to genealogical software choices. In Do
Genealogists Really Need a Database?, I justified my elimination of a
relational database on the basis that:
- they use indexes that are inefficient and not a good fit for historical data,
- they limit sharing because each product has its own schema,
- they risk data loss as disk-based linkages may become corrupt,
- they force an extra stage before data can be accessed or viewed,
- they reduce the longevity of the data as the database is tied to the longevity of a specific product.
As that same article concludes, having an all-embracing
representation in a file that doesn’t require a database gives a special degree
of freedom that genuinely helps with sharing. These genealogical contributions,
or “bundles” as I refer to them internally, may be likened to Word, Excel, and
Powerpoint files in the Microsoft Office paradigm — so termed not because Microsoft
invented it but because Office is a ubiquitous example of it. The essential
elements of this paradigm are that files can be directly exchanged and immediately
visualised by recipients, and that other products may use an API
to access the associated data for alternative visualisations.
Figure 2 – Visualising Excel data.
If an Excel spreadsheet were received in an email then you
could immediately click on it to see its information in a registered viewer.
This would be the Excel tool in this case, but note that an Adobe PDF file could
use a dedicated read-only viewer that is freely available; you only need a
licence if you want to create such files yourself. Alternatively, software developers
can write other tools that use the publicly documented API to present
information in different ways, and possibly to supplement suites of proprietary
software.
Figure 3 – Visualising STEMMA data.
These failures by software genealogy may be linked to the
prevailing notion of genealogy as family trees maintained in databases. A
particular concern of mine is that this model restricts what parts of our
history we can leave for the future, and how long its digital representation
will be accessible. Consider that a JPG image file is backed by an
international standard that is publicly accessible, whereas a genealogical
database generally has a proprietary schema implemented in a proprietary
database engine (I can list several such engines that are no longer available).
In Part I, I suggested that the fundamental medium of
narrative is poorly supported by our tools, and that it is either squeezed into
some internal (to a tree) plain-text field, or relegated to some external tool
such as a word-processor, blog, or wiki. I know of no public tool that supports
the integration of narrative — whether for research write-ups, stories and
memories, proof arguments, or transcription — with trees, timelines, geography,
and other forms of genealogical visualisation. Again, I am making the
distinction between organisation and visualisation so your first reading of
this paragraph may be misleading.
The concept of genealogical narrative has been tainted by
the laughable claims of some products that they can generate narrative from
discrete data in trees, and I want to distance myself from that. As an example,
the ProGen manual for professional genealogy implies that software combined
with narrative must mean template-generated robot-speak,[1]
and this perception will spread as long as such claims exist.
There may be a more subtle issue with narrative which
relates to its usage on the Web. In Our
Days of Future Passed — Part II, I started distinguishing narrative essay from narrative report, and rather more
recently noted that ProGen (p.354) also used the term narrative report in the same fashion, albeit with negative
connotations. A forum post from June 2016, entitled Hereinafter
Unsure, began as a simple question about potential confusion with the use
of hereinafter in a citation, but
quickly degenerated to a circular debate about writing for the Web and the use
of this same term. The term is widely used outside of genealogy, but I’d
adopted it to describe the format of my own research articles. These articles
are certainly not research reports,
which are far more rigorous and intended to describe research undertaken for a
client. Neither are they historical
accounts, which would be wholly about lives or events, or even case studies, which might be more
appropriate in an academic journal. My usage reflects the fact that they
embrace both the research journey and the uncovered history in a single
narrative form, together with identification of sources and analysis of
evidence. It soon became clear that there was a deep difference of opinion over
whether a readable account of research, plus the inclusion of evidence analysis
and citations, were a bad combination; furthermore, that the latter belong in
scholarly journals and books rather than in narrative shared on the Web or with
family. My belief is that the Web is the primary mechanism for sharing our
narrative, and that such material will be found by corresponding search
operations rather than reading journals. If true then the proper handling of
sources and evidence makes the difference between more throwaway genealogical
claims and something of value that can be used and cited by others. The sad
element of this is that online narrative would provide the missing venue for
the full use of the good methods taught within traditional genealogy, the very
ones that are now being poorly applied to online trees.
So is there a general feeling that the majority of
genealogists are not up the challenge of a written account? I sincerely hope not.
In fact, I believe that the majority would not only be capable, they would
welcome the encouragement from both commercial genealogy and traditional
genealogy. It is true that there are some so-called genealogy police who can deliver heavy-handed criticism, and they will
effectively discourage these people. But remember two things: experts who know
this craft inside-out did not arrive by parachute out of thin air — it took
them time to achieve it — and these people will not be writing for academic
publications. I personally don’t care if such work isn’t grammatically perfect,
or without every citation perfectly crafted and punctuated, as long as all the
details and reasoning are captured. People can always learn to do it better,
but presuming it to be the preserve of academics is a self-fulfilling prophecy.
The price we’ve paid for having access to online records was
giving genealogy a mass-market appeal — meaning ease and simplicity — but the
momentum associated with that market has changed the face of genealogy,
probably forever. It’s now very difficult to take a step back and look at what
might have been achieved with more vision and in the absence of this legacy.
There are many genealogists who have only ever known
computer-based genealogy — using the Internet and/or local applications — and
their perception will therefore have been determined by the currently available
sites and products. A case in point is that many of these users do not know how
to handle conflicting information from different sources, such as ages in
different census returns — not because it’s too complicated but because their
tools offer them the wrong orientation. If their family tree only accommodates
conclusions then it’s hardly surprising if they resort to creating multiple
birth events.
At some point, the software industry was always going to
look at genealogy as a market for new applications, but what would prospective software
developers currently see: family trees. It is hardly surprising that their data
models, databases, and the products themselves, all try to deliver variations
on this theme.
In an episode of Mondays
with Myrt on 10 Aug 2015
(timestamp 1:14:20), I suggested that genealogy was serving the interests of
the software industry, rather than software serving the interests of
genealogists. Had genealogy not turned into a big enough market then the
software industry would have selected some other field of endeavour to focus
its talents on. Unfortunately, it’s still rare to find professional software
designers who are also research-orientated genealogists, and I believe this is
hindering further progress.
These endeavours (software and history/genealogy) would
almost certainly have been exclusive career choices since they would each have required
a wealth of experience to become truly proficient, and each would have required
a quite different academic background. On top of this, both have their own
jargon, but with much room for confusion and ambiguity when participants
interact.
The main problem is one of education, but not the
traditional forms such as how to use a given Web site, or how to use a given
product, or even attaining qualifications such as BCG certification; there is a
lack of reciprocating knowledge, and vision, in both software genealogy and
traditional genealogy. Software genealogy certainly has a naïve and
overly-simple view of its goal, and so it needs input as part of that
education. I have mentioned a perception within traditional genealogy, though, of
software as necessarily conclusion-based, and possibly even a mistrust of its
capabilities or reliability. It’s a poor analogy but one that everyone can
relate to: no one would now prepare a document using pen-and-paper before entering
it into a word-processor. A word-processor is a complicated program, but it has
evolved to the point where almost everyone can use it to some level of productivity,
often with no documentation.
Traditional genealogy acknowledges the power and importance
of narrative, for all its many uses such as the handling of evidence and
inference, but it does not sell this notion within the digital world. As a
result, the requirement has not been picked up by software genealogy or by
commercial genealogy. This one deficiency, alone, can be linked to poorly
researched trees, a dearth of reasoning and justification on these trees, and a
disrespect of genealogy by academic historians. It is welcome, therefore, to
see kindex — whom I met at RootsTech 2016 —
progressing with their mark-up for narrative; and also to hear FamilySearch reinventing
itself and emphasising stories and memories over trees — although I hope
they can also see a place for fully researched, reasoned, and sourced
narrative.
So where does this Gordian knot leave me?
Well, I admit that I struggle to find common ground. Software genealogy is too
concerned with something different to what I'm doing; traditional genealogy has
taught me much, but it appears uninterested in its future within a digital
world; commercial genealogy provides me with valuable data, but I fear it
is not interested in what I really want to achieve. Someone please tell me
that I'm not alone!
[1] Elizabeth
Shown Mills, ed., Professional Genealogy: A Manual for Researchers, Writers,
Editors, Lecturers, and Librarians (Baltimore: Genealogical Publishing Co.,
2001), p.355; book hereinafter cited as ProGen.