Given that the
majority of genealogists are currently working on "their tree", it
may be worth just taking a moment to understand what that means, and also what
we think it means.

We may take it for
granted that a family tree is a straightforward goal, and that its
visualisation is equally straightforward. If so then you are going to be
surprised.

# Graph Theory

Mathematically, the
concept of a tree is defined as part of graph theory, so let's just identify a
few useful and accurate terms.

- Vertices (or nodes) are the items being connected in the graph. Think of them as the persons in your family tree.

- Edges (or links) are the connections between the vertices.

- Path is a sequence of edges that joins a sequence of vertices.

- Directed edge is one that has a specific direction. Graphs are usually directed or undirected according to the nature of all their edges.

- Tree is an undirected graph in which any two vertices are connected by exactly one path. In other words, there is always a unique path to get from one vertex to any other.

- Forest is an undirected graph in which any two vertices are connected by at most one path. In other words, the graph may have disjoint tree segments.

- Layered graph drawing is a representation (not a graph type) in which the vertices of a directed graph are drawn in horizontal rows or layers to represent some common attribute (e.g. families or generations in a family tree).

- Acyclic means that a graph has no directed cycles. In other words, there is no path that will loop you back to where you were.

- Semi-directed cycle (or semi-cycle) is where the vertices of a loop are connected by directed edges but do not form a cycle. For instance, if three vertices, A,B,C, are connected by A→B, B→C, A→C then it would constitute a semi-directed cycle (C→A would have completed a directed cycle).

- DAG is a directed acyclic graph. Such graphs are frequently used for temporal ordering (i.e. events, including lineage ones) because of the unidirectional nature of time.

Note that mathematically,
the use of the term 'tree' in is not simply a comment on a visualisation
looking like the branches (or the roots) of a real tree.

# Basic Family Trees

Although we expect a
family tree to display in a top-down approach, where biological parents point
to children, we cannot guarantee that the underlying data has a specific representation
for the physical union between two people. Trying to equate that biological
element with marriage is far too naive for real lineage. The data format known
as GEDCOM is well-known to have a "family" concept that embraces two
spouses — tellingly termed the husband and wife — and their associated
children, but the fallacy is clear for all to see: a generic family is extremely hard to define
, and the implied "nuclear family" is an idealised concept. Worse
still, it is using the social concept of a family when it's the biological
concept of a union that is meaningful for a "lineage-linked format".
The format is also known to have had interpretational difficulties with the
notion of a family, and has tried various ways to include adopted children,
thus straying from a pure lineage-based linkage. In fact, all we can guarantee
is that each person has just one progenitive father and one progenitive mother,[1] even if they're
unknown.

Probably the simplest
family tree is one where we show direct ancestors, known as an ancestry chart
or pedigree chart. Because each vertex has two connected vertices on the level
above then it also constitutes a binary tree.

Figure 1 - Binary pedigree chart.

But note that this
representation (generated here by the SVG
Family-Tree Generator, but not uncommon) has a single upward edge
connecting to a bound pair of parent vertices. This is useful because it provides
a handle to select details of the parents' specific union, and it helps with
the visualisation (particularly in cases of step- or half-siblings) as simply
having two independent edges pointing to each person's parent vertices would
rapidly become hard to follow.

The converse of this
illustration, usually called a descendancy chart (and sometimes incorrectly
referred to as a decent-type pedigree chart — pedigree is about blood-line
ancestors) is where we show the children of a common ancestor and their spouse(s),
and then the children of the children, etc.

Figure 2 - Simple descendancy chart.

# Complications

The first thing to
note is that Fig.1 and Fig.2 represent extreme cases. Suppose that we were interested
in our direct ancestors, but also their siblings and the children of their
siblings. For instance:

Figure 3 - Chart showing ancestors, their siblings, and
their children.

This small
illustration works, but in general it would not be possible to display such
relationships without lines all crossing over each other. Whether you want to
do this depends on whether "your tree" is primarily for people
carrying your surname, starting from some root ancestor. This rather sexist
approach is still quite common, despite the fact that surnames do not carry our
genetics.

An important issue is
pedigree collapse, where an edge crosses over to other branches to create a
semi-directed cycle. The following illustration is of a first-cousin marriage.

Figure 4 - First-cousin marriage.

The fact that there
is, now, no unique path to get from these married cousins to their grandparents
means that the chart is technically not a tree, although it is still a DAG.

We've mentioned that
non-biological parents would create problems if placed on a chart depicting
lineage, but why is that? Well, such parents are not exclusive of biological
parents, and it's not uncommon for someone to have had foster parents and
adoptive parents in addition to their biological parents. They're still part of
the family history, irrespective of any personal preference to the contrary,
but they need specialised tools for their visualisation.

Sequential marriages
are relatively common, but related to this are half-siblings, step-siblings, non-marital
unions, and non-paternity events (NPEs). The following illustration depicts a
man who was married twice, having a son with the first wife and a daughter with
the second. At some point, he had also had a non-marital union with a woman
resulting in an illegitimate daughter (note that the green circle is changed,
here, to reflect this status). Also, the man's second wife was previously
married and had an associated son, plus a daughter that was the result of an
NPE (note the dashed line reflecting this).

Figure 5 - Half-siblings, step-siblings, sequential
marriages, non-marital unions, and NPEs.

Finally, suppose we
have cause to include people who are not related by blood or by marriage. I
suppose this could include the families of adoptive parents or guardians, but a
bigger example would be anyone performing a one-name or one-place study.

Figure 6 - Disjoint trees.

Note that this
illustration, which shows a neighbouring family, is technically called a forest
as it consists of disjoint trees.

# Databases

So, the connections
between people are manyfold in number and type, and the naive picture of
everyone forming a single tree from some root ancestors (or possibly even Adam
and Eve) is entirely unrealistic. Storing these connections in the data is not
a problem, in principle, although there is no universal standard, and what we
have is unlikely to have defined unambiguous ways of handling all the scenarios
that we've highlighted. The real problem is in their visualisation!

What many people do
not notice is that their genealogy software, be it desktop or online, usually
presents just a workable section of the stored data at once. If you had 10,000+
people in your tree then it would look rather like a crocheted football field
if presented all at once, but to present just a few generations around some
person of interest — especially during maintenance of that tree — is much more
useful, and easier. Such software allows you to navigate from one person of
interest to another, and so will continue to support a naive impression of your
"family tree".

Of course, we'll
continue to use the term "family tree", and we'll continue to think
of it as looking like a real tree, with branches and roots, but if you could
assimilate the underlying data as a computer would then you would realise how
different and complex it really is.

[1] Actually, technology is capable of engineering children with DNA
from three or more “parents” (see uk-government-ivf-dna-three-people).

## No comments:

## Post a Comment