What is the FAN Principle, and when would we use it? What
generalisations of it can be employed? How does it relate to analytical methods
from outside of genealogy? How do the different methods relate to our intended
goals?
There is some confusion over what the FAN acronym stands for.
I previously thought that it was Family,
Associates, and Neighbours;[1] and
later became Friends, Associates, and
Neighbours,[2] possibly
because it’s so easy to lapse into “Friends, Romans, and countrymen …”.[3]. The
historical difference between friends (or enemies) and weaker associates could
be a subjective one, and so not as useful to us in directing our research;
however, knowing the extent of a person’s family might also be part of the goal
to which this technique is applied in the first place. More recently, the terms
get merged as Friends & Family,
Associates, and Neighbours, with the acronym being unofficially extended to
FFAN.
The sources I gave for the two variants of FAN, above, were
from the same year and the same author (Elizabeth Shown Mills) and so didn’t
support my initial impression; I had to get more details. According to
Elizabeth, she started using the term FAN
Principle in her Advanced Research Methodology (ARM) track at the Institute
of Genealogy and Historical Research (IGHR), Samford University, during the
early 1990s. That ARM track commenced in 1986, but its inspiration came the
previous year when a defeatist response in the APG quarterly newsletter
prompted her to analyse her own successes and failures. At that time, she
emphasised neighbours and associates since she didn’t believe that family needed emphasising. Also, whole-family genealogy, as opposed to direct-line genealogy, was comparatively
uncommon, and what there was mostly amounted to just following males with the
same surname. Her technique was much wider than this, and required individuals
to be placed in their respective community
context in order to find new sources of relevant information. After several
years of teaching this, she hit upon the notion that “Every ancestor had their FAN Club: Friends, Associates, and Neighbours.” With the explosion of
Internet genealogy during the mid-2000s, whole-family
research (not the same as simple name gathering) had become positively rare,
and that’s when she started referring to Friends
& Family, Associates, and Neighbours.
The FAN Principle
is a research method for studying individuals in the context of their FAN Club in order to widen the search
for relevant information.[4] It is employed when we have no documents that
give us direct evidence about a person’s identity, origin, and/or parentage.
Although commercial genealogy may suggest that you can “build your tree”
directly from their records, we all know that this rarely works; it’s not long
before we can’t find someone, or we can’t identify someone (usually a woman),
or there are alternatives that are too close for us to distinguish. You’re then
in the world of inferential genealogy
where you have to study the scant information available and make an argument for
what the truth might have been — the better the argument, the more reliable the
conclusion.
There’s a concise example of the FAN Principle provided in QuickLesson
11. It describes a Mary Smith who married a James Boyd in 1853, and
shows how correlating various sources relating to her FAN Club allowed her family to be identified.
In this case, because there were no sources directly
identifying Mary, it begins by looking at the most obvious person in her FAN Club: her husband, and then looking
at his FAN Club. This is an accepted
technique for identifying women during those times as they would be conspicuous
by their absence in most records.
Figure 1 – Targeted research to identify Mary Smith via
her husband.
The above diagram illustrates that different sources relate
to different sets of associates of
the target person: James Boyd. For simplicity, it doesn’t show all the associates in this example, or all the
sources; the following table includes all the sources.
|
Marriage
|
Road order
|
Land registry
|
1850 census
|
Boyd family
|
||||
John
|
|
|
●
|
●
|
James
|
●
|
●
|
●
|
●
|
Mary
|
●
|
|
●
|
|
Andrew
|
|
●
|
●
|
|
Franklin
|
|
●
|
●
|
●
|
Smith neighbours
|
||||
William
|
|
|
|
●
|
Jane
|
|
|
|
●
|
Samuel
|
|
●
|
●
|
●
|
William
|
|
●
|
●
|
●
|
Joseph
|
|
●
|
●
|
●
|
Thomas
|
|
|
|
●
|
Mary
|
|
|
|
●
|
Table 1 – Relationship of associates and sources for Boyd/Smith example.
By correlating the information provided by those sources,
and by considering their contexts (dates, ages, occupation, etc.), then an
argument is made for the wife of James Boyd being Mary C. Smith, daughter of
the neighbouring William and Jane Smith.
The FAN principle
is also described by the terms Cluster
Research[5]
or Cluster Genealogy[6],
but not the term Cluster Analysis. Cluster Analysis is a
long-standing research method that has been applied to many different fields. I
will take a brief tour of it in order to determine what, if any, relationship
exists to the FAN Principle.
Cluster analysis
separates data (or objects) into groups that are meaningful, useful, or both.
Methods of cluster analysis fall into
two broad types: ones where the clustering is evident in the data itself
(empirical) and ones where we deliberately categorise data according to some
shared property (categorical).
With empirical cluster
analysis, data is typically plotted in some data space (e.g. pressure against temperature) or geographical space and the distribution
of points examined for clusters. Although it’s usually easy to distinguish
clusters visually, there are many different algorithms for locating them and
establishing their boundaries. It would then be necessary to explain the number
or shape of the clusters, or even their very existence.
Figure 2 – Empirical cluster analysis: clusters evident
in the data, and require explanation.
One example of this method might be when analysing the
geographical distribution of some disease or ailment. A genealogical example
might be when looking at the distribution and movements of a family.
With categorical cluster
analysis, data (or objects) are separated into groups that reflect some
common attribute or property. This may be an abstraction to support statistics
or summarisation, or a precursor to some other type of analysis. It is this
type, therefore, to which the FAN
Principle is related; the groups of family,
friends, neighbours, and other associates,
are the clusters into which we have separated the general associates of a target person.
Figure 3 – Categorical cluster analysis: separation in
preparation for some other study.
Because the conceptual clusters have been predefined in this
method then we might expect to see alternative results if we change them. In
fact, in both types of analysis, clusters may be either strict
partitional (each object only in one cluster), hierarchical (object in a child
cluster is also in the containing parent cluster), or overlapping (object may
be in multiple, non-exclusive clusters).
Figure 4 – Overlapping, non-exclusive clusters.
The FAN Club clusters
are overlapping rather than hierarchical. For instance, not all family are
neighbours, and not all neighbours are family.
There exists a specific cluster variant that deserves a
mention: graph-based. In this
variant, objects in a cluster are connected to other members of the same
cluster by some relationship type, and not connected to members of other
clusters. This is different to the clusters of objects that share a common
property. At a stretch, it might be possible to describe a family cluster in this way as all its members are connected by some
sort of family relationship; however, it is not particularly relevant to the FAN Club as associations there are
specifically relative to the target person rather than to other members.
Let’s take a deeper look at the nature of the FAN Club. We have already mentioned that
its clusters are overlapping, or non-exclusive, and citied the example that not
all family are neighbours and vice versa.
In fact, the clusters are simply the target’s associates grouped for priority of investigation. The FAN QuickSheet contains a diagram illustrating the concept of targeted research, where concentric
circles represent clusters of associates,
ordered according to the strength of their connection with the target, which
might be searched from the innermost outwards.
- Target person
- Known relatives and in-laws
- Others of same surname
- Associates and neighbours of target
- Associates of those associates
First, notice that these circles are not literal
interpretations of the FAN acronym; it would be a limiting folly to treat the
acronym as some simple prescription. Even this diagram is just a guide, though,
and we might conceive of more circles according to the nature of the particular
problem and some knowledge of the potential sources. For instance, people of
the same surname who are also neighbours are more likely to be family members (known
or unknown) and so potentially represent a stronger connection than ones
elsewhere. I once subdivided neighbours to prioritise ones in the same
occupation, based on the assumption that they may have worked in the same place
as the target.
So are all these associates
just acquaintances? Not really; the use of the word also embraces cases of a general
connection, or association, between those persons and the target. In essence,
these associations are the properties shared by the members of each cluster. The
associates might be acquaintances … or
they might be family members who had never met, or persons of the same surname but
different family, etc. A favourite of mine, also mentioned in the FAN QuickSheet, is the list of persons
interred in the same or neighbouring burial plots because it often throws up
surprising further connections.
The targeted-research list, above, illustrates another
technique: that of looking at indirect associates,
or associates-of-associates in this case. Looking at the FAN Club of an associate may
be necessary in order to understand the nature of a connection, or to
investigate further connections; this will begin to form a network rather than
a set of connections anchored on the target person. The worked example of Mary
Smith uses this technique as it looks at the FAN Club of her husband. We can envisage many variations of this,
such as family-of-neighbours or family-of-family (including in-laws), but where
should we stop? Well, there’s no shortage of scope but an iterative approach,
customised for each case, would be more practical than trying to enumerate
every possible direct and indirect associate
at the outset. The strength of an indirect association depends on the product
of the strengths of the individual direct associations, and so it can fall-off
very quickly.
A recent case of mine involved the identity of a George
Kirk, for whom there was no visible birth/baptism record. By identifying his
father (Joseph Kirk), from his second marriage certificate, and so finding his
mother (Elizabeth2 Hutchinson),
and then looking at her parents (Joshua & Elizabeth1), and then identifying all her siblings, it was
possible to show that George was actually Elizabeth’s2 own son, but baptised as the very youngest son of
Joshua & Elizabeth1 before
she got married. What I’d done was to deliberately look at the family-of-family
of George.
Whether we’re looking at direct or indirect associates, looking at related FAN Clubs means that we have intersecting
clusters.
Figure 5 – Intersection of FAN Clubs for direct and
indirect connections.
Those intersecting clusters represent the fact that there
may be some shared associates. This
will be far more likely for a direct associate
(i.e. the FAN Club of someone in the
target’s FAN Club) than for an
indirect associate (e.g. the FAN Club of someone who has an associate in common with the target’s FAN Club).
All this means, of course, is that those lives are
interlocked, and the history of one will affect and be affected-by the history
of others. Putting it another way: you
cannot research an individual in isolation!
It is usually said that whole-family
research is a prerequisite for cluster
research; however, I will suggest that family
reconstitution is a more fundamental notion because it applies to arbitrary
families rather than specifically your own. The term is defined in one
dictionary as follows: “The technique of compiling family trees for as many
people as possible in a chosen area of study, e.g. a parish, so as to obtain
detailed demographic data on matters such as age at marriage, or expectation of
life”.[7]
While this is a fair definition, I take issue with the emphasis on
demographics, particularly from a family-history dictionary.
The concept is fundamental, therefore, because it underpins
several distinct genealogical pursuits:
- Whole-family genealogy. While I cannot find a strict definition, it can be described in terms of its differences from direct-line genealogy where only direct ancestors (maternal and paternal) are researched. Whole-family genealogy means that the siblings of every ancestor are also researched, and possibly their descendants too. Either of these may be constrained by surname, such that only direct ancestors with your surname are considered (possibly for establishing a particular pedigree), or only descendants of some single progenitor who carry your surname (usually for a so-called “your family tree”). Whole-family genealogy also means looking at the offspring of any multiple marriages, and also the marriages of the women in each generation.
- One-name studies. Studying everyone of a given surname, including its variants. This might be worldwide, or it might be constrained by place and/or time period.
- One-place-studies. Studying the whole population of a given community, such as a village or hamlet.
Family reconstitution
is essential for any of these pursuits because it is the first step in
establishing the structure of some community; without that then you could not
investigate the associations of a family with other families or individuals.
So let’s cast the net even further: what about micro-history? Well, all of the above
pursuits are variations of micro-history,
but my own use of this term would also include historical subjects other than
persons, such as places, groups (e.g. regiments, companies, classes), and
animals.
Genealogy is almost always about persons rather than places,
or any other historical subject, but the same method would be applicable in all
cases. For instance, we could analyse places in a similar manner to persons,
and establish the identity of a place reference through an examination of its associations.
This would force a difference between treating a place as some property for clustering
persons and treating it as an independent entity with its own identity and associations.
In reality, establishing the identity of a given person may first require the
identity of some place to be established, and so it is artificial to think of
these cases as fundamentally separate.
Having found items of relevant information in the extra sources
from the community context, it is
then time to make an argument for the identity of some person, or for the biological
relationships within some family. We’re now out of cluster analysis and into another long-standing method: Link Analysis.
This Wikipedia page actually gives a pretty poor summary of link analysis; it gives the impression
that it is all about large-scale software processing of connections found in
bulk data. While this may be the current usage, it is a method that predates
the computer age, and it was originally used as a way of visualising
connections in logical deduction — see the introduction to Our
Days of Future Passed — Part III.
Figure 6 – Use of Link Analysis for analysing and
correlating source information.
In other words, the application of genealogy software to
this method would primarily be about visualisation, and helping us keep track
of sources, information, and specific references. An Internet search for images
relating to “Link Analysis Software” gives many ideas for visualisation, but
they all share the same fault: they are node-heavy and assume that most of the
information is related to the objects (nodes) rather than to the links (edges).
In a genealogical context, each link would have to embrace any quoted
information (or links to associated transcriptions), the relevant source, and
our analysis or deduction (in narrative form), whereas the objects would mostly
represent person references (as opposed to identified people who lived).
So where would the transition arise between clusters and
links? We have already mentioned that cluster
research, and the FAN Principle,
employ cluster analysis in terms of
categorising persons for targeted research. That research would find relevant
information that could be used to create an argument for establishing someone’s
identity, but it is highly unlikely that any single item will be enough to
achieve this. Correlating and comparing those items is where link analysis would be employed,
irrespective of whether this was done in your head, with a pencil and paper, or
with some new software tool.
My original intention with this analysis was to look at the
essence of the FAN Principle, and so
understand how it is applied to address specific research problems; to compare
this research method with certain ones outside of genealogy; and to understand
the relationship between these methods and the various genealogical (or
historical) goals that we may aspire to. Out of this deeper understanding of
the overall landscape should come the ability to develop software that might
better help in those pursuits — and particularly in their visualisations — as
opposed to simply interpreting the FAN acronym literally.
What I didn’t anticipate was the level to which cluster research applies in all veins of
genealogy. Just as historical context is
essential for the study of historical events, so community context is essential in the study of individuals. It
is probably one of the most fundamental concepts in any historical research,
and it lies at the heart of many of its pursuits, in additional to its
application to solving difficult identity problems. What a shame, then, that
modern Internet genealogy encourages people to deal only with direct answers to
simple questions; more Charlie foxtrot
than cluster research!
Inferential genealogy
should be a concept applicable to everyone’s research, but it has sadly become
associated with the professional or the academic. I have already heaped much of
the blame for this on commercial genealogy’s simplistic model (see Reaping
What We Sow — Part I and Reaping
What We Sow — Part II) but can anything be done to counter it? Some
inferential cases may be very complicated and involve extensive associations in
order to make an argument, but not all will be so complicated as to be out of
the reach of the more ordinary genealogist. What is missing is a set of
powerful tools for visualising the associations, and supporting our inferences
by accepting written narrative at appropriate places. Of course, it would have
to be a benefit rather than a chore, and the easier it was to use then the
greater that benefit.
[1] Elizabeth Shown
Mills, QuickSheet: The Historical Biographer's Guide to Cluster Research
(The FAN Principle) (Baltimore: Genealogical Publishing Co., 2012);
hereinafter cited as FAN QuickSheet.
[2] Elizabeth Shown
Mills, “QuickLesson 11: Identity Problems & the FAN Principle”, Evidence
Explained: Historical Analysis, Citation & Source Usage (https://www.evidenceexplained.com/content/quicklesson-11-identity-problems-fan-principle
: posted 26 Aug 2012, accessed 24 Oct 2016); hereinafter cited as QuickLesson 11.
[4] In genealogy, that
is. The term “Fan principle” may also be a reference to the “Fan dipole”
antenna design for optimised bandwidth, or the “Fan-Line Principle” that
employs three fan lines in stock market predictions.
[6] “Cluster genealogy”, Wikipedia (https://en.wikipedia.org/wiki/Cluster_genealogy : accessed 4 Nov 2016).
[7] David Hey, The Oxford Dictionary of
Local and Family History (Oxford University Press,
1997),s.v. “family reconstitution”.