Thursday, 29 January 2015

Warm Fuzzy Dates

No, not that sort of date! Calendar dates are a crucial part of historical research — including genealogy — but how well do we understand them? Is there more to their representation than a mere distinction between accurate and approximate?

A calendar is simply a mechanism by which a given culture records the passing of the days. I will try and restrict this article to the Gregorian calendar that we use everyday, although the basic principles can be applied to any calendar.

The Gregorian calendar has a selection of units that may be used in conjunction to express a given date, as illustrated below:

Structure of Gregorian date units, and the associated ISO numeric patterns

The pattern shown underneath each form is how it should be represented numerically according to the ISO 8601 standard, and the yearly-quarters pattern is shown in brackets since the ISO standard doesn’t currently address that form (see Is the ISO Date Standard Bad?).

Most genealogical dates try to describe a given day. Providing the actual time of an event is quite rare, but references to larger units are not so rare. When mentioning “last week”, or “the sixties”, or “19th Century”, then the implication is that the whole of that period is being referenced; not merely one particular day somewhere within it. Each of those ISO patterns may be truncated to express a date representing some of those cases, such as yyyy-mm or just yyyy. The proposed yyyy-Qq representation already describes a period greater than one day (i.e. three months), and it would have a very good use for certain record types. The GRO indexes of civil registrations for vital events in England & Wales are compiled on a quarterly basis, and that means that no finer-grained representation would be appropriate when citing the date of their entries. STEMMA refers to this concept as the granularity of the date reference, and it roughly corresponds to the GEDCOM concept of a date-period.

This is a subtle semantic difference from an approximate date, but it is the latter that we’re more familiar with. We commonly have a day-based date that we believe falls between some upper and lower limits — one of which could be unknown in the general case (i.e. including before or after some threshold). STEMMA refers to this concept as imprecision, and it roughly corresponds to the GEDCOM concept of a date-range.

In fact, imprecision also applies to dates with a granularity greater than one day, and the first table at Date Margins shows how a ±margin is interpreted in conjunction with different granularities by STEMMA. The following diagram uses lumen and penlumen[1] to visually illustrate how equality is interpreted as ‘having some overlap’, whilst the degree of the overlap may be used to rank date matches.

Another concept that is used with less-than-known dates is uncertainty. The difference between uncertainty and imprecision concerns how sure you are of a date value or of a date range. For instance, saying “I think he was born in 1878” would be a case of uncertainty whereas saying “He was born during 1876–1880” would be a case of imprecision. STEMMA doesn’t address this concept in the date notation, but it can attach an attribute of Surety=certainty% to the datum. By contrast, the US Library of Congress Extended Date Time Format (EDTF) contains specific syntax for representing each of these cases. It uses a suffix of ‘~’ (tilde) to indicate imprecision and ‘?’ to indicate uncertainty; both of which may be combined. For instance:

  • 2000-06?                   Possibly June 2000, but not definitely.
  • 1974~                        Approximately the year 1974 .
  • 1974?~                      Approximately 1974 but even that is uncertain.

These are examples of their Level-1 specification, but in Level-2 these suffixes may be applied to the individual parts of a date.

  • 2004?-06-11              Uncertain year (month & day known).
  • 2004-06~-11              Year and month are approx. (day known).
  • 2004-(06)?-11            Uncertain month (year and day known).
  • 2004-06-(11)~            Day is approximate (year & month known).
  • 2004-(06)?~               Month is approx. and uncertain (year known).
  • 2004-(06-11)?            Month and day uncertain (year known).

The EDTF has comprehensive mechanisms for handling partial dates, but I believe their mechanism for handling uncertainty in the digits of a date (e.g. 19uu-12-uu) is actually misplaced as part of its specification. This should not be date-specific and is encroaching on the bigger requirement of a standard representation for uncertain characters during transcription.

One area of confusion is that although there are distinct reasons for the different notational schemes, the schemes themselves are sometimes indistinct. For instance, there’s the humanly-readable notation which generally uses a c./ca. prefix (for circa, meaning “about”) for approximate dates, and an en-dash for date ranges (e.g. 1852–1855). It may also use word prefixes such as before or after. Looking at the GEDCOM support for dates shows that some of this humanly-readable notation has crept into an essentially computer-readable notation. Its date-range term uses prefix operators of AFT and BEF, and an infix operator of BET. Its date-approximated term uses prefix operators of ABT, CAL, and EST. The primary example of a computer-readable notation would be the ISO 8601 standard. Although it may be acceptable to employ the ISO notation in a document, some style guides indicate that the truncated numeric forms may be ambiguous to the reader. For instance, 1910-11 with a hyphen would be an ISO representation of November 1910, but 1910–11 (using an en-dash) would be a date range of 1910 to 1911. This ambiguity would not arise if the schemes were used in their appropriate contexts.

Even in the context of computer-readable notation, there are distinct goals that separate the different schemes. The EDTF notation is an expressive representation, designed to capture the full details of incomplete, approximate, or uncertain dates, and may therefore be more applicable to transcription. The W3CDTF format — which is a restricted subset of the ISO standard, employing the format yyyy-mm-dd — is a comparative representation. By that, I mean that any two dates in that representation are comparable, and all such dates would form a totally ordered set in mathematical terms. The ability to compare dates efficiently is essential for timelines and for date searches, and the general ability (for any data-type) underpins many types of software index, such as the B-tree. It’s worth noting that the different numeric ISO forms, highlighted above, are individually comparative but not together. For instance, the yyyy-mm-dd form cannot be directly sorted with the yyyy-Www form, and this was one of the driving forces for STEMMA implementing its own computer-readable notation; one that ensured all granularities were inclusively comparative (see Date Value).

The ability to compare dates is also a requirement when both imprecision and granularity are present together. Rather than encoding imprecision in the date string, STEMMA, uses its date notation to separately describe the start and end of the associated date range. This avoids encumbering the core notation while making it easy to implement comparisons in terms of the range end-points. The second table at Date Comparisons shows how STEMMA interprets comparison operators such as less-than-of-equal in this situation.

Indicating that a date falls before, after, or between other dates is called a temporal constraint. These obviously have their uses when implementing the concept of imprecision, but they are less appropriate between dates that both have some real-world significance. If you roughly knew, for instance, the dates of someone’s birth and baptism, then it would be inappropriate to express a temporal constraint to indicate that the latter is greater than the former. It’s inappropriate because the underlying semantics would have been lost. What is needed is an event constraint which indicates that their baptism follows their birth, and this topic was briefly discussed back in Eventful Genealogy – Part II. More recently, the topic of representing the birth order of a family’s children was discussed on the FHISO TSC-public mailing list at Birth Order. In the situation where their birth dates were unknown, it was suggested that a Family record could implicitly order them. This maybe true but a proper event constraint is a much more general concept, and one purposely designed to express those semantics. It could even be applied between twins when their birth dates are identical but their birth order was known to be otherwise.

If we want to take an extreme view of imprecision then we have to discuss the concept of probability distributions. Simply saying that something occurred during 1881–1885 doesn’t indicate whether 1881 is more or less likely than 1883 (i.e. mid-range); it simply describes a flat distribution of the likelihood. I believe that in most cases like this one, we could indicate one date that would be the statistical mode (i.e. the most common or likely value) of the distribution, but specifying and utilising distribution curves would be impractical in my opinion.

An interesting take on this may be found in recent research undertaken in Verona, Italy, to look at supporting fuzzy dates on their SITAVR information system.[2] Their research considers basic aspects of fuzzy dates, calendars, fuzzy temporal constraint networks (FTCN), and probability distributions. Those distributions are of a trapezoidal nature, and so require only four defining values rather than a full curve. Although the report may be very academic, it’s worth reading since the justification is the real-world archaeological data in their SITAVR system; much of which is subjective, estimated, or imprecise.

In conclusion, there are distinct reasons for the different date notations, and we should keep them in focus so that we don’t confuse them:

  • Computer-readable. These notations may record details of transcription issues (e.g. uncertain characters) or the uncertainty of a claimed date. I would contend that these are both general requirements that should apply to any datum — including numbers and text — and not just dates. For a decipherable date, they will also represent details of granularity and imprecision, both of which must be represented in a way that facilitates efficient comparison, sorting, and searching.
  • Humanly-readable. The traditional notations we use in written works rarely go into great detail regarding the possibilities or the levels of surety. In order to produce a humanly-readable version of a computer notation then one alternative might be to generate the nearest traditional form and use a footnote, or an interactive pop-up or right-click equivalent, to supplement it with the greater detail.

The jury may be out as regards the level of detail required in our notations, and whether imprecision should consider variable likelihoods (i.e. some type of probability distribution). However, in constructing such a notation, we must remain sure of whether it’s designed for humans or for computer software, and whether the issues being addressed are specific to dates or are a general consideration for any type of datum.

[1] Coined from Latin paene ("almost”) and lumen (“light”). Analogous to umbra and penumbra for shadow.
[2] Alberto Belussi and Sara Migliorini, "Modeling Time in Archaeological Data: the Verona Case Study", report to Dipartimento di Informatica Università degli Studi di Verona, Apr 2014, Verona University ( : accessed 29 Jan 2015).

Saturday, 17 January 2015

Hierarchical Sources

Some interrelated topics to be discussed in this article: What is a hierarchical source? How does it relate to a hierarchical arrangement and to provenance? Is one hierarchy enough? Does it affect our citations? Does it affect digital organisation?

Hierarchical Sources
You may be thinking that these are unrelated topics but let’s just begin with a question commonly posed in genealogical forums and mailing lists: ‘how do I organise my media files?’. This usually translates into ‘how do I name my files?’ or ‘how do I arrange the corresponding folder hierarchy?’, and an example may be found at: How should I Organise My Digital Documents?.

Most people have at least tried to organise their digital artefacts (i.e. document files and media files) by surname, and then realised how impractical that is. For instance, should a marriage certificate be organised by the groom’s or the bride’s surname? Whose surname do you use for a group photograph that includes several generations of relatives and in-laws? What do you do when you have inherited a photograph of ‘woman holding a baby’ but you haven’t yet formed a positive identification?

Back in May 2013, Sarah Ashley presented her solution at Organizing Your Genealogical Documents. This was to use a source-based scheme where the documents were each assigned sequential 4-digit identifiers while being scanned. Different computer folders were then used to store the different categories of material, such as vital events, newspaper articles, photographs, census pages, etc., and the individual files were named using the corresponding identifiers. In the following June, Louis Kessler presented an improved version of this at Source Based Document Organization where he suggested a hierarchical organisation, and also the use of the GEDCOM REFN tag (defined as: “A description or number used to identify an item for filing, storage, or other reference purposes”) for linking to the relevant documents.

These are good schemes but neither one suggests how those flat or hierarchical identifiers should be allocated, nor whether (when material was copied from some external source) there should be any relationship to external cataloguing of the original or an online version. Also, in the case of physical, rather than digital, artefacts then how should someone deal with a collection donated-by or inherited-from another family member?

The answer to these issues can be found in archival science. Archivists have been doing this for years, and they have international standards and a well-established vocabulary. In particular, provenance is a core principle of archival science, and it has two fundamental concepts: respect des fonds — basically grouping records according to their creator, or fonds — and original order — basically maintaining the same record order as that of their creator. In effect, I’m suggesting that we should manage our physical and digital artefacts as a micro-archive.

In May 2013, Sue Adams produced an excellent description of this approach on her Family Folklore blog at Provenance of a Personal Collection – Archival Accession, Arrangement and Description. She explained that archival arrangement places all the items at positions in a hierarchy reflecting their provenance, usage, and physical structure. An archival description would then provide information that served to identify, manage, locate and explain the archival materials at each level.[1] This definition is important because the resulting catalogue should identify information about an item rather than information within an item, and so not express any analysis or conclusions — more on this later.

The International Standard Archival Description (ISAD) defines a model for the levels in a hierarchical arrangement, and this includes fonds, sub-fonds, series, sub-series, files, and items[2]; items being the lowest level. A fonds (silent ‘d’ and ‘s’) is a term for a grouping of documents that have been naturally accumulated by an individual, family, or organisation as a result of their normal activities or work. This replaces much of the usage of the older term collection which is now reserved for groupings that have been assembled rather than created. The difference is effectively whether the grouping relates to a common provenance rather than a common characteristic.

Following Sue’s lead, I’ll select an example using a reference to a page in the 1901 census of England[3], as held by The National Archives of the UK (TNA): piece 3191, folio 125, page 19. Their guide to citing their documents and catalogues presents the following general document-reference formats:[4]

dept-code  series / piece
dept-code  series / piece / item

The census folio and page number are actually internal identifiers for those items, and so are relevant to a citation for some piece of information but not to the cataloguing of the associated source. The result might be something like:

RG 13/3191, f.125, p.19

Note that their recommendations involve the folio abbreviations f./ff. rather than the fo./fos. ones that some readers may be familiar with.

Evidence Explained covers the use of citations for multi-levelled archival arrangements, and remarks that “Your citation should follow the practice of the archive whose material you are using”.[5] However, it also warns that this may lead to conflicting styles when dealing with international sources, such as whether elements should be sequenced large-to-small or small-to-large.[6]

It has been suggested, more than once, that digital images should contain elements of meta-data that detail their provenance, and that this would greatly help when people have downloaded otherwise-untraceable images from online sources. The ubiquitous copying and downloading of digital images makes it nigh on impossible to know where they first came from, and who they should be attributed to. There is nothing technically impossible about this. For instance, the XMP meta-data design applies to several image and document formats, and without hindering applications that read them. It uses namespaces to make it applicable to any number of distinct meta-data sets, and it even has an international standard: ISO 16684-1:2012. Unfortunately, XMP is still a registered trademark of Adobe Systems Inc., and this has probably limited its take-up. One of the safer (i.e. more portable) alternatives is something called sidecar files, where the meta-data is held separately from the associated data by using a second file with a related name.

There is a good case for using something like XMP in this image-copying scenario, but it becomes less useful for your micro-archive because (a) it will likely contain physical as well as digital artefacts, and (b) the meta-data will be applicable to different units in the hierarchy, and not simply to the lowest-level items. Where a specific arrangement of the artefacts has been created then it is more common to use a meta-data database. However, STEMMA’s approach is to use its own file format to create machine-readable archival descriptions. Its files are plain-text, and its Resource entities may be used to describe each of the levels of a hierarchical arrangement. By incorporating its inheritance mechanism, this allows the description, provenance, access control, and any amount of meta-data to be provided for each unit in a single text file that can be loaded and referenced by other STEMMA files.

STEMMA has two main entity types relevant to this discussion:

  • Resource — a representation of a digital or physical artefact, or a combination of these such as when you have a scan of an original letter in your possession, or a digital photograph of a set of medals.
  • Citation — despite its name, this is a generalised reference to sources or information held elsewhere. It includes the location of information within a source, as well as the location of a source itself, and even allows for the representation of attribution.

Both of these entities share a mechanism of parameterisation. This means that they can both define a number of named parameters, each having its own specific data-type. These can be used in a similar way to the REFN tag, mentioned above, but with significant advantages. That GEDCOM tag allows only for a single amorphous code; a code that cannot be decomposed or reverse-engineered. Having the elements of a hierarchical reference in separate parameters allows them to be used in a more powerful fashion. For instance, returning to Sue’s example, the initial levels of her arrangement might be described using the following entities:

<Resource Name='rRWC' Abstract=’1’>

<Title>Raymond Walter Coulson (1922-1997) collection</Title>


<Param Name='Lev1'>CWC</Param>

<Param Name='File'/>

<Param Name='Folder1'>collections/${Lev1}</Param>




Papers, photographs, correspondence, memorabilia and probate documents of Raymond Walter Coulson of 322 Aston Hall Road, Aston, Birmingham, who died intestate on 24 May 1997.



<Resource Name='rRWC_Probate' Abstract=’1’>

<Title>Probate file</Title>

<BaseResourceLnk Name='rRWC’/>


<Param Name='Lev2' Type=’Integer’>1</Param>

<Param Name='Folder2'>${Folder1}/${Lev2}</Param>




Compiled by [my dad], administrator for the estate of Raymond Walter Coulson, between May 1997 and January 1998.



These effectively construct a hierarchy of Resource entities describing the various units in that arrangement. An individual file, such as a marriage certificate, could then be specified through its file name and the Resource entity representing that archival unit. Yes, the folder names could have been hard-coded, and the Resource entities crafted independently of each other, but using the inheritance mechanism introduced in Genealogical Inheritance makes it more flexible and maintainable. Each Resource inherits an accumulated set of parameters from the higher levels.

The parameter mechanism is a general-purpose tool, and may be used to add specific items of meta-data that you want to separate out of the associated archival description. One of the developers independently writing software around the STEMMA specification recently presented me with a related question. He was transferring photographic slides to a digital organisation, and wanted to know how to deal with dates written on each slide frame. Since parameters can be defined freely then I pointed out that a Resource one could be defined for this purpose with a specific data-type of ‘Date’. In a Citation entity, the parameters may be used to define citation elements; those discrete values that would be later formatted into a traditional reference-note citation.

Since both Resource and Citation entities share this parameterisation mechanism then it is also possible to pass parameters from one to another. Imagine, for instance, that the citation for the aforementioned census page had parameters for the piece, folio, and page. If you had a local image copy of it then it could be located using the same parameter values, either substituted into a file name or a folder hierarchy. They could even be used to interrogate a Web site in order to summon the census image on demand (see ‘rCensusImage’ example at Resource).[7]

We’ve mentioned the hierarchy of a source inherent in its archival arrangement, but are there any other examples of a source hierarchy? Well, the chain of data provenance when we cite a source — that is, the relationship between records and the individuals or organisations that have created, maintained, reproduced, transcribed, indexed, otherwise modified them — also constitutes a hierarchy. When we cite a derivative source, such as an online edition, or some database, then we usually cite the source of the source in a secondary fashion.[8]  Provenance also applies to specific information as well as to a source or source data. A common example is when we’re citing an author who is citing other works; ones they have consulted but which we haven’t. We may feel that the provenance of the information is important to our case, but we cannot directly cite what we haven’t consulted. This scenario is covered in some detail by Evidence Explained[9], but consider the case where an author hasn’t cited their source, but we believe we have identified an earlier version of their claim or statement. This may be very important to our case, especially if there are subtle differences, but a simple comment in a reference note may be insufficient to encompass our justification and reasoning.

An important point here is that these forms of hierarchy are facets of the real world, and not some subjective notion that software might decide to support or ignore. This issue was recently discussed on the FHISO TSC-Public mailing list starting at Filing Sources. STEMMA’s Citation entity was endowed with two types of hierarchical linkage: ParentCitationLnk, in order to model provenance (see Cite Seeing), and BaseCitationLnk, in order to model the structure of groupings and the structure within a given source (see Genealogical Inheritance).

So, both local materials and consulted materials held elsewhere, including any associated digital images of them, can be represented using some combination of Resource and Citation entities. The core genealogical data is where we would analyse those materials and form our conclusions, and that will necessarily require links or citations to those materials. However, materials should never be catalogued according to such conclusions since they may change. If you’re cataloguing a photograph of ‘woman holding a baby’, or a painting of ‘a cracked vase with daisies’, then it must be independent of opinions or conclusions. Even their archival description must only record what we know about the materials rather than something we’ve determined from their contents. Our core genealogical data will also need to reference these materials from multiple points, and in different ways — something that renders a simple name-based arrangement redundant.

This article is placing great emphasis on both our sources and the local artefacts in our own micro-archives, but why? Isn’t one arrangement as good as another? Why do we need to be concerned with provenance, or with the arrangement used by some archive? The answer to this would be obvious to an archivist, or to an historian, but less so to most genealogists. The problem is that the majority of genealogy — and especially where it involves online family trees — is people-centric. The pursuit very often boils down to that of searching for a person’s name, or the vital events of a named person, and since the results will mostly come from online data — data that is deliberately keyed on personal names — then it has some consequences:

  • The source of the information is an afterthought. Although some Web sites allow a researcher to tag data with links to their relevant online content, that is merely an electronic bookmark (in the form of a URL) and not a real citation.
  • Even when a researcher references a source, it is only in the context of a citation. The belief that the data is the answer, as opposed to the source contains a clue, means that any reasoning for the making of a considered argument is being short-circuited.

Contrast this with the way someone might approach historical research, where individual sources are assimilated and relevant items analysed and correlated with information from other sources. That style of research begins with a source rather than with a name. The origin, nature, and quality of the source are then very important factors during its analysis.

This same point was recently raised by Jan Murphy on the FHISO TSC-Public mailing at Format for Raw Source Content, and I’ll leave you with her own words — words designed to keep the software mindset focused on the real world:

I hate to keep arguing this point over and over again, but we are looking at documents and other source material.  We are not looking at people.  We are looking at sources, most of which (but not all) contain names. 

A lot of beginning researchers, including many of the people in the Genealogy Do-Over group, struggle to learn how to cite their sources, and why? Because if you work in a people-centric system the sources are always an afterthought.

** Post updated on 19 Apr 2017 to align with the changes in STEMMA V4.1 **

[1] CBPS - Sub-Committee on Descriptive Standards, "ISAD(G): General International Standard Archival Description - Second edition", International Council on Archives (ICA) ( : accessed 16 Jan 2015); attached document CBPS_2000_Guidelines_ISAD(G)_Second-edition_EN.pdf; glossary, p.10, s.v. “archival description”.
[2] “Model of the levels of arrangement of a fonds”, ISAD(G), appendix A-1, p.36.
[3] Whereas these TNA census references apply to England & Wales, they do not apply to Scotland. Scotland has its own system (see and this has caused issues for sites such as findmypast that try to provide a UK-wide search form. The Ancestry equivalent only solicits criteria such as piece/folio/page when specifically searching, say, the census of England, but findmypast currently solicits them in all UK cases, whether relevant or not. See Chris Paton’s views on this at FindmyPast - Scottish censuses.
[4] "Citing documents in The National Archives“, The National Archives of the UK (TNA) ( : accessed 16 Jan 2015).
[5] Elizabeth Shown Mills, Evidence Explained: Citing History Sources from Artifacts to Cyberspace, 2nd ed. (Baltimore, Maryland: Genealogical Pub. Co., 2009), p.116–119.
[6] E. S. Mills, sec.3.3 “International Differences”.
[7] The idea of a reliable, non-internal URL for summoning the image of a particular census page is a nice idea that could help when sharing data with friends and relatives, or between researcher and clients, without the paranoia associated with T&Cs or copyright. Although the recipient would need a subscription to the site, the idea could be adopted by other providers to create a sort of genealogical "Open URL" variation. It would be quite easy for them to offer because they already have form-fill functionality that achieves the same type of lookup. However, the idea is strangely ignored.
[8] E. S. Mills, p.180 under “Citing the Source of a Source”.
[9] E. S. Mills, sec.2.21 “Citing the Source of a Source”.

Tuesday, 6 January 2015

A Life Out of Balance

A small bit of research recently uncovered a truly unexpected result; something worthy of a soap-opera finale. It is said that Lady Justice is blind, and hence impartial, but sometimes she is blind in a different sense.

Back in Harsh Times, I introduced an historical character of Nottingham, Henry Pearson, who had somehow confused the concepts of a police record and a world record. Just following him in the local newspapers revealed that he had been convicted almost 90 times, and that he probably saw more of the Nottingham prison than of his own family:

Nottingham House of Correction, c1895
Figure 1 - Nottingham House of Correction, c1895.[1]

Such was his reputation that the newspaper reports used phrases such as “one of the most notorious characters known to Nottingham”, “a disgrace to the community”, and “the laziest vagabond in Nottingham”.

It’s hard to avoid his convictions due to their sheer number, and even Ancestry lists a couple of them:

  • 14 Apr 1887 at the Easter General Quarter Sessions: Offence of “Larceny Simple — Prior Convictions”. Sentenced to “3 Cal. Months — hard labour”.[2]
  • 17 Oct 1887 at the General Quarter Sessions, Shire Hall. Offence of “Larceny Simple — before convicted of felony”. Sentenced to “3 months”.[3] This corresponds with his arrest for stealing a quantity of cotton waste from the Great Northern Railway Company, as reported in the table of newspaper reports, linked above.

The newspapers continued to yield small reports that I’d previously overlooked, such as Henry being charged with drunk & disorderly, and of assaulting P.C. (Police Constable) Stevens at 12:30am in St. Michael’s Street. He was sentenced to one month prison for this.[4]

At the time of writing, the Nottinghamshire Archives was closed for major work to extend it. However, the staff undertook some limited research to find how much of Henry’s criminal record still survives. It seems that the Borough Quarter Sessions books from 1899 onwards have not survived, and the surviving ones are very bulky unindexed volumes. However, the following details of summary convictions at Petty Sessions were noted:

Easter Sessions 1896.[5]
24 Mar 1896
Drunk and disorderly.
Midsummer Sessions 1896.[6]
24 Apr 1896
Drunk and disorderly.
Michaelmas (Autumn) Sessions 1898.[7]
22 Jul 1898
Obstructing the highway.
Epiphany (January) Sessions 1899.[8]
22 Dec 1898
Drunk and disorderly.

Of the eight volumes spanning 1872–1899, this search included only the 1898–1899 volume, and as far as p.142 in the 1896–1898 volume (about ¼ of the way through).

As these offences were filed under the Petty Sessions, it was decided to search those instead as they would contain more details, and some were actually indexed by the name of the defendant. These registers however, do not commence until 1887, and then there are two main sets of registers: 120 for court no. 1, and 122 for court no. 2. From 1907, there is another set of 4 registers for a court no. 3.

The above 1896 references were checked to establish if these tallied, and this revealed the following details:

24 Mar 1896 [court no. 1][9]
Name: Henry Pearson
Offence: drunk and disorderly
Sentence: 15/- [shillings] or 10 days H[ard] L[abour]

24 Apr 1896 [court no. 2][10]
Name: Henry Pearson
Offence: drunk and disorderly
Sentence: 15/- [shillings] or 14 days H[ard] L[abour]

Having established the expected correlation, the intention was to use the indexes to proceed through each of the registers but it was quickly found that some were only partially indexed and some not indexed at all. Clearly, it would take a massive research exercise to uncover details of Henry’s every conviction. The goal of this research was mainly to determine the availability of further information and so a full search was not undertaken. One last search was performed at the beginning of the registers for each court.

14 Feb 1887 [court no. 1][11]
Name: Henry Pearson
Offence: stealing beef
Sentence: remanded until 21st inst

21 Feb 1887 [court no. 1][12]
Name: Henry Pearson
Offence: stealing beef
Sentence: sent to No.2 court

This referral to court no. 2 on 21 Feb 1887 could not be found on this or the subsequent few days, perhaps because this case was deferred.

7 Jan 1887 [court no. 2][13]
Name: Henry Pearson
Offence: allowing dog at large unmuzzled
Sentence: no appearance, 2/6 P[oor] B[ox]

21 Jan 1887 [court no. 2][14]
Name: Henry Pearson
Offence: allowing dog at large unmuzzled
Sentence: Withdrawn

Most of Henry’s convictions were for dunk & disorderly, and for fighting or assault. He certainly had a problem with authority and several of the assaults involved members of the police force. The first case reported in the newspaper occurred when he was about 18 or 19, but he already had previous convictions by then. Some of his convictions, though, possibly suggest that he was deliberately targeted by the police. Charges such as “sliding on the causeway” (3 Feb 1879) and “found sleeping in an outhouse in a yard” (24 Feb 1888) seem to have gone beyond that necessary for the keeping of law and order.

In amongst the cases of drink and violence can be seen a sad and derelict home-life. My previous blog-post showed that although Henry did not marry Rebecca Belshaw until about 1900, the newspaper report of 6 Sep 1889 suggested that she had already given birth to several illegitimate children, and also that he had been keeping company with her for some time before that. The following table summarises the birth dates of Rebecca’s children based on census and civil-registration data.

Rebecca Belshaw

Frank Belshaw

Laura Belshaw

Rose Belshaw
Cannot identify birth registration.
Ellen Belshaw
Died 1891 aged 8 months.
Henry Belshaw

Kate Belshaw

Annie Belshaw
Died 1897 aged 1 month.
William Pearson

Lily Pearson
21 Mar 1905
At 14 Holland’s Yard.

That same newspaper report explained that Henry had come home, struck Rebecca, and kicked her, inflicting a number of bruises and a black eye.[15] She stated that she had only received 9d (9 pence) from him in the last fortnight. Henry was sentenced to prison for 2 months.

Digging a little deeper found an earlier reference of Henry assaulting Rebecca Bellshaw [sic], and giving her a black eye, on 24 Jul 1880. He was found 15s (15 shillings) or 14 days for this.[16] Hence, he was with Rebecca from the birth of her first child, and was most likely the father of all of her children, even though they did not marry for a further 20 years, and even though all the Belshaw children were listed as step-children in the 1911 census. It’s worth mentioning that Henry and Rebecca did not appear to be avid church-goers. With the exception of Annie Belshaw, whose baptism on 31 Dec 1896 at Nottingham St Catherine was probably because she was a sickly child[17], I can find no evidence of any baptisms, or even of their marriage having occurred in a church.

In June of 1907, Henry was prosecuted by the NSPCC (National Society for the Prevention of Cruelty to Children) at the Nottingham Summons Court.[18] Henry, living at 14 Holland’s Yard, Kelly Street, Sneinton, was charged with having neglected his three children, Kate (12), William (5), and Lily (1¾), in such a manner as to cause them unnecessary suffering and injury to their health, on and before May 21st. He was described by the prosecution as “a thoroughly bad lot, being a thief, an habitual drunkard, and a man who has never seen to do a stroke of honest work”. It was stated that he often turned the wife and children out of the house without food or money while he ate a good meal, and this was corroborated by the NSPCC Inspector. Praise was given to Rebecca for doing as best she could to look after the children. In view of the seriousness, Henry was sent to prison for three months.

In January of 1908, Henry — still at Holland’s Yard — was again prosecuted by the NSPCC at the Summons Court.[19] The prosecution said this was the worst case heard in the court for a long time. When accused of being idle, Henry openly threatened the prosecution:

“Pearson was able to earn good money when he chose to work, but he never did any, and had it not been for the charity of neighbours, and the fact that Mrs. Pearson was an excellent mother, the children would have been reduced to starvation. Mr. Lucas [prosecuting for the NSPCC] referred to defendant as an idle scamp, whereupon Pearson exclaimed, ‘You’re a liar, calling me an idle scamp. I’d give you something if I’d got you out side.’”

It was suggested that following his previous three-month sentence he had immediately started drinking again, and had been in an almost continual state of insobriety since. He was sentenced to a further six months in prison.

In March of 1909, Henry was yet again prosecuted by the NSPCC.[20] The prosecution admitted that the previous three-month and six-month sentences had had no effect at all. Evidence was presented that his three young children (he apparently “had five altogether”) had been persistently neglected, kept in a dirty condition, and not supplied with sufficient food. Henry was sentenced to a further six months with hard labour.

The NSPCC were obviously very concerned for the welfare of the children as they were back in court in November 1913.[21] The same Mr. C. E. W. Lucas prosecuted for the NSPCC as on the previous three occasions. Henry — now at 35 Stanhope Street — was charged with neglecting his two young children aged 11 [William] and seven [Lily] respectively. Rebecca and her “crippled daughter” (unnamed) toiled long hours doing lace work for around 6s per week, and she stated that this was not enough to live on. The school attendance officer explained that the children had received 1,230 breakfasts and 1,599 dinners from the education authority at some cost to the city. In view of his previous 87 convictions, he was sent for trial to the Quarter Sessions.

Something must have happened at this point because the newspaper reports of his court appearances just stopped. Finally, at the age of 55 (the newspaper said 62), might he have become a reformed man? The next mention I could find was associated with the death of his wife on 21 Mar 1933 — some 20 years later — at 6 Camden Street, aged 74, of ‘Heart failure. Fatty degeneration. High blood pressure. Kidney disease’.[22] Henry was present at her death.

Moving forwards another couple of years to 1935, though, reveals a very interesting newspaper report of a Henry Pearson of about the right age (76) being arrested for drunkenness.[23] This poor old man could barely stand, partly through intoxication after being “out with the boys” and partly due to his frail legs. Both the magistrate and the newspaper report treated the whole incident in a light-hearted fashion and I include a full transcription of the report to illustrate the tone:




“I was not drunk—it’s my legs,” said 76-year-old Henry Pearson, of Elford-rise, when evidence was given before the Nottingham magistrates to-day that he was lying helplessly drunk in Goose-gate at 4.15 on Saturday afternoon.

P.c. Collins stated that after making his discovery he lifted Henry, only to find that he could not stand.

P.c. West corroborated as to defendant’s condition when he was brought to the police station.

Then came Henry’s explanation in a duologue with the chairman (Ald. Sir Albert Atkey). He is frail and rather deaf, and he was invited to stand close to the witness box.

“I am 76,” he pointed out. “ I had been with some friends and was the youngest of the four. If I’ve done wrong I want punishing, but it’s my legs that fail me.”

Sir Albert: You say it was your legs? — I’m certain it was. They do fail me.

What are you doing for it?—Nothing, it’s old age creeping on, I expect.

Later he assured the bench that he would give his word, as a man, that nothing of the kind would occur again.

Sir Albert: Look after your legs, will you?

Henry continued to talk of the time he had spent with “the boys,” and finally confessed: “Well, I must  have got a bit over the mark.”


It was stated that it was his first appearance, and Sir Albert remarked: “It would be a pity to have a mark against you at 76, wouldn’t it?’’

“So it would, sir” agreed Henry, with great heartiness.

“All right, we’ll let you go this time,” replied the chairman, with a twinkle in his eye.

And Henry went—that is, as quickly as those legs of his would carry him!

Surely this frail old man, joking with the magistrate about his time spent with his even-older drinking companions, could not be the habitual drunkard and lazy vagabond of over 20 years previous. Surely it could not be the man who was always fighting — sometimes with the police — and who would leave his family destitute and hungry while he ate his meals.

The last section about this being his first offence initially persuaded me that it couldn’t be the same person, but it would be all too easy to miss a good story by making such a rash assumption.

Moving forwards in time a little more finds the death of the notorious Henry Pearson on 24 Apr 1938 at the City Hospital, aged 79, of ‘Cardiac failure. Myocardial degeneration. Chronic bronchitis. Senility’.[24] I know this is the right Henry since the informant was his daughter, Kate Belshaw. However, his normal address was given as 91 Elford Rise, off Windmill Lane, and that matches the addresses of the 76-year-old Henry appearing in court in 1935.

Elford Rise had about 120 households but the chances of finding another Henry Pearson of this same advanced age on that same road was very slim. However, just to be sure, I checked for the deaths of all Henry Pearsons of a vaguely similar year of birth (1855–1860) in the same county, and they all died well before 1935, except for two distant outliers: one aged 81 in Newark (1938) and one aged 85 in Mansfield (1940), neither of which could be linked to Nottingham.

So, it would appear that the court system had completely forgotten about the old Henry Pearson, and that the magistrate had fallen for an amusing tale told by a frail 76-year-old who had simply been having a good time with his friends. They had not made the connection to his real past. I am sure that he really did leave the court “as quickly as those legs of his would carry him”, and I can almost hear his wheezing chuckle as he realised he had finally got one over on the courts, and he probably gave one last gesticulation as he left the building. I bet he got great mileage out of the story with his drinking companions that evening.

[1] Nottingham House of Correction, St John’s Street, c1895. Also known as St John’s Prison as it was built on land formerly occupied by a hospital dedicated to St. John the Baptist. Picture by A. W. Bird. Displayed by permission of Image Ref: NTGM017628.
[2] "England & Wales, Criminal Registers, 1791-1892", database, Ancestry ( : accessed 4 Jan 2015), entry for "Henry Pearson" of "Nottingham" on 14 Apr 1887; citing HO 27, Piece: 207, Page: 340, Entry: 6, The National Archives of the UK (TNA).
[3] "England & Wales, Criminal Registers, 1791-1892", entry for "Henry Pearson" of "Nottingham" on 17 Oct 1887; citing HO 27, Piece: 207, Page: 350, Entry: 2, TNA.
[4] “To-Days Police News”, Nottingham Evening Post (12 Jan 1885): p.3.
[5] Nottingham Borough Quarter Sessions Record Book 1896–1898, transcription by Notts. Archives, document ref: CA 3292, entry for "Henry Pearson", 24 Mar 1896, p.30; Notts. Archives.
[6] CA 3292, entry for "Henry Pearson", 24 Apr 1896, p.58.
[7] Nottingham Borough Quarter Sessions Record Book 1898–1899, transcription by Notts. Archives, document ref: CA 3293, entry for "Henry Pearson", 22 Jul 1898, p.49.
[8] CA 3293, entry for "Henry Pearson", 22 Dec 1898, p.154.
[9] Nottingham Borough Petty Sessions Record Book including March 1896 [Court no.1], transcription by Notts. Archives, document ref: C/PS/CA/1/38, entry for "Henry Pearson", 24 Mar 1896; volume is indexed by defendant’s name.
[10] Nottingham Borough Petty Sessions Record Book including April 1896 [Court no.1], document ref: C/PS/CA/1/39, entry for "Henry Pearson", 24 Mar 1896; volume is indexed by defendant’s name.
[11] Nottingham Borough Petty Sessions Record Book commencing January 1887 [Court no.1], transcription by Notts. Archives, document ref: C/PS/CA/1/1, entry for "Henry Pearson", 14 Feb 1887.
[12] C/PS/CA/1/1, entry for "Henry Pearson", 21 Feb 1887.
[13] Nottingham Borough Petty Sessions Record Book commencing January 1887 [Court no.2], transcription by Notts. Archives, document ref: C/PS/CA/2/1, entry for "Henry Pearson", 7 Jan 1887.
[14] C/PS/CA/2/1, entry for "Henry Pearson", 21 Jan 1887.
[15] “Cruelty to a Woman”, Nottingham Evening Post (6 Sep 1889): p.3.
[16] “Police Intelligence”, Nottingham Evening Post (30 Jul 1880): p.2.
[17] Nottinghamshire Family History Society (NottsFHS), Parish Register Baptism Transcriptions, CD-ROM, database (Nottingham, 1 Jan 2013), database version 6.0, entry for Annie Belshaw, 31 Dec 1896. NottsFHS, Parish Register Burial Transcriptions, CD-ROM, database (Nottingham, 1 Jan 2013), database version 6.0, entry for Annie Belshaw, 5 Jan1897.
[18] “Shocking Neglect of Children”, Nottingham Evening Post (6 Jun 1907): p.5.
[19] “A Disgrace to the Community”, Nottingham Evening Post (21 Jan 1908): p.5.
[20] “A Worthless Vagabond”, Nottingham Evening Post (15 Mar 1909): p.7.
[21] “A Worthless Vagabond”, Nottingham Evening Post (31 Dec 1913): p.7.
[22] England, death certificate for Rebecca Pearson; citing 7b/292/269, registered Nottingham 1933/Jun [Q2]; General Register Office (GRO), Southport.
[23] “OUT WITH THE BOYS — AT 76!”, Nottingham Evening Post (13 May 1935): p.7.
[24] England, death certificate for Henry Pearson; citing 7b/321/298, registered Nottingham 1938/Jun [Q2]; GRO.