The digitisation of historical documents is truly a great
benefit to genealogists and to historians. Without it then we would have to
travel a lot more — demonstrating more than a little commitment to our chosen
field — and spend longer trying to find that essential bit of information for
our quest. But are there instances where it works against us?
Newspaper archives are one of the most incredible of these
resources, and I use several of them. The British Newspaper
Archive (BNA) is a partnership between the British Library and Findmypast to
digitise up to 40 million newspaper pages from the British Library's vast
collection. When it was launched in 2011, I chose to access the resource via my
existing Findmypast subscription rather than take out an independent
subscription to the BNA, but my resulting love-hate relationship with their
interface has been smouldering ever since.
The other day, I found the need to search for references to
a “Miss Jesson” (as she was always known in print) in Nottinghamshire
newspapers from the period 1850–1899. This was because Mary Jesson was an ancestor
who worked as a costume maker in the theatre, and the Nottingham Theatre Royal celebrated
its 150th anniversary on 25 Sep 2015. Although I’d written about her
before, in A
Rich French Actor, my goal on this occasion was to see if I could place her
at the theatre when it first opened, on 25 Sep 1865.
I entered the criteria and it came up with 10 hits,
including 3 false positives from the 1850s — referring to a different person — and
5 hits from the period 1881–1882 that I already knew about. This left the
following two hits from 1865 and 1894:
THIS DAY'S RACING. “… We wUI only 'add that it is
excellent throughout— in fact, superlatively so. The dresses reflect very great
credit on Miss Jesson, and Miss Collier is to be congratulated on most of the
dances she has arranged. The music has been neatly and appropriately …”
27 July 1894 - Nottingham
Evening Post - Nottingham, Nottinghamshire
NOTTS. MICHAELMAS
QUARTER.SESSIONS. “… arranged
by J. Ketltno. The Lime Lights by W. Marriott. The Gas arrangements by W.
Watchorn. The magnificent dresses by Miss Jesson. The whole produced under the
immediate and personal superintendence of Mr. Thomas W. Charles. Doors open at
7; to commence …”
20 October 1865 - Nottinghamshire
Guardian - Nottingham, Nottinghamshire
These were really interesting because: (a) 1865 was the date
that the Nottingham theatre opened, and (b) by 1894 she had moved to a theatre
in London. Hence, these two hits had the prospect of telling me something that
I didn’t know, and maybe even overturning what I thought I did know. I was full
of excitement!
However, I looked at these editions, and looked again, but
could I find those transcribed extracts? …. could I heck! You see, those titles
were there but the extracts were not. Problem
1: Findmypast has never highlighted the hits when viewing the associated pages
and that can mean having to read a whole broadsheet page of newspaper print,
and occasionally more, searching with a fine toothcomb for the words. I admit
to having abandoned a number of previous searches during my research, simply
because I could not find the alleged references. They have been aware of the
problem ever since they hosted the BNA data, but how a major enterprise like that
could be conceived without providing for this feature in the design escapes me;
other archives provide it.
Eventually, I found that both of these newspaper editions
contained a reference to a “Mr. Jesson”, but still nothing remotely close to
the extracts. Problem 2: the BNA
search engine insists on relaxing your search criteria, automatically including
many similar words for you, but provides no way of overriding that. It you want
to be very specific and find only the specified word(s) then it is
impossible. In contrast, we would achieve this fundamental requirement in
Google by placing something in quotation marks.
This particular search engine is generally very weak, and has
not been well-designed. There are no Boolean facilities (specifying OR/AND
between words or phrases) and no way of eliminating particular words — problem 3. The latter deficiency is
particularly annoying if the search engine includes many words that you didn’t
ask for. When Findmypast asked their users for feedback on the changes that
they had made last year, there were several related to the newspaper searches.
Unfortunately, some of the interpretations of that feedback seemed to be quite
obtuse. For instance, on 30 Jan 2014, one such user request read:
Newspaper Searches - exclude
unwanted records
Able to exclude unwanted records
on Newspaper searches.
This looks quite straightforward to anyone who has used a
modern search engine, but the response was:
We’re a little stuck here with
the idea of what is unwanted and what isn’t. The difficulty is knowing this
automatically. We’re unable as a result to offer this kind of service.
The user’s suggestion was declined and this bizarre response
was still visible at the time of writing. How could anyone imagine that a user
was requesting the “automatic” exclusion of unwanted records?
I tried to help at this point and on 11 Nov 2014 I posted a lengthy
analysis of some 8 separate user requests, suggesting that they were
variations of a smaller number of common themes, and relating them to
demonstrable problems. That post was inaccessible at the time of writing due to
a “This UserVoice subdomain is currently available!” error.
So where did the above extracts come from? This was not easy
to determine because including too much of the transcribed text returned no
hits and including too little returned too many hits, but there is virtually no
control over the search process for the user. For instance, searching for just the
phrase "personal superintendence" and the word “theatre”, both from
the 1865 extract, resulted in 18 full pages of hits, and some of these included
words that I didn’t want such as “person”, “personally”, and “superintended”.
To compound this, the default ‘search by relevance’ does
anything but this — problem 4. A
case I had presented to Findmypast in 2014 was still evident when I repeated it
for this article: searching for Elizabeth Bond in Nottinghamshire newspapers
gives 14 hits, but some of these include intermediate words such as “Sarah”, “New”,
“Mary”, and “Woolley of”. In particular, the hit for “Elizabeth Woolley of Bond”
(Nottinghamshire Guardian, 10 Sep
1857) is presented before one of “Elizabeth Bond” (Nottinghamshire Guardian, 2 Nov 1866).
I eventually determined that the 1894 extract was actually
from “CHRISTMAS IN NOTTINGHAM” (Nottinghamshire
Guardian, 30 Dec 1881), although the transcribed extract was shown as “…
Unknown …” in that specific case. Although I couldn’t find the 1865 extract
manually, it did match one of the other original 10 hits: “AMUSEMENTS, THEATRE
ROYAL, NOTTINGHAM” (Nottingham Evening
Post, 31 Dec 1881). In other words, both of these hits were red herrings
and I had spent some considerable time chasing them.
So isn’t this just a case of mis-indexing? Although such
errors are rare, they do happen occasionally. Well, no — I have a recollection
of reporting the 1865 case a couple of years ago but I have no proof. Problem 5: Findmypast provide no
trackable call number that can be revisited to check on the progress of a
software bug, transcription error, indexing error, etc. By now, it’s probably
hard for me to disguise who the subject was in my previous article: Customer
Service. Also, what’s the probability of such an indexing error occurring
twice in a single search? Isn’t it indicative of a systemic error?
What I do have proof of is that I reported a similar error on
30 Apr 2015 because I kept a copy of my text in that case. I was searching for
references to the name Frank Whiley during 1900–1949, while researching for Like
Father, Like Son, and it had yielded the following hit:
COUNTY COUNCIL AND
POLICE OFFICER SUED. “…the Rev. J. W. Busby, said afterwards, "
I shall never forget to-day's experience." Those who died in the fire were
Mr Frank Whiley (52), an unemployed labourer, of Henry Street, Sneinton, Notts;
his wife Rose (49); and his two daughters, Lily (14) and …”
17 February 1923 - Gloucester
Journal - Gloucester, Gloucestershire
There is an article in that newspaper with the given title,
but not that transcribed extract of a funeral following a tragic fire in
Nottingham. That text did appear, almost word-for-word, in a number of national
newspapers on 30 Dec 1937, but not in the Gloucester
Journal, and certainly not in 1923!
In order to round off this outpouring of frustration, I
decided to check which national newspapers did include this same, or similar, text.
I searched for the phrase: "Busby said Afterwards", with no other
filters, and the results were shocking! Eliminating two false positives from 1931
left the following:
CROWDS IN TEARS.
“… unable to restrain their tears, almost drowned with sobs the voices of the
two* clergymen. One of them. Rev. J. W. Busby, said afterwards. " I shall
never forget today's experience." Ten girls, friends of the family,
carried Florrie's coffin and acted …”
30 December 1937 - Western
Morning News - Plymouth, Devon
SOBBING MOURNERS
INTERRUPT FUNERAL SERVICE. “… unable to restrain their tears, almost
drowned with sobs the voices of the two clergymen. One of them, the Rev. J. W.
Busby, said afterwards: " I shall never forget to-day's experience."
Those who died in the fire were Mr Prank Whiley (52) an unemployed labourer …”
30 December 1937 - Western
Daily Press - Bristol, Bristol
BOROUGH PETTY
SESSIONS. “… unable to restrain their tears, almost drowned with
sobs the voices of the two clergy men. One of them, the Rev. J. W. Busby, said
afterwards, " I shall never forget to-day's experience." Those who
died in the fire were Mr Frank Whiley (52), an unemployed …”
09 June 1888 - Northampton
Mercury - Northampton, Northamptonshire
CROWDS AT FUNERAL OF
FIRE VICTIMS. “… Unknown …”
30 December 1937 - Aberdeen
Journal - Aberdeen, Aberdeenshire, Scotland
The hit from the Aberdeen
Journal, for some unexplained reason, didn’t show a transcribed extract.
However, it was a true hit and the relevant extract did show up by using some
slightly modified criteria just a few minutes afterwards.
The interesting hit is the 1888 one from the Northampton Mercury, which is another case
of an error. The fire wasn’t in 1888, and that “Borough Petty Sessions” article
did not contain the alleged extract. While the case I reported last year did
not show, a new one that I had not seen before did show.
I can’t believe that I am the only victim here. I thought
that I was going to write about just two recent cases that, by some incredible fluke,
had appeared in the same search. To realise that I had reported at least one
similar case before, and then to encounter yet another case while writing this
article, has left me with a complete loss of confidence in this resource. It
cannot currently be described as fit-for-purpose with this litany of indexing
errors and the weakness of its search engine.
I often record negative searches, or likely-looking hits
that I have eliminated, but it looks like I now need to record all the indexing
errors in order to avoid wasting my time. How many of my abandoned searches might
have fallen into this category without me knowing?
No comments:
Post a Comment