Saturday, 10 October 2015

How Not To Design a Database Search



In this article, I want to examine the user interface (UI) to a recently launched database, and to analyse just how much thought really went into providing it. Is this an area where database providers can make an important contribution, or is a simple set of search fields and some SQL tables enough for our needs?

Design something useful
Figure 1 - Design something useful.[1]

The host of this database is Findmypast — again — but my goal is not to berate them; I want to dissect this clinically, and objectively, to see how a bit of forethought could make the difference between a tick-in-the-box and a genuinely useful genealogical resource.

The resource is the “England & Wales, Electoral Registers 1832-1932”, and I’ve been waiting a long time for this to come online as there is a wealth of information in the associated records. The data is made available in conjunction with the British Library, and I am hoping that the digitisation of these records will not stop at 1932 as they would become more and more useful as they approach the present day. I am assuming that privacy will not be an obstacle as Findmypast already host “UK Electoral Registers 2002-2014”, and Ancestry host “London, England, Electoral Registers, 1832-1965” along with some incomplete regional variants.

Electoral Registers are annual lists of people who were eligible to vote and these usually included their residential address, although the right to vote was primarily linked to property ownership until the Representation of the People Act 1918. The way that regions were divided up for voting purposes in Britain was, and still is, a little complicated, but the page at Electoral Divisions of the UK may be of some help.

The eligibility to vote varied greatly between the Boroughs until the Great Reform Act of 1832. As well as streamlining the criteria, this also led to a greater number of men being able to vote, but it was still the case that only one million of the seven million adult males in England and Wales could vote. This was doubled by the Second Reform Act of 1867, but even further reform in the Third Reform Act of 1884 still left 1 in 3 adult males, and all females, without the vote in England and Wales.

Although women could vote in local elections as of 1869, they wanted equal eligibility to vote in Parliamentary elections. Several Suffragette and Suffragist groups were established throughout the country to campaign and lobby the government for equal eligibility, and these groups were eventually brought together under the name of the National Union of Women’s Suffrage Societies (NUWSS) in 1897. The 1918 reforms, where the property qualifications for all men over the age of 21 were abolished, were strongly influenced by the effects of WWI, but it wasn't until the Equal Franchise Act of 1928 that men and women over the age of 21 could vote equally.

So did the online information meet my expectations? When the paralysis from my stunned amazement had subsided then I did find some useful details, but it was hard work! The fields in the search form comprised:

Who[2]
Year
Constituency
Polling district or place
County
Country
Additional keywords

OK, not all of these fields are going to be useful for the majority of researchers, but the form did include the primary ones. When the search results were displayed, though, only the following information was presented:

Constituency
Year
Season
Polling district or place
County
Country
Image number

Where is the voter’s name, or their address, you might reasonably ask — and I certainly did ask. The country is a waste of space given that you have the county, and I have no idea what use the image number is. The constituency is of dubious use in the search results but it is also very long. For instance: “P[arliamentary] C[ounty] of Nottinghamshire, Bassetlaw Division” was wrapped over 5 separate lines for each row of the associated search results.

So are the personal names important? Surely, you know what names you entered into the search form. No — the given name and surname are individually optional, so you may be looking at a group of related names. For instance, you may have entered just the surname and place to find members of the same family. The normal Variants option on the name fields is documented as not working (despite being present on the form), but wildcards are allowed. Hence, you need to see the names as they are recorded in the associated Electoral Register pages.

The root of the problem is that the data is available in the form of discrete PDF files, and although this isn’t a problem by itself, there is no database search being performed; your search criteria are used to perform a direct textual search of those PDF files, and that is less than satisfactory.

The problem with a file search is that there is no context; it is searching for words anywhere in the file, just as a newspaper search works. I wanted to find people in Nottingham with the surname Kirk but several of the hits were for Kirk Street. The help claims that the search results are ordered by the proximity of the words, but that does not guarantee that the given name and surname are on the same line; if you were looking for a John Smith then you might be presented with a file that happened to have “John” at the top of the page and “Smith” at the bottom.

Ideally, the text from those PDF files should be extracted, parsed, and then keyed according to the actual names. This isn’t rocket science, but it can be a little messy. The problem is that there isn’t a single layout: sometimes the residential address is on the same line as the personal name, and sometimes it’s on a sub-heading of its own; sometimes there are other fields associated with the name; sometimes the data is in multiple columns, and sometimes just one. I have noticed before that Ancestry’s attempt to identify names in Electoral Registers and the British Phone Books have often led to the misidentification of the correct address. In the meantime, the only recourse is to examine the image in every single case of the search results, but that’s actually impractical at the moment.

Clicking the ‘View document’ icon to the right of one search result gave more details information about the PDF file as a whole — not of the information alleged to have been found by the search — and some of the details shown in the initial search results might have been better if moved here instead. Clicking a further button downloaded the PDF file, and (in Firefox on Windows) I had to then select it from the Downloads area (3rd click). Unfortunately, the files do not have a file extension (there’s no excuse for that!) and so the browser didn’t know what type of file it was. A dialog therefore asked whether I wanted to open this “unknown file type” (4th click) and I was presented with a list of possibilities. Selecting ‘Adobe Reader’ (5th click) and clicking “OK” (6th click) caused a further dialog again asking whether I wanted to open this “unknown file type” (7th click).

I can’t heap the blame for this on Findmypast since there was obviously some collaboration with the British Library, but what happened to the project plan? The requirements to make this project useful must have been laid out before they started. The project was advertised as far back as 2011 so the issues of parsing the list text must have been considered, and solved. That problem is quite common in manipulating legacy business data, and there are tools that can help. These tools can be programmed to handle a particular layout, or data pattern, and there will only be a finite number of possibilities in the registers. Maybe this is the plan for the future. Maybe the project overran by an enormous amount and they had to make something available quickly. I have not seen anything written on these hopefully short-term limitations.

I don’t want to make a habit of these software reviews because if this UI ever gets fixed then my post will be redundant, and I would rather that they all stay as relevant as I can make them. Ironically, I was Findmypast’s biggest fan in their early days, and much favoured them over the other database providers.



[1] Image courtesy of Alan Chapman/Businessballs (http://www.businessballs.com/businessballs_treeswing_pictures.htm : accessed 8 Oct 2015).
[2] This asks for “First Name” and “Last Name”. Although “Given Name” and “Surname” would be more appropriate, I’m going to resist discussing that here. Suffice to say that their own help text mentions “Surname” rather than “Last Name”.