Connected Histories was created by a partnership between the University of Hertfordshire, the Institute of Historical Research, University of London, and the University of Sheffield. Natural language processing, indexing and the development of the search engine were carried out by the The Digital Humanities Institute (University of Sheffield). The website front end was implemented by the Institute of Historical Research, using designs provided by Mickey and Mallory. Evaluation was carried out by the Centre for Computing in the Humanities at King's College London. See below for Project staff.
The project was made possible by a generous grant from the JISC e-Content Capital Programme. We are also grateful for assistance from the Universities of Hertfordshire, London and Sheffield.
In 2019 the website front end was transferred to the Digital Humanities Institute, University of Sheffield, which now has full responsibility for hosting this website.
Connected Histories has not created any new digital content. Instead, it provides integrated access to electronic content already available on distributed websites. Our search engine does not search these resources directly. Instead, it searches indexes we have created from the full content of each resource. Our approach to indexing depends on the nature of the electronic resource available:
The search engine uses the Apache Lucene text search engine, within a Java environment. It is made available to the Connected Histories website via a JavaServer Page application programming interface (API), which provides results in an XML format to the interface, which is hosted by the Digital Humanities Institute.
Evaluation of the natural language processing and search engine was carried out by the Centre for Computing in the Humanities at King's College London.
For the natural language processing (nlp), text samples from resources were manually marked up with names, places and dates, and the results compared with the markup produced by the natural language processing. Statistics were compiled of the numbers of true positives, false positives and false negatives, in order to generate measures of precision (a measure of the number of entities correctly classified divided by the number of entities identified by the nlp) and recall (the number of correctly classified entities divided by the total number of entities that are actually of that type). These two measures were combined into a single measure, the F-measure, which can vary between 0 (totally inaccurate) and 1 (completely accurate).
The results of this process indicate that the success of the natural language processing varied significantly, depending on the structure of the original text (the extent to which it follows expected language patterns) and, more importantly, the quality of the transcription. Text generated by optical character reading (OCR) produces less accurate results from the natural language processing than rekeyed text, because errors in the OCR make the text, both the words to be marked up and its surrounding context, less recognisable to a machine processor. The best results were for British History Online (F-measures between 0.64 to 0.74) and the Parliamentary Papers (0.625 to 0.775). Owing to the OCR, the worst results were for the 17th- and 18th-century British Newspapers (0.22 to 0.52). In general, the best results were found for locations and the worst for persons and dates, though persons and dates in the Parliamentary Papers and dates in British History Online also achieved good results.
The search engine was evaluated by 1) ensuring that it was not possible to break searches; 2) checking whether the searches produced relevant results; and 3) testing the links to the distributed websites.
Some of the resources searched by Connected Histories are only accessible via subscription. While Connected Histories allows users to search these resources and examine snippet results free of charge, we do not and cannot provide non-subscribers full access to these resources. To arrange such access, it is necessary to contact the proprietors of the relevant resource directly.
If you do have subscription access to a resource and encounter a login page you cannot get through, you should first log in to that resource using your normal access procedure before clicking on links in Connected Histories.
Connected Histories is a not-for-profit project whose sole objective is to provide more efficient access to electronic resources for those engaged in researching and teaching British history. Access to this website is free to all users. Since it costs money to maintain the site, and the grant which funded its creation has ended, it is necessary to obtain separate funding to ensure its continuation. For this reason, the site includes advertising. All profits derived from advertising will be devoted to maintaining and upgrading the site.
We welcome proposals for the inclusion of additional resources. If you are responsible for an electronic resource which you believe is appropriate for Connected Histories, please consult our New content information page.
We are grateful to the following for their help in bringing this project to completion:
"About the Project" © University of Hertfordshire, University of London, University of Sheffield, 2011-2018; University of Sheffield 2019 (www.connectedhistories.org, version 1.0, 18 September 2020), https://www.connectedhistories.org/about