Mining the personal – using Open Correspondence to explore correspondents

May 25, 2011 in Open Correspondence

Jonathan Gray, in his post on mapping influence on intellectual history, asks some very sensible questions regarding the influence of books on intellectuals and how they talk to each other. He comments that:

One could even imagine using other sources (library lending data, lecture lists, reading lists, catalogues, letters, notes and other sources) to try to systematically establish things like: * Author X corresponded with author Y * Author X possessed a copy of work A

In Open Correspondence, one of the aims is to look at the social networks of letters. Social networks are more than the people though.  As Jonathan identifies there are networks of things. Books are inherently social things. When somebody reads a text, they tend to mention it to somebody via a blog, review, tweet and so on. In the nineteenth century, the best way of doing this is in letters between people. It intersects with my personal interest in the literary history and how books ‘speak’ to each other.

As part of the recent Book Hackday, I decided to try and use the Letters schema, which is in development, to try and answer some of these questions using the current dataset, letters from Charles Dickens.

At the moment, the first of Jonathan’s queries is perhaps relatively simple.

The list of correspondents from the set is currently found at which offers links to the individual pages and some further data. I pulled these into a Redis NoSQL store as a separate list with the intention of using these at a later date to build collections of the works. I have found a bug that is preventing me finishing this at the moment but the idea I am trying to explore with this is the ability for users, say a school or university, to build their own version of a collection of letters without needing to photocopy them from a book.  This list in itself does not know whether the correspondent is an author. Perhaps this is a later addition to the metadata but it does require a data source to check against, something that would be very useful for gleaning bibliographic data.

The index can be the re-used to show the relationships by using the Javascript Infovis Toolkit. Another idea for a useful graph. That answers the first of Jonathan’s questions in part. Work still needs to be done to do this but visualisations have been generated – just not dynamically yet but that is part of the finalising stage.

Of course a follow up question is what did they write about to each other. Out of the myriad of things it might be, I’ve tried to focus on book titles, in particular the ones that Dickens wrote. In the first part of trying to find the books, I’ve focussed on his own works but when I can find the right API, then I hope to extend it. Perhaps the Open Bibliographic British Library dataset would be a good place to start but that is for another day. It does allow the researcher to view at a glance which books an author read, whether or not they owned them.

A slightly more statistical question that I thought might be interesting is graphing the spread of letters sent to a correspondent. Each correspondent  had an index for their own URI and it struck me as interesting to find expose try to ‘re-think’ the index to allow the user to discover information in different ways. I created a store in Redis which contains the results of a SPARQL query and then it shows the number of letters written per year using Protovis for the visualisation. From this we might see whether the correspondence lasts for some time, suggesting either persistence or importance, or whether it was focussed or brief. The graph comes from the results for W A Macready, an actor to whom Dickens wrote for a number of years. It shows a fairly steady stream of letters suggesting a stable relationship.

Graph of letters to WA Macready

Graph of letters to WA Macready

The toolkit is still in development and is perhaps too reliant on Redis but I thought I’d share some of the ideas that are being developed. It needs to fit into the current database structure, or perhaps running of the current endpoint. The toolkit stores lists of data which needs visualisation and to provide a basic API with the medium term aim to put it into the data section of the Open Correspondence site. The other side of the toolkit is that through using it, various areas to be improved have come to light.

I do think that it is beginning to answer some of Jonathan’s questions. The APIs will not replace the role of scholarship but it can show ways into the data that might not have been thought of or to give the researcher an overview of the data. They show what can be done with indexing though, that indexes can be redesigned to show different pieces of data in different contexts.

0 responses to Mining the personal – using Open Correspondence to explore correspondents

Leave a reply

Your email address will not be published. Required fields are marked *