You are browsing the archive for 2011 May.

Web designer needed for The Public Domain Review website

Adam Green - May 31, 2011 in Design, Technical

The following post is from Adam Green, Editor of the The Public Domain Review.

Hello there! We are looking for someone to make alterations to The Public Domain Review website, adapting the current WordPress Thematic theme.

We want to change the homepage to more of a ‘box-like’ presentation, with many more different zones on first sight (rather than the more blog-type two column set up which there is at the moment), and so creating more immediate hooks into different areas. In addition to the main article ‘post’ pages, to create four new pages for an image archive, audio archive , text archive, and video archive. Basically, making the whole thing into more of a ‘multi-zoned hub‘.

The site currently looks like this:

We want it to look something more like this:

More specifically:

  • Title header being moved up to take up less space, and text in general being a bit smaller, and so making more content available on first sight.
  • New homepage to have three columns – a main central column, flanked by two smaller columns.
  • The main central column will house a ‘large snippet view’ of the most recent article (with picture), with below it two other articles in a smaller snippet view (without picture).
  • Upon clicking on any of the articles shown on the homepage (on the title or a “read more”) it takes you through to a dedicated reading page for that particular article. Perhaps with two columns, a main column for the full article, and  a second column on the right with links to more articles (in the form of the little picture icons which are there at the moment, as well as a perhaps a tag cloud for categories).
  • In one of the smaller columns, snippet views of four new pages (image, audio, video, text) – all which link through to the respective pages where the featured image/audio/video/text is the main thing, but with a growing archive beneath.  (The video and audio will normally be embedded from other websites, or else uploaded).
  • A tag-cloud of categories (present on the homepage and also the specific article pages). Each category will link through to a page displaying snippet views of the articles in that category, which when clicked take you to that article page.

We are looking to get this done as soon as possible, over the next few weeks. If anyone is up for helping out, or knows anyone else who might be, then please get in touch! We can offer you perpetual acknowledgement on the Public Domain Review website and an exciting mystery gift!

If you’re interested, please contact: publicdomainreview@okfn.org.

Week Ahead: Maurizio Napolitano

Maurizio Napolitano - May 30, 2011 in Updates

## Past Week

  • angry with the town of Trento for the license choice on the issue of geographic data
    • They want to release in open data, but it is spoken in the autumn
  • made contacts with UNESCO (thanks Primavera!) for a OpenStreetMap project in Mantua
  • participation in wherecamp.eu
  • written some documents for the release of geo-data from the province of Trento as open data

## This Week

Week Ahead: Primavera

Primavera De Filippi - May 30, 2011 in Uncategorized, Updates

Past Week

  • Revised the PDCalc library code to comply with the new format of Bibliographica
  • Implemented the FR Calculator

Week Ahead

  • publicdomainworks.net
  • meet up with Stefano Costa re: public heritage + OKCon ?
  • meet up with Elizabeth Townsend re: creation of a PDCalc community of experts/reviewers + OKCon ?
  • catchup with Will + Mark re: Bibliographica interface

Maurizio ‘napo’ Napolitano

Maurizio Napolitano - May 30, 2011 in People, Profiles

Maurizio Napolitano is a technologist of the FBK – Fondazione Bruno Kessler (the Research Centre of the Autonomous Province of TrentoItaly). He is interested in free/open source software, social media, construction of shared knowledge (eg Wikipedia, and OSM), geographical information systems, and open data.

He works on several projects in direct contact with the public administration (open data, open source software, social networks, electronic voting …) It’s also very involved with the Italian communities (and associations) of related issues of which he is interested From May 2011 is one of the ambassadors of OKFN.

Week Ahead: Lucy Chambers

Lucy Chambers - May 30, 2011 in Updates

Availability this week: 5 days

Availability next week: 1/2 day Monday (Although will make the coord, CKAN and OpenSpending calls)

Past Week

  • OKF Starter’s Guide (Part of OKF Manual) for Kat
  • Wrote OpenSpending Launch – now needs publishing
  • Catchup with numerous volunteers
  • CKAN Roundup
  • Created Logo for Stickers for OKCon
  • Worked on OKF User’s Manual
  • Open Data Challenge
  • Highrise binge

Week Ahead

  • Catch up with Argentina and Mexico re: their data in OpenSpending
  • Work with Martin Keegan on documentation for OpenSpending
  • Open Data Challenge Judging Procedure
  • Update projects page
  • Catch up with blog backlog
  • Get Stickers Printed for OKCon
  • Publish OpenSpending Launch

Overflow from last week

  • OKF User’s Manual
  • Add existing volunteers to email list
  • Update Events
  • Curate ideas.okfn.org

McNamara’s whereabouts

Tim McNamara - May 30, 2011 in Uncategorized

This week I will be focusing on the annexes to the Open Data Manual. I’ve also been spending some time working on the OKFN dashboard.

Sorry for short update- sent from phone.

Week ahead: Jason Kitcat

Open Knowledge Foundation - May 28, 2011 in Updates

Past Week

  • Deal with a LOT of email
  • Created draft contracts and signed final contracts for new hires
  • Work on accounts, finances, invoices and tax
  • Catchups with Rufus
  • Prep work for two major grant applications in process
  • Corralled job ad for new Python & JS coders

This Week

Availability: Monday, Wednesday & Friday

  • Corp Tax filing
  • More grant app prep
  • Respond to job applicants
  • Draft a consultancy policy
  • Work on finance reporting

Worklog 2011-05-23 to 2011-05-27

Jonathan Gray - May 27, 2011 in Uncategorized

Some things I did this week:

Developing database front-ends for maximum use

Tim McNamara - May 25, 2011 in Design, Technical

You’ve got a database. You’ve got an audience. How should you connect them? This post looks at methods to ensure that both your website and your API is well used.

There are some of the main recommendations:

  • Think of search engines and third-party developers as types of users. If you serve their needs, then you will greatly increase your system’s reach.
  • Avoid authentication. Requiring passwords prevents search engines from accessing your content, restricting your service’s ability to grow traffic organically.
  • Be friendly to crowd-sourcing. If you allow people to suggest improvements to your data, its utility will grow exponentially.
  • Use clear, liberal licensing. This provides lots of certainty to people making use of your service, meaning they will have the confidence to build interesting and useful tools.

Make sure you know what your goal is

Agencies spend money on tools because they want people to access information. They don’t spend money creating websites purely to keep software developers employed. Accessing the information is what’s important. However, there can be a tendency to only consider the needs of users who are accessing the system via a web browser. Taking this view really minimises the potential spread of the agency’s information.

Third parties, such as search engines and other applications can bring large new audiences to your agency’s information. Every new third-party application is another channel of information that you didn’t need to pay for. Therefore, you should spend some time thinking about how third-parties’ might want to access your site.

Making databases accessible to search engines

Search engines are amazingly good at giving specific results while being very simple to use. They also use their own resources to return results, which keeps your resources for others. Plus, they’re results are likely to be faster and more relevant than yours.

You may think that your particular archive has a very specialised audience. Thus, they may not need to use a search engine. However, they are still likely to be more familiar to using that search box rather than yours. . Search engines do need some help you want to draw people in.

Here are some tips:

  • Avoid authentication.
  • Provide your content as full text.
  • Provide a sitemap.xml.
  • Provide something like “Browse” as well as search.

Avoid Authentication

Authentication may seem sensible. You get to see monitor the use of your service. You prevent abuse of your service. You can contact people if anything negative happens. But..

You make it impossible for search engines to get to your content. They don’t use passwords. Therefore, they’ll never be able to authenticate.

Another problem is that you reduce the discoverability of the system for third-party developers. If people need to sign up before they explore the system, they may not bother. If you’re worried about abuse, use standard rate-limiting measures. Attackers have many more options than to use your web API to take you down.

Provide your content as full-text

As far as you’re able, provide your content as full text. Search engines have recently begun to index PDF files, however it’s likely you’ll get the best results with plain text.

Provide a sitemap.xml

A sitemap.xml is a file that your web developer will know about. It’s basically a menu for search engines about what the best bits of content are in your site. Resources that appear on your sitemap are able to be discovered more easily than merely following links.

Provide something like “Browse”

Make sure that it’s possible for users to reach all of the resources by clicking on links. This enables the web spider to index the whole site by following links. I recommend that this is available even if you have millions of records.

Making databases accessible to social networks

If you have a data-centric service, I doubt that you need social network integration. However, you want to make sure that if someone does mention you in social media, that it makes sense.

Make sure that each resource has its own page

If you were looking through a catalogue of books, you would expect that each book has its own page. This is because the unique page is likely to have a whole lot of metadata associated with the book. Things like the author, the publisher and so forth. If you have a large catalogue, attempt to build a similar service.

Trust users

Be friendly to external annotations. Allow crowd-sourced data. People like to fill gaps or make improvements. It makes them feel like ownership of records. This builds personal interest and emotional depth to the project. Many government departments struggle with this. They are very resistant to the idea .

The Semantic Web loves your metadata

If you have a high degree of structured information, use semantic web technologies. These tools are increasingly moving out of academia and are enhancing people’s web experience. For example, the Open Graph Protocol, is used heavily by Facebook. This allows you to tell Facebook and other services a resource’s attributes, so they can improve users’ experiences.

Make your database accessible to developers

There is a worldwide community of software developers looking to improve the lives of their users.

Community

Creating a community around your data is essential. A simple step is to create a mailing list. This will allow developers and users to build their own FAQ. It will also ensure that your agency wont be stuck with the whole support burden.

Avoid Authenticate (again!)

As much as authentication is bad for search engines, it’s even worse for humans. People hate having to sign up to things. They want to be able to explore. By hiding content behind authentication, you make it harder for third-parties to spend the time considering whether it’s worth their time to build services with.

Export formats

You may have a wonderful XML schema for your data. However, JSON has won. Its simple and is extremely well supported. Supporting JSON lowers the barriers to entry to your application. This in turn means it is more likely for useful services to be created.

Licensing Matters

Complexity adds compliance costs. The more difficult it is for the terms to be understood, the less likely it is that your service will be used. Therefore, please emphasise to your legal team that plain English is called for. The idea of licencing is to clarify everyone’s responsibilities so that a positive relationship can be created. Don’t endanger that by creating a license that scares developers away.

Licensing a web service comes in two parts. The first is the terms of accessing your database. The second is the licence of the data that your database holds. Make sure that both of these elements are clear.

One of the more effective mechanisms for preventing abuse of your service is robots.txt. In effect, it provides a machine-readable license. You can prescribe maximum crawl speeds for web crawlers, such as every 20 seconds. Search engines do respect it. Another advantage of using robots.txt is that it’s completely unambiguous.

Mining the personal – using Open Correspondence to explore correspondents

Iain Emsley - May 25, 2011 in Open Correspondence

Jonathan Gray, in his post on mapping influence on intellectual history, asks some very sensible questions regarding the influence of books on intellectuals and how they talk to each other. He comments that:

One could even imagine using other sources (library lending data, lecture lists, reading lists, catalogues, letters, notes and other sources) to try to systematically establish things like: * Author X corresponded with author Y * Author X possessed a copy of work A

In Open Correspondence, one of the aims is to look at the social networks of letters. Social networks are more than the people though.  As Jonathan identifies there are networks of things. Books are inherently social things. When somebody reads a text, they tend to mention it to somebody via a blog, review, tweet and so on. In the nineteenth century, the best way of doing this is in letters between people. It intersects with my personal interest in the literary history and how books ‘speak’ to each other.

As part of the recent Book Hackday, I decided to try and use the Letters schema, which is in development, to try and answer some of these questions using the current dataset, letters from Charles Dickens.

At the moment, the first of Jonathan’s queries is perhaps relatively simple.

The list of correspondents from the set is currently found at www.opencorrespondence.org/correspondent which offers links to the individual pages and some further data. I pulled these into a Redis NoSQL store as a separate list with the intention of using these at a later date to build collections of the works. I have found a bug that is preventing me finishing this at the moment but the idea I am trying to explore with this is the ability for users, say a school or university, to build their own version of a collection of letters without needing to photocopy them from a book.  This list in itself does not know whether the correspondent is an author. Perhaps this is a later addition to the metadata but it does require a data source to check against, something that would be very useful for gleaning bibliographic data.

The index can be the re-used to show the relationships by using the Javascript Infovis Toolkit. Another idea for a useful graph. That answers the first of Jonathan’s questions in part. Work still needs to be done to do this but visualisations have been generated – just not dynamically yet but that is part of the finalising stage.

Of course a follow up question is what did they write about to each other. Out of the myriad of things it might be, I’ve tried to focus on book titles, in particular the ones that Dickens wrote. In the first part of trying to find the books, I’ve focussed on his own works but when I can find the right API, then I hope to extend it. Perhaps the Open Bibliographic British Library dataset would be a good place to start but that is for another day. It does allow the researcher to view at a glance which books an author read, whether or not they owned them.

A slightly more statistical question that I thought might be interesting is graphing the spread of letters sent to a correspondent. Each correspondent  had an index for their own URI and it struck me as interesting to find expose try to ‘re-think’ the index to allow the user to discover information in different ways. I created a store in Redis which contains the results of a SPARQL query and then it shows the number of letters written per year using Protovis for the visualisation. From this we might see whether the correspondence lasts for some time, suggesting either persistence or importance, or whether it was focussed or brief. The graph comes from the results for W A Macready, an actor to whom Dickens wrote for a number of years. It shows a fairly steady stream of letters suggesting a stable relationship.

Graph of letters to WA Macready

Graph of letters to WA Macready

The toolkit is still in development and is perhaps too reliant on Redis but I thought I’d share some of the ideas that are being developed. It needs to fit into the current database structure, or perhaps running of the current endpoint. The toolkit stores lists of data which needs visualisation and to provide a basic API with the medium term aim to put it into the data section of the Open Correspondence site. The other side of the toolkit is that through using it, various areas to be improved have come to light.

I do think that it is beginning to answer some of Jonathan’s questions. The APIs will not replace the role of scholarship but it can show ways into the data that might not have been thought of or to give the researcher an overview of the data. They show what can be done with indexing though, that indexes can be redesigned to show different pieces of data in different contexts.