You are browsing the archive for open data.

Shuttleworth Fellowship Quarterly Review – Feb 2012

Rufus Pollock - March 5, 2012 in Uncategorized

As part of my Shuttleworth Fellowship I’m preparing quarterly reviews of what I and the Open Knowledge Foundation have been up to. So, herewith are some some highlights from the last 3 months.


  • Substantial new project support from several funders including support for Science working group and Economics working group
  • Our CKAN Data Management System selected in 2 major new data portal initatives
  • Continuing advance of projects across the board with several projects reaching key milestones (v1.0 or beta release, adoption by third parties)
  • Rapid expansion of chapters and local groups — e.g. London Meetup now has more than 100 participants, new chapters in Belgium and Switzerland are nearly finalized
  • Completion of major upgrade of core web-presence with new branding and theme used on and across our network of sites (now numbering more than 40)
  • Announcement of School of Data which drew huge attention from the community. This is will be a joint Open Knowledge Foundation / P2PU project.
  • Major strengthening of organizational capacity with new staff


Major new project support including:

CKAN and the DataHub


  • Major breakthrough with achievement of simple data upload and management process – result of more than 9 months of work
  • OpenSpending now contains more than 30 datasets with ~7 million spending items (up from 2 datasets and ~200k items a year ago, and under 10 datasets a 1.5m items just 4 months ago)
  • Substantial expansion in set of collaborators and a variety of new funding opportunities

Other Projects

  • BibServer and BibSoup, our bibliogrpahic software and service, reached beta and have been receiving increasing attention

  • Public Domain Review celebrated its 1st Birthday. Some stats:

    • The Review now has more than 800+ email subscribers, ~800 followers on Twitter
    • 20k visitors with over 40k page views per month
    • An increasing number of supporters making a monthly donation
  • Initiated a substantive collaboration on the PyBossa crowdsourcing platform with Shuttleworth Fellow Emeritus Francois Grey and his Citizen Cyberscience Centre

  • Annotator and AnnotateIt v1.0 Completed and Released

    • Annotator is now seeing uptake from several third-party projects and developers
    • Project components now have more than 100 followers on GitHub (up from ~20 in December)

Working Groups and Local Groups and Chapters

Working groups have continued to develop well:

  • New dedicated Working Group coordinator (Laura Newman)
  • Panton Fellowships run under auspices of Science Working Group
  • Funding of Economics Working Group

Rapid Chapter and local group development:

Additional items

Events and Meetings

Participated in numerous events and meetings including:

Shuttleworth Fellowship Bi-Annual Review

Rufus Pollock - November 25, 2011 in Uncategorized

As part of my Shuttleworth Fellowship I’m preparing bi-annual reviews of what I — and projects I’m involved in — have been up to. So, herewith are some some highlights from the last 6 months.

CKAN and the theDataHub


  • Two major point releases of OpenSpending software v0.10 and v0.11 (v0.11 just last week!). Huge maturing and development of the system. Backend architecture now finalized after a major refactor and reworking.
  • Community has grown significantly with now almost 50 OpenSpending datasets on and growing group of core “data wranglers”
  • Spending Stories was a winner of the Knight News Challenge. Spending Stories will build on and extend OpenSpending.

Open Bibliography and the Public Domain

Open Knowledge Foundation and the Community

  • In September we received a 3 year grant from the Omidyar Network to help the Open Knowledge Foundation sustain and expand its community especially in the formation of new chapters
  • Completed a major recruitment process in (Summer-Autumn 2011) to bring on more paid OKFN team members including community coordinators, foundation coordinator and developers
  • The Foundation participated in launch of Open Government Partnership and CSO events surrounding the meeting
  • Working groups continuing to develop. Too much activity to summarize it all here but some highlights include:
    • WG Science Coordinator Jenny Molloy travelling to OSS2011 in SF to present Open Research Reports with Peter Murray-Rust
    • Open Economics WG developing and Open Knowledge Index in August
    • Open Bibliography working group’s work on an Metadata guide.
    • Open Humanities / Open Literature working group winning Inventare Il Futuro competition with their idea to use the Annotator
  • Development of new Local Groups and Chapters
    • Lots of ongoing activities in existing local groups and chapters such as those in Germany and Italy have
    • In addition, interest from a variety of areas in the establishment of new chapters and local groups, for example in Brazil and Belgium
  • Start of work on OKFN labs

Meetups and Events

Talks and Events

  • Attended Open Government Partnership meeting in July in Washington DC and launch event in New York in September
  • Attended Chaos Computer Camp with other OKFNers in August near Berlin
  • September: Spoke at PICNIC in Amsterdam
  • October: Code for America Summit in San Francisco (plus meetings) – see partial writeup
  • October: Open Government Data Camp in Warsaw (organized by Open Knowledge Foundation)
  • November: South Africa – see this post on Africa@Home and Open Knowledge meetup in Cape Town


Introducing myself: Everton Zanella Alvarenga (Tom from Brazil)

Tom - November 7, 2011 in People, Profiles

Everton Zanella Alvarenga, also known as Tom, is a physicst working as a Web developer and IT consultant in the last five years. He has been involved in a lot of projects in the free culture movement, supporting the free cultural works definition and the open definition.

When he abandoned his elementary particle physics research, he co-created a social networking platform for University of São Paulo, the Stoa project, whose main aim was to provide open educational resources and try to make the university a member of the Open Course Ware Consortium.

Since a few years ago, he has been envolved with open educational resources projects mainly through Wikimedia Brasil – through Wikimedia Foundation projects –  and Recursos Educacionais Brasil,  and has been working for Wikimedia Foundation coordinating the Wikipedia Education Program in Brazil.

On open data, he has been participating of the Transparência Hackers group. Through a work that started with the project Adopt an Alderman, together with friends from Transparência Hackers, he has brought the idea to the City Council of São Paulo to opening up its data following Open Knowledge Foundation recommendations.

Finally, now he is helping to form a Chapter of Open Knowledge Foundation in Brazil and to form a network of groups and people in his country  supporting open/free knowledge, also called “Rede pelo Conhecimento Livre”.

How to Contribute to Open Data Without Knowing a Thing About Data

Tim McNamara - October 10, 2011 in Open Data Manual

Many things are possible when the world’s data is open. Researchers are able to conduct research more freely, students get to learn from relevant examples – the real world, governments get access to policy experts, journalists get access to governments, companies get access to more raw materials to produce fantastic products for consumers. This is all great. How do you help?

Even if you don’t know a thing about programming, there are several ways that you can help. The open data community is vast and varied. The options here are too!

Expand data about your area

If you have some spare time, a neat way to make sure that the world’s computers know about you is to add it to Freebase. Freebase is one easiest ways for people to contribute to a machine-readable about everything in the world.

If you don’t mind a small technical jump, you should also consider OpenStreetMap. OpenStreetMap is a worldwide map built by volunteers. You’ll be surprised to know that it rivals commercial offerings.

Donate your voice

VoxForge stores samples of different accents to create an openly licenced repository for voice recognition. The contribution process is very simple. Most importantly, contributing to this project can build bridges over the digital divide for people who are visually impaired.

Make data easier to find

There are several catalogues for datasets. The one we get behind the most at the Open Knowledge Foundation is called Most importantly for you, it’s very easy to get started.

Here is a basic workflow:

  1. enter a topic in the search bar, such as environment, or Chile
  2. go through the list, checking that each dataset has a
    • description
    • licence
    • direct links to data, known as resources
  3. For anything missing, do a little bit of digital detective work to fill in the gaps

Help to proof read

The open data community has developed a large number of learning materials. However, there are always typos to be found and examples to be added. One of the projects I’m personally heavily involved with is the Open Data Manual. Would love to see you there!

Another idea?

Please add a comment!

Weekend project: Converting New Zealand’s Library Catalogue to RDF

Tim McNamara - September 12, 2011 in Technical

This weekend I spent some time converting 381,518 bibliograpic records from the National Library of New Zealand into 9,353,243 triples. Thanks belong with the library for providing their catalogue as open data. I’ve learned a lot through the process, and thought that I would share some of that knowledge.

The code for the conversion is available. The data are awaiting upload.

Note: This is not related to Open Knowledge Foundation’s Working Group on Open Bibliography, although I hope to integrate this work somehow.


I had a few goals:

  • let the National Library and others know that they don’t need expensive tools to implement a conversion system
  • create a complete MARC to RDF system; the tools that I found did would often filter lots of information from the original MARC records
  • learn more about bibliographic data
  • learn more about linked data
  • create a visualisation of the types of knowledge that New Zealanders generate [currently in progress]

My code

My system is, in principle, fairly simple: I open a record, create a node in the graph and then map the MARC tags, fields and subfields to a predicate, and use the values as values. I left out creating separate nodes for the other entities, such as authors. I wanted to get something up and running.

Here are some of the challenges I faced and what I did to overcome them.

Creating a namespace for non-linked records

One of the things I discovered was that the National Library doesn’t seem to provide each of the items in their catalogue with a URL1. This makes life a bit pain when you want to use a specific URL for each resource. As a bit of a workaround, I decided to do the following to create namespaces:

  • where there is a National Library catalogue number, use
  • where a WorldCat or RLIN number is known use
  • otherwise create a blank node

There are problems with this approach though.

  • They’re brittle. The library hasn’t provided any warranty that they will be keeping their hostname or their vprimo application will remain the
  • They’re wrong. Using WorldCat’s search is ambiguous. Multiple results can come back when a specific number is given.
  • They’re incomplete. A URL for each of the records probably exists somewhere. I just couldn’t find it while looking through the National Library’s site.

Matching MARC21 tags to RDF predicates

One of the more troublesome things was finding what MARC tags records should map to. I spent about two hours looking into different vocabularies to use and reading research published by other projects. Apart from physical descriptions, where I used Good Relations, I went with the recommendations of these three sources who discusses Dublin Core and OWL:


In subsequent iterations, I think I will use a greater variety of more specific vocabularies. For example, use of skos:Subject will make linkages to dbpedia slightly easier. I would also like to use a vocabulary that’s specific for bibliographic data, but I didn’t come to the project with domain-specific knowledge, so went with generics.

Dublin Core/RDF questions

A question that popped into my mind several times is, “Is the dc: namespace completely deprecated?”. This was often followed by “But other people are referencing dc:, will I break something if I suddenly talk about dcterms:?”

Lastly, I didn’t know if dc:creator and dc:Creator were equivalent. I assumed that they are not. However, most people seem to use lower case, but the documentation has evertything in TitleCase. That left me wondering, “Which should I use?”

Running out of memory

Once I had spent several hours in research and had developed a fairly complete mapping between MARC21 and RDF, I decided to fire it up. The graph quickly blew all 4GB of RAM on my laptop. Opps. By that stage, it was 2am on Sunday morning.

The next morning, I used BerkeleyDB to store the graph on disk. This solved my memory problems, but then soon introduced a new one.


I underestimated how long processing would take. Maybe I have been fooled into thinking that things should just take a few seconds to run, maybe a few minutes if something is inefficient. This is the first time I’ve had to deal with processing that takes hours. With rdflib‘s BerkeleyDB RDF store implementation, I achieved data import of about 1000/triples/second. As a comparison, 4store is about 700 times faster.

I wrote my code in a fairly synchronous manner, which means lots of disc I/O blocking while I wait for the next section of the MARC file to be read and then the triples to be written. I expect that implementing things in Twisted would lead to a speed increase of several orders of magnitude.

Where to from here?

I want to extract more triples. I only have material that is directly related to the items that are held. The records hold much more. There is lots of scope for extracting information relating to authors, publishers, dates and locations.

Linking. I want to link this data with dbpedia, Freebase and the Open Library. Topics and subjects can be linked with dbpedia. Authors and publishers could link to Freebase. Items themselves would suit the Open Library. Much of the data suits all three.

The code is not yet open source. I have some tidying up to do before then. I would like to refactor things. I want to be able to do things in an async manner, probably using Twisted. I expect that the performance improvements will be extremely large, as my code is bound by I/O to a very large extent.

Tech Irks

Some issues that are of less use to a more general reader:


pymarc is a very good library. It works on chunks of data at a time, minimising the data that is sitting in RAM before processing begins.

Why are subfields in a list?
The thing I disliked is its treatment of MARC subfields. Rather than using a dict to map subfields to values, record.field.subfields comes back as a list:

  'Scale 1:50,000 ;', 
  '(E 169\xb012\'17"-E 169\xb043\'31"/S 44\xb034\'04"-S 44\xb051\'10").', 
  'New Zealand map grid proj.'

I used some code to more naturally expressed as a mapping between keys and values:

  'a': 'Scale 1:50,000 ;', 
  'c': '(E 169\xb012\'17"-E 169\xb043\'31"/S 44\xb034\'04"-S 44\xb051\'10").', 
  'b': 'New Zealand map grid proj.'

MARCReader should support ordering and counting
One design decision that I dislike is that the MARCReader class has shares the behaviour of Python’s set type. The set is in an unordered collection, meaning you can’t ask for the twenty second item. I would have liked to have been able to do this so that I could pick out a single record while learning the API. Instead, I used this code as a workaround:

records = MARCReader('nznb_22feb1011.mrc')
for record in records:

Even though there is an immediate break statement, Python’s interpreter will assign record to the first record in records.

Loading from compressed files One final thing that might be handy is to be able to add a flag on the MARCReader class that asks whether the file is compressed. This is only a minor consideration and probably wouldn’t have helped me. I was working with a .gzip file living in a .zip file.

National Library

UTF-8 please
A very small fraction (perhaps 10 dozen) number of records failed because of encoding issues. Here is an example:

(E 169\xb011\'14"-E 169\xb042\'37"/S 44\xb050\'15"-S 45\xb07\'22").

Notably, every map I will eventually get around to manually working out how to fix this. [In retrospect, I should have probably have used pymarc‘s ability to coerce data to UTF-8]

Fewer full stops
Full stops are important for sentences, but they’re added noise when added to data. For example, a record might have a list of subjects, that looks similar to this:


The upshot of this is that it makes the data harder to link to other sources. For example, dbpedia encodes these two concepts as Tennis and Sport.


The MARC21 format is an extremely compact way of expressing a huge volume of information. I can see why Linked Data antagonists are skeptical. However, I haven’t begun to actually use the data for anything yet. Its real power is the ability to link multiple datasets together. Hopefully I can add more soon.

Like many software projects, it took a lot longer than I expected. There were many more learning curves than I had anticipated. Even once I had an understanding of MARC21 worked, I then needed to learn pymarc. Once that was done, I needed to get to grips with the terribly documented RDFlib. RDFlib‘s documentation should be excellent, it’s a library that is over a decade old.

I’m unhappy with some of the ugliness in the code. I would have preferred to use more efficient or elegant ways of expressing my ideas.

However, because I wanted to create a utility that could be shared, I wanted to minimise dependencies. This meant that I went for a synchronous programming style. This was a mistake. I should have used Twisted from the beginning.

Finally, I’m really happy with the final result. I have already thought of a few applications that I would like to implement that make use of this data. Most importantly though, the learning experience has been very valuable. It’s been vindication that Linked Data doesn’t need to expensive for public agencies to produce.


1 Yes, URI or IRI are more precise terms. However, some of the people reading may be confused by the (to them) new terminology.

Thoughts on why each council has its own copyright licence

Tim McNamara - August 31, 2011 in Uncategorized

During the middle of last year, I and a few other open government ninjas compiled a list of copyright licences that New Zealand’s local governments use. This list has cropped up because of mix & mash 2011, a local competition for open content and open data. It’s very difficult to create an application or a mashup for the whole country, when every council has its own licence.

A member of another council sent me a private email with the following question:

We have been looking at your project this morning and are interested to know whether you have any accompanying commentary on the statistics? There certainly is quite a variance between councils.

Here are a few thoughts I have on the situation.

My first reaction is that this is lamentable, but expected. Councils are their own entities and have the right to do what they want. There’s potentially an argument to be said for the fact that councils may be doing what’s best for their community. However, I don’t think the issue is that sophisticated.

Here is my response, bar a spelling correction and with the addition of some formatting:

No one really cares about copyright in New Zealand. They don’t understand it. For example, they don’t seem to recognise that every time they send a PDF around the office via internal email, they’re committing a criminal act.1 That apathy leads to a promotion of the default option. The path of least resistance is to accept whatever the lawyer recommends.

The solicitor will recommend the position which is best for their commercial clients and apply it to the public sector. However, my feelings are that the public policy position should be that public information should be promoted as widely as possible. Taking photocopies of walking tracks should not be illegal. I feel that it reduces the burden on council for printing if other people are allowed to make their own copies of things.

People do care about branding. For some reason, there is a huge pressure to put logos on things. This means that when people have thought about what is the best copyright licence, they need a licence that will keep a logo on reworked material. Licences which require branding are typically fairly restrictive.

People also care about liability. However, I don’t see that copyright is the best way to protect the public from the misuse of your information. If someone has taken your material, changes it and someone gets hurt from the new material, then copyright is a thin layer of protection.

As for the variation, well TLAs are independent. Remember, no one cares about copyright. Therefore, thinking about what a copyright licence for the public sector should look like is only very recent. 2

Restrictive copyright places a burden on collaboration between councils. When I worked at MCDEM, many TLAs shared information about emergency procedure and standards. If the officials had checked the IP policy, they probably would have found that this was prohibited. Yet, collaboration happened despite the rules.

TLAs have a responsibility to appropriately licence every publication that they produce. Copyright makes it illegal to make photocopies of official documents. This sounds great, but in practice a more liberal licence could serve everyone better. If forms could be (legally) photocopied by advocacy groups applying for grants, then it would be speedier for the groups than waiting for council and the council doesn’t need to incur printing costs.

1 They’re also incurring a $10k+ liability for the organisation. The situation is worse if you save a PDF to a hard drive, because every time you open it to read it, you create multiple copies


I expect that if officials spent some time considering the full impact of what providing their content under an open licence, then they would do so more often. I’m sure that they would find themselves with better-served constituents and less stressed staff.

An addition

There’s always the problem of “it’s not my job”. I imagine trying to find out who would be responsible for signing off on a non-standard licence is pretty tough.

Will this change?

I’m feel quite cynical today. I don’t expect that there will be much consistency between councils for a long time. Copyright will never be high up on the agenda for inter-agency meetings.

Update: In response to a query about how liability arises, please refer to subsections 3 and 5 of the Copyright Act 1994.

Adding structure to ad-hoc data sharing in emergencies

Tim McNamara - August 24, 2011 in Technical

In emergencies, people need things done. They need things done quickly. So, they create stuff. A university may render some GIS imagery, a team of volunteers will begin to archive Twitter messages. Communities create lists of offers of help and . It gets messy quickly. This post proposes a solution to this:

The Situation

This ad hoc nature of emergency data has been picked up by mainstream media:

A Google Docs spreadsheet is being widely circulated on the Internet, following three bomb blasts in Mumbai, India. The document, simply titled “Mumbai Help,” provides contact information for Mumbai residents, along with information on the type of aid they can provide.

In addition to listing phone numbers for official help locations, such as blood banks and the police control room, the spreadsheet includes individual offers to help with food and shelter. Other Mumbai residents have offered to donate blood, while some have said that they can make calls, send text messages and emails, or tweet on behalf of others. Additional tabs within the document provide opportunities for people to add the names of those who are missing or injured as a result of the blasts.

The Solution is an excellent data catalogue. Its flexible nature allows it * to be able to act as a central point for all ad-hoc sources, and * support all data formats, and most importantly * it has storage capability.

The storage aspect means that people who create files can upload them and make them easily discoverable to others. This is really important because it frees people from needing to find a web or FTP server to host the files. Removing infrastructure concerns is one of the best things that we could be doing for people under the stress of responding.

The Details

We, as an international technology community, already have volunteers curating data sets. They are teams like Humanity Road, the Standby Volunteer Task Force and others contribute to Sahana and Ushahidi instances. They are excellent candidates for adding entries to a catalogue of related data.

An event’s emergent Twitter hashtag is what can link datasets It takes 20s to get messages to Twitter about a sudden onset event. It’s highly likely that a hashtag would have emerged by the time the first datasets are created.

Some things I learned at NetHui

Tim McNamara - July 4, 2011 in Events

NetHui was an amazing event. I’m delighted to have participated. Thanks also to everyone who managed to come along to my session on the impact that digital communities can have on emergency response and recovery. This the first of a few posts I hope to make on the event, and it relates to things that I have taken away personally.

Things that I’ve learned

New Zealand is special

There is something unique here. It enables cabinet ministers to engage with hackers face to face. All of the dialogue was genuine. Officials came to the meeting wanting to know more, not to simply express departmental views.

Open data is here

Government no longer needs to be encouraged to release its information. Open data advocates now really need to move towards enabling agencies to release their information. Only one of the participants seemed to question the capability of departments to move to this mode of openness and several, including the Minister of Finance, spoke in favour of it.

Access to the Internet as a human right is plausible

I went into the conference thinking that notions such as this were daft. A right to access would imply an obligation on providers to provider access. I’ve changed my view. A right to access to the Internet would not preclude providers from denying access for non-payment, it would restrict the ability for governments to prevent communication for political reasons. This means that tather than creating a sui generis right, I feel that access to the Internet sits within a long tradition of freedom of speech & expression. For example, if we follow John Milton’s view, freedom of expression contains three parts:

  • the right to seek information and ideas
  • the right to receive information and ideas
  • the right to impart information and ideas

If Internet access were to be denied for arbitrary reasons, then I think that these rights would be impaired. I can also see how the right to freedom of association could be called in. Political groups associate virtually. Therefore, to create barriers to cyberspace would be to create barriers to associate with whomever people would like.

Lastly, and perhaps most importantly, yhe Internet’s role of facilitating participation in society will be ever-increasing. Government agencies already use the Internet as the primary means of accessing their services. Health providers connect patients with specialists via web cams. Therefore, if these arguments are weak today, then they will strengthen over time.

Ultra-fast broadband wont be ultra-fast

Most of New Zealand’s traffic comes from off shore. This means that we’ll always be struggling with latency. TCP requires acknowledgement between sender and receiever. This means that extremely fast speeds to places which are extremely far away wont be too important.

DRM on public domain is happening

Seriously – wtf. I can understand the (misguided) rationale for using technical measures to unique and new creative works. However, to add DRM to the public domain works is contrary everything the public domain stands for. If it’s public, then no one has a right to create a monopoly.

Opinions I’ve formed

Data caps for local bandwidth are stupid

No local ISPs I know of provide access to uncapped data. That means that there’s no real incentive for local application developers to use local services. It’s always going to be better for me to store bulk content in Amazon S3 or Google Storage rather than a local provider. Until ISPs move on local data, New Zealand Internet users will always suffer from latency problems.

Cheap local data centres wont happen

Despite the fact that New Zealand has mainly hydro power, I tend to support nuclear power when I build web applications. That is, I use the services of cloud providers like Amazon & Google. I’m saddened by this, but the price of local hosting is more expensive by orders of magnitude.

Local solutions would be faster. Bandwidth is cheaper. Latency is lower. Power is renewable. Yet, while they’re much better for my New Zealand customers, they’re not better for me. Until pricing changes, local data centres wont happen.

The Internet builds monopolies

Metcalfe’s Law is extremely powerful. Large providers will consistently pull people magnetically towards them. This creates opportunities for vertically integrated products and monopoly rent seeking behaviour.

Free culture is entrenched

In opposition to this cultural magnetism, free culture is also spreading.

Content really matters

As ubiquitous broadband spreads over the country, content will be the main area for competition. Copyright owners have an opportunity to be able to add genuine value for everyone, including children who wish to remould creative works. Yet, I fear that the current content oligopolies will insist on moving their read-only, broadcast model to the Internet.

Copyright should be tempered

Copyright is purportely used to promote artists’ creations. However, I feel that it’s distributors benefit from the current system and that many artists would be better off under a different system. I’m starting to think that a pricing model that looks very similar to the broadcasting model would work really well. As a society, we shouldn’t create laws that cause inefficiency. Expensive and slow distribution methods, e.g. physical media, should make way for highly efficent modes of transport.

We have a stratified Internet culture

If there was ever an event for content industries to engage with ISPs and politicians in a public forum, NetHui was it. However, defenders of decades-long copyright protection were muted at best. I noticed at least one RIANZ member in attendance, however he never spoke up. This is a real shame. If you want to be the figurehead of your industry, you should have the resolve to speak up in the face of opposition. NetHui was an extremely cordial and polite affair. If the content industries had taken more of an opportunity to make their views heard, I’m sure they would have been negative.

Criminalising most of our society will be negative

The term criminal is generally used for a person who is part of a small fraction of society who lives who acts against the generally accepted rules of the society. There are officials in government who don’t understand that emailing a PDF report amongst colleagues is illegal. Yet they all do it. I fear that criminalising the large bulk of people will lead to a general level of resentment with politicians and officials.

Parliamentarians often speak of wanting to uphold public trust and confidence. They should listen to people. Officials need to provide better advice. The choices are not simply between the current copyright system and abolition. There are many alternatives which policy could choose between.

Writeup: Open Govt Data and Information Charter workshop

Tim McNamara - May 18, 2011 in Events, Policy

Some news from New Zealand. Today I had the pleasure of attending a joint policy workshop with many open data stakeholders. In attendance were senior officials in from about 5 public sector agencies, one from a Crown research institute1 and many from the open data community.

The workshop’s three hours were used to frame the Open Data and Information Principles to underpin the Open Government Data and Information Charter, which is being developed for release. The whole process have been extraordinarily participative. Rather than keeping everything secret, officials have been altering Open New Zealand’s wiki.

My interpretation of seven principles confirmed at the meeting are below. The principles’ wiki page outlines them in greater detail.

OpenAll official information and data should be made openly available by default.
ProtectedPersonal and sensitive2 information is protected.
Readily AvailableData and information are provided to everyone under like terms.
Well ManagedThe Crown undertakes to preserve data over time.
Reasonably PricedCharging discouraged. Pricing for access may be warranted when expense is incurred. Royalties are not.
ReusableData formats are non-proprietary and the data are provided at finest granularity.

Right at the end, I added the possibility of:

Improveability An acknowledgement that the Crown’s data and information is imperfect and benefits for processes that allow for them to be updated.

People seemed happy with the idea, but there wasn’t any time to discuss it in any detail. My idea was that we need to allow space for the benefits of open data for officialdom to be recognised in the principles. Many of the other principles are obligations. I think it would be best to foster a spirit of willingness, rather than obedience when it comes to open government data. With that in mind I’ll be drafting up a proposal to add Improveability (especially if I can think of an actual English word to describe it) to the other attendees for its inclusion.

1 A government-owned research lab.
2 I don’t recall if sensitive was defined. From memory, it refers to security classifications.

Talking at Legal Aspects of Public Sector Information (LAPSI) Conference in Milan

Rufus Pollock - May 3, 2011 in Talks

This week on Thursday and Friday I’ll be in Milan to speak at the 1st LAPSI (Legal Aspects of Public Sector Information) Primer & Public Conference.

I’m contributing to a “primer” session on The Perspective of Open Data Communities and then giving a conference talk on Collective Costs and Benefits in Opening PSI for Re-use in a session on PSI Re-use: a Tool for Enhancing Competitive Markets where I’ll be covering work by myself and others on pricing and regulation of PSI (see e.g. the “Cambridge Study” and the paper on the Economics of the Public Sector of Information).

Update: slides are up.

Community, Openness And Technology

PSI: Costs And Benefits Of Openness