You’ve got a database. You’ve got an audience. How should you connect them? This post looks at methods to ensure that both your website and your API is well used.
There are some of the main recommendations:
- Think of search engines and third-party developers as types of users. If you serve their needs, then you will greatly increase your system’s reach.
- Avoid authentication. Requiring passwords prevents search engines from accessing your content, restricting your service’s ability to grow traffic organically.
- Be friendly to crowd-sourcing. If you allow people to suggest improvements to your data, its utility will grow exponentially.
- Use clear, liberal licensing. This provides lots of certainty to people making use of your service, meaning they will have the confidence to build interesting and useful tools.
Make sure you know what your goal is
Agencies spend money on tools because they want people to access information. They don’t spend money creating websites purely to keep software developers employed. Accessing the information is what’s important. However, there can be a tendency to only consider the needs of users who are accessing the system via a web browser. Taking this view really minimises the potential spread of the agency’s information.
Third parties, such as search engines and other applications can bring large new audiences to your agency’s information. Every new third-party application is another channel of information that you didn’t need to pay for. Therefore, you should spend some time thinking about how third-parties’ might want to access your site.
Making databases accessible to search engines
Search engines are amazingly good at giving specific results while being very simple to use. They also use their own resources to return results, which keeps your resources for others. Plus, they’re results are likely to be faster and more relevant than yours.
You may think that your particular archive has a very specialised audience. Thus, they may not need to use a search engine. However, they are still likely to be more familiar to using that search box rather than yours. . Search engines do need some help you want to draw people in.
Here are some tips:
- Avoid authentication.
- Provide your content as full text.
- Provide a sitemap.xml.
- Provide something like “Browse” as well as search.
Authentication may seem sensible. You get to see monitor the use of your service. You prevent abuse of your service. You can contact people if anything negative happens. But..
You make it impossible for search engines to get to your content. They don’t use passwords. Therefore, they’ll never be able to authenticate.
Another problem is that you reduce the discoverability of the system for third-party developers. If people need to sign up before they explore the system, they may not bother. If you’re worried about abuse, use standard rate-limiting measures. Attackers have many more options than to use your web API to take you down.
Provide your content as full-text
As far as you’re able, provide your content as full text. Search engines have recently begun to index PDF files, however it’s likely you’ll get the best results with plain text.
Provide a sitemap.xml
sitemap.xml is a file that your web developer will know about. It’s basically a menu for search engines about what the best bits of content are in your site. Resources that appear on your sitemap are able to be discovered more easily than merely following links.
Provide something like “Browse”
Make sure that it’s possible for users to reach all of the resources by clicking on links. This enables the web spider to index the whole site by following links. I recommend that this is available even if you have millions of records.
Making databases accessible to social networks
If you have a data-centric service, I doubt that you need social network integration. However, you want to make sure that if someone does mention you in social media, that it makes sense.
Make sure that each resource has its own page
If you were looking through a catalogue of books, you would expect that each book has its own page. This is because the unique page is likely to have a whole lot of metadata associated with the book. Things like the author, the publisher and so forth. If you have a large catalogue, attempt to build a similar service.
Be friendly to external annotations. Allow crowd-sourced data. People like to fill gaps or make improvements. It makes them feel like ownership of records. This builds personal interest and emotional depth to the project. Many government departments struggle with this. They are very resistant to the idea .
The Semantic Web loves your metadata
If you have a high degree of structured information, use semantic web technologies. These tools are increasingly moving out of academia and are enhancing people’s web experience. For example, the Open Graph Protocol, is used heavily by Facebook. This allows you to tell Facebook and other services a resource’s attributes, so they can improve users’ experiences.
Make your database accessible to developers
There is a worldwide community of software developers looking to improve the lives of their users.
Creating a community around your data is essential. A simple step is to create a mailing list. This will allow developers and users to build their own FAQ. It will also ensure that your agency wont be stuck with the whole support burden.
Avoid Authenticate (again!)
As much as authentication is bad for search engines, it’s even worse for humans. People hate having to sign up to things. They want to be able to explore. By hiding content behind authentication, you make it harder for third-parties to spend the time considering whether it’s worth their time to build services with.
You may have a wonderful XML schema for your data. However, JSON has won. Its simple and is extremely well supported. Supporting JSON lowers the barriers to entry to your application. This in turn means it is more likely for useful services to be created.
Complexity adds compliance costs. The more difficult it is for the terms to be understood, the less likely it is that your service will be used. Therefore, please emphasise to your legal team that plain English is called for. The idea of licencing is to clarify everyone’s responsibilities so that a positive relationship can be created. Don’t endanger that by creating a license that scares developers away.
Licensing a web service comes in two parts. The first is the terms of accessing your database. The second is the licence of the data that your database holds. Make sure that both of these elements are clear.
One of the more effective mechanisms for preventing abuse of your service is
robots.txt. In effect, it provides a machine-readable license. You can prescribe maximum crawl speeds for web crawlers, such as every 20 seconds. Search engines do respect it. Another advantage of using
robots.txt is that it’s completely unambiguous.