Places (part 1)

Back in late 2009, we started writing something we nicknamed “parlytags” – a first attempt at geotagging Parliamentary data, an experiment into what was possible using only existing public-facing data. It was retired today with a view to replacing it with a newer, leaner, faster one (but that’s a different post entirely).

It presented itself as a search box that accepted a UK place name

screenshot of the parlytags homepage

The app searched a local copy of data from the GeoNames.org project, found the right grid references and rendered an embedded Google Map, along with any suitably tagged data we’d scraped up earlier. The buttons underneath the text were clickable links to other places the data had been tagged with. Partly as a functionality demo and partly to compensate for the small data sample we were using, tagged data for up to a 40km radius was included in the results rather than just direct hits on the name

screenshot of an example map and results page

One of the nice side effects of using the geonames.org data was the ability to match against the alternate place names allowing for non-English names, regional variations, colloquialisms and common alternative spellings to become valid search terms for little extra effort

screenshot of search results for 'glaschu'

Where the initial place search results were ambiguous (i.e. the name could refer to 2 or more places), it chose one to focus on initially – the first one on the list, prioritised by the place type if my memory serves – and offered links to the others

screenshot of search for 'Leeds'

The service may be gone now but the code can still be found on Github

Laying Siege to the server

To be able to “stress test” our server, we’re preparing to deploy an Open Source application called Siege which simulates multiple users making HTTP requests to the web application. It didn’t do quite what we wanted in the logging department, so we’ve hacked it about a bit and, in the spirit of Open Source, made it available on GitHub.

Now we can get down to writing some test scripts that use Siege to test specific things on the server to see how it (and by extension our application) will behave under load. To make these artificial tests more realistic, Siege helpfully includes the ingenious -i command line option which will make it choose from the list of urls we give it in a cheerfully haphazard fashion, turning it into the server testing equivalent of a roomful of monkeys with internet browsers that will only connect to our site.

(The full command we’re looking to use is: siege -c10 -r50 –file=urls-to-test.txt -E session.log -i in case anyone was really interested in the geeky details)

Now to solve the problem where my plastic laptop starts to melt before the server looks even vaguely stressed…

Blog theme changed

I changed the theme of our blog. Our spartan ‘design’ was hurting my eyes. I’m trying out the ‘Day Dream’ theme – let me know what you think.

Finding your MP

Work continues on the Find Your MP project.

I’ve aggregated our issues from Google project hosting and our GitHub commits into one @fymp Twitter account.

We’re working on the Find Your MP service API, which – at the time of writing – includes REST responses for HTML, JSON, YAML and plain text. I’ve written a scratch Google Application Engine app that calls our YAML output. It works! Not bad for an hour’s hacking.

We’re intending to release as much demo code as we can.

People, rather than Members

Whenever the blog goes quiet, it’s a sign that we’re coding a lot.

The latest public release of the Hansard Prototype now refers to ‘people’ rather than Members: http://hansard.millbanksystems.com/people – for example: Mr. Tony Benn.

This represents a substantial improvement in how we recognise and display those recorded as speaking in Hansard. Your comments are most welcome.

If you’re on Twitter, please feel free to comment there if you prefer – you can find me at @robertbrook. Apparently Tony Benn is on Twitter as well – but he’s not as chatty.

Table formatting

We currently style tables in a slightly smaller font size than the main text on the Hansard prototype site. We’d really like to know if this is working.

Have you spotted a table that was incorrectly formatted? Do you find them readable and accessible? We’d be grateful if you could take the time to make us aware of any odd-looking tables using the Google issue reporting form.

No more GeoNames, no more KML

We’re no longer recognising geonames in the Hansard text, nor are we producing KML files from sitting days.

Although it was an interesting experiment, it wasn’t useful enough in real world usage to enough people. Removing these experimental features means reparsing the source files is quicker.

We’re still working out what to roll out in terms of geographic search features.

Generic Phusion Passenger Server

As part of the Find Your MP project, a member of our team created a Generic Phusion Passenger Server VMWare virtual machine instance at Elastic Server. Don’t worry if none of that made sense.

What this means is: we use Elastic Server’s – er – service to create servers for local and cloud deployment. We use them in testing, development and production. We’re happy with what Elastic Server provides for us. Very happy. We’re even happier about the price.

We aim to release as much of our work as we legally can, which means practically all of it. This now includes parts of our infrastructure, released so others can build on it.

Why didn’t we just use Elastic Server’s default Passenger build? We tried it, but it didn’t do all we wanted exactly the way we wanted it to. We were able to tweak and build our own version and make that available publicly. Wins all round!