Monday, March 21, 2011

Mix & Mash Rehash

This blog post has been a long time coming. Can't remember how long I've been promising to write it...
Now that libraryhack has started up and the wheels at mixandmash.org.nz are starting to spin for 2011, I guess it's about time I finished this off.

Enjoy. (or jump directly to the Cliff's Notes)



Our heroes set out on a journey
During November 2010, I worked with Joshua Smyth on a mashup entry for The Great NZ Remix and Mashup Competition (2010), a wicked competition run by DigitalNZ and supported by many partners and sponsors.
With $30,000 in total prizes and $10,000 for the winning mashup, we decided to turn a few ideas we'd been discussing into reality.

About a month before the competition was announced, Trade Me had just launched their API, and we'd spent a few beer-filled evenings just brainstorming ideas: tracking stolen guitars through serial numbers posted in auction descriptions/comments, analysing bidding trends for similar items over time and generating "buy/sell" indicators to maximise margin, and other ideas that were so implausible I can't even remember them.

One idea we settled on early is how cool it would be if we could walk around our suburb with a GPS-enabled  iPhone or Android device and find out where open homes / rentals within walking distance were.
I'd also recently attended some inspiring geospatial data discussions at the GSoC Mentor Summit which was causing me to think how we could bring mapping to Trade Me and open government data. Listings with addresses (ie. Real Estate) were prime candidates.

M to the A to the Shing Up
So real estate it was. The content sources suggestions page at mixandmash.org.nz introduced us to the site that truly helped to turn our idea from "trademe map frontend" into "house-seeker mashup": koordinates.com
I can't praise koordinates enough. They rock. They rock more than David Bowie's sock.
They host a large number of public domain (as well as some commerical/pay) map layers that you can download and reuse. Most of these map layers come directly from NZ government or local government bodies like Environment Waikato.
They also have an API which allows direct vector or raster queries on map layers. This is what we used to mash school zone data in with the rental listings we were mapping from Trade Me , and for one piece of the electoral data we included.

The school zone (and school decile rating data) was an early suggestion from Stuart Lewis (who helped heaps with ideas and feedback but would always be too humble to say so) that we knew would be of great interest to families looking for houses to rent/buy. That was our first "open data" angle.

We had a plan... now we just needed to spend every night and weekend designing, implementing and perfecting it. Which we did. After we both each tinkered with Trade Me's API, Google geocoding/maps in our own way (I'm an open source / Java freak, Josh is a .NET kinda guy), we designed a PostGIS database, and then split up the rest of the tasks so that I was responsible for web services and the website, Josh was responsible for the Android app and helping me figure out arc tangents when I tried to get each listing's Streetview camera centered on the house.

Many cigarettes and bottles of Macs Gold later, we had a working product. People other than just my Mum were saying it was cool, so I was pretty happy.

The school zone additions were working really nicely, and we'd enhanced them by looking up decile ratings for each school listed. We'd also combined a list of current MPs with an electoral boundaries map to offer some data about representation in the area. OK, not as useful to a family looking for a good area to move into, but we hoped it helped emphasise that there's a lot of location-specific data not already offered on Trade Me that we could access.

Example of a Home Sweet Home listing view


We come close to being eaten by a grue
The original plan was to cover rentals, residentials, and open homes. Our prototype started with rentals only, since it was a smaller data set (and therefore easier to grab and work with) than residential sales, and the addresses and prices were generally more precise than sales listings.

At this stage, the Trade Me API did not give pre-geocoded coordinates with its search results (they do now), so we were manually geocoding every single new listing (there tend to be 17,000 - 20,000 rental listings at any given time) that came our way, using the Google Geocoding API. The rate-limit on these meant that we couldn't go from 0 - 20,000 listings in a single day... we could keep up-to-date with the rate of new Trade Me listings, but populating our initial database was difficult.

In other words, if I made a mistake storing or resolving addresses to latitude and longitude (not an exact science -- have a look at the kind of stuff that makes it into the address field of a Trade Me listing), it would be a long slog to get back up to date. I made a mistake. Twice. At this stage, it was clear that if we wanted "rooftop"-level confidence for most of the houses we were mapping, we'd need to keep focusing just on rentals and ignore sales, or we'd never catch up. Open homes were a small subset of sales, so we still got that going from the Android app.

We are eaten by a grue
That obstacle over, we suffered our next blow. Although it didn't impact our competition entry per se, it meant we wouldn't be the first people to launch a GPS-enabled houses-for-sale/rent smartphone app in New Zealand. Realestate.co.nz , with funding from Westpac , hired an excellent team of developers (who's name I have embarrassingly forgotten. Barnacle Barnes told me, and I forgot) and launched a slick-looking iPhone app to do precisely what we'd been working on 2 weeks before we were due to submit our entry to Mix & Mash.

Looking back, maybe it wasn't as bad as it felt then... it was iPhone only, we were doing Android. It used private data from realestate.co.nz, we were using open data from Trade Me. We knew we weren't ever going to achieve the slick look & feel of the realestate.co.nz app, but we were a bit gutted that we weren't the first to the finish line.

Triumphant music plays as our heroes emerge bruised and bloody, but victorious
We got our submission in, entering the "Best geo application for mobile" and "Best search experience" categories, with over 20 hours to spare. This was after spending all night trying to record a demo video in which I didn't swear or start mumbling to myself, and also after we convinced Jawyei to make a logo for us with promises of fame and fortune.  We were finished!

Over the next 7 days, all the entries were judged. Now that we weren't allowed to work on our projects, all the people who'd entered the competition started to come out of the woodwork. There were some seriously cool mashups out there, and some cool mapping stuff that I hadn't thought to do (mostly non-Google stuff).
The diversity of entries was also better than I'd expected.

The day finally arrived when results would be announced. Live. Over Twitter. This was the first time most of the entries had been made public, so as well as being nervous, I was also having heaps of fun checking out all the other mashups.

Best geo application went to...... Robert Coup's* Yachter Mobile . Hard to be too upset about that, it looked bloody awesome.

Best search experience went to...... Kim Shepherd and Joshua Smyth for Home Sweet Home. Oh well, better luck next ye- WAIT A MINUTE THAT'S ME IUSFPHFIUWEYFIUWHFIUWHFF

(office mates will confirm that I maintained a stoic and almost ho-hum composure after this, but it wasn't too long until I made it clear that I was super-fucking-stoked)

You can see the full list of winners here. We didn't get the $5000 or $10000 "big prizes", but quite frankly, we didn't deserve to. Daniel Pietzsch's NZ Walks Info is a perfect example of intuitive interface design, nice-looking map textures and markers/clusters, and the interactive elevation profile / trail position was the icing on the cake. Check it out.

Our official prize was $2000, which was divvied up and helped reimburse Josh for the test Android device he'd had to buy, and has helped me cover hosting costs to keep the site running and updated.

That wasn't all, though: discovery of all these excellent NZ open government data sources and awareness of what can be done with them has to be the grand prize that every entrant took away with them. As a library system/repository developer, I'm fully behind the open access research movement  , and open data is just another chapter in the same book. If the sponsors' aims included promoting reuse of open government data (and I'm pretty sure they did), they definitely succeeded.

Home Sweet Home will keep running**, and I have more plans for it. I also have some new ideas for the 2011 competition that I'll keep up my sleeve for now...

Cliffs Notes:

  • We entered this site and won a prize in The Great NZ Remix and Mashup Competition
  • Many hilarious hurdles were met along the way (including maths and stuff)
  • It was hard work and we only just finished it all in time, but was definitely worth it
  • DigitalNZ, the sponsors, and the judges all did an awesome job at running the competition and promoting it (and promoting the causes it stood for)



* I've only just realised this is the same 'rcoup' that was driving a lot of the development effort for eq.org.nz ,  a very important project that I wished I'd gotten more involved in.


** As Murphy would have it, it's currently a few days out of date due to some API problems. Nearly fixed, though.

Tuesday, November 30, 2010

Discovering Discovery: DSpace + Solr tips & tricks


DSpace 1.7.0, which is due for release on December 17th, will include a new module called "DSpace Discovery", contributed by the fine folk at @mire.

Discovery adds the ability to use Apache Solr for search, an XMLUI aspect that replaces (most of) the old 'ArtifactBrowser' to enable easy navigation through configurable facets, and a service to allow external sites to perform searches. In future releases, searching will get even easier as autocompletion is added to search boxes.

It's incredibly easy to set up, and because the Solr index exists alongside your traditional plain-old-Lucene search indices, you can switch back and forth without any hassle: no rebuilds, no re-indexing; just enabling/disabling the relevant XMLUI aspects.

You may have seen similar interfaces in other sites: Solr is being used for generic discovery interfaces like Blacklight, as a full-text search module in Drupal and as a custom solution for in-house sites.

You've also possibly seen DSpace Discovery in action at Dryad, an international biosciences data repository.

You can read some more information, including the official documentation and development roadmap on the DSpace wiki.

I've installed DSpace 1.7, now what do I do?

The Discovery Configuration guide in the DSpace documentation/wiki will get you up and running in no time.

I want to create some custom facets/filters. They don't exist as fields in my metadata registries so I can't easily configure them in dspace-solr-search.cfg. Can I configure Solr directly?

Yes! Let me give you an example:

(note: please excuse and ignore my horrible usage of qualified DC -- it's just an example!)

I've been working on a new repository/archive for the Archive of Māori and Pacific Music at The University of Auckland Library, and we had a few pieces of metadata we wanted to treat differently for the purposes of navigation -- a 3-tier "location" for each recording, which we wanted to combine into a single "Place" facet, and fields for both "iwi of the performer" and "iwi of the composer", which we wanted to combine into a single "Iwi" facet.

(for those outside New Zealand, iwi means 'people', and in this case, refers to Māori tribal affiliation, eg. Ngāti Porou or Tainui)

Here's how the Solr schema for DSpace Discovery is configured for faceted/filtered search:

* Defines a dspaceFilter type, which is a fairly simple Solr field type that converts to lowercase and preserves the entire string as a single token (ie. no splitting on spaces or commas, etc.)

* Copies every metadata value into a dynamic field, named [schema.element.qualifier]_filter, eg. dc.title_filter or dc.identifer.issn_filter

So we have three tiers of location data that might look something like:

dc.coverage.spatial_country: "New Zealand"
dc.coverage.spatial_region: "Hawkes Bay"
dc.coverage.spatial_locality: "Waipukurau"

Now, we edit [dspace]/solr/search/conf/schema.xml and add the following new field definitions beneath the definitions for internal fields like "search.resourceid":

<field name="spatial_filter" type="dspaceFilter" indexed="true" stored="true" multiValued="true"/>
<copyField source="dc.coverage.*" dest="spatial_filter"/>

This will take all values where schema is "dc" and element is "coverage", and copy them into a new spatial_filter field, which can then be accessed by dspace-solr-search.cfg when configuring your facets/filters.

Note that this particular example would also copy dc.coverage.temporal values, if any existed -- dc.coverage.spatial* is strictly better for this example, but not as relevant to most use cases ( eg. dc.subject.*, dc.identifier.*, dc.contributor.*, dc.title.* ).

Now all that's left is to add our new "spatial" field to our lists of facets and filters in [dspace]/config/dspace-solr-search.cfg, rebuild our discovery index (I recommend deleting and rebuilding when altering schema.xml) and create some new i18n labels for displaying in XMLUI.

DSpace Discovery will surface our new, helpful "Places" facet which we've created without touching our stored metadata or legacy browse/search indices. Check it out:

If we select "new zealand" and "waikato" to filter our results, the Place facet is now going to tell us about places just within "Waikato, New Zealand"


And that's all! The data does most of the work for us, and DSpace Discovery handles the rest.

In DSpace 1.6.x, I could export a CSV containing item metadata from my search results. Is that possible in DSpace Discovery?

Yes, sort of -- I've written an updated CSV exporter for XMLUI to work with Discovery, but it wasn't written in time for 1.7. It should be in the next release, and I will put a patch up on JIRA shortly for those who wish to use it with 1.7.0.

You mentioned the ability for external sites to query DSpace Discovery --  tell me more!


I'd love to, but I haven't played around with it quite enough to feel like I could do this topic justice -- watch this space!

If you have any questions or tips to share about DSpace Discovery or Apache Solr, please send me an email or leave a comment, or hop over to the DSpace Mailing Lists.

Monday, October 25, 2010

GSoC Mentor Summit 2010

(Note: I'll stick up some photos in a followup post once the official ones go up, and I get the stuff off my camera)

So, the GSoC mentor summit was pure awesome. I've had virtually no unconference/barcamp experience, and I knew the level of smarts would be extremely high, so I was a bit nervous at first about being overwhelmed, or that there would be an atmosphere of elitism, or that the whole thing would be a chaotic mess that I wouldn't be able to participate in. My concerns turned out to be entirely unfounded.

Perhaps it's the fact that the sort of people who get into mentoring have a great attitude in the first place, or perhaps I've just been too cynical about 'FOSS personalities'... whatever the case, the unconference worked brilliantly. There were no egos driving sessions, no elitism or flaming; it was all just pure, unadulterated geekery that allowed for participation by everyone and somehow ran like clockwork. I learned heaps, met a whole bunch of cool smart people, and I have my usual post-conference 'vibe' that motivates me to spend even more time hacking and contributing to all sorts of stuff.

I also wasn't sure how many projects would be related in any way to the education/GLAM sector, and was pleasantly surprised there too: I met people working with enhancing text with semantic markup (FISE), some folk developing an open source web conferencing tool made to plug into LMSes (Big Blue Button), the Creative Commons people were there, and many more that have just slipped my mind right now.

Of personal interest were sessions around GIS and managing/manipulating geo-spatial data. I've been doing some mashups and webapp work at home around the new TradeMe API (amongst other things) using Google Maps and geocoding (or reverse geocoding) locations, so it was great to learn some more about OpenStreetMap, PostGIS, OpenLayers and similar tools, as well as the challenges facing developers in data storage and interchange. (I also met the other two kiwis attending the summit at this session, oddly enough)

Sessions I attended: (some names are paraphrased since they were just written on a whiteboard)
  • Liberate your data!
  • Distributed systems and security
  • OpenStreetMap routing demo (shortest path) with geofabrik.de
  • Geo-spatial data
  • Anyone can be a great mentor
  • Open Source licensing and copyright issues
  • Final session/feedback
Notes were taken in realtime, en masse, using an Etherpad instance provided by the ever-helpful OSUOSL team (and a few similar tools like TypeWithMe). I'll put notes up once they're available on the wiki or I've saved them somewhere.

As well as attending the unconference sessions, I spent a fair bit of time hacking DSpace with fellow committer Mark Diggory and talking geek with him -- always a good opportunity when most DSpace developers are a whole hemisphere away from me.

I just missed out on the "git for data" session, which was a pity, but I'll take a look at the notes once they're up -- they should be full of goodness.

Post-summit resolutions:
  • Get even more involved in GSoC next year and put the lessons I've learned into practice
  • Start pronouncing "data" properly (I wince every time I hear myself say "dah-tah")
  • Follow up on all the GIS tricks and tools I learned about
  • Introduce BigBlueButton to the NZ e-learning community and any university staff who run webinars
  • Start using my camera instead of leaving it in my damn backpack all the time
  • Blog more (or at least write more)
  • Come back to San Francisco some time

Big props to the Google Open Source Programs Office for running GSoC in the first place (especially Carol and Cat) and for organising a brilliant mentor summit, and to all the org mentors/admins who showed up and made the summit what it was.

ps. If you want to catch a glimpse of what was going down while it was going down, as well as some of the aftermath, take a look at the #gsoc #mentorsummit twitter stream

Obligatory first post

I finally wrote something longer than 140 characters that I felt like sharing, and rather than try to resurrect my old WordPress blog which was more of a travel diary than anything else, I figured I'd keep this one just about work and other geeky activities, and let Blogspot do all the hosting for me.
That may change, who knows.