kurtis random blog


Syndicate content
...my blog about music tech, linked data, misc...kurt jacobsonhttp://www.blogger.com/profile/12893784886271353733noreply@blogger.comBlogger37125
Updated: 4 min 53 sec ago

The Return of CatfishSmooth

Sun, 05/16/2010 - 01:34



I've really let this blog languish lately - my excuses include writing thesis, preparing for birth of daughter (literally any day now), and moving back across the Atlantic.

To make up for it, I've done something that is pretty cool (I hope you'll agree). I've given it the name CatfishSmooth for no particular reason other than the fact it is a funny domain name I happen to own (some of my more astute readers might recall the previous award-winning CatfishSmooth incarnation).

CatfishSmooth is a music artist navigation tool built on the web of linked data. We fuse data from DBpedia, DBTune, Echonest, BBC, Last.fm and MusicBrainz to find arbitrary connections between music artists. For example, if we examine the page for James Brown we see a dizzying array of artist similarities. We see a list of artists who's hometown is also Augusta Georgia, a list of artists how are also American soul singers, a list of artists who are also incarcerated celebrities, and several other lists.
We also get some audio and youtube videos courtesy of the amazing Echonest. And finally, we also include along side the random linked-data connections some more traditional "similar artist" recommendation from the Echonest and Last.fm.
So how does it all work you ask? Most of the random connections come from DBpedia - the linked data translation of Wikipedia. We specifically leverage the YAGO types to create really simple SPARQL queries that yield some interesting lists of artists (i.e. incarcerated celebrities). If you don't speak semantic web, you probably understood very little of that last sentence. So let's look at the big picture.
There's tons of reasonably well-structured data about music artists all over the web. Unfortunately, these various data resources are not always easy to link up. How do we know some webpage is talking about James Brown "the Godfather of Soul" and not James Brown the drummer for the post-rock group Veil Maker??? Disambiguating all this data is really the crux of what CatfishSmooth does and by no means does CatfishSmooth do it alone.
MusicBrainz is the ultimate resource in music artist/album/track disambiguation. A unique MusicBrainzid is created for each artist, album, and track. A wide variety of metadata is maintained by an active community of users ensuring a high level of accuracy and a wide breadth of coverage. In a sense we treat MusicBrainz as the center of the musical universe. With a MusicBrainz id it is a simple matter find information from last.fm (which used mbids directly) or the Echonest (which supports mbids through the Rosetta Stone project).
MusicBrainz also serves as our entry point to the world of linked data. The BBC uses mbids for its music artist pages (which are awesomely published as linked data). We use the amazing sameAs.org service to gofrom BBC mbid URIs to DBpedia URIs. With DBpedia URIs we're plugged into all the knowledge of Wikipedia. Most of our lists come from here although some come from the DBTune.org/musicbrainz resource.
With this matching done, we fire off a slew of API calls and SPARQL queries to generate the different elements of the page. CatfishSmooth currently has absolutely no local data store. Everything is "in the cloud" (oh I hate myself for writing that but it's true). We've sort-of created the views and the controllers of an MVC web application. The model is RDF (and some web APIs) so we can just plug in to linked data. Future incarnations will likely be backed by a triple store that aggregates and collects new data from users.
CatfishSmooth is far from a finished work, but I was keen to release it in its current state because (1) I thought the crowd at SF Music Hack Day might take a liking to it, (2) I think it's pretty fun to use as-is, (3) I'm not sure how much time I will have to continue working on it in the very near future. Planned future improvements include:

  • better media support - more youtube videos, play.me, playdar, and/or spotify support
  • semantic playlist building - another idea that probably warrants an entirely new post
  • more links! - from BBC, MySpace, Discogs, and others
One final note, CatfishSmooth is a bit of an experiment. It is prone to crash and you will probably see the python stack trace when it does. I need to setup a bug reporting system also. So take it for what it's worth. Also you'll find it is not so useful for "long tail" artists - the connections are really only found for artists that have detailed MusicBrainz and Wikipedia entries. But hopefully this thing is still fun and useful. I would love to get any ideas or feedback.

FOAF file linkedinqdostwitter logolast.fm soundcloud logo