kurtis random blog


Syndicate content
...my blog about music tech, linked data, misc...kurt jacobsonhttp://www.blogger.com/profile/12893784886271353733noreply@blogger.comBlogger36125
Updated: 50 sec ago

Enter HeVeA

Fri, 01/22/2010 - 23:32

In my last post, we discussed problems with converting LaTeX documents into HTML. I briefly described DocBook, the most compelling alternative I found to LaTeX. I got a lot of response re: DocBook from people telling me I am crazy and what horrible experiences they'd had with DocBook. But I was undaunted, and I forged ahead. I have converted my ISMIR 2009 paper into DocBook and then to HTML. I did this in about half a day, mostly converting LaTeX commands to DocBook XML by hand using the Emacs nXML mode.
You can view the results at http://docs.kurtisrandom.com/ismir2009-docbook/
Or you can download a tarball with my source as well. I applied the most basic docbook-xsl transform. On Ubuntu, you can apt-get docbook-xsl and Macports seems to include this stuff as well. I used xsltproc to apply the transform with some arguements:

xsltproc --output index.html \
--stringparam bibliography.numbered 1 \
--stringparam bibliography.collection \
./bib.xml \
/usr/share/xml/docbook/stylesheet/nwalsh/html/docbook.xsl \
db-ismir09.xml

We are using --stringparam to pass some optional parameters to the docbook.xsl to get a numbered bibliography and to specify the external bibliography file. These a complete reference list of these parameters is provided on the docbook sourceforge page.
You may notice, not everything went according to plan. I had to spend a lot of time converting the bibliography by hand - the tools I found to do this didn't really work. So in the end, I left the bibliography a bit of a mess with some left-over latex in there.
Furthermore, the PDF version was a bit of a disaster. I used the dblatex utility to generate the PDF as follows:


dblatex --output=db-ismir09.pdf -T simple -L bib.xml db-ismir09.xml


While I was able to get the Docbook tool chain up and working rather quickly, I did have lots of questions. My posts to the mailing list and to the IRC channel were virtually ignored (I had one pleasant reply from a jsmith on IRC pointing me to a masters thesis written in docbook which was helpful but includes no citations or bibliography - the thesis is really a technical software documentation). This was really off-putting for me and (sigh) has led me back to LaTeX.
Luckily, thanks to my long-time friend and colleague Ben Fields, I discovered HeVeA. This is more modern and complete LaTeX to HTML package that seems very promising. I was able to convert the same ISMIR 2009 paper to HTML in a manner of minutes.
http://docs.kurtisrandom.com/ismir2009-hevea/
This actually looks much better than the Docbook version IMHO. The only major problem I see is that the URLs in my LaTeX don't seem to automagically become href's in the HTML as I had hoped. There was also some funkiness with the images I had to fix by hand. But this package seems to be well documented and I am optimistic.
In addition to HeVeA, Ben suggested TtH which I had tried before and had problems getting it to work (probably user error to be fair). But I hope to give it a real try again this week.
So in conclusion, in my brief experience Docbook is not as horrible as everybody said. But, the Docbook user community seemed to give me the cold shoulder (sorry I'm a lamer noob who has questions). It seems using Docbook instead of LaTeX would, in the end, create a lot more problems than it would solve. I still like the idea of Docbook - it seems so much more "future proof" than LaTeX - but I'm going to have to stick to the beaten path on this one. Looks like LaTeX wins.

UPDATE:

I've managed to use TTH to create an HTML version of the same paper.

http://docs.kurtisrandom.com/ismir2009-tth/

Looks pretty good, maybe the best yet, but not sure what it's doing with the figures - seems to create a link to the image instead of displaying it with an appropriate <img> tag. Not bad at all, but it seems HeVeA allows more control and is better documented.

Beyond TeX? Discovering DocBooks

Thu, 01/21/2010 - 14:01

As you may or may not know, it is about time for me to start writing my PhD thesis. In preparation for writing what will be the longest and perhaps most important document I've ever written, I've been exploring what document authoring tools are available.
Now most PhD students would simply use LaTeX and get on with it. LaTeX is a document markup language front end for the 30 year-old typesetting solution know as TeX. The intent of these tools has always been to allow anyone to create good-looking printed documents. And these TeX-based tools perform this task rather well despite their ancient syntax and tedious compilation work flow. But while LaTeX, ConTeXt, and other tools based on TeX are definitely compelling in their handling of reference citations and typesetting for print, I find they fall short in one very important respect - the Web. Compiling LaTeX documents into HTML is a cumbersome and error-prone process. The best tool I have found for rendering LaTeX documents for the Web is by far TeX4HT. But this tool is not without it's quirks and, out-of-the-box, it does not handle some very common LaTeX tags including \footnote and \mbox. I my brief experiments I've been unable to render all but the most basic of LaTeX documents as HTML using TeX4HT. Perhaps there is a helpful guide on this topic I am missing...
My thesis is about music and the Web. Therefore, I want to write my thesis to be published on the Web. For me, Web publishing is not an after thought - it is a priority. My experience with TeX4HT has left me searching for a solution not based on LaTeX despite a lengthy round of humorous taunting from my peers.
I have gotten some interesting suggestions regarding LaTeX alternatives. Fellow Linked-data-enthusiast Tom Morris suggested LyX (with some reservations) - however, it is basically a GUI on top of LaTeX/TeX and does not really allow me to write for the Web. There were also some murmurrings of LyX being a bit buggy and not well-suited to creating a very large document. Other suggestions were generally some TeX derivative (i.e. ConTeXt) or some software that makes LaTeX authoring less painful (i.e. RecSys heavy-weight Xavier Amatriain suggested Kile which is quite good, but LaTeX pain was never really my problem anyway) or some alternative document conversion solution (i.e. Pandoc).
Finally, Chris Lowis from the BBC's Web team suggested DocBook. DocBook was one of those things I had heard about vaguely long ago but I didn't really grok it. So I did a bit of research, starting with the Wikipedia page on DocBook. Put simply, DocBook and XML-based (or SGML-based) semantic markup language for writing documentation. It is "presentation neutral" meaning that the same source can easily be converted to HTML, XHTML, or PDF. Using an XSLT style sheet, you can convert DocBook source into just about anything. In fact, it seems the preferred method for creating PDFs actually involves a DocBook->LaTeX->PDF conversion. I've had some experience writing and applying XSLT to create RDF/XML from some proprietary XML schema. I found XSLT to be relatively easy - much less confounding than TeX style files anyway.
So the great experiment begins. I will be trying out DocBook over the next few days (weeks?). So far I've found writing DocBook source with Emacs (or Aquamacs on OS X) in nXML mode to be a joy - I've got autocomplete and validation right there as I type. Tonight I will attempt to do some rendering to PDF. Depending on how painful this process proves... well, we'll see.
Will Kurt fall in love with DocBooks? Will he run back to LaTeX in tears begging for forgiveness? Will he complete his thesis on time? Stay tuned for the exciting conclusion!
UPDATE:Had some early, not-so-encouraging but hilarious responses re: DocBook.
From IRC:" [...]: I chuckled at your thing about latex and docbook just now. I wrote the ------ user manual entirely in docbook and it was the most painful experience imaginable. I would never, ever, ever, ever attempt that again unless I had a nice user interface to make all the xml invisible to me" -(some badass developer)
"[...]: docbook is terrible. we used docbook for iteration 1 of the ----- manual, too. it was almost impossible to assemble a toolchain to compile it." -(some guy with long hair)
From twitter:"@kurtjx Dude, if you're considering DocBook to get around LaTeX authoring woes... UR DOING IT RONG." -(another guy with long hair)

Echonest Artist Graph

Thu, 11/26/2009 - 12:10

I've just recently had a chance to mess around with the Echonest data set that Paul Lamere and Justin Donaldson presented in their tutorial on Music Visualization at ISMIR 2009. Unfortunately I wasn't able to attend the tutorial because we were in the other room talking about Linked Data and music. But I'm a big fan music viz. I wanted to do a "remote hack" using the data set for Boston Music Hackday ala Adam Lindsay's Remix Shell but alas the weekend got away from me.


But finally I made some pretty pictures.



I suppose first we had better discuss the data set. This is a "small" dump from the Echonest API that includes about 70k artists and their connections to "similar artists" as decided by Echonest (which in my experience does a pretty good job). The dataset also includes Echonest's notions for hotttness and familiarity values for each artist.

Now about the picture. We take the largest connected component (turns out that includes about 54k artists) and create an undirected graph where each artist is a node and each similarity relation is an edge. We use the awesome igraph library to handle the graph structure. Luckily igraph includes some nice graph layout options including the super fast Dynamic Recursive graph Layout algorithm (DrL) which we use here. This is similar to what Tamas did for the Last.fm graph. Also note the color corresponds to the "hotness" - hotter artists are more red - and the size corresponds to "familiarity" - the more familiar artists are larger.

Of course it would be nice to see some artist names and some more interactivity, but hopefully that's coming soon along with some interesting details about the structure of the Echonest artist network. But now it's turkey time :-)
update: We've got names on the artist nodes now. You can zoom in using the incredible embedded seadragon widget above!
Note you can embed the above interface with this code:

<script src="http://seadragon.com/embed/gwo.js?width=auto&height=400px"></script>


FOAF file linkedinqdos