RDFa and Drupal

Last year, I wrote a blog post title Drupal, the semantic web and search that outlined how search engines like Google and Yahoo! are getting increasingly hungry for structured data. It is no surprise, because if they could built a global, vertical search engine that, say, searches all products online, or one that searches all job applications online, they could disintermediate many existing companies.

More exciting to me is how search engines can help bootstrap the semantic web as they build out these vertical search engines, and the role that content management systems like Drupal get to play in this. Hundreds of thousands of Drupal sites contain vast amounts of structured data, covering an enormous range of topics. Unfortunately, that structure is hidden deep in Drupal's database and doesn't surface to the HTML code generated by Drupal.

In my DrupalCon Boston keynote presentation last year, I laid down the challenge that we need to put fields in core and make them first class citizens. Once fields are thus empowered, they can be associated with rich, semantic meta-data that Drupal could output in its XHTML as RDFa. For example, say we have an HTML textfield that captures a number, and that we assign it an RDF property of 'price'. Semantic search engines then recognize it as a 'price' field. Add fields for 'shipping cost', 'weight', 'color' (and/or any number of others) and the possibilities become very exciting. I envisioned a Drupal core CCK with the power to do just that.

In the year since Boston, the Drupal community has built exactly what I asked for. I was planning to show a video of their work in my keynote presentation at DrupalCon DC earlier this month. Unfortunately, I ran out of time before I could show it. However, it was shown in the "Semantic Web and Why You Should Care" session, and today Stephane Corlosquet posted all the details in the semantic web group on drupal.org. The video paints a picture of what is possible with today's Drupal technology, but also, what hopefully will become possible with Drupal core at some point. The prototypes in this video were built using contributed modules for Drupal 6. However, since last year, we have fields in core and we've already began putting some RDFa in core, too.

Ben Lavender produced the screencast, Josh Huckabee built the Exhibit view and Stephane Corlosquet built the SearchMonkey applications and the social network site. Other people that helped include Axel Polleres and Andreas Harth (creator of VisiNav). The work on both this video and the featured modules has been sponsored by DERI Galway, Harvard IIC and OpenBand.


Matt Farina (not verified):

RDFa adds weight to a page to make it more readable to machines. But, extra weight on a page causes pages to load a littler slower.

This leaves me wondering if RDFa should have an on/off switch and if RDF with an auto discover link is better.

Cindy (not verified):

This is very exciting and inline with a project that I am aware of. It is described here:


They say "This project proposes to develop the technology to facilitate remixing of content between WikiEducator and the Connexions authoring platforms."

Do you see RDFa and Drupal in line for "remixing"? Could this project use something like Drupal as a prototype?

vango (not verified):

I'd also like to see the same interface whether you are mapping RDF terms from attributes in core, contributed modules, CCK fields or whatever.