Apr. 26, 2010

Posted by in Facebook | 2 comments

Facebook’s Open Graph Still Faces Semantic Web Hurdles

Geek level: Fairly technical. Aimed at web developers and security researchers.

In the wake of last week’s Facebook announcements, people have begun dissecting more of the technical details involved and adding various critiques. One point of discussion has been Facebook’s use of the buzzword “open,” with some observers feeling the description masks certain negative aspects of the new Open Graph.

But amid all the debate about openness, critics and supporters alike seem at times to inadvertently conflate three different (albeit related) technologies. First, the Open Graph Protocol defines a structure for website authors to provide certain bits of metadata (such as title, type, description, location, etc.) about their pages. Second, Facebook is expanding their “social graph” concept by building a database of connections among people, brands, groups, etc. The label “Open Graph” has been variously applied to this new map. Finally, the social networking site has introduced new methods for accessing these stored connections as part of their Graph API.

From a technical perspective, each of these offer great potential. But as they are currently being implemented, they still face difficulties that may hinder Facebook’s vision of the Semantic Web. In fact, while Facebook may have brought certain Semantic Web ideas to a more mainstream audience, they have not addressed some of the issues that have stymied advocates of similar technologies – including criticisms found in Cory Doctorow’s famous “Metacrap” essay from 2001. But first, I think it worthwhile to explore some of the details of Facebook’s three new components.

According to the spec’s website, the Open Graph Protocol is an RDFa vocabulary created by Facebook, though “inspired by” a few other related specs. Four properties are required for every OGP-enabled page, providing a title, type, image, and canonical URI. Optional fields include a description, a site name, location data, certain product codes, and contact information. Since OGP uses RDFa, each of these properties are specified via “meta” tags in the page’s “head” element.

Anyone is free to implement OGP in their pages or consume it with their services, as the technology is published under the Open Web Foundation Agreement 0.9. In that sense, the spec is certainly “open,” though some seem disappointed that the label is applied to a vocabulary apparently developed privately by one company without feedback from others. While Facebook does note already published standards they drew on for inspiration, OGP at times seems to be reinventing the wheel a bit. (Update: One reader pointed out to me that Facebook’s approach uses RDFa to specify data in a separate namespace, so my criticism may have been unjustified.) For instance, the HTML spec has always included a way to specify a page’s description via a “meta” tag – a feature many abused in the past to improve search rankings.

Facebook will not be immune to such abuse in their new namespace for metadata. Doctorow’s first problem with “meta-utopia” was that people lie. In my testing thus far, the OGP properties of title, canonical URI, and site name are essentially arbitrary. This means that not only can page authors add “like” buttons for other pages, they can add false metadata that produces deceptive feed stories. For instance, a feed story may say that a user “liked The Rock on IMDb” when the story links actually point to a malware host. If Facebook wants to build a semantic search engine, they will still have to deal with old black hat SEO tricks.

In addition to OGP properties, Facebook checks pages for an “fb:admins” parameter that sets which Facebook users can administer analytics and information for a given website. Since the site requires no further authentication, I find it a bit disconcerting that a simple XSS hole could provide an attacker with access to so much power for a site that heavily integrates with Facebook. I was glad to see that redirection techniques or spoofed metadata did not enable cross-domain application of “fb:admins”, but I’m still unsure of how some cross-domain (or cross-subdomain) issues will factor in to Facebook’s graph technologies.

Ironically enough, Facebook has yet to add OGP metadata to their own pages, and the new “like” button will not work for pages on facebook.com domains.

While the OGP can help authors describe individual pages, it does not include any way of establishing links between pages. That’s where Facebook’s ambitions become perhaps a little less “open.” The Open Graph of connections between Facebook profiles and OGP-enabled pages is housed on Facebook’s servers. The company does offer many simple ways for other applications to add or access edges of the graph, including the new Graph API. But Facebook is the gatekeeper, and some fear what that control could produce. Also, while Facebook has updated their privacy policy to reflect recent feature changes, their terms of service still include a clause about accessing data using “automated means.” Consequently, I’m still not entirely certain how much of the Open Graph can be automatically replicated.

Apart from concerns about control, however, the new Open Graph opens many possibilities by providing a set of links between pages and people with far more structure than the hyperlinks crawled by search engines today. But several factors may limit the possibilities. If sites do not implement OGP metadata in their pages (and that will include a significant percentage for the foreseeable future), Facebook has to infer data from the page. As already noted, data poisoning could become a significant factor. Maintaining a complex database will also require other types of maintenance, and currently the Open Graph can lead to issues of redundancy or caching of expired data.

If all website authors sought to protect their visitors and provide accurate, structured information on their pages, Facebook’s Open Graph would be a fairly certain success – but then again, it may not even be needed in that case. Meanwhile, since we have to take into account a range of problems and attacks when indexing online content, Facebook will still have to address basic problems encountered by past implementations of Semantic Web ideas. The company’s vision for mapping connections is ambitious, but plenty of work still remains.

  1. I disagree that this article is too geeky or technical. It is, however, conceptual – and you do a great job at outlining these three distinct ideas of the Open Graph.

    I’m no developer but try to follow these developments a closely as possible. Reading this, I see an easily-implemented but limited add-on for websites, a strong improvement for Facebook’s databases and a restricted API developers may be able to tap into.

    Overall, I see a lot of truth in some people’s concerns that this is Facebook taking over the web and benefiting from it hugely.

  2. I am really looking forward to the dream of Tim Berners-Lee: the semantic web. Use date from all the websites across the world.

Trackbacks/Pingbacks

  1. Tweets that mention New Post: Facebook’s Open Graph Still Faces Semantic Web Hurdles -- Topsy.com - [...] This post was mentioned on Twitter by Social Hacking. Social Hacking said: New Post: Facebook’s Open Graph Still Faces ...
  2. iN8sWoRld.net » Blog Archive » Blocking facebook’s open graph - [...] apparently, the changes haven’t really been well thought out – as this article points out, it might prove trivial ...

Leave a Reply