Facebook’s Open Graph Still Faces Semantic Web Hurdles
Geek level: Fairly technical. Aimed at web developers and security researchers.
In the wake of last week’s Facebook announcements, people have begun dissecting more of the technical details involved and adding various critiques. One point of discussion has been Facebook’s use of the buzzword “open,” with some observers feeling the description masks certain negative aspects of the new Open Graph.
But amid all the debate about openness, critics and supporters alike seem at times to inadvertently conflate three different (albeit related) technologies. First, the Open Graph Protocol defines a structure for website authors to provide certain bits of metadata (such as title, type, description, location, etc.) about their pages. Second, Facebook is expanding their “social graph” concept by building a database of connections among people, brands, groups, etc. The label “Open Graph” has been variously applied to this new map. Finally, the social networking site has introduced new methods for accessing these stored connections as part of their Graph API.
From a technical perspective, each of these offer great potential. But as they are currently being implemented, they still face difficulties that may hinder Facebook’s vision of the Semantic Web. In fact, while Facebook may have brought certain Semantic Web ideas to a more mainstream audience, they have not addressed some of the issues that have stymied advocates of similar technologies – including criticisms found in Cory Doctorow’s famous “Metacrap” essay from 2001. But first, I think it worthwhile to explore some of the details of Facebook’s three new components.
According to the spec’s website, the Open Graph Protocol is an RDFa vocabulary created by Facebook, though “inspired by” a few other related specs. Four properties are required for every OGP-enabled page, providing a title, type, image, and canonical URI. Optional fields include a description, a site name, location data, certain product codes, and contact information. Since OGP uses RDFa, each of these properties are specified via “meta” tags in the page’s “head” element.
Anyone is free to implement OGP in their pages or consume it with their services, as the technology is published under the Open Web Foundation Agreement 0.9. In that sense, the spec is certainly “open,” though some seem disappointed that the label is applied to a vocabulary apparently developed privately by one company without feedback from others. While Facebook does note already published standards they drew on for inspiration, OGP at times seems to be reinventing the wheel a bit. (Update: One reader pointed out to me that Facebook’s approach uses RDFa to specify data in a separate namespace, so my criticism may have been unjustified.) For instance, the HTML spec has always included a way to specify a page’s description via a “meta” tag – a feature many abused in the past to improve search rankings.
Facebook will not be immune to such abuse in their new namespace for metadata. Doctorow’s first problem with “meta-utopia” was that people lie. In my testing thus far, the OGP properties of title, canonical URI, and site name are essentially arbitrary. This means that not only can page authors add “like” buttons for other pages, they can add false metadata that produces deceptive feed stories. For instance, a feed story may say that a user “liked The Rock on IMDb” when the story links actually point to a malware host. If Facebook wants to build a semantic search engine, they will still have to deal with old black hat SEO tricks.
In addition to OGP properties, Facebook checks pages for an “fb:admins” parameter that sets which Facebook users can administer analytics and information for a given website. Since the site requires no further authentication, I find it a bit disconcerting that a simple XSS hole could provide an attacker with access to so much power for a site that heavily integrates with Facebook. I was glad to see that redirection techniques or spoofed metadata did not enable cross-domain application of “fb:admins”, but I’m still unsure of how some cross-domain (or cross-subdomain) issues will factor in to Facebook’s graph technologies.
Ironically enough, Facebook has yet to add OGP metadata to their own pages, and the new “like” button will not work for pages on facebook.com domains.
Apart from concerns about control, however, the new Open Graph opens many possibilities by providing a set of links between pages and people with far more structure than the hyperlinks crawled by search engines today. But several factors may limit the possibilities. If sites do not implement OGP metadata in their pages (and that will include a significant percentage for the foreseeable future), Facebook has to infer data from the page. As already noted, data poisoning could become a significant factor. Maintaining a complex database will also require other types of maintenance, and currently the Open Graph can lead to issues of redundancy or caching of expired data.
If all website authors sought to protect their visitors and provide accurate, structured information on their pages, Facebook’s Open Graph would be a fairly certain success – but then again, it may not even be needed in that case. Meanwhile, since we have to take into account a range of problems and attacks when indexing online content, Facebook will still have to address basic problems encountered by past implementations of Semantic Web ideas. The company’s vision for mapping connections is ambitious, but plenty of work still remains.