Joi Ito's conversation with the living web.

There is an interesting discussion going on various blogs about "The Semantic Web". Russel Beattie starts a discussion about it. Les Orchard makes a nice attempt at explaining it on his blog. According to Les, Tim Berners-Lee talked about it on NPR Science. I thought about it a lot when I was running Infoseek Japan and I was thinking about how XML would impact search engines. Thinking about how metadata and schema for metadata will evolve is a very interesting topic. It is as much political and social as it is technical. When you run a search engine, there is a constant trade-off between brute-forcing information indexes and waiting for the metadata. If the sematic web really existed and you could run the queries in a distributed way, you wouldn't REALLY need search engines, at least in the way they are designed now. What's so exciting about blogs and things like RSS is that the community is pushing the metadatification of the sematic web in a way that no standards body or company could ever have done.


I talked to my XML class a lot last year about this concept. There are some good resources online. One that I like is an article that Tim Berners-Lee co-authored in Scientific American: The Semantic Web.

As a librarian by training, however, I think that the exchange of data among blogs is far from what metadata needs to be in order to supplant search engines. The big problem is lack of what librarians call a "controlled vocabulary" for describing content. RSS doesn't address topical content, only publication data. It's the equivalent of a distributed card catalog.

One of the reasons libraries (and librarians) still exist is that computers are still very very bad at organizing information conceptually. Google is better than most at figuring out what something is really "about," but it's still flawed in that regard.

For example, today I did a search on "blog" in Amazon's database. It found....nothing. (It did ask me if I meant to type "blow", which I found amusing.) Turns out most of the books are on "blogs." But a search on "blogs" doesn't turn up weblog, or blogging.

Similarly, I recently wanted to see if I could find syllabi for academic courses that were about HTML and/or web design. It's almost impossible to design a query in Google that will retrieve this information. It made me wish for *real* metadata, so that I could search for words only in the "subject" or "keywords" of a page, rather than in the full text.

Wish I had answers on this...or more optimism about the organization and description of content. It's a tough problem. But I suppose until it's solved, my MLS skills will give me some job security. :-)

You're right Liz. The Semantic Web is one more level up from RSS, but RSS will allow people to start working on tools that will get closer and closer to what we want. At least we don't have to "scrape" html anymore. When we were building the Infoseek Japan service, we licensed a weird database from Omron called SuperMorph-J. This was a huge dictionary of words and synonyms. We needed this because Japanese does not use spaces between words so we needed to "break" the words and because Japanese have 4 ways to write computer and we wanted to search results to show all of them. I think there are similar databases for English to include plurals and conjugations. Of course, understanding synonyms and being intelligent are completely different, but it's a start. If you could use something like a huge multi-dimensional vector space or a vast library of links to set up meaning relationships... There must be a huge body of academic work in this field already...

But I guess what I was trying to get at was that as we struggle to define "blogs" and understand what "IT" is, we are really creating vocabulary and connecting a variety of ideas together. This "meaning" is what is still missing. I think that blogs as a method for coming up with meaning and some sort of method for storing and linking to instances of that meaning could be a "social" thing rather than a more organized thing. Anyway, I'm groping, but it just feels like we have a bunch of tools in our hands that we didn't have before and that maybe they are solutions to some problems we haven't used them on yet...

The idea that came to me is that RSS is bootstrapping the Semantic Web. It's making people think "oh, how convenient it is to have all this is a portable format." They have to see the value in RSS, in RDF, in XML::RPC and SOAP, for things to unify into the next level. Without that vision, people begin to think "Oh yeah, that and flying cars", adding the Semantic Web onto visions of the future that came out of Metropolis and the Pulps.

There is tremendous potential. Metadata would already have worked without xml if it weren't for bad human nature aka spammers ruining it for everyone. Ronald Reagan had a good line for this: "Trust, but Verify".

eBay has a model that works. It may have holes, but it works. Some of us that understand XML and metadata and the technical nature of the web have many things we can do to make Joi Ito's dream a reality. We can use metadata and combine it with an *interactive* web to enhance the experience.

Google is testing this with their smiley / frowney faces on their toolbar. Unfortunately, spammers will probably ruin this as well, but Amazon's reviewer system borrowed a page from eBay's manual and it is a successful experiment imho and epinion's also did a nice job with this...

I'd say that Joi Ito sees the potential and it's true, it's there, and Liz sees the reality... But every achievement begins with a dream. The ball is in our court...

This discussion is above my head but I am striving. I am specially interested in the librarian's point of view since I am working at/with the Library of Congress searching for web based innovations to help with subject headings (SH). Joi knows that I have been very inspired by Plumbdesign's Thinkmap and their visual thesaurus. Their new version seems to present a step further towards a semantic web. By mouseovering any word, you can get its derivates. And the connections to related meaning appear, just as they should in a printed SH catalogue, but on line, it is very much more flexible. I wonder if the people who have commented on this, including Joi, have thought about SH as part of the semantic web and how it could be integrated in a cool interactive way. BTW, did anyone of you play with ?

The whole metadata concept is very interesting. I'm continually disappointed that XML Topic Maps are not generally included in this discussion. My guess is that, because they are subject-based, as opposed to resource-based like RDF, they don't work with the vision.

That brings up a really interesting question: who owns the metadata? Related to this: whose metadata do you trust? My thought is that the user ought to be able to edit the metadata that comes with a resource, but this is only a partially formed idea. My thought is that when users have control over how they associate stuff (see The Brain) then there will be uptake. We can already find stuff on Google.

I'm working slowly to develop a Plumb-like graph interface for XTM, and I'm guessing others are too. There's a lot of work to be done on usable interfaces for this stuff.

Leave a comment

1 TrackBacks

Listed below are links to blogs that reference this entry: The Semantic Web.

TrackBack URL for this entry:

Speaking of links going round-and-round... I followed a few links on the Tara Grubb political weblog trail after reading about Read More