There is an interesting discussion going on various blogs about "The Semantic Web". Russel Beattie starts a discussion about it. Les Orchard makes a nice attempt at explaining it on his blog. According to Les, Tim Berners-Lee talked about it on NPR Science. I thought about it a lot when I was running Infoseek Japan and I was thinking about how XML would impact search engines. Thinking about how metadata and schema for metadata will evolve is a very interesting topic. It is as much political and social as it is technical. When you run a search engine, there is a constant trade-off between brute-forcing information indexes and waiting for the metadata. If the sematic web really existed and you could run the queries in a distributed way, you wouldn't REALLY need search engines, at least in the way they are designed now. What's so exciting about blogs and things like RSS is that the community is pushing the metadatification of the sematic web in a way that no standards body or company could ever have done.
The Semantic Web »
Joi
Nov 04, 2002 - 06:12 UTC »
Categories:
6 Comments
Leave a comment
1 TrackBacks
Listed below are links to blogs that reference this entry: The Semantic Web.
TrackBack URL for this entry: http://joi.ito.com/mt/-touchme.cgi/305
Speaking of links going round-and-round... I followed a few links on the Tara Grubb political weblog trail after reading about Read More
Search
About this Archive
This page is an archive of recent entries in the Joi's Diary category.
Japanese Politics is the previous category.
Joicards is the next category.
Find recent content on the main index.
Recent Posts
- Creative Commons 2008 fundraising campaign
- A Shared Culture
- Sequoia Capital on startups and the economic downturn
- Martti Ahtisaari wins Nobel Peace Prize
- XDR -TB
- Cement and Japanese politics
- Obsolete financial services
- Dubai
- Creative Commons Launches Study of "Noncommercial Use"
- iCommons Summit 2008
Tag Cloud
Categories
- Activism (77)
- Advanced Science (9)
- Art (53)
- BitTorrent (1)
- Blogging about Blogging (501)
- Books (64)
- Business and the Economy (19)
- CPSR (4)
- Computer and Network Risks (26)
- Consumer Electronics (22)
- Cool Web Sites (81)
- Creative Commons (151)
- Dashboard (1)
- Eating and Cooking (40)
- Ecology (12)
- Economics (39)
- Email (18)
- Emergent Democracy (111)
- Energy (13)
- Flash (5)
- Gadgets (88)
- Games (35)
- Gender (10)
- Global Politics (113)
- Global Voices (39)
- Hardware (13)
- Health and Medicine (95)
- Heckling (46)
- Human Rights (19)
- Humor (164)
- ICANN (50)
- IM (2)
- IRC (47)
- Identity (15)
- Information and Media (60)
- Intellectual Property (124)
- Internet Policy (13)
- Introspective (79)
- Japanese Culture (123)
- Japanese National ID (29)
- Japanese Policy (97)
- Japanese Politics (50)
- Joi's Diary (656)
- Joicards (4)
- LOAF (15)
- Leadership and Entrepreneurship (21)
- Marketing (36)
- Media and Journalism (165)
- Moblogging (47)
- Movies (45)
- Mozilla (13)
- Music (103)
- Neoteny (20)
- Network Technology (51)
- Open Source Software (13)
- People (21)
- Photo (155)
- Podcasts (17)
- Privacy (104)
- Python Fun (18)
- Reforming Japanese Democracy (28)
- Religion (29)
- SARS (12)
- Salon (1)
- Search (51)
- Second Life (6)
- Sharing Economy (23)
- Six Apart (11)
- Social Software (116)
- Socialtext (5)
- Software (81)
- Technology Controversy (68)
- Technorati (26)
- US Policy and Politics (204)
- Venture Capital (17)
- Video (33)
- VoIP (12)
- Warblogging (101)
- Wiki (64)
- Wireless and Mobile (112)
- World of Warcraft (19)
Monthly Archives
- November 2008 (10)
- October 2008 (10)
- September 2008 (11)
- August 2008 (13)
- July 2008 (18)
- June 2008 (16)
- May 2008 (6)
- April 2008 (5)
- March 2008 (4)
- February 2008 (10)
- January 2008 (10)
- December 2007 (13)
- November 2007 (8)
- October 2007 (11)
- September 2007 (14)
- August 2007 (9)
- July 2007 (14)
- June 2007 (14)
- May 2007 (13)
- April 2007 (23)
- March 2007 (19)
- February 2007 (14)
- January 2007 (13)
- December 2006 (20)
- November 2006 (12)
- October 2006 (5)
- September 2006 (10)
- August 2006 (7)
- July 2006 (8)
- June 2006 (20)
- May 2006 (14)
- April 2006 (10)
- March 2006 (17)
- February 2006 (17)
- January 2006 (20)
- December 2005 (23)
- November 2005 (45)
- October 2005 (37)
- September 2005 (28)
- August 2005 (37)
- July 2005 (37)
- June 2005 (29)
- May 2005 (48)
- April 2005 (55)
- March 2005 (44)
- February 2005 (37)
- January 2005 (43)
- December 2004 (57)
- November 2004 (79)
- October 2004 (85)
- September 2004 (62)
- August 2004 (78)
- July 2004 (77)
- June 2004 (61)
- May 2004 (72)
- April 2004 (56)
- March 2004 (76)
- February 2004 (74)
- January 2004 (94)
- December 2003 (71)
- November 2003 (69)
- October 2003 (72)
- September 2003 (71)
- August 2003 (59)
- July 2003 (65)
- June 2003 (60)
- May 2003 (53)
- April 2003 (79)
- March 2003 (106)
- February 2003 (71)
- January 2003 (68)
- December 2002 (56)
- November 2002 (54)
- October 2002 (73)
- September 2002 (50)
- August 2002 (61)
- July 2002 (32)
- June 2002 (12)
- May 2002 (1)
- April 2002 (2)
- December 2001 (1)
- October 2001 (1)
- July 2001 (1)
- February 2001 (1)
- January 2001 (1)
- December 2000 (1)
- November 2000 (1)
- October 2000 (1)
- September 2000 (1)
- August 2000 (1)
- July 2000 (1)
- June 2000 (1)
- May 2000 (1)
- April 2000 (2)
- March 2000 (1)
- February 2000 (1)
- January 2000 (1)
- December 1999 (1)
- November 1999 (1)
- October 1999 (1)
- September 1999 (3)
- April 1999 (1)
- February 1999 (5)
- January 1999 (2)
- December 1998 (2)
- October 1998 (1)
- August 1998 (7)
- November 1997 (1)
- October 1997 (1)
- June 1997 (1)
- April 1997 (1)
- October 1996 (1)
- October 1995 (1)
- June 1995 (1)
- May 1995 (1)
- March 1995 (2)
- November 1994 (1)
- July 1993 (2)
![Joi Ito [logo]](/_site/img/joi-ito-logo-92x.png)



I talked to my XML class a lot last year about this concept. There are some good resources online. One that I like is an article that Tim Berners-Lee co-authored in Scientific American: The Semantic Web.
As a librarian by training, however, I think that the exchange of data among blogs is far from what metadata needs to be in order to supplant search engines. The big problem is lack of what librarians call a "controlled vocabulary" for describing content. RSS doesn't address topical content, only publication data. It's the equivalent of a distributed card catalog.
One of the reasons libraries (and librarians) still exist is that computers are still very very bad at organizing information conceptually. Google is better than most at figuring out what something is really "about," but it's still flawed in that regard.
For example, today I did a search on "blog" in Amazon's database. It found....nothing. (It did ask me if I meant to type "blow", which I found amusing.) Turns out most of the books are on "blogs." But a search on "blogs" doesn't turn up weblog, or blogging.
Similarly, I recently wanted to see if I could find syllabi for academic courses that were about HTML and/or web design. It's almost impossible to design a query in Google that will retrieve this information. It made me wish for *real* metadata, so that I could search for words only in the "subject" or "keywords" of a page, rather than in the full text.
Wish I had answers on this...or more optimism about the organization and description of content. It's a tough problem. But I suppose until it's solved, my MLS skills will give me some job security. :-)
You're right Liz. The Semantic Web is one more level up from RSS, but RSS will allow people to start working on tools that will get closer and closer to what we want. At least we don't have to "scrape" html anymore. When we were building the Infoseek Japan service, we licensed a weird database from Omron called SuperMorph-J. This was a huge dictionary of words and synonyms. We needed this because Japanese does not use spaces between words so we needed to "break" the words and because Japanese have 4 ways to write computer and we wanted to search results to show all of them. I think there are similar databases for English to include plurals and conjugations. Of course, understanding synonyms and being intelligent are completely different, but it's a start. If you could use something like a huge multi-dimensional vector space or a vast library of links to set up meaning relationships... There must be a huge body of academic work in this field already...
But I guess what I was trying to get at was that as we struggle to define "blogs" and understand what "IT" is, we are really creating vocabulary and connecting a variety of ideas together. This "meaning" is what is still missing. I think that blogs as a method for coming up with meaning and some sort of method for storing and linking to instances of that meaning could be a "social" thing rather than a more organized thing. Anyway, I'm groping, but it just feels like we have a bunch of tools in our hands that we didn't have before and that maybe they are solutions to some problems we haven't used them on yet...
The idea that came to me is that RSS is bootstrapping the Semantic Web. It's making people think "oh, how convenient it is to have all this is a portable format." They have to see the value in RSS, in RDF, in XML::RPC and SOAP, for things to unify into the next level. Without that vision, people begin to think "Oh yeah, that and flying cars", adding the Semantic Web onto visions of the future that came out of Metropolis and the Pulps.
There is tremendous potential. Metadata would already have worked without xml if it weren't for bad human nature aka spammers ruining it for everyone. Ronald Reagan had a good line for this: "Trust, but Verify".
eBay has a model that works. It may have holes, but it works. Some of us that understand XML and metadata and the technical nature of the web have many things we can do to make Joi Ito's dream a reality. We can use metadata and combine it with an *interactive* web to enhance the experience.
Google is testing this with their smiley / frowney faces on their toolbar. Unfortunately, spammers will probably ruin this as well, but Amazon's reviewer system borrowed a page from eBay's manual and it is a successful experiment imho and epinion's also did a nice job with this...
I'd say that Joi Ito sees the potential and it's true, it's there, and Liz sees the reality... But every achievement begins with a dream. The ball is in our court...
This discussion is above my head but I am striving. I am specially interested in the librarian's point of view since I am working at/with the Library of Congress searching for web based innovations to help with subject headings (SH). Joi knows that I have been very inspired by Plumbdesign's Thinkmap and their visual thesaurus. Their new version seems to present a step further towards a semantic web. By mouseovering any word, you can get its derivates. And the connections to related meaning appear, just as they should in a printed SH catalogue, but on line, it is very much more flexible. I wonder if the people who have commented on this, including Joi, have thought about SH as part of the semantic web and how it could be integrated in a cool interactive way. BTW, did anyone of you play with www.kartoo.com ?
The whole metadata concept is very interesting. I'm continually disappointed that XML Topic Maps are not generally included in this discussion. My guess is that, because they are subject-based, as opposed to resource-based like RDF, they don't work with the vision.
That brings up a really interesting question: who owns the metadata? Related to this: whose metadata do you trust? My thought is that the user ought to be able to edit the metadata that comes with a resource, but this is only a partially formed idea. My thought is that when users have control over how they associate stuff (see The Brain) then there will be uptake. We can already find stuff on Google.
I'm working slowly to develop a Plumb-like graph interface for XTM, and I'm guessing others are too. There's a lot of work to be done on usable interfaces for this stuff.