If I were Microsoft I would probably like micro-content and metadata. IE and the browser wars were the pits for them. They should hate html by now. Microsoft also hates Google. Google hates metadata. Google likes scraping html, mixing it with their secret sauce and creating the all-mighty page ranking. Anything that detracts value from this rocket science or makes things complicated for Google or easy for other people is probably a bad thing for Google.

If the Net started to look more and more like XML based syndication and subscriptions with lots of links in the feeds to metadata and other namespaces, it would be more and more difficult to create page ranking out of plain old html.

My guess is that Microsoft knows this and intends to be there when it happens instead of totally missing it at the beginning like when the Internet got started. I have a feeling they will embrace a lot of the open standards that we are creating in the blog space now, but that they will add their usual garbage afterwards in the name spaces and metadata so that at the end of the day it all turns funky and Microsoft.

Just a thought...

24 Comments

I suspect the reason google avoids metadata is that it generally tends to be complete garbage. A ot of it is fake, some of it is legally questionable, and some of it is just incompetently written.

Yes, but when MS extends the format in XML, namespaces are a mechanism for dealing with that.

We can parse the tags and grok them or we can ignore them.

As a 'softie, I'd have to say the company has been filled with metadata zealots for years. Ever wonder why MS Word stores so much info that gets people in trouble and all ticked off? Metadata. These features were part of dreams that this would make indexing, author based search, relevance ratings, etc. of Word documents possible in 1997 (and in fact we have internal servers that do use the mechanism.) There are certainly those who could care less about metadata, but I think they're a minority.

RSS is very interesting in that it's a metadata standard been embraced by much more visible folks than those eating/breathing document management, databases and the such. It's one of first examples of where it's easy to identify my public diary (blog) with a way to make it easier to read a bunch of them (aggregator). RSS is the magical, easily understandable thing that links those two.

So yes, my bet is that MS will get more active publicly space and get knee deep in RSS or something similar. I'd argue the change is not in MS, but in the rest of the world embracing. The jury's still out I think though on whether metadata will live up the zealotry behind it.

And as for HTML, think about how abused 'meta' have gotten over the years.

Perhaps I'm naive, but I thought the key to Google's success was augmenting a fast-but-dumb full-text search engine with one laboriously-calculated numeric metadata field, the incoming link count.

I have an essay on this at http://www.tbray.org/ongoing/When/200x/2003/07/29/SearchMeta

As for Microsoft, it's never wrong to be paranoid about them, Joi could well be right. But the great thing about blogs is the human voice, which is even more expensive to capture than metadata.

Tim, I've learned that it pays to be paranoid about Microsoft, for sure -- but I've also learned to not turn my back on you, and I get a sneaky suspicion I should be watching out for Joi too.

I've been personally involved in some of the secret blog discussions inside Microsoft and so far we're playing the high road. Will we go down the low road at some point in the future? I'll try to prevent it.

Come to the PDC to see how we're using RSS 2.0.

Re. MS, I suspect it's just that as an entity it's a big lumbering beast, and slow to move into new areas. Smart individuals within MS are certainly looking at blogging and metadata in general. Also don't forget MS came out with the Channel Definition Format (CDF) for syndication even before the good Mr.Winer got in on the act.
Another consideration is that the relational file system I believe is intended to go in Longhorn will probably fit well with metadata-based formats.

As Tim's piece suggests, the "Metacrap" article is off-target and well past its sell-by date. It made sense in the context of html meta, but those days have passed. Even Cory Doctorow (author of that article) and co. now have a totally non-crap RDF metadata feed:

http://boingboing.net/rss.xml

OK... let me clarify a few things. Metadata is a loaded word. The metadata I was thinking about when I wrote this was things like friend of a friend (FOAF), Musicbrainz music ID tags, ISBN #'s, blogrolls vs. permalinks, Creative Commons license, etc. All these things are things that blog tools understand and love.

I forgot that one of the things I would do if I were MS would be to hire Scoble. ;-) I agree that MS is on high ground and I think they benefit from this. I'm sure they'll try to do it the right way. I just think that when they finally "embrace" all of this and try to really do something, it is inevitable that it will get very complicated. I guess if cell phones have embraced the standards too, then it might be bit more difficult for MS to own everything, but it's just my sense that when everything is integrated into Longhorn, less will be open and it will get more and more messy... but I'd love to be wrong. ;-)

Robert - a bit of typing latency there, I would of course include you in the smart individuals at MS ;-)

Let me add one more thought...

So if the Net turned into a bunch of people reading RSS feeds rich with metadata, aggregated by web services, read in news readers, wouldn't MS love it and Google hate it?

PS, I would also add GeoURL as another form of good metadata.

First of all - thank God Joi is NOT Microsoft. Though it would be fun to have Joi running around with a $50B war chest to invest, something tells me he wouldn't be the same Joi. And as far as Microsoft supporting open standards, I'm afraid it'll probably have to be the other way around. Our open standards will have to gateway or support THEIR kludges. But there's no reason why that can't happen. And as Dave will tell yah - since we can only trust them as far as they can be thrown (and Ballmer is a BIG guy) - then we HAVE to assume that any good ideas we come up with - they'll steal (oh excuse me, Scoble's reading this - I mean "learn from.....")

Marc: What is there to steal?

If someone shows that there's a business model in all of this, we'll just acquire. No?

The best thing Microsoft could do is just provide a platform on which RSS or Atom will do more things than if you viewed RSS or Atom on Linux or Macs.

In a good world, Microsoft will get back to being a platform company.

Microsoft tried syndication once before and failed miserably. Remember CDF?

Why did it fail? Not for technology but because Microsoft forgot about the little guy and only cared about what Disney wanted.

If Microsoft has a disease, that's it. We focus on where the business is today and the heck with the rest of the stuff.

Of course, for you and Joi, that's the opportunity. Make a business out of this stuff and you'll be set.

This entire discussion is silly. The premise that a Web full of structured metadata would somehow "foil" Google is laughable.

Yes, entirely too silly. And now for something completely different.

Marc, you're full of shit too.

And if you can tell me what I know I can tell you what you know.

You know that it ain't just Microsoft who plays the nasty game. I seem to remember Quicktime causing you a lot of grief. Hey at least they didn't call it Videoworks. That would have really messed things up for you, wouldn't it?

Microsoft isn't the bastard you have to worry about right now. First get everyone else on the same page about RSS and we can talk about the Big's. Right now it's Joi's little buddies who decided to reinvent, just like Apple, and they did something even uglier than what Apple did and Microsoft does.

How many times have you read an article about RSS where they talk about how the people can't even agree on what it means. Give that squarely to Ben and Mena (and don't forget the Google guys). And look at Joi's page. He's doing it too.

I've heard Joi talk the talk, now it's time to walk the walk. You can't have it both ways mah man. Either it's evil to pollute the standard formats, or it's not. You can't have not be evil when the baby squirrels do it, and be evil when Scoble's employer does it. (And they haven't even done it yet, how about that.)

After taking out extraneous and controversial parts out of Joi's hasty post, I do find there is an interesting grain of thought:

How will arguable trend toward small XML fragments (micro-content with fine-grained metadata) affect the market?

OK Dave - let's use this forum to try and work things out. First of all - no name calling or personal anything. 2nd of all - I have been a supporter of RSS 2.0 - and will always be. 3rd of all - you appear to be winning the suscription wars, so don't gloat. I agree that having two different formats with practically the same name - is crazy - and they don't even HAVE a name for this new format, so.....

Anyway - I'd like to use a paralell issue that Ray Ozzie and I blogged about today - which is about the new Upcoming calendar service. The question is: "why is there no "standard" for Calendar Events in RSS?"

Upcoming appears to be using RSS 2.0 for it's subscription format, but when you receive that feed - it has all the event information in text form, lacking any represention of the event's date, time, venue or category. What we want to is make a Calendar Event a new kind of micro-content. Just like a blog post, it can easily be standardized and utilized throughout a wide range of on-line apps and services. We all can exchange these events and......

Now I KNOW there is a way to do that with RSS 2.0 and I'm not sure if you'd call it a namespace extension or what - but I KNOW it's possible. Obviously aggregators will have to know of this namespace extension - but Matt and Paolo have been showing what can be done with their ENT extension, so why not an extension for Calendar Events? And why not an extension for Resumes, Reviews or even People?

Right now I ask - "how would you like to see these sort of new extensions developed?" "Should they happen within the RSS 2.0 framework or part of some new, unamed subscription format? I really would prefer to stay out of the format battles - I just wanna build some cool new digital lifestyle aggregation products and support the concept of subscribing to these new kinds of "micro-content". What do you suggest?

Dave, careful with the analogies you pick.

Director (nee Videoworks) was a proprietary, closed format with some hard-coded limited assumptions (uniform framerate, slowing down video to match CPU is the right answer, Packbits compression is all you need, keeping everything in RAM is how you solve performance problems).

QuickTime lifted these assumptions and generalised them - published file format; arbitrary frame durations; keeping audio and video in sync by dropping video frames; general compression model; arbitrary number of data references in an edit list.

These are some of the reasons QT is used to edit uncompressed HD video today with substantially the same infrastructure and API, and Director isn't.

Don't assume this is me taking sides in the RSS wars, just noting my viewpoint on the history of your analogy.

Robert, CDF really wasn't a markup for syndication we know it -- it was the model that was the failure, not the data.

As for the Windows platform providing more things to do with RSS than Linux or Mac OS X -- I hope so. These are operating systems, they shouldn't do a damn thing with RSS. That's the lesson that MS keeps forgetting, and I see no difference with the new directions.

Don, you're using RSS as a model for metadata and content I'm assuming with your comment -- metadata with microcontent. The world does not revolve around RSS, and I don't think Microsoft's new interests is focused on Better Things to do With RSS.

Dare, well organized metadata will not foil Google, but undercut Google's edge with its magical algorithms, yes. I imagine that MS also plans on being more direct -- like build search engine capability into the OS.

Finally, I apologize to one and all for ever coining the 'baby squirrel' term, as I see it being used out of context.

In this syndication debate, the juice in the aggregator tool space and not with weblog tools.

Why? There is so much out there that is worth subscribing to if it had a feed (from HBO high-definition schedules to low cost air fares on routes I frequently travel). Current aggregators just fall short, and that is likely where Longhorn will clean up. To make this clearer, the writing tools are much less important than the tools by which you aggregate and manipulate the data you subscribe to (the ratio of writers to readers will always be 100 to 1). A smart approach for Microsoft would be to embrace the quirky weblog world's syndication format, put an advanced aggregator with a world class search engine on everyone's desktop, and extend the format into everything else. The fact that it started with weblogs will be historical errata in five years.

Thank you, John! Finally I understand why RSS and agregators are such a big thing.

I don't agree that we have to wait for Longhorn to get aggregation+search, nor that Google needs to lose sleep over it. Imagine if Google were to buy Bloglines and integrate it with a Subscribe button placed next to every link in a results set that was to a site containing an RSS feed.

I am an admittedly biased source, but I completely agree with John's take. How you capture and organize all of your online information (whether received via automatic feeds, emails or discovered while you search and browse) and what you can actually "do" with that information once it’s in your local sandbox is pretty integral.

Let me give you an example. My experience in planning a trip to New York is a hell of a lot more meaningful both to me and an eventual recipient of that information (the 100 to 1 factor that John refers to) if the listings of hotels, restaurants and sites to see that I chose to visit can be meaningfully organized and acted upon (e.g., push pinned into an interactive map, plugged into my trip planner rolodex). Similarly, if the price quotes, online articles, blog posts and/or emails that drove my trip planning decisions are interconnected to all of this information, I have created a context that transcends folder, inbox or bookmark in a more than the sum of the parts fashion. If anything, such an outcome only increases the value of the human voice. At the same time, this implies a more structured and action-able approach than the aggregators I have played with so far.

I could be wrong, but I certainly hope and don’t expect to have to wait for Longhorn to be released for this scenario to start playing out (although MS has a really strong position when it is), and I suspect MS will focus on big corporations first, and the consumer only later.

Leave a comment

10 TrackBacks

Listed below are links to blogs that reference this entry: If I were Microsoft....

TrackBack URL for this entry: http://joi.ito.com/MT-4.35-en/mt-tb.cgi/1012

Joi Ito, presumably in response to this news, wrote the following about a possible Microsoft strategy as regards to Google, searching, and metadata: Google likes scraping html, mixing it with their secret sauce and creating the all-mighty page ranking.... Read More

Joi Ito has some interesting "out of the blue" commentary about Microsoft, Metadata, Search, and Google, and Scoble commented back Read More

Robert Scoble has been spending the past few weeks trying to answer stuff about Microsoft's blogging intentions. Today, he's just come out with a nice article that looks at Microsoft's philosophy behind developing Software for the masses and not for th... Read More

Robert Scoble has been spending the past few weeks trying to answer stuff about Microsoft's blogging intentions. Today, he's just come out with a nice article that looks at Microsoft's philosophy behind developing Software for the masses and not for th... Read More

Robert Scoble has been spending the past few weeks trying to answer stuff about Microsoft's blogging intentions. Today, he's just come out with a nice article that looks at Microsoft's philosophy behind developing Software for the masses and not for th... Read More

Robert Scoble has been spending the past few weeks trying to answer stuff about Microsoft's blogging intentions. Today, he's just come out with a nice article that looks at Microsoft's philosophy behind developing Software for the masses and not for th... Read More

Robert Scoble has been spending the past few weeks trying to answer stuff about Microsoft's blogging intentions. Today, he's just come out with a nice article that looks at Microsoft's philosophy behind developing Software for the masses and not for th... Read More

Read two posts recently that I think demonstrated the use of rhetorical devices that I don't find as effective as their authors apparently do. Read More

to displace Google even with all the metadata and micro-content that Joi might imagine. Read More

In Metadata, Semiotics, and the Tower of Babel , Tim Oren rants against Read More

About this Archive

This page is an archive of recent entries in the Business and the Economy category.

Books is the previous category.

Computer and Network Risks is the next category.

Find recent content on the main index.

Monthly Archives