Joi Ito's Web

Joi Ito's conversation with the living web.

I haven't really commented on the "should blogs be in Google search results" debate, but one random question. What is a blog? What's the technical difference (from the perspective of a search engine) between my blog and The Register? I don't see how you can "filter" blogs. You can obviously change the page ranking mechanism to give certain types of sites an advantage or disadvantage, but I don't see how you can filter blogs. My blog is just a bunch of html created by a content management system.

If more people think that the google search results are poor because the top results are not "relevant" it means the ranking system is broken, not that something has to be "filtered". The whole point of a search engine is that it searches everything and finds the most relevant pages.


Perhaps then the suggestion is that Google has too much power, i.e. what if they only search say the pyra blogs?

Algorithm: count the instances of "I", "me" and "my blog". If total count is >100, it's a blog ;-)



Nice to see someone who *gets it*.

I'd reference comments I've made elsewhere on the topic, but I don't feel like digging them up. One's over on Scripting News' RFC on keeping changes.xml pure.

Seems to me that Google and the other big search engines started off with the goal and the promise of searching everything. This was back when everything meant mostly static web pages.

Now, if Google is to change this, then they are going to have to come up with policy on what they will index and what they won't. Good luck Google, that sounds hard, especially if they have to satisfy a broad and fairly picky userbase.

They could just say "no blogs" and then end up with the technical "what is a blog?" question. But they could also try and come up with "what does useful, indexable content look like?" question. That is hard to answer, but it is where the value is, I figure.

Deep thought required, Google. Good luck. Realise that breaking the original covenant with users, or complicating it, will send people elsewhere.

I don't think blog pages need to be identified if blog tools are changed to mark blog front pages (what goes into a RSS file) and common sections of each blog pages (primarily blogrolls) as not indexable due to fast changing nature.

I think most blog tool developers will go for this idea. Once the practice becomes de facto standard like robot.txt, peer pressure will limit proliferation of rogue blog tools.

on balance I don't see what the problem is. If I'm searching for a specific topic and come across a well informed blog then what's wrong with that? It provides me with information that probably isn't spun like the 'official' sites will be. In my eyes that's more important when researching things.

Blogs should be included in searches - what they write about might not be sales literature, or in-depth but the information is still valid...

What If I wanted to search for python scripts on Google? I would probably come across this blog at some stage (I probably should go and try this shouldn't i?) and why not? There has been much mention of Python on here recently and the content could prove to be pretty useful...

actually it strikes me as fairly simple.

google's search is based upon a core idea: when a human types in some search terms, they're looking for a document that other humans point to where either the humans pointing or the document has those terms.

blogs seem like a very good implementation of humans pointing.

however, on their previous tuning of the algorithm, there were few enough personal web pages that there was no real skewing of pages away from the "official sites." it may be that the personal web pages, blogging, are now over emphasized in the weighting of the search (since blogs are much easier to put up/use) and all you would need to do is tweak the weighting. searches are statistical events anyway... (in terms of signal and noise.)

Is this still an issue? Has anyone said "blogs should/will not be included in Google indexing/search results?
1- don't see how it's possible.
2- don't see why it's even desireable to do such a thing.

If anything it further fulfills Google's "raison d'etre" and tests it's algorythms as the body of searchable content multiplies every second...

The folks complainign about this are the ones that don't "get it"... which is fine... there are always those. They will be assimilated... ;)

Boris, there are a BUNCH of articles, but here's one.,12449,959151,00.html

Scoble says that Orlowski didn't make it up.

Hrmm... Ok if Googl eis getitng pressure form advertisers, then I see why they may consider it... I'd liek to think that Google sticks by it's guns (and it's algorythms)... In Scoble's NEC example, the simple fact that a NEC wouldhave to come to grips with is that maybe just maybe somebody's blog entry about their Tablet is more relevant than their marketing copy.

And that's what it's all about. Valuable information over hollow information.. If I want specs on that tablet, i will eventually end up at the manufacturer's site and hunt down the specs (not their useless marketing copy) but first I wanna know what people who have used it think. And no advertiser should weigh in on that.. That would be like a dictator muscling in on Emergent Democracy...

(yeah! this is, like, emergent democracy in the marketplace.. yeah... ;)

I always assumed that one of the reasons Google bought Blogger was to have a very large sample of Blogs available so they could tweak their algorithms and cancel out some of the blog borg effect, since, as Joi points out, there isn't really any other way for them to know what's a blog and what's not.

And, the rest of you must be having a very different search experience than me -- I find it annoying that so many useless blog entries come up in my searches: very often it is one search term in one post, another term in a different post and the page actually has nothing to do with what I was searching for.

If Google is rejiggering based on their knowledge of what bloggers are talking about in real time, I doubt it because of "pressure" from Big Corporations and more because the search results have been steadily getting worse over the last year or two.

Well here's a sign that Google has totally and completely and utterly LOST it. According to them, I am an authority on this! Above the actual research I quoted and all I did was babble nonsensically. That's seriously wrong. Technically, I don't think it will be difficult for Google to separate blogs from non-blogs. The issues are more politically frought since not all blogs are equal(some are more valuable than "official" sources and certainly not as frivolous as mine). What I'd suggest to Google is that they separate blogs from non-blogs but present them side by side like so (instead of siphoning them off to another tabbed page)...and to show ranking numbers beside each search result. That way no political favoritism is shown and it gives users the option of filtering out blogs altogether if they are not interested.

The problem is that not all blogs are bad, just most of them, the same way that not all websites are crap, just most of them. How can an algorithm distinguish between good blogs and crap blogs? PageRank was the solution for websites, but it doesn't work for blogs.

I imagine the solution will be to make PageRank a bit smarter, by making it sensitive to 'mutual appreciation clubs' on the web, where rings of sites simply link to one another, without much connection to the world at large.

Ultimately, though, Google faces a semantic issue. It's impossible (and will be for the foreseeable future) for an algorithm to recognise a 'good' blog or site, or for it to understand what the site is actually about. It is dependent on reading cues, such as links and terms, which are provided by humans.

I would think this is a sensitive issue for Google, because it presents an opportunity for a competitor to 'box in' and devalue what is perceived to be one of their key pieces of intellectual property, the PageRank patent.

maybe google headings in the HTML of webpages.

4 TrackBacks

Listed below are links to blogs that reference this entry: To Google, what is a blog?.

TrackBack URL for this entry:

Blog Definition from Third Superpower
June 9, 2003 9:18 AM

“What is a blog? What’s the technical difference (from the perspective of a search engine) between my blog and The... Read More

TITLE: Google Weblog Filtering versus Algorithm Improvements URL: IP: BLOG NAME: Randy Holloway's Blog DATE: 06/09/2003 11:35:06 AM Read More

Blog Definition from Third Superpower
June 16, 2003 10:31 AM

What is a blog? What’s the technical difference (from the perspective of a search engine) between my blog and The... Read More

No Robot.txt? You lose... from Observations from a Tech Architect: Enterprise Implementation Issues & Solutions
February 18, 2005 11:46 PM

People are just starting to see the downside of the power in search engines. Google, Yahoo, Altavista, and all the others aggregate an incredible amount of information from websites. Sadly, webmasters and system administrators seem to be unaware of u... Read More