islam1.gif
IBM History Flow visualization of the "Islam" article on Wikipedia.
I think the gaps are where the page has been erased and restored.
See the IBM History Flow page for more details and examples.
I think this has been mentioned in the press already, but I confirmed with Jimmy Wales that a study done by IBM (The group that did the history flow work) tried to measure how quickly vandalism on Wikipedia was identified and corrected. They searched for pages where suddenly all of the content disappeared or a huge amount was deleted. They found that the median time for such a page to be restored was 5 minutes. This did not take into the account the process that where Wikipedians often refactor or move pages and redirect them which would show a similar behavior. So the median time is probably less than 5 minutes. In the context of our discussion about Wikipedia authority, I think this is quite an interesting and impressive statistic.

15 Comments

The IBM study is irrelevant, since the current criticism of Wikipedia is about minor errors. It is much easier to locate a massive vandalism of a contentious topic than to locate minor errors in obscure topics.

I love that IBM study/project. However, as Charles notes, though not precisely enough (contentious information is not an all-out error), the original issue of the authority of Wikipedia is unaffected by this. Vandalism and blatant misinformation are one thing. Reliability of the information - information which even a bunch of folks may believe to be "true" - is another.

Either way, IMHO, it's all good. Always question everything.
Folks who cynically mistrust easily do so because they want to trust too much, no? ;)

There is some relevance. Wikipedians follow recent changes. All new changes show up on top. The people watching a page will see the changes coming in. The point is that there is a mechanism for alerts for changes. It's not just people randomly wondering in and finding an error. You can also track users on Wikipedia so if someone goes around and makes lots of minor changes to corrupt the information, that is trackable too. Like commment spam. I agree with some of the reliability of information issues, but I'm responding more to Alex's challenge about how long it would take for the Wikipedians to find a bunch of intentional errors inserted into Wikipedia.

Ahh, yes, totally agree... they do seem to have a great system running. :)

I just talked to Jimmy about this and he said that people have "watchlists" tracking pages they work on. Some people have thousands of pages in their watchlist. They will check any changes on the pages they watch. If someone changes 1964 to 1963, it will alert all of the editors of the page. If the person is intentionally doing something stupid, they will be tracked. Also, there is what he calls the pirana effect. If someone makes a change, people all see the change and often this causes a flurry of activity around that change.

Doesn't seem to always work though. This guy (http://www.frozennorth.org/C2011481421/E652809545/index.html) posted a mistake a day for 5 days and waited to see how long they'd take to be corrected. He waited between 20 hours and 5 days for corrections and nobody corrected any of them so he decided to restore the correct information himself.

It'd be interesting to compare this with a popular open source software project where someone puts in code that's not blatantly stupid (i.e. fails to compile or pass a check in test) but that's rather shortsighted in design or that leaks a little memory.

it's interesting, but not impressive. this data only holds for vandalism, mind you. as chris mentioned, some badly researched information, which is faulty but looks plausible, can be in there for a long time before anyone cares to fasify/correct it.

About 50 people on the English language Wikimedia wikipedia have thousands of articles on their watchlist. The 100 largest watchlist sizes may be of interest.

The average is two watchers per article. Obtaining better statistics is on my to do list, no specific timeline. I'll probably produce a list of all articles with a count of the number of watchers when I do that, since that will be of assistance to those looking out for vandalism.

The next release of the MediaWiki software contains a feature to make it easier for the RC Patrol to get good coverage, a marking system to show that an edit has been reviewed. Personally, I think it's too easy to game that system but it'll probably be of some help in dealing with those who aren't being clever in their attacks, which is the common case.

James: Why not learn from the Slashdot experience and include metamoderation from the get-go? It seems like a system that includes both moderation (i.e., positive checks on both changes *and* on content), and meta-moderation that notes when moderators may have been wrong, makes some sense. Of course, this means that moderators will be less likely to touch an article on, say, scientology, but this needn't be a bad thing. It lets you know which articles have been carefully fact-checked and for which you need to be more careful in your own fact-checking.

See this validation test code and this revision by revision evaluation for one experimental approach. I have some reservations about how that sort of revision by revision evaluation will work for a million edits a month. But we'll see - incremental improvement is the way and if it's not right initially, it'll be refined until it or something is useful, effective and scales well enough to be usable.

Other issues being considered are better ways to record sources and ways to indicate that you're a professional in the field and have reviewed a particular version for accuracy and completeness. Both of these should eventually assist librarians and others in applying their traditional methods for evaluating the quality of an information source. Previously, and stilll in many areas, building the content has been a far higher priority than evaluating it in any formal way.

For all of these, considering how it may scale in human labor terms and how it scales for server load are big issues, in part because we're busy enough that the "slashdot effect" is now an insignificant load factor. That's good for demonstrating our capacity but bad if we make a mistake and do something which scales badly.:)

The best I can really say is "watch that space" and make suggestions for approaches which you think may be helpful - but preferably on the wiki pages discussing quality or in the bug/suggestion system at bugzilla.wikimedia.org.

It seems that the question is, who is watching the watchers?

In the case of the Wikimedia projects the answer is: the whole world is watching, and the oversight is certain to increase.:) It's very hard to avoid lots of oversight when you're this prominent, conduct almost all business online and keep extensive history of those things online for anyone to look at. This is all good, even the temporary inconveniences of testing of the quality control systems, which will help to improve those systems for the future. The more the merrier - we'll learn from all available sources and if capable pros want to study the project and provide useful tips in their study results, I for one will thank them for their help and encourage further work.

Michael, I didn't give you as full an answer as I should. Here are more detail of the review systems for the watchers.

For deletion of articles or blockings of IP addresses or accounts there is a log, available for anyone to read. That's been OK for smaller projects but isn't sufficient when the project grows, because the logs become too large for easy oversight of the actions of an individual, a key transparency requirement. Not logged at all at present are moves of pages, which can be done by anyone (though we can restrict it if someone is using it as an attack).

Haing recognised this, improved logging is being developed and will probably be included in the next release of the MediaWiki software. Ultimately I expect all deletions, blockings, moves and other activities which may need review to appear both as a part of the contributions record of each individual contributor and, where applicable, in the history of an article.

One limitation which continues at present is the inability of non-administrators to see the contents of a deleted article. Eventually I expect that to be overcome as well, in some way which prevents linking to what might be a copyright infringement or other content which must be generally invisible, but does permit broad community oversight of what was done and why. In this case, what we have works well enough, but not as well as it could.

There's a spam filter which prevents some things from being added. I don't think that the details of what is on it are generally visible at present. It probably should be (though it'll help spammers a bit if it is, that's something we can work around).

wonder if a wiki can be used as a template for community based verification of data:

ie. do the number of asthma cases that volunteers report coorespond to factory emissions? persons with asthma can be a collective human barometer towards identifing impurities in the air. they some how bring in the idea of glogging towards developing an interesting sampling that can be coorelated with photos of factory emissions: yellow hues and subjective smells near and about the nj turnpike.

honestly your a silly little man....and i don't know how i wound up here...put put your intellect to good use and sweep the f'ing street, or dig a ditch somwhere...and lye down in it for eternity ...if that wouldn't be to much of a bother.....

Leave a comment

10 TrackBacks

Listed below are links to blogs that reference this entry: Wikipedia heals in 5 minutes.

TrackBack URL for this entry: http://joi.ito.com/MT-4.35-en/mt-tb.cgi/2779

IBM tried to measure how quickly vandalism on Wikipedia, an open source encyclopedia, was identified and corrected. They found that the median time for such a page to be restored was just 5 minutes. This is just another reason why Read More

Joi Ito adds to the debate over the ability of wiki users to collaboratively locate and fix errors in wikis; in the case Ito discusses, experimenters used IBM’s History View to track how quickly vandalism in a Wiki was located and fixed by users.... Read More

Self-Healing Community Content from Coffeehouse at the End-Of-Days
September 8, 2004 12:35 AM

Because Wikis allow nearly anyone to edit content, they are susceptible by definition to vandalism. Intriguingly, nearly all Wikipedia vandalism is corrected within five minutes. Of course, hundreds if not thousands of bona fide editors watch for chan... Read More

Stever Rubel (Micropersuasion) cites (Via Joi Ito) an IBM survey of Wikipedia:IBM tried to measure how quickly vandalism on Wikipedia , an open source encyclopedia, was identified and corrected. They found that the median time for such a page to... Read More

Wikipedia heals in 5 minutes from BlueHereNow - Your Local Wireless Station
September 8, 2004 1:55 AM

From Joi Ito. IBM found that the median time for a vandalized wikipedia page to be restored was 5 minutes. See full story... Read More

Dan Bricklin's elegant essay on the lessons for system design and use of on-line and other information sources (Learning From Accidents and a Terrorist Attack) is very informative and makes some excellent points around the ability and availability of t... Read More

If you are still in doubt that Wikipedia, a Wiki that everyone out there can edit and change around, can work, I suggest you check out Joi's post entitled Wikipedia heals in 5 minutes. It links to a study (with Read More

For decades one historical meme of ARPANET was that if a hub on the net was damaged (by anything ranging from human error to nukes), the rest of the network would "route around" the problem and the network would still... Read More

Dispatches from the Frozen North suggests a number of ways someone could create lasting errors in Wikipedia . Read More

About this Archive

This page is an archive of recent entries in the Business and the Economy category.

Books is the previous category.

Computer and Network Risks is the next category.

Find recent content on the main index.

Monthly Archives