Joi Ito's Web

Joi Ito's conversation with the living web.

I'm a mercilessly ip banning comment spammers until I figure out a better solution. If you try to post and I've banned a dialup IP address that you get stuck with, send me email and let me know.


Get the Trotts to add a comment approval queue and / or user moderation system. A nice-to-have would be some basic spam checking techniques or the possibility to integrate the comment system with spam blocking software.

Or find a third-party solution that does it.

However, what's really a pain is the way MT requires rebuilds of the pages after you delete the comments and how often (at least for me and other weblogs) it posts comments multiple times.

God I hate perl (*puts on asbestos suit*)

Hey Seyed! Haven't seen you around here for awhile. How are you?

Yeah, the crew is working on stuff right now. There are a few third party solutions out there. We're trying out some of the third party stuff. I'll post a status report once we have a view.

and what happens if our email gets tagged as spam? it's sad when legitimate use of technology is hindered as a result of misuse. most likely the people who get wrongly blocked from sending you a comment because they happen to share an IP address with a spammer (if the spammer isn't spoofing IP address - what will you do then?) won't bother with the hassle of validating themselves before they can post a simple comment.

the motivation for spam is to generate traffic. if you remove the URL link from your comments, you'll remove the motivation, without disabling the primary function of comments: communication. :: Killing Comment Spam Dead. This should do the trick. It's a brilliant soloution. Though the next version of MT really needs to do something about comment spam. I think a comment queue is a really bad idea though. A bayesnian filter should be nice. Something like SpamAssassin. Many servers are already running it, shouldn't be to hard to send comments through it. After some tweaking, it works really well.

I forget, he's releasing it as a plugin: MT-Blacklist. If it works as advertised, then it definitely belongs in MT.

The draw for a certain percentage of spammers seems to be to put their links on popular pages to get the indexers like daypop and popdex to notice their sites.
I've gotten a few of those, the Borises and ZipCode spams, so I wrote a little script that puts a layer between them and my site. ( Then, using robots.txt, you tell the bots not to index there.
If something like that were built in, much of the problem would go away, beacuse there would be no benefit to making that kind of post.

Of course, a better solution would be to have permalinks that did not include commenter's links, so that the aggregators wouldn't index links other than those added by the blog author.

If you're fighting Net.Kooks, you'll have a harder time. You'll need moderation, a karma system and an editor to comb through the tricks these losers come up with. The classic example is Slashdot: they get the most comments, a huge percentage of crap posts and malicious attempts to disrupt the site, but they never delete a post. Yet still, most readers never see the garbage - it's all moderated down out of sight

MT-Blacklist (or something like it) DOES belong in MT, because its functionality requires access to low and primitive levels of the MT system and its configuration should be seamless and adjacent to IP banning (which is a useless excersize whack-a-mole futility, btw).

We'll see what happens...

I will probably release the first version tomorrow.

Scott, I will try out other things... It appears that many of the spam posts come from the same fixed IP addresses at this point. I will go through and try to delete the dialup IP addresses if I have a chance. It's just getting really bad these days...

MT needs to implement a solution -- this really threatens blogging with comments.

I admit that SPAM is a big concern for me. At work we are getting so much SPAM these days that I am truly startingt o wonder about the future of e-mail. To see it start to proliferate in the blogging community is at the very least, horrifying.

That being said, one of my pet peeves is seeing the potential of a new technology stifled by the mere possibility that someone might abuse it. More and more companies are locking down thier systems and disabling features to prevent abuse (*cough* *cough* RIAA *cough*). At this rate I see e-mail systems that don't recieve e-mail, and who knows, someday an Internet that no longer connects computers. DO we really want a meta-community that does not accept meta-data?

What I see now is less of a reason to be afraid as a reason to be angry. The blogging community it very politically and socially active (as Mr. Deans campaign and the resignation of Gen. Clarks campaing manager have shown). What if it were possible to motivate bloggers to proactivly find a way to stop SPAM from ruining blogging for all. Not by shutting the doors to communication, but by opening them up and screaming out into the street.

This may sound like naive optimism. I agree that this isn't much of a solution, but I just can't tolorate the idea of running and hiding when confronted with a problem.

Last month people were running around "arrg-ing" because some website declares "talk like a Pirate day". I suggest a similar "SPAMers not wanted" campaign.

What do you think Joi? If anyone has the clout to motivate the blogging community to a cause...


I setup an app on Feedster to capture IP addresses that we want to ban as comment spammers and I'm happy to produce a queryable api that IP addresses can be tested against before comments are accepted.

Here are the docs:


I've publsihed this hack for Greymatter:

perhaps someone can port it.

Shit is an opportunity. Show wants to figure this out and make a little money? I'm not talking about comment SPAM, I'm talking about the whole SPAM/virus industry.

Got the image verification working now for MT. The patch will requires commentors to key in an security code display as an image.


Until Jay's plugin is out you could try my subclass of MT::App::Comments which also implements a blacklist.

Spam comments have been cut by a order of magnitude this week.

Quick comment about James Seng's solution: I had the same idea about a month ago and did a quick non-working mockup of it on my blog. Within hours I was told by folsk who concern themselves with web-site accessibilty that such a solution poses an unsurmountable obstacle for screen-readers, which are used by the sight-impaired.

In other words, it's not th ebest way to go as it would effectively shut out a portion of the community.

Just thought I should mention that.

(Screen readers obviously cannot read text in images. They read alt tags but providing them in this case negates the purpose as it would give spam-bots somethng to read and use to bypass the mechanism.)

Let's keep working on this potentially disasterous problem though! Courage!

Dougal Campbell [1] reports the problem of blog spammers posting porn spam to comment fields in various blogs. He proposes a centralized service of IP-tracking, while blogs can subscribe and pass new comments, IP and possibly other parameters to the service for check-up. A possible validation could take place thereafter.


when will anti-spammer learn that blacklist is at best a political tool?

when will anti-spammer learn that blacklist is at best a political tool?

spammers...they are no longer the dumb guys who just spam & spam. they have machines under their control, enough to DoS several anti-spam black list out of of existence.

How long do you think blacklist is going to last against them, when they turn their might on blog comments?

Easy solution - implement a 'Turing Test' solution along the lines of GIMPY from the CAPTCHA project. Both Yahoo! and Paypal use it on account creation, and it is suprisingly simple to implement.

How it would fit into MT is that in the comment entry section an image would be displayed, and the end user would simply have to enter the word that they see in the image before submitting.

The EZ-GIMPY code is written in Perl, integrating into MT should be a breeze (only one dependancy - gimp libs). Rolling-your-own can be done with any capable graphics library.

Hey Joi - I'm well thanks for asking :) Just getting back into blogging after a prelonged absence.

Good to see Jay Allen on the case... Jay rocks! That said, I hope SixApart compensate you for your work ;)

IP banning has too many weaknesses to be considered a real solution.

What we really need (IMO) is a federated identity service and global moderation system that doesn't block commenters so much as allows users to filter them based on the reputation of the poster. A reputation and trust system that isn't bound to a single vendor is seriously lacking. There are many ways to make this work but without the toolbuilders tackling the issue we're left with primitive firewalling or textual analysis. However, getting the increasingly fractured world of blog software vendors to agree on this seems unlikely at present. Guess we'll just have to wait for Hailstorm 2: Microsoft Strikes Back (aargh) to come and gobble it all up.

Oh and I really hate the idea of image recognition tests as a means of distinguishing genuine comments from trolls / spam. Not just for accessibility issues but also because it only solves one part of the problem (ie. barring automation).

> Good to see Jay Allen on the case... Jay rocks!

heh. Glad you think so, Seyed. And here I thought you were mad at me... :-)

By the way, all of you people who are advocating image recognition, stop. It is an accessibility nightmare. I was Product manager for the team that came up with the first commercial implmentation of it (at Hotmail, Paypal and all of the blind people of the world have us to thank/kill). We KNEW at the time that it was an accessibility nightmare but had to implement it to repel the serious abusers of our system.

It is NOT a good solution. Just because YOU can see, does not mean that your users can. To make that assumption is not only a mistake, but a slap in the face to a certain segment of society.

"Oh and I really hate the idea of image recognition tests as a means of distinguishing genuine comments from trolls / spam. Not just for accessibility issues but also because it only solves one part of the problem (ie. barring automation)."

It is not just 'image recognition', the Turing test can take many forms - including sounds. It is just a way to distinguish a human from a machine. Barring automation *is* a very important factor - simply because it is automation that makes spam so widespread. Spam has such a low 'hit' ratio that it is not worth while to manually send emails or post weblog comments. The spammers would have to go back to door-knocking and ripping off little old ladies.

The simple 'English word' that GIMPY presents is a lot more conveniant than entering a string of letters and/or numbers. Obviously a lot easier to setup than a 'national database'.

A central authentication system has issues - it is hard to make it internationalised (who holds the data?). Microsoft Passport would be the closest, and would be simple to integrate into MT, but who wants to have both their identification *and* internet usage history sitting with a corporate? (or a government, for that matter).

Jay said:

"heh. Glad you think so, Seyed. And here I thought you were mad at me... :-)"

Not a clue as to why you'd think that but rest assured - you rock in my books :)

Nik said

"It is not just 'image recognition', the Turing test can take many forms"

Sure which is why I said 'image recognition' not Turing tests... what non-image recognition Turing test mechanism would you recommend? I remember a similar discussion @ Mark Pilgrim's site a while back....

"A central authentication system has issues - it is hard to make it internationalised (who holds the data?)."

Aargh. I hate the concept of a central authentication system which is why I didn't suggest that but a federated one. The idea I'm advancing is identity management that's controlled by the users not corporations but is supported by tool vendors including MT. Think FOAF plus Web Services.

All it would take is an agreed format for identity requests and set of standardised services plus necessary tools to make it a no-brainer for non-hardcore geeks.

The centralised (prob. distributed) services wouldn't manage identities so much as utilise them to provide add-on services such as reputation.


* I use Tool X to generate my digital identity file (which may or may not be a REAL identity but just one I want to make visible online)

* I come to comment on Joi's web site and fill in the usual stuff (name, email etc). However, as this is my first time posting (checked via cookies or whatever) additionally I'm asked for either a URL to my identity file plus my passphrase. This would be a one-time deal for each site or comment system if using a third-party hosted solution. (The exact process may be something else, e.g. a challenge-response via email)

* The comment service does a simple request to one of potentially several, federated reputation systems and either accepts the comment (based on whatever threshold Joi has set for his blog) or not.

* Readers of Joi's blog may also choose to have higher threshold's than Joi has set, to filter out commenters they don't want.

Whilst the reputation services should follow a common request-response process they can use different internal mechanisms for calculating it. One might use the relative authority of the blog the identity has. Another may use some moderation system which can be plugged into blog tools. Another may be tied to the various social network systems emerging. Yet another might do a combination of these things. The identity has the choice on which he attaches himself to and the blog owner has the choice to select which he accepts on his blog.

I'm just chewing out some ideas and I'm 100% certain others can think of better mechanisms. The key points though I think will remain:

* The person should have responsibility for his digital identity (or identities, one for each purpose)

* The provision of tangential services such as reputation should be done in a federated manner with standardised access (though not necessarily equivalent implementations).

* Trust should be delegated voluntary. eg. The site owner delegates trust to a the reputation service to provide quality metrics.

Identity management should be a market not a centralised system IMO.

Some more spam that I came across appears to be this piece of Blogdex spam.

Seems like a tediously effective piece of spam only let down by the obvious title.

> * I come to comment on Joi's web site and fill in the usual stuff (name, email etc). However, as this is my first time posting (checked via cookies or whatever) additionally I'm asked for either a URL to my identity file plus my passphrase.

Giving your passphrase to any site other than the "trusted" site which holds your identity registration isn't going to be a good idea. Instead, you want a scheme where you authenticate directly with the identity service, and some sort of opaque tokens are passed between that server and the blog server to verify that you authenticated correctly.

Not much on my mind lately, but that's how it is. My life's been dull these days, but such is life. I haven't been up to much recently.

5 TrackBacks

Listed below are links to blogs that reference this entry: IP Banning comment spammers.

TrackBack URL for this entry:

「ウェブログの導入」だのなんだかんだは面倒くさいし、ようわからんという方も多いでしょう。 しかし、不特定多数に対して言いたいことを自分の言葉で語るというのは、なかなかス Read More

Is this a blogdex spam? So I checked out Blogdex today and found Read More

das Kommentar Spamming auf Blogs nimmt stetig zu (Bsp 1). Spammer m〓en mit automatisierten Scripts die Artikel mit Kommentaren zu. Wozu? Um Links auf Seiten zu hinterlassen, die zB Google nat〓ich mitindiziert. Somit kann man sich als Spammer peu ... Read More

Derek Powazek's essay on Gaming the system: How moderation tools can backfire. I also think this is relevant to the recent posts from Joi Ito on IP Banning comment spammers. Read More