Why are comment spammers using the ‘Jakarta Commons-HttpClient/2.0.1′ client?
Ho hum. For techies only. While you’re reading this, probably the server is being irked by a comment spammer trying to post comments to point to a card-playing site.
They don’t appear (150-odd in 5 hours). I’ve tweaked the plugin which simply deletes them at once to report the http_user_agent that’s trying to post them. This gives ‘Jakarta Commons-HttpClient/2.0.1′ and reports that the http_connection_status (usually “stay open’ for normal connections, I think) is ‘close’.
Here’s the Google search: it’s an Apache program. Anyone got an idea why and how comment spammers are up to this? Nothing much turns up on this. Or am I the first to notice it, which seems astronomically unlikely?
- These posts might be related (the database thinks..):
- Google, Yahoo and MSN to fight comment spam with "don't follow" link instruction; spammers unlikely to be worried (20 January 2005; score: 57.1%)
- Sometimes you forget that there are clueless spammers too (18 January 2005; score: 50.35%)
- How are would-be spammers registering on my Wordpress blog if I've disabled registering? (15 February 2007; score: 49.67%)




November 1st, 2004 at 4:25 pm
It’s more likely a Java app - Jakarta is the Apache XML framework for XML.
November 1st, 2004 at 4:41 pm
I’d suggest that this interesting piece in Wired might be relevant as to ‘why’ (Google page rankings for their spamvertised sites) http://www.wired.com/wired/archive/12.03/google.html?pg=7
And if they’re using Jakarta Commons http client it’s probably because of the functionality it provides.
November 1st, 2004 at 5:20 pm
Oh, I totally understand *why* the blog spammers are trying to do it (which is also why I’m keen to stop it). When I wondered “why comment spammers are up to this”, I meant the persistent banging on the site despite the fact that it would not work. It *couldn’t*, because of how this blog is configured: the links they wanted to post just aren’t allowed by my software. They would always fail. A perfect computerised meeting of the irresistible and the immovable.
It’s the relentless way that they wouldn’t stop when they were trying precisely the same thing again and again without effect that I don’t understand. If it’s automated, was it trying to do it until it got a single result? In which case why not include some text saying “we’ll keep doing this until we get a link on your blog”? That at least I could understand, Tony Soprano-style. (I still wouldn’t allow it, but at least I’d feel there was a human behind it.)
As for the Jakarta Commons thing - what is the functionality it provides? The other “browser” allegedly trying to post here identified itself as Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 4.0; PCUser).
November 1st, 2004 at 7:34 pm
Brute force is a wonderful thing. All muscle and no brains. The comment spamming program just tries, tries and tries again. Perhaps it doesn’t recognise that it’s failing. Or is it trying to leave lots of comments for better page rankings? As for the software, that has to automate what I’ve just done - find the comment link, ‘click’ on it, fill in name, e-mail (both false no doubt…)and the comment (advert?!) and then ‘click’ again. All by ‘reading’ the html. Not something my browsers can do unaided. And perhaps the browser name should be taken with a pinch of salt - that can be programmed in as well (the Sam Spade spam investigation tool is an example of that).
So who are the people trying to do this? Where are they? The only link to them is the site they’re trying to advertise (given they’re using a botnet to place the comments, it seems). Are they sending ordinary spam too? (a trawl in groups may find sightings)
All this trouble you’re having puts me off the idea of a blog anytime soon…you spend the time setting it up and then along comes somebody to spoil the party. And then you have to spend more time in putting up defences, deleting the comments that get through and so on. Hugely, hugely irritating but what can you do? Perhaps there’s something like a realtime IP block list (like Sorbs, DNSBL, or Spamhaus for blocking e-mail spam) for bloggers suffering from comment spam?
November 1st, 2004 at 10:47 pm
Truly, it’s not that much of a hassle. Using Wordpress (as I do; I picked it because it requires the least software - just PHP and MySQL) solves a lot of problems, especially if you get the right plugins (I’ve referred to the ones I use - Three Strikes Plugin has saved me over the past few days, and Kitten’s Spam Words is the foundation too.)
Having a blog seems more positive than negative. As I have said elsewhere, the problem mostly came from three infected machines on the Verio network. Having blocked them, the problem stopped. I could ignore it completely, apart from that I like to know what’s been blocked.
November 3rd, 2004 at 2:46 pm
Are you sure it was three machines? I also had three Verio IP addresses; but they were very close to each other, and I guessed it was one machine getting its IP dynamically updated every 12 hours or so.
November 3rd, 2004 at 5:36 pm
Completely certain it was three different machines. The attempts came too close together. Here’s an extract from the reports:
Date: Mon, 01 Nov 2004 12:52:20 +0000Blocked comment from http://ws.arin.net/cgi-bin/whois.pl?queryinput=168.143.113.128
Date: Mon, 01 Nov 2004 12:54:48 +0000Blocked comment from http://ws.arin.net/cgi-bin/whois.pl?queryinput=168.143.113.124
Date: Mon, 01 Nov 2004 13:00:51 +0000Blocked comment from http://ws.arin.net/cgi-bin/whois.pl?queryinput=168.143.113.126
3 machines in 8 minutes. And continued for hours. Either broadband Trojans (odd though that their IPs are so close; or maybe that’s how bot nets get set up) or someone wiht a homespun spambot.