Mollom blocks more than 500,000 comment spam attempts a day. That volume provides a unique perspective on the world of comment spammers, including the world's best and worst spam techniques. Below are some excerpts from some of the more interesting spam attempts which we see frequently on Mollom's back end.

1. Some spammers try to embed flash objects in the comments section of a blog post or article. Really? Yes.

Spam techniques flash

2. Spammers randomly generate spam messages as illustrated by the excerpt shown below. Some comment spammers have obviously buggy scripts ...

Spam techniques buggy script

3. Some spammers try to take advantage of other companies’ positive brand and reputation. In the example below, the spammer tries to leverage Facebook's reputation to build a positive Mollom or Akismet reputation of its own.

Spam techniques facebook

4. In the example below, this spammer used a free site building service, webs.com, to build a spam site. If not a free website building service, spammers will abuse incorrectly configured content management systems. Of course, there is some good old shouting too.

Spam techniques good ol shouting

5. A very common spam technique is to copy relevant content from a site, and to sprinkle in some advertising. The excerpt below shows a spam message posted on a blog post that talks about Drupal.

Spam techniques content reuse

6. As strange as it may seem, there are spammers that will simply post gibberish. My unproven theory is that they keep track of the gibberish they posted, and then register the domain after it has a reasonable ranking on Google. Spam first, create the spam pages later. This is one of the more difficult techniques to block for Mollom.

Spam techniques gibberish

7. Then there are spammers who try to leverage image tags to inject image spam in the comments of a blog post.

Spam techniques image

8. Some will try to use OpenID to by-pass e-mail verification.

Spam techniques openid

9. Another trick that spammers will try is to insert the Google ad section start. This tag is normally used by site owners to tell Google about the text and HTML content that they'd like Google to emphasize when matching ads to a site's content. Spammers try to trick Google into believing that their spam comment is the most important content on the page. Could be deadly for your search engine ranking, and could really hurt your advertising revenue. Evil!

Spam techniques google ad section start

10. Last is the simple, but somewhat clever approach of trying to trick spam filters by injecting unnecessary spacing.

Spam techniques spaces

There are other techniques but this should give you a sense of the strategies used by comment spammers. It seems like they are becoming more and more creative every day!

Comments

Jeremy Martin (not verified):

Good post - it's nice to know what strategies are being deployed. Another common one I've seen (when the user name is allowed to be a link back to the poster's website), is to simply stuff some SEO keywords into the user name, link it back to the site, and then duplicate another comment in the message board for the message body. I have a notification that lets me know whenever there is a duplicate comment posted... 99% of the time it's spam.

t-dub (not verified):

Fun read! I'd be curious to hear more about #3. It would be interesting to hear how Mollom is counteracting such attempts. I also have to wonder if this approach could damage the reputation of legitimate sites from Mollom's perspective. It seems unlikely that facebook.com is going to acquire a poor rep, even if this becomes a popular technique among spammers, but I could see a site with fewer organic links in Mollom's database getting nicked.

Patrick Hayes (not verified):

#6, "This is one of the more difficult techniques to block for Mollom.".

Couldn't you make a rule of thumb that checks to see if there is at least one or two words in the comment that are in the dictionary? This may catch a good 50% of the random text spam. Or am I missing something?

dalin (not verified):

#6, "This is one of the more difficult techniques to block for Mollom.".

You can also use DNS to validate the email address. Make sure that there is an MX or at least an A record for the domain.

Ironic that Mollom thinks that this post is spam.

Damien Petitjean (not verified):

I never saw that ! Waw

At the beginning, I thought it was a joke. I don't understand why some spam comments try to make links to websites like facebook, do they think that they can put Facebook in sandbox into Google's servers ? I don't think so.

Damien, CEO of CompareMandataire

Giles Kennedy (not verified):

The few spam comments that I've seen that have got past Mollom have been of type #6 or similar (sometimes there is a short sentence in the body as well as the junk link). I've also noticed occasional searches for what looks like a random string, and I wondered if a bot or a spammer was searching for some gibberish they'd previously tried to post, but I've not tried to correlate against the Mollom spam entries in the watchdog. Maybe a dumb bot even mistook the search form for a comment form..!

My own unproven theory was actually that #6 is "probing" the defences, testing if a certain type of content gets past (which as I say it did a few times). Possibly also the spammers hope that by getting something through this would improve their reputation.

In terms of trapping #6, I guess that if the same domain name (which has not been encountered before) suddenty starts appearing on multiple sites then it's a bit suspicious, especially if there is no A record for it.

Dries:

We're not currently checking whether the A-records or MX-record exist. I did some research and it seems like it can be done with reasonable effort. I even found some example code; e.g. http://www.rgagnon.com/javadetails/java-0452.html. Based on the article, it sounds like it is easy to run into various problems though (e.g. grey listing forcing you try again later). We need to investigate that more.