Email obfuscation is a fucking joke

I'm getting fed up with the programming community's refusal to think like the enemy. Many web applications which have something to do with email (most of them) utilize techniques to obfuscate email addresses. Geeks also do it when they post to public forums thinking that it will protect them from spam. This whole canon of obfuscation is a total joke. Let me give you some examples of what people are doing and why it's dangerous. I say dangerous because a false sense of security is far worse than no security at all.

Movable Type

Movable Type utilizes a technology called spam_protect. It turns user@example.com into user@example.com. Those funny looking strings where the @ sign and . used to be are html encoded versions of the @ and . signs. The idea that this would prove to be anything more than a hiccup for a spammer is ludicrous. I'll just give you an example rule that a spam harvesting robot could use. If you find the string ".com" or ".org" or ".net" in a web page, search backward for the first space and capture that as an email address. Then either interpret the address through an html interpreter or simply do a search and replace for the two strings to turn them into @'s and .'s. I could write this in one line of code, and I can't program worth shit, definitely not well enough to write an efficient spam harvester.

php.net and every geek out there

So you've probably seen this clever technique where the @ sign is replaced with the word at and the . is replaced with the word dot. You'll find this on comments all over slashdot. This way the spam harvester can't merely look for an @ sing or a .com. I'm sorry, but this is never going to work. How often in normal text do you see any of these words being used. Dot, Com, Org, Net. Ok, net is probably common, but "dot". Looking for ".com" and looking for " dot com" seem about the same to me. Again, a harvester would bypass this trick with a single line of code.

Mailman and many other programs

The reason that this has come to my attention is two fold. One, I recently disabled the display of "obfuscated email addresses" through movable type, since a child could write around the obfuscation. I've just installed mailman for our mailing lists (a great program mind you), and it uses just as lame a system. This time it's combining the at for @ and dot for . with a mailto: tag with the full email address. This approach is the worst of all of them. When a human visits a page with email addresses they display as user at example dot com, but when a harvester views them (at an html level, not a viewable level) it has the full email address, user@example.com, right there with a mailto: tag in case it wasn't obvious enough that it was an email address.

I don't have a killer app solution here, but we need to get rid of this placebo shit and accept that we don't have an answer and that we're vulnerable. With the placebo it will just take that much longer for a real solution to be found and implemented.

14 Comments

  • Jacob says:

    Occasionally I’ll use l33t 5p34k to protect my email address from spam bots. But that only works if the person reading the page is fluent in l33t.
    For the most part, I’m of the opinion that one can’t stop spam at the address-harvesting level. As you’ve noted, solutions that obfuscate an address are ridiculously easy to bypass. And once you obfuscate past a certain level, “decoding” becomes more of a pain for a real person that wants to send you email than for the harvesting bot.

  • gene says:

    Looks like last summer the CDT did a kickass experiment regarding how spammers harvest addresses. Their findings, Why Am I Getting All This Spam? are very interesting. I stick with my beliefs about the uselessness of obfuscation however. The data about spam and email addresses on public web sites are really interesting.

  • Jacob says:

    My favorite quote from the study
    “E-mail addresses need not be incomprehensible, but a user with a common or short name may want to modify or add to it in some way in his or her e-mail address.
    For further information, please contact Ari Schwartz at the Center for Democracy & Technology, 202-637-9800, __ari@cdt.org.__
    (emphasis added)
    ari@cdt.org? What were you just saying about modifying short names?

  • didofoot says:

    you make it sound so scary, like there are aliens waiting to take our email addresses. and our children. i am clutching a glass of water right now and waiting for one to show up in my computer. just try it, spamalien.

  • Brooks says:

    Do spammers really want the email addresses of people that are trying that hard to not get any? If I were a spammer, I don’t think I would spend any effort trying to decode email addresses of people that will only hate any company that spams them whether directly or indirectly. Just a thought.

  • dianna says:

    But if you were a spammer, you probably wouldn’t care much about the success of the company for which you were spamming, because you’re not an employee, just someone who’s been paid a few bucks to send out a crapload of spam. The more craploads you can send out, the more bucks you can get paid. So you have every incentive to go out and grab as many emails as possible.

  • gene says:

    That’s the main point, the spammers don’t have any incentive to collect specific types of addresses, they get paid on a per address basis regardless of it’s quality. An email address translates to cash for them, regardless of the outcome between the email address holder and the company selling the products. The methodology is broadcast not targeted. The idea isn’t to increase the quality of the addresses the send spam to, it’s to increase the quantity.
    If spammers were attempting to increase the quality of the addresses, the world would be a better place since they would work with people like you and me to remove our names from their lists and therefore increase the quality of the lists.
    Unfortunately, that’s not the world we live in.

  • Brooks Ayola says:

    Point well taken. It seems that the spammers that had been the source of aggravation a year ago for one of my clients aren’t using the methods to bypass simple Java email scrambling techniques, as this has worked wonders for them. So I guess it’s better than nothing considering how easy it is to copy and paste the code into a page.

  • gene says:

    Ya, no matter what I say, that study done by the CDT says that stuff like javascript and HTML encoding, at least as of last summer, works and stops spam. I don’t think this is an extensible solution though. And I’m a zealot about some stuff and refuse to change my opinion even in the face of overwhelming evidence.

  • Scaberous says:

    I truly and sincerely hope that every scum spammer contracts brain cancer and dies a horrible painful death.

  • AB says:

    What’s needed is a server-side script that will dynamically convert email addresses to gif images. That way it looks like a text email, but it’s not, it’s an image. Of course this wouldn’t help a mailto: link very much, but it’s a start…

  • gene says:

    This is a good idea except it’s been considered. Optical character recognition could easily convert an image to text in an automated fashion. So you may ask, what if they obfuscate the image. Carnegie mellon is constantly working on a project called CAPTCHA which does just this. However every other university is working on AI programs that show they can still succesfully OCR the image Berkeley being one of them.

  • Alex�nder Murillo H. says:

    This code works a little bit better tha having a @ in the HTML code.
    <html>
    <head>
    <title>
    </title>
    </head>
    <script language=”JavaScript”>
    function myEmail( user, dom )
    {
    document.write( user + ‘@’ + dom );
    }
    function myMailTo( user, dom, caption )
    {
    if ( caption == “” )
    {
    document.write( “<a href=\”mailto:” + user + ‘@’ + dom + “\”>” + user + “@” + dom + “</a>” );
    }else
    {
    document.write( “<a href=\”mailto:” + user + ‘@’ + dom + “\”>” + caption + “</a>” );
    }
    }
    </script>
    <body>
    <b>Ocultar e-mail de los indexadores para spam</b>
    <br>
    <br>
    <font color=”navy”>
    <b>
    <script language=”JavaScript”>myEmail( “user”, “domain.com” )</script>
    </b>
    </font>
    <br>
    <br>
    <script language=”JavaScript”>myMailTo( “user”, “domain.com”, “” )</script>
    <br>
    <br>
    <script language=”JavaScript”>myMailTo( “user”, “domain.com”, “contactenos” )</script>
    </body>
    </html>

  • btreehugger says:

    I’ve found that there is no easy way to do this kind of thing on the client side. I personally have a thing against people relying on active scripting as a cop-opt for doing things properly. The only way I can see someone effectively stopping spammers from getting their email address is to avoid it altogether – have an online facility to send the message instead of a mailto link in your source. That way, the person sends you their messages and your program will send it to you. Of course, if someone really wanted to get to you they would hammer this “service”, but at the very least it becomes a DDoS-type attack instead of a spamfest. Someone would really have to want to get to you, because you could impose limits on the number of incoming connections from particular IP-addys or particular message bodies/search strings. Still imperfect, and requires another extra layer of crap. But hey, it’s not as much of a placebo as I feel the other solutions are, nor is it as reliant on active scripting, and helps make your site far more accessible. Thanks for stating what should be common knowledge, seeing at least a few people on this forum informed makes it all worthwhile to me.

Leave a Reply

Your email address will not be published. Required fields are marked *