News & Views » Cover

Damn Spam!

A descent into a world of organ enlargements, toner cartridges, con men and spam fighters

by

comment

Page 4 of 5

He also serves as chairman for the Anti-Spam Research Group, an arm of the Internet Research Task Force, one of the foremost independent Internet industry think tanks. Which pretty much makes the 26-year-old Judge a big wheel in geekdom.

Of the various spam-filtering programs, the most basic is a rule-based filter, which weeds out mail that has selected spammy words and phrases in the subject line, such as "wild teen sluts," "low-priced," "toner cartridges" and that old standby, "wild teen sluts prefer our low-priced toner cartridges."

The way around this pitfall, spammers have found, is to misspell key words, add hyphens or use an unrelated phrase in the subject box, such as "hi there."

A somewhat more sophisticated filter is one used by Brightmail, a company contracted to screen out spam for EarthLink -- which calls the service "the Spaminator" -- and several other large ISPs.

Brightmail sets up thousands of decoy e-mail addresses and fake open relays that are intended to attract spam much like shit draws flies. As soon as spam hits a bogus inbox, the program scans it and then proceeds to eradicate any duplicate e-mails.

The spammer antidote to Brightmail, however, are the strings of random characters that often appear in the subject line: "Hot granny-grabbing action! vvgh3y7kxwq." Using a program that adds different characters to each e-mail, spammers have found they often can foil the filter.

Last year saw the introduction of the most effective anti-spam filter so far, a type of program based upon the complex probability theories of Thomas Bayes, an 18th-century British mathematician who created a new branch of algebra that is the basis of many modern-day Internet search engines.

The advantage of a Bayesian filter is that the program gets progressively "smarter" over time, according to its creator, MIT-educated hotshot hacker Paul Graham.

To get the program started, you need two inbox trash cans -- one to dispose of legitimate e-mail and the other to get rid of spam. The filter analyzes each e-mail, picking out the 15 most "statistically significant" words in order to compare the occurrence of spammy words (i.e., "orgy," "refinancing," "prescriptions") against those of unspammy words (i.e., "Kafkaesque," "hypotenuse.")

The filter starts from scratch but should be almost totally effective within three or four days, says Graham, who sounds like a well-read surfer dude.

"If you write a Bayesian filter program that doesn't screen out at least 99 percent of the spam, you'll be laughed at by the other programmers," he says.

By contrast, Brightmail's effectiveness as a spam-blocker is estimated at around only 70 percent. "All they end up filtering out is spam by people who don't know what they're doing," Graham says.

Graham, who is semi-retired after signing a lucrative programming deal with Yahoo! a few years back, would like nothing better than to be known as the man who killed spam -- and he thinks it could happen.

"Contrary to popular belief, sending spam isn't free," he says. "Spammers do have a profit margin and if you can cut it down to almost nothing, it won't be worth their time."

For instance, he says, the cost of sending out 1 million e-mails is approximately $200, for which a spammer might earn $500 in commissions -- a profit of $300. But if a spammer is forced to send out 2 million e-mails in order to make the same $500, eventually he'll go into a different line of work -- or, better yet, starve.

The bad news is, of course, that spammers study every new anti-spam program to try to learn how to beat it, Judge says. Already we're seeing spam that includes sequences of non-spammy words that are invisible to the recipient, but can be read by the filter. It will be another year or so before the long-term effectiveness of the Bayesian system can be determined, he says.

In the meantime, EarthLink recently introduced its newest product, Spam Blocker. Termed a "challenge-response" system, it automatically responds to e-mail from every new sender, requesting that he copy a three-character series into a box before his message will be delivered. The trick is that the characters are contained in an image that cannot be read by a computer program, ensuring that your mail actually comes from a real person.

Blame It On Boca Raton
Do you recall a time when, still un-jaded by the sight of your e-mail inbox clogged with anti-aging offers and wicked deals on toner ink, you wondered, "Where does all this crap come from?"