Lunchtime Musings: Ed Felten on Bayesian Filtering


Okay, it’s a little sad that I’m sitting here typing between bites of my sandwich (at least it’s not a cheese sandwich), but I came across Ed Felten’s Victims of Spam Filtering post this morning and wanted to note a couple of things about it. Well, I suppose that I actually want to note one thing: that I entirely disagree with his logic.

While it’s best for you to go and read his entire post, I’ll copy the first two paragraphs here, since they’re the ones that set off alarm bells in my head:

Anyway, this reminded me of an interesting problem with Bayesian spam filters: they’re trained by the bad guys.

[Background: A Bayesian spam filter uses human advice to learn how to recognize spam. A human classifies messages into spam and non-spam. The Bayesian filter assigns a score to each word, depending on how often that word appears in spam vs. non-spam messages. Newly arrived messages are then classified based on the scores of the words they contain. Words used mostly in spam, such as “Viagra”, get negative scores, so messages containing them tend to get classified as spam. Which is good, unless your name is Jose Viagra.]

Now let’s compare that to a snippet from Paul Graham’s A Plan for Spam, the document that introduced the word “bayesian” to so many of us:

Because it is measuring probabilities, the Bayesian approach considers all the evidence in the email, both good and bad. Words that occur disproportionately rarely in spam (like “though” or “tonight” or “apparently”) contribute as much to decreasing the probability as bad words like “unsubscribe” and “opt-in” do to increasing it. So an otherwise innocent email that happens to include the word “sex” is not going to get tagged as spam.

I appears that Felten, like many other people (me, for example), has made the mistake of viewing a Bayesian filter as something like a keyword filter on steroids. A story:

At one point I was using one of the popular open source Bayesian filters. I had it set up so that it wasn’t just marking “spam” and “ham,” but rather was categorizing all of my mail for me: tech/programming mail into one bucket, personal mail another, mailing lists a third, and so on. This worked well, and led to the brilliant idea of training my system to recognize a special “password” that people could include in their emails if they wanted to end up in my “priority” bucket.

This didn’t work well.

Why not? Well, a couple of reasons. The first reason is that I have a bunch of immature programmers as friends, so within fifteen minutes of sending out a note asking for email containing the word “avocado” so that I could train my system, there were no fewer than three scripts written that did nothing but send me email after email, all containing nothing but the word “avocado” repeated over and over. Ha, ha, you bastards.

The second reason that this didn’t work is that the whole point of a Bayesian style filter is that it’s looking at texts as a whole, not just individual words. Bayesian filters aren’t “trained by the bad guys,” they’re trained by the bad guys, your mother in law, your co-workers, your friends…they’re trained by everyone who sends you email. My avocado never did work very well, because a single word (“avocado”) was rarely, if ever, enough to change the overall character of a message. If the rest of the message content (good and bad) looked like my other “personal” messages, it would end up in the personal bucket; if it looked like my “social software” messages it ended up in that bucket.

The kind of attack that Ed Felten is imagining would be crippling if Bayesian filtering worked on a sort of “adaptive keyword” basis, picking out the messages with spam words and looking for new spam words to filter…but that’s just not the case. Let’s take Felten’s example of a spammer trying to poison the word “fahrenheit” prior to the release of the Michael Moore film:

You send me 50, 500, or 5000 spam messages containing “fahrenheit,” and that word has never before appeared in a message that I received. All of them get marked as spam due to the other spammy message content, which increases the spam potential of “fahrenheit.” Then a friend sends me a note with some thoughts on the movie fahrenheit 9/11 — will that message go into the spam folder? It could, but that’s not really likely. Because it’s been in n spam messages and no good ones, “fahrenheit” will have a really high spam potential, but there’s other content in the message: if it’s an email from your friend about a movie that they just saw, the other message content (your friend’s email address, your name, the words used in normal conversation, etc.) probably all has very low spam potential. The odds are that the “ham” potential of the 500 other words that your friend wrote will dramatically outweigh the “spam” potential of a single word and the message will make it into your inbox, which in turn reduces the spam potential of “fahrenheit.”

The whole idea of Bayesian filtering is to get away from this “one bad word poisons the message” sort of thinking. So forget about this one and go worry about google bombing or something.