One of the tasks I've taken on over the years is to go through the spam mail of people who no longer work at my agency and use them to make our filtering better.

What most people don't realize is how the great majority of the spam that an organization gets (at least any organization that I've ever seen) comes to employees that no longer work there.  This is just the reality of Spam, the spammers never give up and since their messages weren't asked for in the first place they have no idea when the person they are sending to has left. 

So once you get to the point where the number of employees who have left is greater than the number of employees that are currently employed you start to see these "ghost accounts" getting the majority of your spam.

As a policy we forward someone's mail to their replacement and/or boss for the first 6 months and then relegate it to a spam account after that (the spam account has an "out of office" type message saying that employee no longer works with us).  From there these accounts continue to receive mail for, well, I assume forever.

At this point the spam account receives a message every 3 seconds or so.

The one advantage to this otherwise annoying situation is that it provides you with a good way to (a) detect spam and (b) test your filters reliability.  This is more helpful than you might think because spam filtering is almost impossible to gauge with existing users. 

I've always likened configuration of a spam filter to someone handing you a box that contains something you know how to fix.  They then seal up the box and cut two holes in the side just big enough for your hands to get through and tell you to stick your hands into the box and fix the thing. 

That's basically what spam filtering is.  I can see what's going on outside but I can't (ethically not technically) look into the user's mail box so I'm forced to guess my way through to an effective strategy. 

But by redirecting all these old accounts to a central Spam folder I can go through them and try to refine the spam filtering.  Usually what I do is to use the keyword filtering to increase our ability to catch spam.  This works in two ways...

  1. It allows me to see the various euphemisms for sex and male genitalia and block mail with them in it (Baysian Filtering can be surprisingly dense).  So I block all the "se>.<u@1s" and all the "organ/instrument/device/participle" talk.  All of which increases our catch level tremendously.
  2. Even more effective it allows me to block the URLs.  In the end every spam message is trying to get you to go somewhere and in every one of those situations that URL is something that can be blocked.  At this point I must have a collection of 2000+ bogus urls that we block in e-mails.  One trick here though, most spammers will put the url at the end of a sentence because a text filter sees a difference between "http://www.fakeurl.com" and "http://www.fakeurl.com." So you'll see them cycle through with  "http://www.fakeurl.com!" and "http://www.fakeurl.com?"  To be effective you have to make sure you get every one.

There are other tricks that you pick up as you go along but those two alone will get you pretty far.  At this point my keyword checker outpaces the Baysian Filter by about 4-to-1.

One caution though, don't go overboard.  There's no legit reason for an e-mail to have "Sexual Organ" in it but there are plenty of reasons for just the word "Organ".  Be specific as possible or you'll end up costing users more than you benefit them.