Filter hebrew, russian, chinese… spams with SpamAssassin

Hi!

According to several report from my users, it seems we were getting more and more spams written in some foreign languages.
Despite my good amavis/spamassassin filtering system, all kind of bayesian filters are no-op and this spam usually comes from valid yahoo/gmail/others accounts aren’t reported to pyzor or dcc.
Real pain…

The good news is nobody speaks hebrew around, so I can safely tag these mails as junk. Here a quick howto to enable this on Debian:

Edit /etc/spamassassin/v310.pre and uncomment the following line:

Configure this new plugin from /etc/spamassassin/local.cf:

“ok_languages” contains only the lang I actually understand (french and english). You can add yours (see the commented URL for “language codes”).
The second line enable all supported language. TextCat disable by default a couple of rare languages to save servers ressources, but honestly, who cares about CPU usage on servers nowadays…
Then, I increase the score to 5 (default is 2.8) and the last line add a X-Spam-Languages headers so I can check my spam/ham to see which langs have been detected.

However, amavis will rewrite all headers by his own and drop X-Spam-Languages.
So, edit “/etc/amavis/conf.d/50-user” and add the following lines before “1;”:

This will ask amavis to keep this header from spamassasin. Please note, it won’t work unless you’re running amavis >= 2.7 !

You may want to check than spamassasin can load the module fine:

Don’t worry about the “language possibly: en” line, it doesn’t mean anything (when using –lint spamassassin behaves like if it was processing a real mail).

Restart amavis and enjoy !

Here is what you should find in headers of a mail from your Junk mailbox soon: