Hi!
According to several report from my users, it seems we were getting more and more spams written in some foreign languages.
Despite my good amavis/spamassassin filtering system, all kind of bayesian filters are no-op and this spam usually comes from valid yahoo/gmail/others accounts aren’t reported to pyzor or dcc.
Real pain…
The good news is nobody speaks hebrew around, so I can safely tag these mails as junk. Here a quick howto to enable this on Debian:
Edit /etc/spamassassin/v310.pre and uncomment the following line:
1 |
loadplugin Mail::SpamAssassin::Plugin::TextCat |
Configure this new plugin from /etc/spamassassin/local.cf:
1 2 3 4 5 6 |
# SpamAssassin TextCat (Language Guesser Plugin) # http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Plugin_TextCat.html <strong>ok_languages en fr</strong> # I can't understand anything else than french or english <strong>inactive_languages ''</strong> # Enable all languages <strong>score UNWANTED_LANGUAGE_BODY 5</strong> # Increase score <strong>add_header all Languages _LANGUAGES_</strong> # Write the detected langs in X-Spam-Languages |
“ok_languages” contains only the lang I actually understand (french and english). You can add yours (see the commented URL for “language codes”).
The second line enable all supported language. TextCat disable by default a couple of rare languages to save servers ressources, but honestly, who cares about CPU usage on servers nowadays…
Then, I increase the score to 5 (default is 2.8) and the last line add a X-Spam-Languages headers so I can check my spam/ham to see which langs have been detected.
However, amavis will rewrite all headers by his own and drop X-Spam-Languages.
So, edit “/etc/amavis/conf.d/50-user” and add the following lines before “1;”:
1 2 |
# Print X-Spam-Languages header from TextCat SpamAssassin plugin $allowed_added_header_fields{lc('X-Spam-Languages')} = 1; |
This will ask amavis to keep this header from spamassasin. Please note, it won’t work unless you’re running amavis >= 2.7 !
You may want to check than spamassasin can load the module fine:
1 2 3 4 5 6 7 8 9 10 11 |
user@server:~$ sudo spamassassin --lint -D 2>&1 | grep -i textcat Oct 22 21:34:25.772 [17852] dbg: plugin: loading Mail::SpamAssassin::Plugin::TextCat from @INC Oct 22 21:34:25.778 [17852] dbg: textcat: loading languages file... Oct 22 21:34:25.885 [17852] dbg: textcat: loaded 73 language models Oct 22 21:34:26.541 [17852] dbg: config: fixed relative path: /var/lib/spamassassin/3.003002/updates_spamassassin_org/25_textcat.cf Oct 22 21:34:26.542 [17852] dbg: config: using "/var/lib/spamassassin/3.003002/updates_spamassassin_org/25_textcat.cf" for included file Oct 22 21:34:26.543 [17852] dbg: config: read file /var/lib/spamassassin/3.003002/updates_spamassassin_org/25_textcat.cf Oct 22 21:34:28.913 [17852] dbg: plugin: Mail::SpamAssassin::Plugin::TextCat=HASH(0xa0af534) implements 'extract_metadata', priority 0 Oct 22 21:34:28.915 [17852] dbg: textcat: classifying, skipping: '' Oct 22 21:34:28.936 [17852] dbg: textcat: language possibly: en Oct 22 21:34:28.937 [17852] dbg: textcat: X-Languages: "en", X-Languages-Length: 1342 |
Don’t worry about the “language possibly: en” line, it doesn’t mean anything (when using –lint spamassassin behaves like if it was processing a real mail).
Restart amavis and enjoy !
Here is what you should find in headers of a mail from your Junk mailbox soon:
1 2 3 4 5 6 7 8 9 |
X-Spam-Flag: YES X-Spam-Score: 14.858 X-Spam-Level: ************** X-Spam-Status: Yes, score=14.858 tagged_above=-999 required=6.31 tests=[AWL=-2.517, BAYES_99=6, DCC_CHECK=2.5, HTML_MESSAGE=0.001, MPART_ALT_DIFF=0.79, NORMAL_HTTP_TO_IP=0.001, RCVD_IN_PSBL=2.7, RP_MATCHES_RCVD=-0.735, SPF_PASS=-0.5, T_KHOP_FOREIGN_CLICK=0.01, <strong>UNWANTED_LANGUAGE_BODY=5</strong>, URIBL_WS_SURBL=1.608] autolearn=no <strong>X-Spam-Languages: pt</strong> |
Hi!
This saved my day 🙂
May I still ask how spamassassin scores a mail too short to detect language?
Or more generally a mail where it cannot determine the language.
Thanks!
Hi,
I’m not sure to understand your question. If spamassassin can’t detect any language, this feature will just be no-op !
Great post, thank you! Works like a charm.
Amazing things here. I am very happy tto look your post.
Thank you so much and I am looking forward to touch you.
Will you kindly drop me a mail?
Why is that ? You can reach me at gandalf@le-vert.net