{"id":67,"date":"2012-10-22T21:43:17","date_gmt":"2012-10-22T19:43:17","guid":{"rendered":"http:\/\/blog.le-vert.net\/?p=67"},"modified":"2012-10-22T21:43:17","modified_gmt":"2012-10-22T19:43:17","slug":"filter-hebrew-russian-chinese-spams-with-spamassassin","status":"publish","type":"post","link":"https:\/\/blog.le-vert.net\/?p=67","title":{"rendered":"Filter hebrew, russian, chinese&#8230; spams with SpamAssassin"},"content":{"rendered":"<div class=\"twttr_buttons\"><div class=\"twttr_twitter\">\n\t\t\t\t\t<a href=\"http:\/\/twitter.com\/share?text=Filter+hebrew%2C+russian%2C+chinese...+spams+with+SpamAssassin\" class=\"twitter-share-button\" data-via=\"\" data-hashtags=\"\"  data-size=\"default\" data-url=\"https:\/\/blog.le-vert.net\/?p=67\"  data-related=\"\" target=\"_blank\">Tweet<\/a>\n\t\t\t\t<\/div><\/div><p>Hi!<\/p>\n<p>According to several report from my users, it seems we were getting more and more spams written in some foreign languages.<br \/>\nDespite my good amavis\/spamassassin filtering system, all kind of bayesian filters are no-op and this spam usually comes from valid yahoo\/gmail\/others accounts aren&#8217;t reported to pyzor or dcc.<br \/>\nReal pain&#8230;<\/p>\n<p>The good news is nobody speaks hebrew around, so I can safely tag these mails as junk. Here a quick howto to enable this on Debian:<\/p>\n<p>Edit <strong>\/etc\/spamassassin\/v310.pre<\/strong> and uncomment the following line:<\/p>\n<pre>loadplugin Mail::SpamAssassin::Plugin::TextCat<\/pre>\n<p>Configure this new plugin from <strong>\/etc\/spamassassin\/local.cf<\/strong>:<\/p>\n<pre># SpamAssassin TextCat (Language Guesser Plugin)\r\n# http:\/\/spamassassin.apache.org\/full\/3.3.x\/doc\/Mail_SpamAssassin_Plugin_TextCat.html\r\n<strong>ok_languages en fr<\/strong> # I can't understand anything else than french or english\r\n<strong>inactive_languages ''<\/strong> # Enable all languages\r\n<strong>score UNWANTED_LANGUAGE_BODY 5<\/strong> # Increase score\r\n<strong>add_header all Languages _LANGUAGES_<\/strong>  # Write the detected langs in X-Spam-Languages<\/pre>\n<p>&#8220;ok_languages&#8221; contains only the lang I actually understand (french and english). You can add yours (see the commented URL for &#8220;language codes&#8221;).<br \/>\nThe second line enable all supported language. TextCat disable by default a couple of rare languages to save servers ressources, but honestly, who cares about CPU usage on servers nowadays&#8230;<br \/>\nThen, I increase the score to 5 (default is 2.8) and the last line add a X-Spam-Languages headers so I can check my spam\/ham to see which langs have been detected.<\/p>\n<p>However, amavis will rewrite all headers by his own and drop X-Spam-Languages.<br \/>\nSo, edit &#8220;\/etc\/amavis\/conf.d\/50-user&#8221; and add the following lines before &#8220;1;&#8221;:<\/p>\n<pre># Print X-Spam-Languages header from TextCat SpamAssassin plugin\r\n$allowed_added_header_fields{lc('X-Spam-Languages')} = 1;<\/pre>\n<p>This will ask amavis to keep this header from spamassasin. Please note, it won&#8217;t work unless you&#8217;re running amavis >= 2.7 !<\/p>\n<p>You may want to check than spamassasin can load the module fine:<\/p>\n<pre>user@server:~$ sudo spamassassin --lint -D 2>&1 | grep -i textcat\r\nOct 22 21:34:25.772 [17852] dbg: plugin: loading Mail::SpamAssassin::Plugin::TextCat from @INC\r\nOct 22 21:34:25.778 [17852] dbg: textcat: loading languages file...\r\nOct 22 21:34:25.885 [17852] dbg: textcat: loaded 73 language models\r\nOct 22 21:34:26.541 [17852] dbg: config: fixed relative path: \/var\/lib\/spamassassin\/3.003002\/updates_spamassassin_org\/25_textcat.cf\r\nOct 22 21:34:26.542 [17852] dbg: config: using \"\/var\/lib\/spamassassin\/3.003002\/updates_spamassassin_org\/25_textcat.cf\" for included file\r\nOct 22 21:34:26.543 [17852] dbg: config: read file \/var\/lib\/spamassassin\/3.003002\/updates_spamassassin_org\/25_textcat.cf\r\nOct 22 21:34:28.913 [17852] dbg: plugin: Mail::SpamAssassin::Plugin::TextCat=HASH(0xa0af534) implements 'extract_metadata', priority 0\r\nOct 22 21:34:28.915 [17852] dbg: textcat: classifying, skipping: ''\r\nOct 22 21:34:28.936 [17852] dbg: textcat: language possibly: en\r\nOct 22 21:34:28.937 [17852] dbg: textcat: X-Languages: \"en\", X-Languages-Length: 1342<\/pre>\n<p>Don&#8217;t worry about the &#8220;language possibly: en&#8221; line, it doesn&#8217;t mean anything (when using &#8211;lint spamassassin behaves like if it was processing a real mail).<\/p>\n<p>Restart amavis and enjoy !<\/p>\n<p>Here is what you should find in headers of a mail from your Junk mailbox soon:<\/p>\n<pre>X-Spam-Flag: YES\r\nX-Spam-Score: 14.858\r\nX-Spam-Level: **************\r\nX-Spam-Status: Yes, score=14.858 tagged_above=-999 required=6.31\r\n\ttests=[AWL=-2.517, BAYES_99=6, DCC_CHECK=2.5, HTML_MESSAGE=0.001,\r\n\tMPART_ALT_DIFF=0.79, NORMAL_HTTP_TO_IP=0.001, RCVD_IN_PSBL=2.7,\r\n\tRP_MATCHES_RCVD=-0.735, SPF_PASS=-0.5, T_KHOP_FOREIGN_CLICK=0.01,\r\n\t<strong>UNWANTED_LANGUAGE_BODY=5<\/strong>, URIBL_WS_SURBL=1.608] autolearn=no\r\n<strong>X-Spam-Languages: pt<\/strong><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Hi! According to several report from my users, it seems we were getting more and more spams written in some foreign languages. Despite my good amavis\/spamassassin filtering system, all kind of bayesian filters are no-op and this spam usually comes &hellip; <a href=\"https:\/\/blog.le-vert.net\/?p=67\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.le-vert.net\/index.php?rest_route=\/wp\/v2\/posts\/67"}],"collection":[{"href":"https:\/\/blog.le-vert.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.le-vert.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.le-vert.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.le-vert.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=67"}],"version-history":[{"count":5,"href":"https:\/\/blog.le-vert.net\/index.php?rest_route=\/wp\/v2\/posts\/67\/revisions"}],"predecessor-version":[{"id":72,"href":"https:\/\/blog.le-vert.net\/index.php?rest_route=\/wp\/v2\/posts\/67\/revisions\/72"}],"wp:attachment":[{"href":"https:\/\/blog.le-vert.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=67"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.le-vert.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=67"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.le-vert.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=67"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}