Centreon “log” table getting insanely huge

Hi there,

I’m currently migrating some old Centreons 2.5/2.6 with Nagios/NDO to Centreon 2.7 with Centreon-Engine/Centreon-broker but I’m experiencing some issues with insanely large MySQL tables to migrate:

root@server:~# ls -lah /var/lib/mysql/centreon_storage/log.*
-rw-rw---- 1 mysql mysql  13K Apr 15  2015 /var/lib/mysql/centreon_storage/log.frm
-rw-rw---- 1 mysql mysql  16G Dec  8 09:18 /var/lib/mysql/centreon_storage/log.MYD
-rw-rw---- 1 mysql mysql 6.0G Dec  8 09:18 /var/lib/mysql/centreon_storage/log.MYI

This table contains old Nagios logs and according to a forum post it’s being use when clicking on Monitoring > Event logs and is used when doing reporting actions.
Fair enough, I don’t mind anyway of what happened last year, reporting is done on a monthly basis.

So let’s see what is the oldest entry there:

root@server:~# echo 'SELECT FROM_UNIXTIME(ctime) FROM log ORDER BY ctime ASC LIMIT 1' | mysql -N centreon_storage
2015-06-14 19:19:00

Sadly, it’s using unix timestamp and not MySQL datetime format, so we’ll have to do some conversion to get it humanely-readable.
To be honest, when I started the cleanup the oldest entry was even older.

I’m not sure if Centreon is supposed to clean this out. I guess it does, probably using one of the various cron jobs installed by Centreon but according to my experience this is highly borked and can surely lead to uncleaned entries.

Let’s validate we’re not going to delete bad entries by running a select first

root@server:~# echo 'SELECT FROM_UNIXTIME(ctime), ctime, output FROM log WHERE ctime < UNIX_TIMESTAMP("2016-06-08 00:00:00") LIMIT 5' | mysql -N centreon_storage
2015-06-14 19:19:00	1434309540	Max concurrent service checks (200) has been reached.  Nudging server1:traffic_eth0 by 11 seconds...
2015-06-14 19:19:00	1434309540	Max concurrent service checks (200) has been reached.  Nudging server1:Ping by 7 seconds...
2015-06-14 19:19:00	1434309540	Max concurrent service checks (200) has been reached.  Nudging server2:Memory by 12 seconds...
2015-06-14 19:19:00	1434309540	Max concurrent service checks (200) has been reached.  Nudging server3:Processor by 6 seconds...
2015-06-14 19:19:01	1434309541	Max concurrent service checks (200) has been reached.  Nudging server3:Memory by 10 seconds...

Looks okay. Be sure to compare "ctime" and the converted date and play with the WHERE condition so you can be sure it's really working properly.
For instance, if you swap "2016-06-08 00:00:00" with "2015-06-14 19:19:01" the last line should disappear.

Once you've confirmed your not deleting anything useful, go ahead with a DELETE statement:

root@server:~# time echo 'DELETE FROM log WHERE ctime < UNIX_TIMESTAMP("2016-06-08 00:00:00") LIMIT 1000000' | mysql -N centreon_storage

real	0m51.884s
user	0m0.000s
sys	0m0.008s

I decided to use LIMIT here, to avoid loading too much the server for an unknown time. "time" command has been added here so you can have a measurement of the time required to delete 1 000 000 entries (52s here).

You can now recheck the oldest log you have now:
root@server:~# echo 'SELECT FROM_UNIXTIME(ctime) FROM log ORDER BY ctime ASC LIMIT 1' | mysql -N centreon_storage
2015-06-19 21:29:54

It seems it'll be a long way to go before getting to june, 2016 😉

Bonus:
All in one command, so you just have too check your term when coming back from the coffee machine to see its progress:

root@server:~# while true; do echo 'DELETE FROM log WHERE ctime < UNIX_TIMESTAMP("2016-06-08 00:00:00") LIMIT 100000' | mysql -N centreon_storage && echo 'SELECT FROM_UNIXTIME(ctime) FROM log ORDER BY ctime ASC LIMIT 1' | mysql -N centreon_storage && sleep 2; done
2015-06-21 01:47:32
2015-06-21 10:59:55
2015-06-21 19:57:21
2015-06-22 04:58:59
[...]

Nginx SSL vhosting using Server Name Indication

Here is the issue: I have a tcp/443 DNAT to a specific machine running some specific HTTPS app that does not work behind a reverse proxy.

Obviously, I want to run others application on 443 and I’m not allowed to get any other port.

Sounds pretty bad, right ?
Actually, there’s a way out and it’s called “nginx-is-so-fuckin-powerfull” 😉

As you may know, a long time ago a feature has been added to TLS which is called “Server Name Indication”. Before this it was impossible to serve multiple virtual hosts on a single address because SSL session was negociated before the client actually sends the requested vhost name.

With SNI, there’s a quick chat between your HTTPS server and the remote browser, something like:

- Client: hey I'm an HTTPS client
- Server: Ok, which server ?
- Client: blog.le-vert.net
- Server: Serving blog.le-vert.net certificate...
- Client: #*/-[}$$ (start talking SSL)

Ok that’s probably not really accurate but who cares about what exactly happens. The thing is: there’s a routing capability before serving the SSL certificate and we know the requested domain name at this point; and guess what: NGINX offers routing possibility using SNI name !!

First thing… You need a really really new NGINX version (1.11.5), but if your distro doesn’t have it you can use NGINX repositories.
Second, you must understand that very old clients may not use SNI. If it doesn’t it will hit the NGINX default vhost. So make sure to keep the old behavior as default, just in case.
Here is the client compatibility list for SNI: https://en.wikipedia.org/wiki/Server_Name_Indication
I leave it to you to decide if you care about handling Internet Explorer < 7. So let's configure NGINX correctly: You need to define a stream {} section on nginx.conf top, just like the http one.

stream {
    include /etc/nginx/stream.conf.d/*.conf;
}

Of course, you need to disable default http/server to listen on port 443 (comment lines like “listen 443 ssl” in all your existing configuration).

Now we’ll create a stream server, which is a plain TCP proxy:
In /etc/nginx/stream.conf.d/443.conf:

map $ssl_preread_server_name $name {
    default original_dest;
    new.hostname.com local_https;
}

upstream original_dest {
    server 1.2.3.4:443;
}

upstream local_https {
    server 127.0.0.1:8443;
}

log_format stream_routing '$remote_addr [$time_local] '
                          'with SNI name "$ssl_preread_server_name" '
                          'proxying to "$name" '
                          '$protocol $status $bytes_sent $bytes_received '
                          '$session_time';

server {
    listen 443;
    ssl_preread on;
    proxy_pass $name;
    access_log /var/log/nginx/stream_443.log stream_routing;
}

And that’s it 😀

You can now create a new http/server instance on port 8443 to serve your different new https vhosts but I suggest starting with the default virtual host (/etc/nginx/conf.d/default.conf) by adding “listen 8443 ssl default_server” and some ssl cert and key directives.

Here is a example of the stream_443.log:

192.168.0.100 [01/Dec/2016:11:16:53 +0100] with SNI name "" proxying to "original_dest" TCP 200 3135 1161 10.256
192.168.0.100 [01/Dec/2016:11:17:56 +0100] with SNI name "new.hostname.com" proxying to "local_https" TCP 200 1467 747 0.070
192.168.0.100 [01/Dec/2016:11:18:12 +0100] with SNI name "new.hostname.com" proxying to "local_https" TCP 200 16505 1365 16.178
192.168.0.100 [01/Dec/2016:11:18:15 +0100] with SNI name "local.server.hostname" proxying to "original_dest" TCP 200 2461 557 25.59

Nice work NGINX, as usual !

Going further:
There’s just a little issue here: The real HTTPS on port 8443 will always see incoming IP address as 127.0.0.1. Howerver, there’s an overhead called “proxy_protocol” that can help passing proxying related things between NGINX servers but my equipment running behind doesn’t like this.

So the idea here is to use proxy_protocol between my stream/443 and http/8443 instances and strip it when proxying to original_dest using a dummy stream server that does nothing else that popping out the proxy_protocol data and forwarding to the real server. Then I will restore remote_addr in http/8443.

The new config file is now:

map $ssl_preread_server_name $name {
    default original_dest;
    new.hostname.com local_https;
}

upstream original_dest {
    # Forward to a dummy server to strip out proxy_protocol
    # Otherwise original_dest won't work
    server 127.0.0.1:8080;
}

upstream local_https {
    server 127.0.0.1:8443;
}

log_format stream_routing '$remote_addr [$time_local] '
                          'with SNI name "$ssl_preread_server_name" '
                          'proxying to "$name" '
                          '$protocol $status $bytes_sent $bytes_received '
                          '$session_time';
server {
    listen 443;
    ssl_preread on;
    proxy_pass $name;
    proxy_protocol on;
    access_log /var/log/nginx/stream_443.log stream_routing;
}

# Dummy server to strip out proxy_protocol before sending to original_dest
server {
    listen 8080 proxy_protocol ;
    proxy_pass 1.2.3.4:443;
}

In the http/8443 vhost, we set the following to restore original client IP address:

listen 8443 default_server proxy_protocol ssl;
set_real_ip_from 127.0.0.1/32;
real_ip_header proxy_protocol;

Nginx -_-

Bonus stuff:

I case you’re having issue with SELinux (and you will, for instance it will deny NGINX to start a connection from port 8080 to a remote host), you can use the following to extract failures from audit.log and turn them into a permanent SELinux exception

tail -n 2 /var/log/audit/audit.log (you may want to get more or less lines, depending of what you see happening)
tail -n 2 /var/log/audit/audit.log |audit2allow -m nginx_proxy_connect (create a plain text SELinux rule, so you can see what's going to be done)
tail -n 2 /var/log/audit/audit.log |audit2allow -M nginx_proxy_connect (create the real SELinux rule)
semodule -i nginx_proxy_connect.pp (install the rule)

Disable HiLink mode and force tty modem on NEW Huawei E3272

There’s plenty of documentation on Internet related to this issue but none of them works with recents firmware. They all talk about using the embedded web interface and force serial mode through some call and then send an AT command to choose default mode.
It’s not working ANYMORE on 22.470.07.00.00 firmware.

And sorry, you’ll need a Windows computer for this… (probably a clean pre-Windows 8 one)

First you need to confirm that your modem is actually working correctly in HiLink mode.
Plug it and wait for the browser to open automatically:
00_HiLink_web_interface

You should confirm from device manager that there’s a new NDIS network interface
01_NDIS_network_interface

Run E3272s_Update_21.420.07.00.00.exe which is a firmware installer containing an older version that permits default mode change
02_Firmware_downgrade

After a while it will fail with the error below. The firmware updater turned the device into serial mode but there’s no driver available
04_Device_not_found

Confirm from device manager that there’re some unknown devices
05_Unrecognized_devices

Install Mobile Partner from Huawei and fix the driver file because it doesn’t contain the IDs for this device
02_Install_Mobile_Partner

Go to C:\Program Files (x86) \Mobile Partner\Driver\Driver\X64 (for 64 bits system)
and edit ewser2k.inf file.

In the [QcomSerialPort.NTamd64], add the two following lines

%QcomDevice00% = QportInstall01, USB\VID_12d1&PID_1442&MI_00
%QcomDevice01% = QportInstall00, USB\VID_12d1&PID_1442&MI_01

Now go back to device manager and update driver by choosing the path containing the inf file
06_Update_driver_locally
07_Select_directory

If you get this error, you need to disable driver signature verification first (google for it).
BE SURE TO RESTART FIRMWARE UPDATER BEFORE TRYING TO FIX THE DRIVERS AGAIN OTHERWISE IT WONT BE TURNED INTO SERIAL MODE.
09_Driver_installing

After a successful installation you should now see two additional COM ports
10_Driver_installed_succesfully

Start the firmware updater and wait a bit
11_Upgrade_step_2

On my Windows 8.1 computer it gets stuck here and fails with an error but it worked correctly on Windows 7…

Here is what you should see if it’s working correctly
00_Upgrade_in_progress

Finally, the success message saying you firmware has been downgraded to 21.xx
00_Upgrade_succeeded

Now we have access to the serial port and we’ll have to issue a few AT command to set a new default mode. Find the COM port used by your modem now
00_COM_device

And start Putty on it
00_Putty_connection

Now we can send a few command (press Enter key at the end)

AT: Will reply "OK", it means your actually talking to someone understanding AT commands
AT^FHVER: Confirm you are running firmware 21.xx
AT^SETPORT?: Show current modem default config
AT^SETPORT=?: Display available modes
AT^SETPORT="FF;10,12": Enable diag interface and classic serial based modem emulation (this is what we need to use with wvdial)
AT^RESET: Restart the modem

Screenshot below are a bit wrong: I used AT^SETPORT=”FF;12,10″ instead of AT^SETPORT=”FF;10,12″ so the modem is on ttyUSB1 instead of ttyUSB0 !

Here you can see my AT session (please note that AT^SETPORT? won’t refresh until the modem is restarted)
00_Putty_session

After issuing AT^RESET the COM id will change (probably increased by 1), you can restart Putty and check default mode is now the one expected.
00_Putty_after_reset

You can now restart Linux and enjoy the stick being detected correctly now:

Aug 18 22:58:23 thrall kernel: [ 283.080966] usb 5-1.2: new high-speed USB device number 5 using xhci_hcd
Aug 18 22:58:23 thrall kernel: [ 283.173491] usb 5-1.2: New USB device found, idVendor=12d1, idProduct=1506
Aug 18 22:58:23 thrall kernel: [ 283.173496] usb 5-1.2: New USB device strings: Mfr=2, Product=1, SerialNumber=0
Aug 18 22:58:23 thrall kernel: [ 283.173497] usb 5-1.2: Product: HUAWEI Mobile
Aug 18 22:58:23 thrall kernel: [ 283.173499] usb 5-1.2: Manufacturer: HUAWEI Technology
Aug 18 22:58:23 thrall kernel: [ 283.184269] usbcore: registered new interface driver usbserial
Aug 18 22:58:23 thrall kernel: [ 283.184280] usbcore: registered new interface driver usbserial_generic
Aug 18 22:58:23 thrall kernel: [ 283.184287] usbserial: USB Serial support registered for generic
Aug 18 22:58:23 thrall kernel: [ 283.186411] usbcore: registered new interface driver option
Aug 18 22:58:23 thrall kernel: [ 283.186422] usbserial: USB Serial support registered for GSM modem (1-port)
Aug 18 22:58:23 thrall kernel: [ 283.186513] option 5-1.2:1.0: GSM modem (1-port) converter detected
Aug 18 22:58:23 thrall kernel: [ 283.186597] usb 5-1.2: GSM modem (1-port) converter now attached to ttyUSB0
Aug 18 22:58:23 thrall kernel: [ 283.186613] option 5-1.2:1.1: GSM modem (1-port) converter detected
Aug 18 22:58:23 thrall kernel: [ 283.186656] usb 5-1.2: GSM modem (1-port) converter now attached to ttyUSB1

Modem is on /dev/ttyUSB0.

Bonus stuff:

Udev rules that will create /dev/gsm0 (in case you have other /dev/ttyUSBx):

SUBSYSTEM=="tty", ATTRS{idVendor}=="12d1", ATTRS{idProduct}=="1506", SYMLINK+="gsm%n"

And a working wvdial configuration (PIN code disabled, POST.lu APN so you probably want to change this, no user, no password):

[Dialer Defaults]
Init1 = ATZ
Init2 = AT+CGDCONT=1,"IP","web.pt.lu"
Stupid Mode = 1
MessageEndPoint = "0x01"
Modem Type = Analog Modem
ISDN = 0
Phone = *99#
Modem = /dev/gsm0
Username = { }
Password = { }
Baud = 460800
Auto Reconnect = on

Finally, a systemd service file with autorestart

[Unit]
Description=wvdial

[Service]
Type=simple
ExecStart=/usr/bin/wvdial
RestartSec=2
Restart=always

[Install]
WantedBy=multi-user.target

Fixing non-working iDrac on PowerEdge server (R610)

It seems Dell released a couple of servers with a broken embedded iDirac.
Actually the issue comes from the on-board Broadcom ethernet chip which is not configured correctly: http://permalink.gmane.org/gmane.linux.hardware.dell.poweredge/42033

Spot the issue

Here is how to confirm your issue is related to this bug and not something else. Boot the server and press CTRL+E to get into the iDrac BIOS. Select the network submenu and check the Active LOM entry. LOM stands for LAN On Motherboard.

If it says No Active LOM even if you selected Shared above, it means the iDrac is unable to bind on any on-board LAN, this means you are having this issue.

01_broken_lom

Then, we’ll create a DOS-based floppy disk image containing some Broadcom firmware related tools that will reconfigure the embedded network controller so it can be use for the iDrac board.

Create a PXE bootable disk image with Broadcom utilities

Download Bcom_LAN_14.2.x_DOSUtilities_A03.exe from http://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=29DKK and get a terminal in the download directory.

We will now dowload a FreeDOS disk image (that can be PXE booted) and we’ll add the required tools in the image.


wget http://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/distributions/1.0/fdboot.img
mkdir mount
sudo mount -t vfat -o loop fdboot.img mount/

unzip Bcom_LAN_14.2.x_DOSUtilities_A03.exe
sudo cp ./Userdiag/NetXtremeII/uxdiag.exe mount/

sudo sh -c 'echo uxdiag -t abcd -mfw 1 > mount/idrac.bat'

sudo umount mount

mv fdboot.img fdboot-fix-poweredge-idrac.img

Now we have a FreeDOS containing Broadcom uxdiag tool as well as a idrac.bat script that will start the required command.

Copy the img file to your PXE server and set the following to start it with PXELinux (pxelinux.cfg/default):

LABEL fix-idrac
KERNEL memdisk
APPEND initrd=fdboot-fix-poweredge-idrac.img

If you don’t have memdisk binary it can be found in package syslinux-common.

Then you can restart your server and trigger a PXE boot. Once FreeDOS starts, select the Safe Mode entry (I had some issue of memory being full when using another entry).

02_freedos_booting

Then, type idract.bat to start the batch script we added inside the disk image:

03_run_script

Broadcom tools will run for a couple of seconds and output something like this:

04_uxdiag_processing

Restart the server and hit CTRL+E to get inside the iDrac again; it’s now binding on LOM1 aka the ethernet port with label “1”:

05_fixed_lom

Master-master simple email server with Dovecot

The purpose of this article is to explain how to create an hight availability email server with Dovecot.
We will use internal plain text files as users backend but it can of course easily be extended to use LDAP or SQL, but this article won’t cover this setup.

Install required packages

On both servers we’ll install dovecot as well as the POP3 and IMAP backends

apt-get install dovecot-core dovecot-imapd dovecot-pop3d

To use dovecot clustering feature, known as dsync, we need dovecot 2.2 or later. Debian Jessie’s version is ok.

Setup file-based users database

Edit /etc/dovecot/conf.d/auth-passwdfile.conf.ext and set both userdb and passworddb like this:

passdb {
  driver = passwd-file
  args = scheme=PLAIN username_format=%u /etc/dovecot/users
}

userdb {
  driver = passwd-file
  args = username_format=%u /etc/dovecot/users
  default_fields = uid=vmail gid=mail home=/srv/vmail/%u
}

I will use plaintext clear password here because I really want to be able to read the users from the configuration file directly. You can of course use an encrypted format, see Dovecot documentation.

The file /etc/dovecot/users will contains the users accounts and we’ll deliver all emails using paths like /srv/vmail/user@domain.com.
Dovecot is set up to always use the vmail user with mail group to avoid uid/gids madness.

First I tried to create a multi-domain setup, using “username_format=%n /etc/dovecot/%d/users” and “default_fields = uid=vmail gid=mail home=/srv/vmail/%d/%n” but current master/master plugin is unable to handle such configuration (Error: passwd-file: User iteration isn’t currently supported with %variable paths) so I decided to use a single authentication file using email as login (%u instead of %n).

We need to create the system user for dovecot:

adduser --system --ingroup mail --uid 500 vmail --home /srv/vmail

Now we need to enable this backend by commenting auth-system and un-commenting auth-passwdfile from /etc/dovecot/conf.d/10-auth.conf

#!include auth-system.conf.ext
#!include auth-sql.conf.ext
#!include auth-ldap.conf.ext
!include auth-passwdfile.conf.ext
#!include auth-checkpassword.conf.ext
#!include auth-vpopmail.conf.ext
#!include auth-static.conf.ext

Configure Postfix to use Dovecot as delivery agent

In /etc/postfix/master.cf add the following section:

# Dovecot LDA 
dovecot    unix  -       n       n       -       -       pipe
  flags=DRhu user=vmail:mail argv=/usr/lib/dovecot/dovecot-lda -f ${sender} -a ${original_recipient} -d ${user}@${nexthop}

Then run the following command to make sure Postfix is configured correctly (postconf is a command that will edit main.cf config file):

postconf -e "myhostname=`hostname -f`"
postconf -e "mydestination=`hostname -f`, `hostname -s`.localdomain, `hostname -s`, localhost.`hostname -d`, localhost.localdomain, localhost"

Please MAKE SURE your /etc/hosts and /etc/hostname are configured correctly !
The following commands should return short/full/domain names:

hostname -s
hostname -f
hostname -d

Now we’ll enable Dovecot LDAP and enable our mail domain:

postconf -e virtual_transport="dovecot"
postconf -e dovecot_destination_recipient_limit=1
postconf -e virtual_mailbox_domains=domain.com

Additional Dovecot config

In /etc/dovecot/conf.d/10-mail.conf set

mail_location = maildir:~/Maildir

It will deliver emails in Maildir format like this: /srv/vmail/user@domain.com/Maildir

In /etc/dovecot/conf.d/10-auth.conf we’ll enable plain text login because we don’t care about SSL and stuff (non-encrypted auth is disabled for any host except localhost by default):

disable_plaintext_auth = no

Create first user and try it

Create /etc/dovecot/users with the following content:

test@domain.com:{plain}testpassword::::

And secure the file permissions:

chown root:dovecot /etc/dovecot/users
chmod 640 /etc/dovecot/users

Finally restart dovecot, postfix and send a test email:

systemctl restart dovecot
systemctl restart postfix
echo test | mail -s test test@domain.com && tail -f -n 20 /var/log/syslog

You should see something like this in the logs:

Mar 29 10:16:40 smtp1 postfix/pickup[26046]: 0620580AE772: uid=0 from=
Mar 29 10:16:40 smtp1 postfix/cleanup[26052]: 0620580AE772: message-id=<20160329101640.0620580AE772@smtp1.service.domain.com>
Mar 29 10:16:40 smtp1 postfix/qmgr[26047]: 0620580AE772: from=, size=339, nrcpt=1 (queue active)
Mar 29 10:16:40 smtp1 dovecot: lda(test@domain.com): msgid=<20160329101640.0620580AE772@smtp1.service.domain.com>: saved mail to INBOX
Mar 29 10:16:40 smtp1 postfix/pipe[26055]: 0620580AE772: to=, relay=dovecot, delay=0.04, delays=0.02/0.01/0/0.02, dsn=2.0.0, status=sent (delivered via dovecot service)
Mar 29 10:16:40 smtp1 postfix/qmgr[26047]: 0620580AE772: removed

The key part here is dovecot: lda(test@domain.com): msgid=: saved mail to INBOX.

We can now check what happened on the filesystem:

root@smtp1.service.domain.com:~# find /srv/vmail/
/srv/vmail/
/srv/vmail/test@domain.com
/srv/vmail/test@domain.com/Maildir
/srv/vmail/test@domain.com/Maildir/cur
/srv/vmail/test@domain.com/Maildir/new
/srv/vmail/test@domain.com/Maildir/new/1459247005.M110518P26261.smtp1.service.domain.com,S=412,W=423
/srv/vmail/test@domain.com/Maildir/tmp
/srv/vmail/test@domain.com/Maildir/dovecot.index.log
/srv/vmail/test@domain.com/Maildir/dovecot-uidvalidity.56fa579d
/srv/vmail/test@domain.com/Maildir/dovecot-uidvalidity
/srv/vmail/test@domain.com/Maildir/dovecot-uidlist
/srv/vmail/test@domain.com/Maildir/dovecot.index.cache

Now we can test IMAP login will the following transcript using telnet:

telnet 127.0.0.1 143
. LOGIN test@domain.com testpassword
. EXAMINE INBOX
. FETCH 1 BODY[]
. LOGOUT

You should see the message body containing “test”. If so, we now have a fully working email server.

Enable doveadm service and replication plugin

Create a new file /etc/dovecot/local.conf with the following content:

# Doveadm (used by sync service)
service doveadm {
  inet_listener {
    # any port you want to use for this:
    port = 2727
  }
}

doveadm_port = 2727
doveadm_password = mysecretpasswordsharedamongservers

# Fix permissions for vmail user
service aggregator {
  fifo_listener replication-notify-fifo {
    user = vmail
    group = root
    mode = 0660
  }
  unix_listener replication-notify {
    user = vmail
    group = root
    mode = 0660
  }
}

Then we’ll configure the peer address for replication plugin in /etc/dovecot/conf.d/90-plugin.conf:

plugin {
  mail_replica = tcp:5.6.7.8:2727
}

Now we will globally enable the replication plugin as well as the notify one (required), in /etc/dovecot/conf.d/10-mail.conf:

mail_plugins = notify replication

And that’s it… Yes, really, we’re done here !

Replicate config to secondary server

Here is my synchronisation script

#!/bin/sh

me="1.2.3.4"
peer="5.6.7.8"

# Postfix
rsync -avz --delete /etc/postfix/ root@${peer}:/etc/postfix/
ssh root@${peer} 'postconf -e "mydestination=`hostname -f`, `hostname -s`.localdomain, `hostname -s`, localhost.`hostname -d`, localhost.localdomain, localhost"'
ssh root@${peer} 'postconf -e "myhostname=`hostname -f`"'
rsync -vz /etc/aliases root@${peer}:/etc/aliases
ssh root@${peer} newaliases
systemctl restart postfix
ssh root@${peer} systemctl restart postfix
sleep 1
ssh root@${peer} systemctl status postfix

# Dovecot
rsync -avz --delete /etc/dovecot/ root@${peer}:/etc/dovecot/
ssh root@${peer} "sed -i \"s|mail_replica = tcp:${peer}|mail_replica = tcp:${me}|\" /etc/dovecot/conf.d/90-plugin.conf"
systemctl restart dovecot
ssh root@${peer} systemctl restart dovecot
sleep 1
ssh root@${peer} systemctl status dovecot

Basically it sync the whole Postfix and Dovecot postfix, replace the hostname by the secondary server one in Postfix configuration and change the address in Dovecot’s mail_replica setting.

You can now run echo test | mail -s test test@domain.com on both server and check that both filesystems are updated with all emails 🙂

root@smtp1.service.domain.com:~# find /srv/vmail/
/srv/vmail/
/srv/vmail/test@domain.com
/srv/vmail/test@domain.com/Maildir
/srv/vmail/test@domain.com/Maildir/cur
/srv/vmail/test@domain.com/Maildir/new
/srv/vmail/test@domain.com/Maildir/new/1459247005.M110518P26261.smtp1.service.domain.com,S=412,W=423
/srv/vmail/test@domain.com/Maildir/new/1459248844.M932607P26622.smtp1.service.domain.com,S=412,W=423
/srv/vmail/test@domain.com/Maildir/new/1459249870.M304816P27522.smtp1.service.domain.com,S=412,W=423
/srv/vmail/test@domain.com/Maildir/new/1459250003.M334397P27770.smtp1.service.domain.com,S=412,W=423
/srv/vmail/test@domain.com/Maildir/new/1459250051.M437424P14567.smtp2.service.domain.com,S=436,W=447
/srv/vmail/test@domain.com/Maildir/tmp
/srv/vmail/test@domain.com/Maildir/dovecot.index.log
/srv/vmail/test@domain.com/Maildir/dovecot-uidvalidity.56fa579d
/srv/vmail/test@domain.com/Maildir/dovecot-uidvalidity
/srv/vmail/test@domain.com/Maildir/dovecot-uidlist
/srv/vmail/test@domain.com/Maildir/dovecot.index.cache

Of course, you can now connect two Thunderbird instances against 1.2.3.4 and 5.6.7.8 and then create folder, move emails, toggle read flag. Both will show the change with a very little delay.

Thanks for reading and I hope that will help

Stop backscattering when using Postfix as an Exchange frontend

Hey,

Not much to say here because everything is already explained in the GitHub README file.

In a few words, I wrote a script that extracts from Active Directory LDAP all Exchange email addresses and export this as a Postfix map. The idea is to be able to reject invalid recipients instead of whitelisting the whole domain. By doing this, your infrastructure will stop sending “non-delivery notifications” back to forged sender addresses because you let some invalid recipient emails go into your system.

Everything is available there:
https://github.com/eLvErDe/exchange-active-directory-to-postfix-map

Fighting DNS flood with Shorewall

Hello,

One of my server had the whole syslog full of lines like this:

Feb 17 09:47:05 ns1 named[27994]: client 71.103.252.61#35085 (gqvqvprkgvp.www.x99moyu.net): view external: query (cache) 'gqvqvprkgvp.www.x99moyu.net/A/IN' denied
Feb 17 09:47:05 ns1 named[27994]: client 42.225.49.146#34132 (jwhywxp.www.x99moyu.net): view external: query (cache) 'jwhywxp.www.x99moyu.net/A/IN' denied
Feb 17 09:47:05 ns1 named[27994]: client 122.183.74.13#15741 (whtotgmli.www.x99moyu.net): view external: query (cache) 'whtotgmli.www.x99moyu.net/A/IN' denied
Feb 17 09:47:05 ns1 named[27994]: client 79.31.53.2#53568 (qhnzwuw.www.x99moyu.net): view external: query (cache) 'qhnzwuw.www.x99moyu.net/A/IN' denied
Feb 17 09:47:05 ns1 named[27994]: client 86.96.114.164#7526 (psgbswn.www.x99moyu.net): view external: query (cache) 'psgbswn.www.x99moyu.net/A/IN' denied
Feb 17 09:47:05 ns1 named[27994]: client 107.68.3.25#55201 (k.www.x99moyu.net): view external: query (cache) 'k.www.x99moyu.net/A/IN' denied

And it was happening for a long time. It wasn’t a big deal because the request is denied anyway until I had to do some serious modification on this server and discovered that syslog was nearly unusable, thanks to this amazing flood:

root@ns1.domaim.com:~# wc -l /var/log/syslog
84960 /var/log/syslog

It seems to be impossible to have fine-grained logging with bind9, so I decided to try something else: let’s use shorewall (iptables frontend) to drop all pattern matching “x99moyu.net” (all requests are against this specific domain).

Let’s give iptables a try:

iptables -A INPUT -p udp --dport 53 -m string --algo bm --string x99moyu -j DROP

Yeah! Syslog stopped complaining. However, I’m not really happy with solution:

  • TCP is not handled as well
  • IPV6 isn’t either
  • It matches x99moyu instead of x99moyu.net
  • It’s not integrated into the system
  • It’s not self-documenting

Let’s try to figure out how to match the whole domain first:

iptables -A INPUT -p udp --dport 53 -m string --algo bm --string x99moyu.net -j DROP

Won’t work. In fact, the DNS request in constructed a different way:
http://stackoverflow.com/questions/14096966/can-iptables-allow-dns-queries-only-for-a-certain-domain-name?answertab=votes#tab-top

If you look at the contents of the DNS request packet in wireshark or similar you will find that the dot character is not used. Each part of the domain name is a counted string, so the actual bytes of the request for google.com will be:

06 67 6f 6f 67 6c 65 03 63 6f 6d
The first byte (06) is the length of google, followed by the 6 ASCII characters, then a count byte (03) for the length of com followed by… you get the idea.

Yep, I got it. We’ll also need to do a “hex” match instead of a simple string:

iptables -A INPUT -p udp --dport 53 -m string --algo bm --hex-string "|07|x99moyu|03|net"

Here we go, here’s the proper iptable line to use, now we can integrate it into our /etc/shorewall/rules and /etc/shorewall6/rules above the “DNS/ACCEPT” line.

# With logging (x99moy is a "tag" displayed in the log lines, limited to 6 chars)
#DNS/DROP:info:x99moy loc fw ; -m string --algo bm --hex-string "|07|x99moyu|03|net"
# Without logging
DNS/DROP loc fw ; -m string --algo bm --hex-string "|07|x99moyu|03|net"

					

A short story about PHP CMS, Spam, RBL and Postfix rate-limiting

We had some issues today, at work, with a PHP-based CMS (hello |*@#-?! joomla) being used as a spam gateway.

 

  • The root cause (Joomla)

I fixed the issue by figuring out what was the broken PHP file using findbot.pl tool from abuseat.org. But my main concerns is that there’s no way to prevent this to happen again. PHP is broken by design, especially while being used for a CMS.

Abuseat’s script helped me to find suspicious code, then confirmed by the apache logs:

62.84.241.155 - - [10/Feb/2016:07:08:03 +0000] "POST /templates/_old2/session.php HTTP/1.1" 200 316 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0"
62.84.241.155 - - [10/Feb/2016:07:08:51 +0000] "POST /templates/_old2/session.php HTTP/1.1" 200 316 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0"
62.84.241.155 - - [10/Feb/2016:07:09:38 +0000] "POST /templates/_old2/session.php HTTP/1.1" 200 341 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0"

In the meanwhile, Joomla has been updated an hopefully the security issue has been fixed.
After removing the bad file, the owner of my turned-into-a-spambox-cms looks being annoyed and seemed to try break-in again:

195.206.253.146 - - [10/Feb/2016:08:18:21 +0000] "GET /administrator/index.php HTTP/1.0" 200 7778 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
195.206.253.146 - - [10/Feb/2016:08:18:21 +0000] "POST /administrator/index.php HTTP/1.0" 303 228 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
195.206.253.146 - - [10/Feb/2016:08:18:21 +0000] "GET /administrator/index.php HTTP/1.0" 200 7778 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
195.206.253.146 - - [10/Feb/2016:08:18:21 +0000] "POST /administrator/index.php HTTP/1.0" 303 228 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"

No thanks, really. It’s been a pleasure but it’s time for me to move on:

root@some.server.com:~# shorewall drop 195.206.253.146
195.206.253.146 Dropped

 

  • Preventing this from happening again ?

So how could you care about this ? First thing, be sure to not mess your main SMTP IP address with it. Be sure to relay the CMS emails throught a specific dedicated SMTP server that’s not hidden being the same NAT as your main SMTP server. Otherwise, you will get blacklisted as soon as any flows open in Joomla.

To ensure you’re fine, you can use one the multi-rbl checks online like anti-abuse.org or senderbase.org by Cisco. If you’re not listed here, you’re probably fine. Otherwise it’s time to ask for removal on any blacklist and be patient. Your SMTP server won’t be trusted again until at least a couple of hours, probably couple of days to be un-blacklisted on the whole Internet.

Of course, you may consider upgrading Joomla, changing password and avoid having thousands of useless plugins, but I guess you’re not in charge of this Joomla website, right ?

Another thing that may help is to enable some PHP hardening tool called “suhosin“. It wasn’t ready while Debian Jessie has been released, so we’ll use the official upstream repository to get it.

Here’s an extract of my docker file to enable this extension:

RUN echo 'deb http://repo.suhosin.org/ debian-jessie main' >> /etc/apt/sources.list
RUN curl https://sektioneins.de/files/repository.asc | apt-key add -
RUN apt-get update
RUN DEBIAN_FRONTEND=noninteractive apt-get -y -o 'Dpkg::Options::=--force-confdef' -o 'Dpkg::Options::=--force-confold' --no-install-recommends --no-install-suggests install \
  php5-suhosin-extension
RUN php5enmod suhosin

 

  • Treat the symptoms, as well as the cause

So now, you’re using a different SMTP to relay emails coming from the insecure website… To avoid spaming the world and/or overloading the internet connection, we’ll setup rate-limiting on the postfix server.

We’ll use postfwd for this.

apt-get install postfwd

If using Debian Wheezy, make sure to get the one from backports, the default one is completly broken.

Then, we set-up a rule limiting enforcing each client_address (IP connecting this SMTP server) to not send more than 5 emails every 5 minutes.

Create new /etc/postfix/postfwd.cf configuration file containing the following:

id=RULE001
	action=rate(client_address/5/300/450 4.7.1: $$client_address: only 5 messages per 5 minutes allowed)

Then set STARTUP=1 in /etc/default/postfwd.

Then, edit your postfix configuration in /etc/postfix/main.cf to add a new smtpd_recipient_restrictions setting like this:

smtpd_recipient_restrictions = 
  check_policy_service inet:127.0.0.1:10040,
  permit_mynetworks,
  reject_unauth_destination,
  permit

The check_policy_service will check postfwd running on port 10040 which will return either permit or deny. Postfwd will reply with a 450 temporary error if the rate has been exceeded.

Beware of the order, in this example, even hosts being allowed to relay emails with this SMTP server, listed in $mynetworks, have been rate-limited.
The reason is that this SMTP server is outside main corporate network and I don’t trust any of the hosts using it.

Here’s another snippet from a production server:

smtpd_recipient_restrictions =
    permit_mynetworks,
    permit_sasl_authenticated,
    reject_non_fqdn_recipient,
    reject_unknown_recipient_domain,
    reject_unauth_destination,
    check_policy_service inet:127.0.0.1:10040,
    permit

If you don’t have this setting yet, you can get the default value on your system by running

postconf smtpd_recipient_restrictions

I suggest to always add “permit” as the last action, even if it’s implicit it’s way more easy to understand the workflow by adding it.

You can now restart both service and check the log files:

service postfwd restart
service postfix restart
Feb 10 08:27:10 server postfwd2/policy[14962]: [RULES] rule=0, id=RULE001, client=server.some.domain[123.123.123.123], sender=, recipient=, helo=, proto=SMTP, state=RCPT, rate=rate/6/0.00s, delay=0.00s, hits=RULE001, action=450 4.7.1: 123.123.123.123: only 5 messages per 5 minutes allowed
Feb 10 08:27:10 server postfix/smtpd[15881]: NOQUEUE: reject: RCPT from server.some.domain[123.123.123.123]: 450 4.7.1 : Recipient address rejected: 4.7.1: 123.123.123.123: only 5 messages per 5 minutes allowed; from= to= proto=SMTP helo=
Feb 10 08:27:10 server postfix/smtpd[15881]: disconnect from server.some.domain[123.123.123.123]
Feb 10 08:27:10 server postfix/smtpd[15881]: connect from server.some.domain[123.123.123.123]
Feb 10 08:27:10 server postfwd2/policy[14962]: [RULES] rule=0, id=RULE001, client=server.some.domain[123.123.123.123], sender=, recipient=, helo=, proto=SMTP, state=RCPT, rate=rate/6/0.00s, delay=0.00s, hits=RULE001, action=450 4.7.1: 1123.123.123.123: only 5 messages per 5 minutes allowed

Of course, postfwd has many more feature, check its online documentation !

Fixing suspend on ACPI 5.0 motherboards

2015-06-09: Current state on Linux 4.0

Today I got really pissed of to see than mw Debian testing machine was still unable to resume correctly from suspend out of the box…
So I decided to upgrade to kernel 4.0 and update motherboard BIOS to the latest release: no luck.

While looking at Google about that issue I finally found out some information… right here, on my own blog. That was quite disappointing.

I figured out I still had all these error message, just like 3 years ago:

Jun  9 23:06:04 thrall kernel: [    0.162246] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S2_] (20150204/hwxface-580)
Jun  9 23:06:04 thrall kernel: [    1.154306] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150204/psargs-359)
Jun  9 23:06:04 thrall kernel: [    1.154314] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT1._GTF] (Node ffff8802160bc478), AE_NOT_FOUND (20150204/psparse-536)
Jun  9 23:06:04 thrall kernel: [    1.154820] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150204/psargs-359)
Jun  9 23:06:04 thrall kernel: [    1.154828] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT2._GTF] (Node ffff8802160bc400), AE_NOT_FOUND (20150204/psparse-536)
Jun  9 23:06:04 thrall kernel: [    1.155221] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150204/psargs-359)
Jun  9 23:06:04 thrall kernel: [    1.155229] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT1._GTF] (Node ffff8802160bc478), AE_NOT_FOUND (20150204/psparse-536)
Jun  9 23:06:04 thrall kernel: [    1.156398] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150204/psargs-359)
Jun  9 23:06:04 thrall kernel: [    1.156406] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT2._GTF] (Node ffff8802160bc400), AE_NOT_FOUND (20150204/psparse-536)
Jun  9 23:06:04 thrall kernel: [    1.161873] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150204/psargs-359)
Jun  9 23:06:04 thrall kernel: [    1.161881] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT3._GTF] (Node ffff8802160bc388), AE_NOT_FOUND (20150204/psparse-536)
Jun  9 23:06:04 thrall kernel: [    1.165882] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150204/psargs-359)
Jun  9 23:06:04 thrall kernel: [    1.165891] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT3._GTF] (Node ffff8802160bc388), AE_NOT_FOUND (20150204/psparse-536)
Jun  9 23:06:04 thrall kernel: [    1.184009] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150204/psargs-359)
Jun  9 23:06:04 thrall kernel: [    1.184017] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff8802160bc4f0), AE_NOT_FOUND (20150204/psparse-536)
Jun  9 23:06:04 thrall kernel: [    1.213989] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150204/psargs-359)
Jun  9 23:06:04 thrall kernel: [    1.213997] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff8802160bc4f0), AE_NOT_FOUND (20150204/psparse-536)

So what happened with that patch fixing the issue in 2012 ? Does it got rejected ? Lost somewhere in some random git space ?
I checked the official git master branch and the fix was applied. The idea was good again, no luck.

Finally, I figured out the issue was triggered by some (new?) ACPI/SATA related stuff that were probably not existing in 2012.

The proper fix is simply to boot your kernel with libata.noacpi=1 and resume works again, YAY \o/

To make it permanent on Debian, edit /etc/default/grub and set the following line:

GRUB_CMDLINE_LINUX_DEFAULT="libata.noacpi=1"

Then regenerate grub config by running update-grub.

After a reboot I checked my ACPI related error message:

Jun  9 23:26:00 thrall kernel: [    0.170951] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S2_] (20150204/hwxface-580)

That’s a lot less and suspend seems to be reliable so far.


Original post from 2012

There’s a bunch of new motherboard coming which uses ACPI 5.0 however it’s not supported yet by Linux kernel.

If you own a recent Asus motherboard (seems to append on P67, H67 & Z68 chipset based series) and your computer suspend just fine but never wakes up, you may really love this post 😉

Here we go, check if your is affected by this bug:

dmesg | grep AE_NOT_FOUND

If you see lines looking like this:

ACPI Error: [RAMB] Namespace lookup failure, AE_NOT_FOUND (20110112/psargs-359)

Great, you should be able to fix your suspend issue !

Next step is quite easy. We’re going to build latest 3.2.1 kernel patched with ACPICA_Fix_to_allow_region_arguments_to_reference_other_scopes.

Install latest 3.2 kernel from Debian and compilation tools:

sudo aptitude install linux-image-3.2.0-rc7-amd64 build-essential

Download and extract 3.2.1 kernel sources:

wget ftp://ftp.kernel.org/pub/linux/kernel/v3.x/linux-3.2.1.tar.bz2
tar xvjf linux-3.2.1.tar.bz2

Download the patch attached to this blog post, and apply it:

cd linux-3.2.1
patch -d drivers -p0 < /tmp/ACPICA_Fix_to_allow_region_arguments_to_reference_other_scopes.patch

Copy debian’s 3.2 kernel config:

cp /boot/config-3.2.0-rc7-amd64 .config

Start building…. (Use -j N, N = number of CPU cores)

make -j 4

It will ask for a few new config options (drivers added between 3.2.0-rc7 and 3.2.1), just accept default settings.

Install new kernel:

make modules_install
make install

On my system, the initrd has been updated automatically using the debian way. Despite it’s 100Mb (??), it works as excepted.

After rebooting on this new kernel, you should be able to suspend and resume sucessfully.

References:
GIT commit in linux-next
ArchLinux forum topic who gave me the right fix
ACPI devel mailing list archive

Known motherboard affected by this bug:
Asus P8Z68-V LX
Asus P8Z68-V LE
Asus P8H67

Future:
According to a co-worker (kernel developper), the patch has been committed to linux-next GIT repository, so it should be integrated to official kernel release starting on version 3.4.

Postfix: SSL relayhost

Hi,

Here is a quick workaround to make postfix use a remote server as a relay (aka “relayhost“) using SSL on port 465.

The idea is to setup a stunnel daemon on a random local port which will operates as an SSL TCP proxy to your real server.

apt-get install stunnel4

Then, edit /etc/stunnel/stunnel.conf, comment the “cert = /etc/stunnel/mail.pem” line an any built-in proxy ([pop3s], [imaps]…).

Add a new section:

[postfix-ssl-relayhost]
accept = 2525
client = yes
connect = my.remote-server.com:465

Enable stunnel daemon by setting ENABLED=1 in /etc/default/stunnel4.

Restart stunnel:

/etc/init.d/stunnel4 restart

Add the following settings in /etc/postfix/main.cf:

# 465 isn't filtered...
# relayhost = smtp.internal-server.com 
# relay thru stunnel forwarding to my.remote-server.com:465
relayhost = [127.0.0.1]:2525

And restart the service:

/etc/init.d/postfix restart

You should now see something like this in your log file:

Feb 10 14:12:47 my.server.local postfix/cleanup[5121]: 6D8A8100E6F: message-id=<20140210131247.6D8A8100E6F@my.server.local>
Feb 10 14:12:47 my.server.local postfix/qmgr[5112]: 6D8A8100E6F: from=, size=336, nrcpt=1 (queue active)
Feb 10 14:12:47 my.server.local stunnel: LOG5[5009:3083459504]: postfix-ssl-relayhost connected from 127.0.0.1:59355
Feb 10 14:12:47 my.server.local postfix/smtp[5123]: 6D8A8100E6F: to=, relay=127.0.0.1[127.0.0.1]:2525, delay=0.09, delays=0.02/0/0.06/0.01, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 7F5E340569E3)
Feb 10 14:12:47 my.server.local postfix/qmgr[5112]: 6D8A8100E6F: removed
Feb 10 14:12:47 my.server.local stunnel: LOG5[5009:3083459504]: Connection closed: 511 bytes sent to SSL, 313 bytes sent to socket