Centreon “log” table getting insanely huge

Hi there,

I’m currently migrating some old Centreons 2.5/2.6 with Nagios/NDO to Centreon 2.7 with Centreon-Engine/Centreon-broker but I’m experiencing some issues with insanely large MySQL tables to migrate:

root@server:~# ls -lah /var/lib/mysql/centreon_storage/log.*
-rw-rw---- 1 mysql mysql  13K Apr 15  2015 /var/lib/mysql/centreon_storage/log.frm
-rw-rw---- 1 mysql mysql  16G Dec  8 09:18 /var/lib/mysql/centreon_storage/log.MYD
-rw-rw---- 1 mysql mysql 6.0G Dec  8 09:18 /var/lib/mysql/centreon_storage/log.MYI

root@server:~# ls -lah /var/lib/mysql/centreon_storage/log.*

-rw-rw---- 1 mysql mysql 13K Apr 15 2015 /var/lib/mysql/centreon_storage/log.frm

-rw-rw---- 1 mysql mysql 16G Dec 8 09:18 /var/lib/mysql/centreon_storage/log.MYD

-rw-rw---- 1 mysql mysql 6.0G Dec 8 09:18 /var/lib/mysql/centreon_storage/log.MYI

This table contains old Nagios logs and according to a forum post it’s being use when clicking on Monitoring > Event logs and is used when doing reporting actions.
Fair enough, I don’t mind anyway of what happened last year, reporting is done on a monthly basis.

So let’s see what is the oldest entry there:

root@server:~# echo 'SELECT FROM_UNIXTIME(ctime) FROM log ORDER BY ctime ASC LIMIT 1' | mysql -N centreon_storage
2015-06-14 19:19:00

1 2	root@server:~# echo 'SELECT FROM_UNIXTIME(ctime) FROM log ORDER BY ctime ASC LIMIT 1' \| mysql -N centreon_storage 2015-06-14 19:19:00

Sadly, it’s using unix timestamp and not MySQL datetime format, so we’ll have to do some conversion to get it humanely-readable.
To be honest, when I started the cleanup the oldest entry was even older.

I’m not sure if Centreon is supposed to clean this out. I guess it does, probably using one of the various cron jobs installed by Centreon but according to my experience this is highly borked and can surely lead to uncleaned entries.

Let’s validate we’re not going to delete bad entries by running a select first

root@server:~# echo 'SELECT FROM_UNIXTIME(ctime), ctime, output FROM log WHERE ctime < UNIX_TIMESTAMP("2016-06-08 00:00:00") LIMIT 5' | mysql -N centreon_storage
2015-06-14 19:19:00	1434309540	Max concurrent service checks (200) has been reached.  Nudging server1:traffic_eth0 by 11 seconds...
2015-06-14 19:19:00	1434309540	Max concurrent service checks (200) has been reached.  Nudging server1:Ping by 7 seconds...
2015-06-14 19:19:00	1434309540	Max concurrent service checks (200) has been reached.  Nudging server2:Memory by 12 seconds...
2015-06-14 19:19:00	1434309540	Max concurrent service checks (200) has been reached.  Nudging server3:Processor by 6 seconds...
2015-06-14 19:19:01	1434309541	Max concurrent service checks (200) has been reached.  Nudging server3:Memory by 10 seconds...

root@server:~# echo 'SELECT FROM_UNIXTIME(ctime), ctime, output FROM log WHERE ctime < UNIX_TIMESTAMP("2016-06-08 00:00:00") LIMIT 5' | mysql -N centreon_storage

2015-06-14 19:19:00 1434309540 Max concurrent service checks (200) has been reached. Nudging server1:traffic_eth0 by 11 seconds...

2015-06-14 19:19:00 1434309540 Max concurrent service checks (200) has been reached. Nudging server1:Ping by 7 seconds...

2015-06-14 19:19:00 1434309540 Max concurrent service checks (200) has been reached. Nudging server2:Memory by 12 seconds...

2015-06-14 19:19:00 1434309540 Max concurrent service checks (200) has been reached. Nudging server3:Processor by 6 seconds...

2015-06-14 19:19:01 1434309541 Max concurrent service checks (200) has been reached. Nudging server3:Memory by 10 seconds...

Looks okay. Be sure to compare “ctime” and the converted date and play with the WHERE condition so you can be sure it’s really working properly.
For instance, if you swap “2016-06-08 00:00:00” with “2015-06-14 19:19:01” the last line should disappear.

Once you’ve confirmed your not deleting anything useful, go ahead with a DELETE statement:

root@server:~# time echo 'DELETE FROM log WHERE ctime < UNIX_TIMESTAMP("2016-06-08 00:00:00") LIMIT 1000000' | mysql -N centreon_storage

real	0m51.884s
user	0m0.000s
sys	0m0.008s

root@server:~# time echo 'DELETE FROM log WHERE ctime < UNIX_TIMESTAMP("2016-06-08 00:00:00") LIMIT 1000000' | mysql -N centreon_storage

real 0m51.884s

user 0m0.000s

sys 0m0.008s

I decided to use LIMIT here, to avoid loading too much the server for an unknown time. “time” command has been added here so you can have a measurement of the time required to delete 1 000 000 entries (52s here).

You can now recheck the oldest log you have now:

root@server:~# echo 'SELECT FROM_UNIXTIME(ctime) FROM log ORDER BY ctime ASC LIMIT 1' | mysql -N centreon_storage
2015-06-19 21:29:54

1 2	root@server:~# echo 'SELECT FROM_UNIXTIME(ctime) FROM log ORDER BY ctime ASC LIMIT 1' \| mysql -N centreon_storage 2015-06-19 21:29:54

It seems it’ll be a long way to go before getting to june, 2016 😉

Bonus:
All in one command, so you just have too check your term when coming back from the coffee machine to see its progress:

root@server:~# while true; do echo 'DELETE FROM log WHERE ctime < UNIX_TIMESTAMP("2016-06-08 00:00:00") LIMIT 100000' | mysql -N centreon_storage && echo 'SELECT FROM_UNIXTIME(ctime) FROM log ORDER BY ctime ASC LIMIT 1' | mysql -N centreon_storage && sleep 2; done
2015-06-21 01:47:32
2015-06-21 10:59:55
2015-06-21 19:57:21
2015-06-22 04:58:59
[...]

root@server:~# while true; do echo 'DELETE FROM log WHERE ctime < UNIX_TIMESTAMP("2016-06-08 00:00:00") LIMIT 100000' | mysql -N centreon_storage && echo 'SELECT FROM_UNIXTIME(ctime) FROM log ORDER BY ctime ASC LIMIT 1' | mysql -N centreon_storage && sleep 2; done

2015-06-21 01:47:32

2015-06-21 10:59:55

2015-06-21 19:57:21

2015-06-22 04:58:59

[...]

When the loop keeps outputing the same date, it means DELETE is not removing anything anymore, time to hit ctrl+c !

Let’s have a look to the table size now:

root@server:~# ls -lah /var/lib/mysql/centreon_storage/log.*
-rw-rw---- 1 mysql mysql  13K Apr 15  2015 /var/lib/mysql/centreon_storage/log.frm
-rw-rw---- 1 mysql mysql  16G Dec  8 10:25 /var/lib/mysql/centreon_storage/log.MYD
-rw-rw---- 1 mysql mysql 6.0G Dec  8 10:25 /var/lib/mysql/centreon_storage/log.MYI

root@server:~# ls -lah /var/lib/mysql/centreon_storage/log.*

-rw-rw---- 1 mysql mysql 13K Apr 15 2015 /var/lib/mysql/centreon_storage/log.frm

-rw-rw---- 1 mysql mysql 16G Dec 8 10:25 /var/lib/mysql/centreon_storage/log.MYD

-rw-rw---- 1 mysql mysql 6.0G Dec 8 10:25 /var/lib/mysql/centreon_storage/log.MYI

Uh ?

Thanks to Google, it seems I need to run “OPTIMIZE TABLE” to reclaim the freed disk space. But there’re two thing I know about optimize and huge tables like this one:
* It will write lock the table
* It will last for ages (I mean up to *days*)

Let’s try to make this process a bit quicker… Ever heard about eatmydata ?
It’ll will disable fsync() system call, giving you some kind of write cache on steroids; drawbacks: you’re not protected anymore from file corruption in case of a crash.

For now, we’ll take the risk and hack mysql init script to run with eatmydata:

root@server:~# sed -i 's!/usr/bin/mysqld_safe > /dev/null!/usr/bin/eatmydata /usr/bin/mysqld_safe > /dev/null!' /etc/init.d/mysql
root@server:~# systemctl --system daemon-reload
root@server:~# systemctl restart mysql

root@server:~# sed -i 's!/usr/bin/mysqld_safe > /dev/null!/usr/bin/eatmydata /usr/bin/mysqld_safe > /dev/null!' /etc/init.d/mysql

root@server:~# systemctl --system daemon-reload

root@server:~# systemctl restart mysql

It’s pretty hard to figure out if the trick worked or not. Actually, it’ll set a LD_PRELOAD env variable to override libc calls with the unprotected ones.
Thanks to /proc, we can check this by looking at the mysqld PID attributes

root@server:~# cat /proc/`ps aux | grep /usr/sbin/mysql | grep -v grep | awk '{ print $2 }'`/environ | tr '\0' '\n'

1	root@server:~# cat /proc/`ps aux \| grep /usr/sbin/mysql \| grep -v grep \| awk '{ print $2 }'`/environ \| tr '\0' '\n'

(basically, I get /usr/sbin/mysql pid which is the main MySQL server process and check /proc//environ)

If it worked, you should find a line like this:

LD_PRELOAD=/usr/lib/libeatmydata/libeatmydata.so /usr/lib/libeatmydata/libeatmydata.so

1	LD_PRELOAD=/usr/lib/libeatmydata/libeatmydata.so /usr/lib/libeatmydata/libeatmydata.so

We can now run optimize on this table:

root@server:~# echo "OPTIMIZE TABLE log" | mysql centreon_storage

1	root@server:~# echo "OPTIMIZE TABLE log" \| mysql centreon_storage

You can see it processing by running:

watch -n 2 ls -lah /var/lib/mysql/centreon_storage/log.*

1	watch -n 2 ls -lah /var/lib/mysql/centreon_storage/log.*

-rw-rw---- 1 mysql mysql  13K Dec  8 14:58 /var/lib/mysql/centreon_storage/log.frm
-rw-rw---- 1 mysql mysql  11G Dec  8 16:45 /var/lib/mysql/centreon_storage/log.MYD
-rw-rw---- 1 mysql mysql 3.1G Dec  8 16:45 /var/lib/mysql/centreon_storage/log.MYI
-rw-rw---- 1 mysql mysql 710M Dec  8 16:53 /var/lib/mysql/centreon_storage/log.TMM

-rw-rw---- 1 mysql mysql 13K Dec 8 14:58 /var/lib/mysql/centreon_storage/log.frm

-rw-rw---- 1 mysql mysql 11G Dec 8 16:45 /var/lib/mysql/centreon_storage/log.MYD

-rw-rw---- 1 mysql mysql 3.1G Dec 8 16:45 /var/lib/mysql/centreon_storage/log.MYI

-rw-rw---- 1 mysql mysql 710M Dec 8 16:53 /var/lib/mysql/centreon_storage/log.TMM

Now you will have to wait a couple of hours for the optimization to complete…

blog.le-vert.net

No bullshit, only Linux stuff

Centreon “log” table getting insanely huge

Leave a Reply Cancel reply