Fixing VMware vCenter template customization for Debian Stretch (nic detected as “ether”)

Hello,

I’m a big fan of Foreman, I use it everywhere to spawn my virtual machines (mostly with VMWare vCenter or AWS) and then apply directly Puppet classes on it to get a fully configured new host in a few clicks. Maybe I’ll write about it one day, let’s see.

Anyway, this week theme was mostly “Let’s upgrade from Jessie to Stretch, I’m craving Python 3.5 and the new async/await syntax)”.
Sadly, it went wrong. I was unable to use my Foreman anymore against ESX 6.0 because when injecting the customization XML file (used to define IP settings within the VM through open-vm-tools) the resulting VM had no network set.
After looking at what happened, I figured out /etc/network/interfaces had been created wrong: instead of using eth0 (yes, I disabled predicitve interface name in my template) it was all set like the interface was named ether. Uh ?

Quick Google search with “debian stretch vmware ether” lead me to the following GitHub bug opened against open-vm-tools. Sadly the issue wouldn’t come from open-vm-tools: this issue comes from a VMWare script not parsing correctly current ifconfig output (yeah, I added net-tools in my template too).

Here is an extract of the net-tools package NEWS.Debian file:

Wow, that’s a pretty dangerous move you did here….

The script creating the network configuration is actually a piece of Perl crap copied directly from the vCenter server into the VM filesystem. Yeah, that sounds like black magic but the good news is that’s it’s Perl, so it’s fixable.

So I searched for this “Customization.pm” file on my vCenter Windows server and I found it here:
C:\Program Files\VMware\vCenter Server\vpxd\imgcust\linux\imgcust-scripts\Customization.pm

I managed quite easily to understand what was wrong, and I must say that original output parsing was pretty cheap.
Anyway, here’s a better one that just works:

Nothing to restart, this file in copied everytime you apply customization to a template. You’ll find attached a text version of the patch: vcenter_Customization_pm.diff

Good luck!

Policy Based Routing (PBR) with Shorewall to migrate a server

Hey,

Today I’m doing pure sysadmin work and I’ve been asked to migrate several servers from an obsolete IP range 192.168.x to 10.x. Things were quite easy until I reached the internal mail server that can be used by hundreds field hardware as a relay server. Everybody is supposed to use DNS entry but I won’t trust that.

So my idea is to switch eth0 to the new network and keep a new eth1 in the old one to keep the service working and be able to log what’s using the obsolete address.
There’s just a little problem: if my default gateway is on eth0, any packet entering eth1 from a routed network (would work for what’s connected in the legacy local network) will be answered using the default gateway on eth0. That’s asymmetric routing and that just doesn’t work.

Okay, so how do I solve that ? With Shorewall of course ! The idea if to tag any packet entering eth1 with a different mark than the ones coming throught eth0 and provide different routing table for each mark. I’ll do this on CentOS today but it should be basically the same for any Linux system. Shorewall is usually available everywhere but you can try doing this by hand with “ip” and “iptables”. Looks like a lot of pain to me, though.
Having both address routed and working is a nice step but it’s pretty useless if I have no way to find out who’s using the obsolete address so we’ll use Shorewall to log these access and create a specific rsyslog/logrotate configuration to get a dedicated log.

First, we’ll change the network configuration to have both interfaces up with a default gateway only on the first interface (connected to the new network). The gateway will be later overridden by Shorewall but it’s always saner to have a default configuration working, even with limited feature.

So make sure to create proper ifcfg-eth0 and ifcfg-eth1 in /etc/sysconfig/network-scripts and make sure to have only GATEWAY defined on the new network. You should also make sure that the server is reachable on its new address and reachable on the old address with a machine directly connected to the legacy network.

Let’s continue with a very basic Shorewall configuration. yum -y install shorewall and then make sure to have the three following files in /etc/shorewall:

  • interfaces – List of network adapter handled by Shorewall
  • policy – Default firewall policies between each zone
  • providers – This one is PBR specific, we’ll use this to mark packets
  • rules – Overrides default policies with port/host rules
  • shorewall.conf – Global settings
  • zones – Map interfaces to firewall zones
  • If you miss one, copy it from /usr/share/shorewall/configfiles/

    So let’s do a few adjustments in shorewall.conf first:

    IP_FORWARDING=No (No this machine SHOULD never be used a gateway between legacy and new network, we're not here to create security flows ;-))
    DISABLE_IPV6=Yes (Sadly, there's no IPv6 here so it's better to let Shorewall4 kill the whole stack)
    LOGTAGONLY=Yes (Change the way Shorewall generate log prefix, otherwise ours will be too long and get shortened)

    Now defines the interfaces in rules:

    And map them to IPv4 zones:

    fw is a default zone meaning “myself”.

    And we create a default policy allowing the machine itself to reach legacy and new network zones and blocking any incoming packets.

    Finally we’ll add a set of default rules to be at least, to SSH the server again

    Just like in policy file, you can use loc,old if you want to permit ping and SSH from the old network too.

    I’ll also add a few rules to permit mail related services from the new zone:

    Okay, now we can enable and start Shorewall.

    Now we’ll ask Shorewall to mark packets differently according to the incoming interface. This will be done in providers file.

    Last column is the gateway to use on each network.

    Let’s permit mail-related traffic from the legacy network but ask Shorewall to log these packets. Add the following in rules file:

    Reload Shorewall and try to telnet tcp/25 from a routed network, both IPs are now working !

    If you check /var/log/messages you will see log like:
    Jul 24 16:55:39 mailsrv kernel: Shorewall:MailMigration:ACCE IN=eth1 OUT= MAC=XXX SRC=192.168.55.4 DST=192.168.0.10 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=22405 DF PROTO=TCP SPT=39474 DPT=25 WINDOW=29200 RES=0x00 SYN URGP=0 MARK=0x2

    You can also check your routing tables with ip route show table.
    ip route show table main shows no more default gateway
    ip route show table 1 shows local route for eth0 network and default gateway of the new network. It’ll be used for packet tagged as 1.
    ip route show table 2 shows local route for eth1 network and default gateway of the legacy network. It’ll be used for packet tagged as 2 (Note the log above with MARK=0x2).

    Your server is now completely accessible from both networks and you can easily monitor the log file to find clients still using the legacy address. But we can make it a lot easier by asking rsyslog to create a separate log file with these specific messages:

    Create /etc/rsyslog.d/mailmigration.conf with the following content:

    And the associated logrotate /etc/logrotate.d/mailmigration file to avoid having a single never ending file:

    If you want to go further for a more automated way of handling this, I’d definitely suggest having a look at Rsyslog AMQP module to publish event to a RabbitMQ and write a quick Python consumer to parse and notify “Someone” (may I suggest calling some API to create an internal support ticket ?) using Pika. The “worker.py” file should be enough for testing, just try/except/ch.basic_nack your handler so the message goes back in queue in case of failure.

Building native Python 3.4 module with Pip on Windows

Hello,

Yesterday I decided to use Python setproctitle module in a project to rename Python script process name (for pretty displaying in nestat, ps…).
RPM package for CentOS 7 was done very quickly by modifying current dev Fedora one to get Python34 flavor for my old CentOS but I try to keep compatibility of my code for Windows too (mostly for development purpose of colleagues).

As usual, I would just go for “pip install setproctitle” on Windows (I would clearly advice to NEVER do that on production but it’s fine for developing).
Sadly it failed with the following error:
error: Microsoft Visual C++ 10.0 is required (Unable to find vcvarsall.bat).

According to Google this error is quite famous but most of the people seem to be trying to fix it without having any clue of what is really going on.
The root of this issue is that renaming a process in something really specific and thus platform dependent. If you look at setproctitle code you’ll see it’s all C code with specific section for each family of operating system. So we are having to issue installing this module on Windows because:

  • You need a compiler, but unlike on Linux you need the same compiler that the one Python team used when building the Python interpreter you have installed
  • You will probably also need Windows SDK, because setproctitle is very likely to use Windows low-level headers

According to pip error message when installing setproctitle module, I need Visual Studio 10.0 compiler. Okay.
Thanks to Wikipedia, I’m now aware that version 10.0 is actually Visual Studio 2010.
Microsoft confirms this but adds an interesting information: Visual Studio 2010 is a commercial software, so I need a free alternative which is Microsoft Windows SDK for Windows 7 and .NET Framework 4 embedding Visual Studio 2010 compiler.

I’d suggest getting the ISO version instead because the previous link in an online installer. It may not work anymore next time you need to install it…

A funny thing: you’ll be prompted for three different ISO file without any information about what the difference is… So here is the explanation:

  • GRMSDK_EN_DVD.iso: This is the regular X86 Windows running in 32 bits mode
  • GRMSDKIAI_EN_DVD.iso: Intel Itanium 64 bits, you don’t want that
  • GRMSDKX_EN_DVD.iso: X64 version, that’s probably the one you need

If you get the wrong one, the installer will fail with weird error message saying there’s an MSI file missing!

Before trying to install this, uninstall any “Visual Studio 2010” related software, especially the classic Microsoft Visual Studio 2010 Redistribuable X86 and X64 which are very likely to be installed already. Otherwise, the SDK will fail to install without any understandable error message (but you’re free to give it a try and try to figure out what’s going on in the Windows Installer log file, good luck)

I also had trouble running the installer from a network mapped drive, so you can safely extract ISO content with 7-Zip but you might have to copy the folder locally before running it (any feedback would be appreciated in comments if you give a try).

You may now think you’re ready but wait… There’s more.
It seems Windows SDK package installs a broken Visual Studio distribution: KB2519277 (FIX: Visual C++ compilers are removed when you upgrade Visual Studio 2010 […] if Windows SDK v7.1 is installed). According to the title, it’s not exactly what we are doing but you really need that, get VC-Compiler-KB2519277.exe here:

Last but no least, despite Microsoft released a fix to repair a broken Visual Studio installation from the SDK package, they still managed to release it broken: it’s not working on X64, there’s a missing BAT file to set environment variables when running from an X64 shell 😀 No kidding…

Even worse, it has been reported to Microsoft but they closed the issue with no explanation: https://connect.microsoft.com/VisualStudio/feedback/details/510784/vcvarsall-bat-amd64-environment-is-missing

Hopefully some people at StackOverflow fixed the issue by themselves.

I made a batch script so anyone can just run the script and enjoy the fix:

Again, it cannot be run from a network drive (wtf, Windows really…) so you’ll have to create this script on your desktop with “.bat” extension and run it with administrator privileges using right click.

Now you can go back to pip and enjoy the package building and installing successfully 🙂

PS: Did I mention setproctitle 1.1.10 module is real good shit ? If you’re running tones of Python processes, especially related to networking you may benefit from a renamed process when using ps or netstat !

A working Microsoft RDP (remote desktop) client

Hey,

Recent Windows Server release (like 2012) seems to require some additional feature the good old “rdesktop” tools do not handle. Here is what happens when connecting:

Autoselected keyboard map en-us
ERROR: CredSSP: Initialize failed, do you have correct kerberos tgt initialized ?
Failed to connect, CredSSP required by server.

Many people around Internet suggest disabling something on the server but it means disabling some security feature. Moreover, you might need to use RDP to disable this (ZeroDivisionError) and you may not be allowed to do so. Anyway, shitty answser.

Here is the proper one: https://github.com/FreeRDP/FreeRDP

This client just works but has the same issue as rdesktop: it’s highly stupid. For instance, look at the error message above and notice “Autoselected keyboard map en-us”.
Sorry, what ? It’s not because I’m using en_US locale that I’m actually staying in the United States and using a regular ANSI QWERTY keyboard. In fact, I’m not, not at all.
Another issue is the screen size setting which seems to be always set to 1024×768 which is a pitty nowadays, everybody uses at least “FullHD” screen.

So I made a shell wrapper script implementing dynamic screen size selection to 90% of your current display (configurable) and setting the right keymap according to your keyboard layout and variant (layout=ch, variant=fr for me, which is a french oriented QWERTZU layout used in Luxembourg and called by Windows “Swiss French”).

It also feature a configuration file to override defaults and some handy default options to share clipboard and home disk with the remote target. All you have to do is to put saner-xfreerdp in /usr/local/bin/ and use it instead of the real binary.

Get the script here: https://github.com/eLvErDe/saner-xfreerdp

Here is an very simple usage example:

user@host:~$ saner-xfreerdp -u username -a some-srv-01.domain.lan
INFO: Detected active screen on monitor DVI-0 with width=1920 and height=1200
INFO: Will use resized resolution of 1728×1080

INFO: Running xfreerdp +clipboard +home-drive /u:”username” /v:”some-srv-01.domain.lan” /kbd:”Swiss French” /w:1728 /h:1080

[xrdp logs…]
Password:

Debugging “no output” in Nagios or Centreon

You just set up a new test that was running perfectly fine when ran by hand but fails completely after integration in the monitoring software ?
Of course, you suspect that the actually run command in invalid, thanks to parameters, quotes, escapes or whatever but you’re having hard time to figure out what was run exactly…

Been there, done that. But here’s a magic trick:
Let’s do some kind of “ps | tail -f | grep” on the monitoring poller itself:

while true; do ps aux | grep check_script_name | grep -v grep; done

Now, trigger a forced check and get the full command on your terminal.
Some quotes might be missing because ps aux doesn’t show the argument separator but I guess that could be workarounded with a real script querying /proc/${pid}/cmdline that contains \0 arg separator…

Centreon “log” table getting insanely huge

Hi there,

I’m currently migrating some old Centreons 2.5/2.6 with Nagios/NDO to Centreon 2.7 with Centreon-Engine/Centreon-broker but I’m experiencing some issues with insanely large MySQL tables to migrate:

This table contains old Nagios logs and according to a forum post it’s being use when clicking on Monitoring > Event logs and is used when doing reporting actions.
Fair enough, I don’t mind anyway of what happened last year, reporting is done on a monthly basis.

So let’s see what is the oldest entry there:

Sadly, it’s using unix timestamp and not MySQL datetime format, so we’ll have to do some conversion to get it humanely-readable.
To be honest, when I started the cleanup the oldest entry was even older.

I’m not sure if Centreon is supposed to clean this out. I guess it does, probably using one of the various cron jobs installed by Centreon but according to my experience this is highly borked and can surely lead to uncleaned entries.

Let’s validate we’re not going to delete bad entries by running a select first

Looks okay. Be sure to compare “ctime” and the converted date and play with the WHERE condition so you can be sure it’s really working properly.
For instance, if you swap “2016-06-08 00:00:00” with “2015-06-14 19:19:01” the last line should disappear.

Once you’ve confirmed your not deleting anything useful, go ahead with a DELETE statement:

I decided to use LIMIT here, to avoid loading too much the server for an unknown time. “time” command has been added here so you can have a measurement of the time required to delete 1 000 000 entries (52s here).

You can now recheck the oldest log you have now:

It seems it’ll be a long way to go before getting to june, 2016 😉

Bonus:
All in one command, so you just have too check your term when coming back from the coffee machine to see its progress:

When the loop keeps outputing the same date, it means DELETE is not removing anything anymore, time to hit ctrl+c !

Let’s have a look to the table size now:

Uh ?

Thanks to Google, it seems I need to run “OPTIMIZE TABLE” to reclaim the freed disk space. But there’re two thing I know about optimize and huge tables like this one:
* It will write lock the table
* It will last for ages (I mean up to *days*)

Let’s try to make this process a bit quicker… Ever heard about eatmydata ?
It’ll will disable fsync() system call, giving you some kind of write cache on steroids; drawbacks: you’re not protected anymore from file corruption in case of a crash.

For now, we’ll take the risk and hack mysql init script to run with eatmydata:

It’s pretty hard to figure out if the trick worked or not. Actually, it’ll set a LD_PRELOAD env variable to override libc calls with the unprotected ones.
Thanks to /proc, we can check this by looking at the mysqld PID attributes

(basically, I get /usr/sbin/mysql pid which is the main MySQL server process and check /proc//environ)

If it worked, you should find a line like this:

We can now run optimize on this table:

You can see it processing by running:

Now you will have to wait a couple of hours for the optimization to complete…

Nginx SSL vhosting using Server Name Indication

Here is the issue: I have a tcp/443 DNAT to a specific machine running some specific HTTPS app that does not work behind a reverse proxy.

Obviously, I want to run others application on 443 and I’m not allowed to get any other port.

Sounds pretty bad, right ?
Actually, there’s a way out and it’s called “nginx-is-so-fuckin-powerfull” 😉

As you may know, a long time ago a feature has been added to TLS which is called “Server Name Indication”. Before this it was impossible to serve multiple virtual hosts on a single address because SSL session was negociated before the client actually sends the requested vhost name.

With SNI, there’s a quick chat between your HTTPS server and the remote browser, something like:

Ok that’s probably not really accurate but who cares about what exactly happens. The thing is: there’s a routing capability before serving the SSL certificate and we know the requested domain name at this point; and guess what: NGINX offers routing possibility using SNI name !!

First thing… You need a really really new NGINX version (1.11.5), but if your distro doesn’t have it you can use NGINX repositories.
Second, you must understand that very old clients may not use SNI. If it doesn’t it will hit the NGINX default vhost. So make sure to keep the old behavior as default, just in case.
Here is the client compatibility list for SNI: https://en.wikipedia.org/wiki/Server_Name_Indication
I leave it to you to decide if you care about handling Internet Explorer < 7. So let's configure NGINX correctly: You need to define a stream {} section on nginx.conf top, just like the http one.

Of course, you need to disable default http/server to listen on port 443 (comment lines like "listen 443 ssl" in all your existing configuration). Now we'll create a stream server, which is a plain TCP proxy: In /etc/nginx/stream.conf.d/443.conf:

And that's it 😀 You can now create a new http/server instance on port 8443 to serve your different new https vhosts but I suggest starting with the default virtual host (/etc/nginx/conf.d/default.conf) by adding "listen 8443 ssl default_server" and some ssl cert and key directives. Here is a example of the stream_443.log:

Nice work NGINX, as usual ! Going further: There's just a little issue here: The real HTTPS on port 8443 will always see incoming IP address as 127.0.0.1. Howerver, there's an overhead called "proxy_protocol" that can help passing proxying related things between NGINX servers but my equipment running behind doesn't like this. So the idea here is to use proxy_protocol between my stream/443 and http/8443 instances and strip it when proxying to original_dest using a dummy stream server that does nothing else that popping out the proxy_protocol data and forwarding to the real server. Then I will restore remote_addr in http/8443. The new config file is now:

In the http/8443 vhost, we set the following to restore original client IP address:

Nginx -_- Bonus stuff: I case you're having issue with SELinux (and you will, for instance it will deny NGINX to start a connection from port 8080 to a remote host), you can use the following to extract failures from audit.log and turn them into a permanent SELinux exception

Disable HiLink mode and force tty modem on NEW Huawei E3272

There’s plenty of documentation on Internet related to this issue but none of them works with recents firmware. They all talk about using the embedded web interface and force serial mode through some call and then send an AT command to choose default mode.
It’s not working ANYMORE on 22.470.07.00.00 firmware.

And sorry, you’ll need a Windows computer for this… (probably a clean pre-Windows 8 one)

First you need to confirm that your modem is actually working correctly in HiLink mode.
Plug it and wait for the browser to open automatically:
00_HiLink_web_interface

You should confirm from device manager that there’s a new NDIS network interface
01_NDIS_network_interface

Run E3272s_Update_21.420.07.00.00.exe which is a firmware installer containing an older version that permits default mode change
02_Firmware_downgrade

After a while it will fail with the error below. The firmware updater turned the device into serial mode but there’s no driver available
04_Device_not_found

Confirm from device manager that there’re some unknown devices
05_Unrecognized_devices

Install Mobile Partner from Huawei and fix the driver file because it doesn’t contain the IDs for this device
02_Install_Mobile_Partner

Go to C:\Program Files (x86) \Mobile Partner\Driver\Driver\X64 (for 64 bits system)
and edit ewser2k.inf file.

In the [QcomSerialPort.NTamd64], add the two following lines

%QcomDevice00% = QportInstall01, USB\VID_12d1&PID_1442&MI_00
%QcomDevice01% = QportInstall00, USB\VID_12d1&PID_1442&MI_01

Now go back to device manager and update driver by choosing the path containing the inf file
06_Update_driver_locally
07_Select_directory

If you get this error, you need to disable driver signature verification first (google for it).
BE SURE TO RESTART FIRMWARE UPDATER BEFORE TRYING TO FIX THE DRIVERS AGAIN OTHERWISE IT WONT BE TURNED INTO SERIAL MODE.
09_Driver_installing

After a successful installation you should now see two additional COM ports
10_Driver_installed_succesfully

Start the firmware updater and wait a bit
11_Upgrade_step_2

On my Windows 8.1 computer it gets stuck here and fails with an error but it worked correctly on Windows 7…

Here is what you should see if it’s working correctly
00_Upgrade_in_progress

Finally, the success message saying you firmware has been downgraded to 21.xx
00_Upgrade_succeeded

Now we have access to the serial port and we’ll have to issue a few AT command to set a new default mode. Find the COM port used by your modem now
00_COM_device

And start Putty on it
00_Putty_connection

Now we can send a few command (press Enter key at the end)

AT: Will reply "OK", it means your actually talking to someone understanding AT commands
AT^FHVER: Confirm you are running firmware 21.xx
AT^SETPORT?: Show current modem default config
AT^SETPORT=?: Display available modes
AT^SETPORT="FF;10,12": Enable diag interface and classic serial based modem emulation (this is what we need to use with wvdial)
AT^RESET: Restart the modem

Screenshot below are a bit wrong: I used AT^SETPORT=”FF;12,10″ instead of AT^SETPORT=”FF;10,12″ so the modem is on ttyUSB1 instead of ttyUSB0 !

Here you can see my AT session (please note that AT^SETPORT? won’t refresh until the modem is restarted)
00_Putty_session

After issuing AT^RESET the COM id will change (probably increased by 1), you can restart Putty and check default mode is now the one expected.
00_Putty_after_reset

You can now restart Linux and enjoy the stick being detected correctly now:

Aug 18 22:58:23 thrall kernel: [ 283.080966] usb 5-1.2: new high-speed USB device number 5 using xhci_hcd
Aug 18 22:58:23 thrall kernel: [ 283.173491] usb 5-1.2: New USB device found, idVendor=12d1, idProduct=1506
Aug 18 22:58:23 thrall kernel: [ 283.173496] usb 5-1.2: New USB device strings: Mfr=2, Product=1, SerialNumber=0
Aug 18 22:58:23 thrall kernel: [ 283.173497] usb 5-1.2: Product: HUAWEI Mobile
Aug 18 22:58:23 thrall kernel: [ 283.173499] usb 5-1.2: Manufacturer: HUAWEI Technology
Aug 18 22:58:23 thrall kernel: [ 283.184269] usbcore: registered new interface driver usbserial
Aug 18 22:58:23 thrall kernel: [ 283.184280] usbcore: registered new interface driver usbserial_generic
Aug 18 22:58:23 thrall kernel: [ 283.184287] usbserial: USB Serial support registered for generic
Aug 18 22:58:23 thrall kernel: [ 283.186411] usbcore: registered new interface driver option
Aug 18 22:58:23 thrall kernel: [ 283.186422] usbserial: USB Serial support registered for GSM modem (1-port)
Aug 18 22:58:23 thrall kernel: [ 283.186513] option 5-1.2:1.0: GSM modem (1-port) converter detected
Aug 18 22:58:23 thrall kernel: [ 283.186597] usb 5-1.2: GSM modem (1-port) converter now attached to ttyUSB0
Aug 18 22:58:23 thrall kernel: [ 283.186613] option 5-1.2:1.1: GSM modem (1-port) converter detected
Aug 18 22:58:23 thrall kernel: [ 283.186656] usb 5-1.2: GSM modem (1-port) converter now attached to ttyUSB1

Modem is on /dev/ttyUSB0.

Bonus stuff:

Udev rules that will create /dev/gsm0 (in case you have other /dev/ttyUSBx):

SUBSYSTEM=="tty", ATTRS{idVendor}=="12d1", ATTRS{idProduct}=="1506", SYMLINK+="gsm%n"

And a working wvdial configuration (PIN code disabled, POST.lu APN so you probably want to change this, no user, no password):

[Dialer Defaults]
Init1 = ATZ
Init2 = AT+CGDCONT=1,"IP","web.pt.lu"
Stupid Mode = 1
MessageEndPoint = "0x01"
Modem Type = Analog Modem
ISDN = 0
Phone = *99#
Modem = /dev/gsm0
Username = { }
Password = { }
Baud = 460800
Auto Reconnect = on

Finally, a systemd service file with autorestart

[Unit]
Description=wvdial

[Service]
Type=simple
ExecStart=/usr/bin/wvdial
RestartSec=2
Restart=always

[Install]
WantedBy=multi-user.target

Fixing non-working iDrac on PowerEdge server (R610)

It seems Dell released a couple of servers with a broken embedded iDirac.
Actually the issue comes from the on-board Broadcom ethernet chip which is not configured correctly: http://permalink.gmane.org/gmane.linux.hardware.dell.poweredge/42033

Spot the issue

Here is how to confirm your issue is related to this bug and not something else. Boot the server and press CTRL+E to get into the iDrac BIOS. Select the network submenu and check the Active LOM entry. LOM stands for LAN On Motherboard.

If it says No Active LOM even if you selected Shared above, it means the iDrac is unable to bind on any on-board LAN, this means you are having this issue.

01_broken_lom

Then, we’ll create a DOS-based floppy disk image containing some Broadcom firmware related tools that will reconfigure the embedded network controller so it can be use for the iDrac board.

Create a PXE bootable disk image with Broadcom utilities

Download Bcom_LAN_14.2.x_DOSUtilities_A03.exe from http://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=29DKK and get a terminal in the download directory.

We will now dowload a FreeDOS disk image (that can be PXE booted) and we’ll add the required tools in the image.


wget http://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/distributions/1.0/fdboot.img
mkdir mount
sudo mount -t vfat -o loop fdboot.img mount/

unzip Bcom_LAN_14.2.x_DOSUtilities_A03.exe
sudo cp ./Userdiag/NetXtremeII/uxdiag.exe mount/

sudo sh -c 'echo uxdiag -t abcd -mfw 1 > mount/idrac.bat'

sudo umount mount

mv fdboot.img fdboot-fix-poweredge-idrac.img

Now we have a FreeDOS containing Broadcom uxdiag tool as well as a idrac.bat script that will start the required command.

Copy the img file to your PXE server and set the following to start it with PXELinux (pxelinux.cfg/default):

LABEL fix-idrac
KERNEL memdisk
APPEND initrd=fdboot-fix-poweredge-idrac.img

If you don’t have memdisk binary it can be found in package syslinux-common.

Then you can restart your server and trigger a PXE boot. Once FreeDOS starts, select the Safe Mode entry (I had some issue of memory being full when using another entry).

02_freedos_booting

Then, type idract.bat to start the batch script we added inside the disk image:

03_run_script

Broadcom tools will run for a couple of seconds and output something like this:

04_uxdiag_processing

Restart the server and hit CTRL+E to get inside the iDrac again; it’s now binding on LOM1 aka the ethernet port with label “1”:

05_fixed_lom

Master-master simple email server with Dovecot

The purpose of this article is to explain how to create an hight availability email server with Dovecot.
We will use internal plain text files as users backend but it can of course easily be extended to use LDAP or SQL, but this article won’t cover this setup.

Install required packages

On both servers we’ll install dovecot as well as the POP3 and IMAP backends

To use dovecot clustering feature, known as dsync, we need dovecot 2.2 or later. Debian Jessie’s version is ok.

Setup file-based users database

Edit /etc/dovecot/conf.d/auth-passwdfile.conf.ext and set both userdb and passworddb like this:

I will use plaintext clear password here because I really want to be able to read the users from the configuration file directly. You can of course use an encrypted format, see Dovecot documentation.

The file /etc/dovecot/users will contains the users accounts and we’ll deliver all emails using paths like /srv/vmail/user@domain.com.
Dovecot is set up to always use the vmail user with mail group to avoid uid/gids madness.

First I tried to create a multi-domain setup, using “username_format=%n /etc/dovecot/%d/users” and “default_fields = uid=vmail gid=mail home=/srv/vmail/%d/%n” but current master/master plugin is unable to handle such configuration (Error: passwd-file: User iteration isn’t currently supported with %variable paths) so I decided to use a single authentication file using email as login (%u instead of %n).

We need to create the system user for dovecot:

Now we need to enable this backend by commenting auth-system and un-commenting auth-passwdfile from /etc/dovecot/conf.d/10-auth.conf

Configure Postfix to use Dovecot as delivery agent

In /etc/postfix/master.cf add the following section:

Then run the following command to make sure Postfix is configured correctly (postconf is a command that will edit main.cf config file):

Please MAKE SURE your /etc/hosts and /etc/hostname are configured correctly !
The following commands should return short/full/domain names:

Now we’ll enable Dovecot LDAP and enable our mail domain:

Additional Dovecot config

In /etc/dovecot/conf.d/10-mail.conf set

It will deliver emails in Maildir format like this: /srv/vmail/user@domain.com/Maildir

In /etc/dovecot/conf.d/10-auth.conf we’ll enable plain text login because we don’t care about SSL and stuff (non-encrypted auth is disabled for any host except localhost by default):

Create first user and try it

Create /etc/dovecot/users with the following content:

And secure the file permissions:

Finally restart dovecot, postfix and send a test email:

You should see something like this in the logs:

The key part here is dovecot: lda(test@domain.com): msgid=: saved mail to INBOX.

We can now check what happened on the filesystem:

Now we can test IMAP login will the following transcript using telnet:

You should see the message body containing “test”. If so, we now have a fully working email server.

Enable doveadm service and replication plugin

Create a new file /etc/dovecot/local.conf with the following content:

Then we’ll configure the peer address for replication plugin in /etc/dovecot/conf.d/90-plugin.conf:

Now we will globally enable the replication plugin as well as the notify one (required), in /etc/dovecot/conf.d/10-mail.conf:

And that’s it… Yes, really, we’re done here !

Replicate config to secondary server

Here is my synchronisation script

Basically it sync the whole Postfix and Dovecot postfix, replace the hostname by the secondary server one in Postfix configuration and change the address in Dovecot’s mail_replica setting.

You can now run echo test | mail -s test test@domain.com on both server and check that both filesystems are updated with all emails 🙂

Of course, you can now connect two Thunderbird instances against 1.2.3.4 and 5.6.7.8 and then create folder, move emails, toggle read flag. Both will show the change with a very little delay.

Thanks for reading and I hope that will help