Multicast UDP surprises

Posted on .

In my job I have to deal with UDP multicast traffic constantly. Creating software that communicates using multicast messages in C or C++ using the BSD sockets API can lead to surprising situations and behavior due to the way the API and the kernel work (Linux in my case). I think some of those aspects are interesting and will try to detail them here. I’ll suppose you are somewhat familiar with the sockets API and I’ll try to escalate from TCP to “What you should know if you’re going to use UDP multicast”. I’ll suppose this is IPv4. It should be mostly the same with IPv6.

TCP

Most people start socket programming with TCP as it’s easy and convenient. You start by creating a socket with the socket() system call and from there proceed to do the following.

In the client, you call connect() to connect to a server at a given address. After that, the socket can be used like a file with read(), write() and close(). You can call bind() before calling connect() but in most cases it’s not needed.

In the server, you first bind the socket to a listening address with bind(). At that point you normally specify a listening port and a listening IP address. The listening address can be INADDR_ANY to listen on any (all) IP addresses at that port. If you run netstat --listen --tcp -n on you machine and you see an asterisk before the port name, that means the process specified INADDR_ANY. You can also specify another listening address, like 127.0.0.1 or the IP address of a network interface, which you can dynamically obtain prior to calling bind(), so as to restrict traffic to a given interface or avoid listening to the world.

After that, you call listen() to put the socket in listen mode and accept() to accept new connections. The initial socket corresponds to the listening part you can see with netstat, while new sockets returned by accept() represent a specific connection with a client. Like before, you can call write(), read() and close().

Congratulations after writing a couple of client/server programs doing that. You’re now at the beginner level of socket programming. You can move up the stair by trying select(), poll(), non-blocking reads, threading or multi-processing to handle client connections on the server, etc.

UDP

Coming from a TCP background, a few things may surprise you. In the client you usually call bind() with port 0 and INADDR_ANY to bind the socket to a given address that will be used to both send messages and receive responses. Being connection-less, this is a needed step. After that, sendto() and recvfrom() will allow you to send data to other processes and receive data from them.

In the server, you normally bind() to a specific port and use a specific IP address or INADDR_ANY as before, and you sendto() and recvfrom() as in the client.

Few things appear to have changed yet there is an important difference. In TCP, the operating system queues data for you grouping it by connection (i.e. socket) and it’s essentially a queue of bytes. If you receive 20 bytes but request to read 40, your read operation will block unless you switched the socket to non-blocking mode. In that case, you’ll read 20 immediately. Likewise, if you request to read 10 you’ll read 10, and the 10 remaining bytes will wait for a future read().

In UDP, by contrast, the operating system queues your messages by order of arrival (they are not grouped by any connection because they don’t exist) and handles each message (UDP datagram) as an indivisible block. So if you receive a 20-bytes datagram and request to read 10, the operating system will dequeue that message and serve you the first 10 bytes, discarding the other 10. If you request to read 40, the operation will not block unless there are no datagrams pending and you’re in blocking mode, and in our case it will immediately return with the 20-bytes datagram.

This normally means when designing a UDP protocol you either create fixed-size datagrams or always use a large buffer and specify the maximum read size with each recvfrom() call. In my case, radars serve my software variable-sized data up to a maximum size of 64 KiB, which more or less matches the maximum size of a normal UDP datagram, so I usually declare reception buffers with that size and use it in the recvfrom() call.

Unless you’ve been warned or explained this, it usually catches you by surprise in your first UDP programs.

Multicast UDP

Sending data

If you want to create a program that will send traffic to a destination multicast address, an additional step is needed. After creating the socket with socket() and binding it to a local address with bind(), you need to call setsockopt() to specify the IP address of the network interface you want outgoing multicast traffic to go through. Take into account that, by being multicast traffic, the operating system cannot decide which interface to send the message from using the routing table. Multicast traffic can go through any given interface. See the following code:

int fd;
int ret;

// Create socket.
int fd = socket(AF_INET, SOCK_DGRAM, 0);
if (fd == -1)
{
        /* Handle error. */
}

// Bind to local address.
struct sockaddr_in bind_addr = {
        .sin_family = AF_INET,
        .sin_port = 0,
        .sin_addr = {
                .s_addr = INADDR_ANY
        }
};
ret = bind(fd, (const struct sockaddr *)(&bind_addr), sizeof(bind_addr));
if (ret == -1)
{
        // Handle error.
}

// Set multicast interface.
struct in_addr if_ip;
if_ip.s_addr = ...;

ret = setsockopt(fd, IPPROTO_IP, IP_MULTICAST_IF,
                 (const void *)(&if_ip), sizeof(if_ip));
if (ret == -1)
{
        // Handle error.
}

After that, you can use the socket normally and indicate a multicast IP address as the destination address. To send multicast traffic through several interfaces, the easiest option is to create one socket per interface.

Receiving data

The next step is creating a program that will receive multicast traffic, say to IP address 230.1.1.1 and port 4001. There are several steps involved. A program wanting to receive multicast traffic needs to, at least, add a multicast subscription to that IP address on a given interface (specified by IP address too). That subscription is added to a socket and is usually related to the bind address of the socket, but operated independently.

In the bind() call, you must use the given port (4001 in this case). The bind address is usually set to INADDR_ANY or the interface address if you want to receive unicast traffic on that socket too, in addition to the multicast traffic. If you don’t want to receive unicast traffic on the socket and restrict it to multicast traffic, you normally specify the multicast IP address as the bind address.

But that’s not enough to receive multicast traffic on that port. The operating system needs more information. It won’t let multicast traffic in through an interface unless a running process has explicitly requested to receive multicast traffic to that IP address through that interface. This needs, again, a call to setsockopt() to add a multicast group subscription through that interface.

// Suppose we have already called socket() and bind().
struct ip_mreq group;
group.imr_interface.s_addr = /* Interface IP address in network order. */;
group.imr_multiaddr.s_addr = /* 230.1.1.1 in network order. */;

ret = setsockopt(fd, IPPROTO_IP, IP_ADD_MEMBERSHIP,
                 (const void *)(&group), sizeof(group));
if (ret == -1)
{
        // Handle error.
}

After a running process has successfully added a multicast group to a socket, the subscription will be listed by netstat -gn, the command you can run to check everything is working.

As soon as you want to receive traffic from more than one multicast IP address to the same port, some questions arise. If you bind() to INADDR_ANY, you will open yourself to unicast traffic. If you bind() to the multicast address, you can only receive from that multicast address. Normally, the easy solution is to create one socket per multicast address, binding each socket to each address and adding the corresponding multicast subscription to each socket.

SO_REUSEADDR

Depending on the way you manage your sockets, you might have to activate the SO_REUSEADDR option for your sockets. This option is set with setsockopt(), as before, and needs to be set before calling bind().

int one = 1;
ret = setsockopt(fd, SOL_SOCKET, SO_REUSEADDR,
                 (const void *)(&one), sizeof(one));
if (ret == -1)
{
        // Handle error.
}

SO_REUSEADDR, in the context of multicast UDP, allows sockets to share the same listening address. If you create several sockets and all of them listen on a different multicast IP address, there will be no conflict. But, for example, if you want to receive unicast traffic to port 4001 by binding to INADDR_ANY and have separate sockets for multicast traffic, binding to *:4001 and 230.1.1.1:4001 at the same time will only be possible by specifying SO_REUSEADDR on both sockets. Either no reuse is allowed and there’s only one listening socket or reuse is allowed by everyone and there may be several listening sockets. That’s one example conflict but your application may have others that force you to use SO_REUSEADDR.

Relationship between the multicast address, the port and the interface address

Here comes the surprise. One of the most common mistakes is to think that all three elements are a single entity that acts as a filter to the traffic you will receive on a given socket. After all, the three of them are specified related to the socket. The multicast address and port are used when binding, and the multicast address and interface address are used when subscribing to the multicast group.

So if you bound to 230.1.1.1:4001 and subscribed to 230.1.1.1/192.168.1.1 on a socket, many people would expect that socket to receive traffic through the interface with address 192.168.1.1 going to address 230.1.1.1 and port 4001, and only that traffic. But it doesn’t work like that.

When you bind, you specify your listening address. You cannot received traffic that goes to other addresses. When you subscribe to a multicast group your are saying two things to the operating system. On the one hand, that socket will have a subscription to that multicast group. On the other hand, traffic to that multicast group should be allowed through that interface because there’s a process interested in receiving it (we’re not taking into account firewalling rules, of course).

So when the operating system sees multicast traffic coming through that interface, it will first check if someone has requested traffic to that multicast group through that interface. If someone has, it will let those datagrams in. Immediately and surprisingly, the operating system will forget (or stop caring about) which interface the traffic came through. It will deliver it to every socket bound to a compatible destination address and port, including any socket bound to INADDR_ANY on that port.

So if you want to receive traffic to a given multicast group on several interfaces and you create one socket per interface, binding with reuse and adding the multicast group on one interface per socket, you will be surprised when you receive traffic from every interface replicated on all those sockets. And it doesn’t stop per-process. One program of yours listens for multicast traffic and enables the multicast group on a given interface. Another process, outside your control, subscribes to the same multicast group on a completely different interface and suddenly both processes will start receiving all traffic.

If you’re thinking now about the security implications of the previous paragraphs you’re on the good track. You cannot isolate a process receiving multicast traffic to a specific interface from the source code alone. A third-party outside your control can enable that multicast group on another interface and it will reach your process. I’ve experimented this first hand. Comments are welcome in this regard.

If you want to test all of this by yourself, I have created a repository in GitHub containing a couple of example test programs for sending and receiving multicast traffic. All my tests ran on real hardware with real network interfaces. I haven’t bothered to test if everything works the same way in virtual machines.

Many thanks to W. Richard Stevens for extracting the multicast API documentation from Steve Deering’s original README published in 1989 and to Gary R. Wright for restoring Stevens’ kohala.com site after him passing away.

Superblock last write time is in the future

Posted on . Updated on .

Update

Both bugs mentioned below were closed while the issue is still present. Fortunately, there is one specific bug where this problem is being worked on: bug 1202024. I’ll keep the /etc/adjtime workaround active until that one is fixed, but it’s probably just a matter of time, as an upstream patch from Ted Tso in e2fsck has been committed and a future e2fsprogs update will probably fix the issue. The definitive solution seems to be to make e2fsprogs fix the superblock last write time without causing a full filesystem check. fsck will continue to be run twice for the root filesystem in the boot process, but that second issue will probably be handled in bug 1201979.

End of update

I hit a small Fedora bug two months ago and I wanted to post about it just in case it’s useful for somebody else. Basically, an e2fsprogs update from March 5th, 2015 made the system take much longer to boot if you’re East of GMT and keep your system clock in local time. Running systemd-analyze critical-chain should reveal the problem is systemd-fsck-root.service.

I asked about the problem in the unofficial Fedora forums and the thread is easily reachable with a web search. Reading my comments in that thread reveals the whole story. It also mentions two related bug reports in the Fedora bug tracker: bug number 1198761 and number 1201978, which I also commented on.

Long story short, Fedora will set the system clock to use local time if it detects a Windows installation in the computer, as was my case, at installation time. After the update from March 5th, fsck running from the initial RAM disk will be unaware of this and think the clock is set to UTC, adding your timezone offset to the superblock last write time to an already-local time the first time it’s run. The fact that fsck runs at this point could be the real problem and, as far as I know, is still being investigated. Later, when fsck runs a second time during the boot process to check the root filesystem, it’s aware the system clock contains the local time and then it finds the superblock write time to be in the future. Then it will spend a lot of time checking the filesystem and fixing that problem to make sure everything is alright.

The real problem is probably a combination of fsck apparently running twice and too early, and suboptimal behavior of e2fsck when fixing the superblock last write time problem. My solution leaves those issues to the Fedora people and goes for a simple workaround as I mentioned in both a bug comment and the forums thread: just include /etc/adjtime in the initial RAM disk and the lengthy filesystem check will not take place anymore.

As root, lsinitrd will allow you to check if /etc/adjtime is already included in it. If it’s not and you’re suffering this problem, create the file /etc/dracut.conf.d/adjtime.conf (the base name is up to the user) and put the following line in it:

install_items+=/etc/adjtime

This will make dracut (the initial RAM disk management tool in Fedora and other distributions) include the file in all future initial RAM disks, which will be generated automatically for future kernel upgrades. You probably want to fix your current initial RAM disk too. For that, simply run dracut -f and reboot.

Fell in love with an old-timer: Music On Console

Posted on .

One aspect where Fedora is more inconvenient to use than Slackware is playing music. Specifically, playing MP3 files. I’m not sure about the legal status of software MP3 players. Wikipedia has a section in the MP3 article about it mentioning there are still pending patent issues that may be resolved this year. However, Slackware has been shipping MP3 decoding software like mpg321 and libmad for several years. Audacious, as shipped with Slackware, includes the MP3 plugin, for example. Fedora, on the other hand, doesn’t ship any software to play MP3s by default. I’m not sure Slackware’s legal status is solid in this regard but I’m not a lawyer, and having MP3 playing capabilities out-of-the-box is definitely handy.

In Fedora, as I stated a few times in the past, I like to build my multimedia toolchain by hand. This means, basically, that I build ffmpeg, mpv and a few other packages with my own SPEC files. Other people like the convenience of RPM Fusion, a repository providing additional packages that cannot be found in the official Fedora repositories. It’s definitely worth taking a look at it. If you use RPM Fusion, it’s easy to install a few packages here and there and give Audacious the capability of playing MP3 files, or having a few command-line tools available that can handle them.

But if you decide not to use RPM Fusion, adding MP3 support to Fedora can be a bit inconvenient. RPM Fusion gives you several options. The easiest one is by providing Audacious and GStreamer plugin packages. If you want to replicate that work you need to install libav instead of or in addition to ffmpeg and build a few plugins (note I don’t take any sides in the ffmpeg vs libav controversy and I’m simply used to installing and using ffmpeg because it seems better supported for the software I normally use).

In that situation, I decided to try to find a music player that could use ffmpeg directly and didn’t depend on a specific desktop environment to simplify my build chain. To my surprise, there aren’t many. Most of them use libav or ffmpeg indirectly through GStreamer, as mentioned above. Enter MOC: a simple client-server music player with an included curses interface. Super-simple to build and use. It uses ffmpeg directly and works amazingly from both a tty and a terminal emulator. The server part is handled automatically in most cases and allows music to continue playing while the client is closed. It supports playlists or playing music from a directory. 100% recommended.

Note if you do use RPM Fusion under Fedora, you can try MOC too without any hassles. RPM Fusion provides a MOC package.

Dealing with a newborn baby

Posted on .

It’s been some time since my last post. I have three posts in mind I want to write but they require time and there’s a newborn baby at home. Things will improve but right now my spare-moments budget is a bit limited.

I’ll be back soon with more content about a Fedora bug I had to work around (it may be interesting to others), a fantastic music player I’ve only recently discovered and a few programming notes about UDP multicast reception that may surprise developers that didn’t have to deal with multiple subscriptions and interfaces but pop up from time to time in my day job.

SELinux interfering with Postfix

Posted on .

This is the last post I’ll write about migrating to Fedora and I’ll cover the problem that made me waste more time. I hadn’t run a system with SELinux before and I had trouble finding the source of the problem. Only after searching and reading a lot of web pages, forums and wikis I found a reference to SELinux that clicked in my mind. I’ll bear SELinux in mind for future similar problems.

Long story short, I monitor my hard drives with “smartd” and have it configured to send an email to my user account in case of trouble. I soon realized local mail wasn’t working because Fedora’s minimal installation doesn’t include an MTA by default. From the possible choices in Fedora, I went with Postfix. It’s modern and simple to configure for local mail delivery only. In fact, Fedora’s default configuration file comes prepared exactly for that purpose, with very sane defaults. A quick test confirmed local mail was working, and I forwarded root’s mail to my user account too.

Stage two of the plan involved setting up a $HOME/.forward file for my user that delivered mail to both the local mailbox and an external mail address at my domain. I already had a script prepared for that using some common tools for this situation: “msmtp” to deliver mail to the external address, “formail” (from the procmail package) to slightly modify it before sending it, and “getmail_mbox” (from the getmail package) to deliver it locally unmodified. I’m posting the script below in case you find it useful.

#!/bin/sh
FROM_ADDRESS=foo@example.com
TO_ADDRESS=bar@example.com
MAILBOX=/var/spool/mail/user
TEMPORARY_FILE="$( mktemp /tmp/mailforward.XXXXXX )"
cat >"$TEMPORARY_FILE"
SUBJECT="$( formail -x Subject -c <"$TEMPORARY_FILE" )"
<"$TEMPORARY_FILE" | formail \
        -i "From: $FROM_ADDRESS" \
        -i "To: $TO_ADDRESS" \
        -i "Subject: [localhost]$SUBJECT" \
        | msmtp -f "$FROM_ADDRESS" "$TO_ADDRESS"
getmail_mbox "$MAILBOX" <"$TEMPORARY_FILE"
rm "$TEMPORARY_FILE"

That script usually sits as a link named $HOME/bin/mailforward pointing to the real location of the file somewhere else in my home directory (where I keep it under version control), and my .forward file pipes all incoming local mail to it.

"|/home/user/bin/mailforward"

I enabled that but surprisingly I noticed I was receiving my local mail only at the local mailbox. SELinux just wasn’t in my mind, but it was preventing the local mail process in Postfix from executing the script, which resided in my $HOME/bin directory as mentioned before. I thought it was a problem with Postfix and wasted more than one hour trying to find out what could be wrong.

There are several possible solutions to the real problem. One is simply disabling SELinux. You can do that by editing /etc/selinux/config and changing SELINUX to the value “disabled”. A reboot is needed, as far as I know, and I don’t remember if running dracut to regenerate the initial RAM disk is needed too.

A second solution is to disable SELinux only for that Postfix process. I read you can do that but I didn’t investigate how to do it.

The third and more elegant solution is simply allowing that specific action that’s currently being denied. I know nothing about SELinux but it’s surprisingly easy to do and well documented. It turns out when SELinux denies permission to do something, it writes a line in /var/log/audit/audit.log describing that event in detail, and that log line is enough for a tool called “audit2allow” to generate a SELinux policy file that can be added to the system allowing the operation, hence the tool name.

Suppose you extract the specific log lines to a file named denied.log and you set on the name “postfixlocal” for the set of rules you want to create. You’d simply run the following.

audit2allow -m postfixlocal <denied.log >postfixlocal.te
semodule -i postfixlocal.pp

The first command creates the SELinux policy, in two forms. A textual form in postfixlocal.te and a “compiled” form in postfixlocal.pp. The second line copies postfixlocal.pp to a special directory and adds the rules in it to the current set of SELinux policies, making it a permanent change.

In my case, the contents of postfixlocal.te were something like this:

module postfixlocal 1.0;

require {
        type user_home_t;
        type home_bin_t;
        type postfix_local_t;
        class lnk_file read;
        class file { execute execute_no_trans };
}

#============= postfix_local_t ==============
allow postfix_local_t home_bin_t:lnk_file read;
allow postfix_local_t user_home_t:file { execute execute_no_trans };

After that, mail forwarding was working. If SELinux gives me more headaches, I’ll consider disabling it. I don’t think it’s really important for a workstation. But so far that’s the only problem I had with it, and it’s nice to have that extra protection for free. You can simply forget about it. For future weird problems, I’ll remember to check /var/log/audit/audit.log just in case.