I/O Event Loops
Using mio in Rust

Topics On
The Rust Programming Language
Edition 2018

Daniel Joseph Pezely

24 July 2019

Two years into using the Rust programming language as a primary language, time arrived for getting back to lower-level work of non-blocking input/output networking calls.

It had been a few years since using epoll() on Linux or kqueue()/kpoll() on FreeBSD or (then) MacOSX using C or Common Lisp, specifically SBCL, plus earlier uses of select() on Solaris and SunOS.

Here are a few nuggets of experience worth passing along to others from getting my sea-legs again on event loops using network socket TCP/IP streams for a code challenge.

The challenge was to spend less than a day implementing a minimal chat server with the hard constraint of using only one particular networking library beyond the included standard Rust library.

That one library is mio. Its name means “Metal I/O”, but versions before Rust 1.0-stable identified it as “Mini IO”.

Since mio is a thin wrapper around the OS event loop, it’s worth noting underlying behaviours in details that follow.

Contents:

Handle False Events

It’s important to know that there may be false events.

Calling mio::poll::register() will trigger such a false event. Therefore, the next event loop iteration may indicate that a stream is ready for read but might not have anything to read, thus triggering io::ErrorKind::WouldBlock.

Sockets Being Writable Before Readable

A new TCP stream– such as immediately after accepting from a listening socket– typically becomes writable before there’s anything to read.

This makes sense when considering the 3-way handshake of a TCP/IP connection. IP packets might still be traversing the physical medium of network cables at that particular nanosecond.

Avoid Cloning A TCP Stream

Duplication of socket streams (e.g., using .clone() in Rust) can trigger anomalous behaviour with epoll on Linux.

This gets called out by a contributor to mio itself in comments on past issues but not obvious in the primary documentation.{1} {2}

This requires special attention for Rust programmers.

While writing early iterations of code and wrestling with the borrow checker (borrowck) within the compiler, it not uncommon to simply clone a value, add a “FIXME” comment and move on.

This, however, should be avoided.

Questions arise as to whether the clone operation is a simple Rust memory management trick versus going all of the way into the network library stack and performing a dup(2) system call that duplicates the file descriptor. (There are a sufficient number of dependencies for mio itself that this is left as an exercise for the reader. Also consider those Rust dependencies for different operating systems than your own. Have fun!)

Array vs Vector For Buffer

Somewhat unique to Rust’s read() method of the Trait, std::io::Read:

Note that its buf parameter uses Array semantics– not Vector– even though you may supply a vector when calling it. (There’s a different method, read_to_end() that takes a mutable Vec but incompatible with non-blocking I/O.)

An Array has .len() for length which is comparable to a Vector’s .capacity(). This means that for a vector within an array context, the .len() method gets different semantics within the scope of read().

Therefore, if allocating a Vector at run-time (e.g., when having a run-time configurable buffer size rather than one that must be known at compile-time), it must be pre-populated such that the vector’s length matches its capacity.

Conventionally, fill it with zeros.

If experiencing WouldBlock on all reads, re-examine Rust’s documentation for std::io::read() carefully, where it notes, “2. The buffer specified was 0 bytes in length.” (Emphasis added.) When supplying a Vector as your buffer, it can be easy to conflate semantics of length with capacity, which leads to this error. Again, pre-filling the entire buffer with zeros solves the issue. Again, the main clue is that read() specifies &[u8] rather than Vec<u8>.

Read vs Write Events

A single event might indicate that a stream is ready for both writing and reading. Inside an application’s event loop, it’s important to test for both with two if conditions (rather than if-else).

The decision about whether to handle reads before writes seems arbitrary, but under Load & Capacity testing in much earlier versions of BSD and Linux kernels, reading first was preferred.

Under extremely high network loads, reading too infrequently leads to errors due to OS buffers getting filled, and the OS discarding packets. (Such errors can be tracked via netstat command on BSD Unix and Debian-based Linux including Ubuntu.{3} {4})

Load & Capacity

Load & Capacity test and measurement was beyond the scope of implementation for the coding challenge but should be performed as a matter of course for applications involving such event loops.

The rationale here is that you truly don’t know until it’s been measured.

There are simply too many variables in that mental model, and many change over time with the release of new OS kernels.

Conclusion

There are lots of caveats in Unix systems programming, and working with event loops for TCP/IP network socket stream seem particularly finicky.

Having benefited from a short write-up like this while I was learning the C equivalent many years ago, hopefully someone else might appreciate insights from this one.

Copyright © 2019 Daniel Joseph Pezely
May be licensed via Creative Commons Attribution.