firecracker

The Firecracker Jailer

Disclaimer

The jailer is a program designed to isolate the Firecracker process in order to enhance Firecracker’s security posture. It is meant to address the security needs of Firecracker only and is not intended to work with other binaries. Additionally, each jailer binary should be used with a statically linked Firecracker binary (with the default musl toolchain) of the same version. Experimental gnu builds are not supported.

Jailer Usage

The jailer is invoked in this manner:

jailer --id <id> \
       --exec-file <exec_file> \
       --uid <uid> \
       --gid <gid>
       [--parent-cgroup <relative_path>]
       [--cgroup-version <cgroup-version>]
       [--cgroup <cgroup>]
       [--chroot-base-dir <chroot_base>]
       [--netns <netns>]
       [--resource-limit <resource=value>]
       [--daemonize]
       [--new-pid-ns]
       [--...extra arguments for Firecracker]

Here is an example on how to set multiple resource limits using this argument:

  --resource-limit fsize=250000000 --resource-limit no-file=1024

Jailer Operation

After starting, the Jailer goes through the following operations:

Example Run and Notes

Let’s assume Firecracker is available as /usr/bin/firecracker, and the jailer can be found at /usr/bin/jailer. We pick the unique id 551e7604-e35c-42b3-b825-416853441234, and we choose to run on NUMA node 0 (in order to isolate the process in the 0th NUMA node we need to set cpuset.mems=0 and cpuset.cpus equals to the CPUs of that NUMA node), using uid 123, and gid 100. For this example, we are content with the default /srv/jailer chroot base dir.

We start by running:

/usr/bin/jailer --id 551e7604-e35c-42b3-b825-416853441234
--cgroup cpuset.mems=0 --cgroup cpuset.cpus=$(cat /sys/devices/system/node/node0/cpulist)
--exec-file /usr/bin/firecracker --uid 123 --gid 100 \
--netns /var/run/netns/my_netns --daemonize

After opening the file descriptors mentioned in the previous section, the jailer will create the following resources (and all their prerequisites, such as the path which contains them):

We are going to refer to /srv/jailer/firecracker/551e7604-e35c-42b3-b825-416853441234/root as <chroot_dir>.

Let’s also assume the, cpuset cgroups are mounted at /sys/fs/cgroup/cpuset. The jailer will create the following subfolder (which will inherit settings from the parent cgroup):

It’s worth noting that, whenever a folder already exists, nothing will be done, and we move on to the next directory that needs to be created. This should only happen for the common firecracker subfolder (but, as for creating the chroot path before, we do not issue an error if folders directly associated with the supposedly unique id already exist).

The jailer then writes the current pid to /sys/fs/cgroup/cpuset/firecracker/551e7604-e35c-42b3-b825-416853441234/tasks, It also writes 0 to /sys/fs/cgroup/cpuset/firecracker/551e7604-e35c-42b3-b825-416853441234/cpuset.mems, And the corresponding CPUs to /sys/fs/cgroup/cpuset/firecracker/551e7604-e35c-42b3-b825-416853441234/cpuset.cpus.

Since the --netns parameter is specified in our example, the jailer opens /var/run/netns/my_netns to get a file descriptor fd, uses setns(fd, CLONE_NEWNET) to join the associated network namespace, and then closes fd.

The --daemonize flag is also present, so the jailers opens /dev/null as RW and keeps the associate file descriptor as dev_null_fd (we do this before going inside the jail), to be used later.

Build the chroot jail. First, the jailer uses unshare() to enter a new mount namespace, and changes the propagation of all mount points in the new namespace to private using mount(NULL, “/”, NULL, MS_PRIVATE | MS_REC, NULL), as a prerequisite to pivot_root(). Another required operation is to bind mount <chroot_dir> on top of itself using mount(<chroot_dir>, <chroot_dir>, NULL, MS_BIND, NULL). At this point, the jailer creates the folder <chroot_dir>/old_root, changes the current directory to <chroot_dir>, and calls syscall(SYS_pivot_root, “.”, “old_root”). The final steps of building the jail are unmounting old_root using umount2(“old_root”, MNT_DETACH), deleting old_root with rmdir, and finally calling chroot(“.”) for good measure. From now, the process is jailed in <chroot_dir>.

Create the special file /dev/net/tun, using mknod(“/dev/net/tun”, S_IFCHR | S_IRUSR | S_IWUSR, makedev(10, 200)), and then call chown(“/dev/net/tun”, 123, 100), so Firecracker can use it after dropping privileges. This is required to use multiple TAP interfaces when running jailed. Do the same for /dev/kvm.

Change ownership of <chroot_dir> to uid:gid so that Firecracker can create its API socket there.

Since the --daemonize flag is present, call setsid() to join a new session, a new process group, and to detach from the controlling terminal. Then, redirect standard file descriptors to /dev/null by calling dup2(dev_null_fd, STDIN), dup2(dev_null_fd, STDOUT), and dup2(dev_null_fd, STDERR). Close dev_null_fd, because it is no longer necessary.

Finally, the jailer switches the uid to 123, and gid to 100, and execs

./firecracker \
  --id="551e7604-e35c-42b3-b825-416853441234" \
  --start-time-us=<opaque> \
  --start-time-cpu-us=<opaque>

Now firecracker creates the socket at /srv/jailer/firecracker/551e7604-e35c-42b3-b825-416853441234/root/<api-sock> to interact with the VM.

Note: default value for <api-sock> is /run/firecracker.socket.

Observations

Caveats