Epistemic Status: seedling Sprouting


While doing some reading on the NixOS Discourse forums to try to learn more about how the NixOS module system works, I came across a post comparing it to expert systems.

NixOS modules is one of the largest world expert systems, with knowledge acquired from dozens of sysadmins and devops.

I don’t know much about expert systems and I think based on the descriptions, NixOS modules are missing one of the key features, an inference engine that can generate new “facts” (though maybe that’s just what modules are). But I’ve been thinking about what the implications of that insight might be, and thinking about how I’ve been using the module system personally since I finally understand the basic concept after years of using NixOS (a case of the knowledge hiding in plain sight).

Nix’s flexibility might be its biggest weakness

One of the things I’ve noticed about Nix and how it’s used is that it provides enough flexibility that everyone sorta does their own thing with it. Aside from the modules provided by Nixpkgs, it’s more difficult to standardize on a set of modules.

They also come in varying levels of maturity. To quote one of the replies to that thread on Discourse:

My intuition is that there are different levels of integrations for the services.*services.* subtree. In my mental model, the scale goes roughly like this:

  1. Trivial - Only an “enable” option, possibly a “configFile” option if applicable, otherwise rely on whatever is in /etc/xxx/config/etc/xxx/config)
  2. Minimal - Some basic options, at least one of extraConfigextraConfig, or configFileconfigFile)1
  3. Mixed - Several config options covering “most” use cases, the remaining in extraConfigextraConfig
  4. Complete - The service can be fully configured in Nix options, without relying on an extraConfigextraConfig argument, think bluetooth)
  5. Advanced - The service is fully configurable in nix, and options are type-checked, and tested with config testers. Currently bluetooth does not prevent the insertion of mis-typed keys. Think knot-config-checked where the check is done with the right version of the package as configured in NixOS.

I think this is important but it might be missing an important role that well written modules can play in distilling some of that “knowledge acquired from dozens of sysadmins and devops”. Something that might make it much easier for the average contributor to write modules higher up that scale. This might be best explained with an example.

Securing systemd services

One of the things I’ve been spending way too much time on in my spare time is writing a module for running a modded Minecraft server. There are several things that make that less than straightforward, but that’s a story for another day.

For any service, I always want to limit its capabilities as much as possible to minimize the security risk in the event that it’s compromised. Systemd has a lot of settings around this, and it’s not always clear what can and can’t be restricted. It does provide a tool to assess a service’s security from the perspective of what it’s allowed to do, using systemd-analyze security [service]systemd-analyze security [service]. This basically just lists the capabilities, but doesn’t really help with understanding how they interact with each other or which one might be blocking a service from doing a particular thing.

I certainly don’t know all the little details of this, and it’s hard to figure out going from a highly restricted configuration which settings need to change to allow the service to do its regular functions. I’ve spent a lot of time trying to figure out why a service can’t do something I expect it to be able to do given the settings I’ve given it, so it’s not an easy task.

The options that NixOS currently provides are fairly open, and that makes sense for the general use-case of “I need to configure a systemd service for this thing in the context of this module I’m writing” and starting from a blank slate makes sense. But that’s a very low level mechanism. At the other end of the spectrum are things like services.<name>.enableservices.<name>.enable, where you can turn on a service, and it sets some reasonable defaults (or at least it should, if well written).

What I think is missing is a middle layer. A set of modules for creating services that handle a lot of this in a consistent way, maybe something based on predefined capabilities. This would be the manifestation of that sysadmin knowledge into a form that others can easily reuse, and also can be easily documented2.

A sketch of an idea

Let’s imagine we already have this middle layer. What might it look like to use it?3 We could imagine we’re setting up a new service, and it needs a few things. Maybe it’s a web server, so it needs to be able to bind a reserved port, have access to the internet and so on. That definition could look something like this:

# This is just an example. With more thought a better approach 
# might become apparent.
newServices.webserver = {
  separateUser = true;
  networking = {
    reservePorts = [ 80 433 ];
  };
  filesystem = {
    readWritePaths = [ "/var/lib/www" ];
  };
};
# This is just an example. With more thought a better approach 
# might become apparent.
newServices.webserver = {
  separateUser = true;
  networking = {
    reservePorts = [ 80 433 ];
  };
  filesystem = {
    readWritePaths = [ "/var/lib/www" ];
  };
};

Then, the newServicesnewServices module would take that and under the hood create a systemd.servicesystemd.service starting from the most restrictive settings, and open up just the specific ones needed for this service, each option codifying the changes needed to open up that specific capability.

In the example above setting network.reservePortsnetwork.reservePorts (could be called bindPortsbindPorts?) would ensure that the CapabilityBoundingSet includes the ones needed for network access, those being restricted ports, it would know that the service needs enough access to bind them and then drop that privilege (I forget how that’s done, one more reason to have something like this), and so on. Even down to the system call level. Services should be as limited and sand-boxed as possible for them to get their work done and nothing more. This is really just looking to build a capability system, using the tools at hand. It’s certainly not going to be as robust as a true capability system, but it’s a start.

Expanding beyond the services example

What I’m suggesting here is using the NixOS module system to codify best practices in ways that are easier to build on for those of us who might not know all the details of what needs to be done to make it that way. This could start as a library of building blocks that target specific areas.

It also provides a way to document these practices and allow others to learn from that experience being shared. I’ve learned a lot about Linux system configuration and management through NixOS and this sort of codified expert knowledge would help others as well.

This ties into a larger train of thought I’ve had around how I’ve started to view code as memory. The knowledge base that is NixOS modules can be viewed as a shared form of that.

Footnotes

  1. These days, the suggested approach is to try to provide a settingssettings option if the settings file can be represented in a format that Nix can easily represent.

  2. There is the question of how do we verify that these solutions are actually good, and not just promoting bad ideas, but that seems like something that would be solved by community involvement and things like code reviews.

  3. I find it useful to think of API design in this way. How do I want the user of the API to work with it, and then figure out how to make it happen.