It's a bad idea to put a coding agent in a box. At least if you're developing embedded software and it runs on an external target. Let me tell you why!

It all started because I wanted to try letting GitHub Copilot iterate on a problem totally autonomously. Let it run whatever shell commands it wanted then come back a half an hour later and see what it came up with. However, seeing as the best analogy for coding agents now is a 12-year-old who is just a bit too smart for his own good rather than a superintelligence bent on world domination, it is a bad idea to give the agent unrestricted access to the shell on your personal machine. It might remove some important files or exfiltrate your data if it looks at the wrong webpage. Just as a 12-year-old might give your credit card information to a website telling him he just won a FREE IPHONE!! Come to think of it, the superintelligence bent on world domination would also be a bad candidate for permission to run arbitrary commands on your personal machine for other reasons. But obviously, I digress.

The method for sandboxing that I reach for first these days is Linux namespaces. So I set up a container using bwrap with all the usual mounts and poked around with bash to see if the environment was suitable for development.

The first problem I saw was that even though I had mapped the user and group ids in the container to my own uid and gid, all of my supplementary groups were not passed to the container. And since I needed serial ports which were accessible to the dialout group in order to test my software, this was a problem. And it turned out to be intractable due to the semantics of the clone syscall which is used to set up namespaces.

So that already meant that I would have to expose the ports to the container via a program running outside the container. Particularly, I wound up using a chain of socats for my experiments. They went from a serial port in the root namespace to a Unix socket between the namespaces to a PTY in the container namespace. This had to be managed by a server, as the ports would appear and disappear on every flash. All that to pass through the power of the dialout group to the container.

But then I looked at actually flashing the board. And I realized that the current flash method (copy UF2 to a temporary mounted filesystem) wouldn't work in the container without giving the container access to every filesystem in /media/$USER. Which would be a bit of a security risk. So would I have the server also do flashing?

On top of that, the procedure for collecting crash logs is very sensitive to timing, and I realized that doing it after waiting for the socat chain to come up could be problematic.

Overall, it became clear that the container sandbox for agents is inelegant and overly particular for embedded software projects. Would you want to go through this process of identifying every required resource for every embedded project you work on?

The main problem is that Linux namespaces just don't encapsulate devices very well. There is no way to say "have everything connected to this USB hub be in namespace X rather than the root namespace." I suspect that containers work a lot better for software that runs exclusively on the host PC.

One final snag was that the project required the Zephyr SDK, which when installed in the container was placed in the home directory which was a tmpfs. So I unless I wanted to redownload that massive tree over and over, I needed to think about persisting some parts of the temporary container hierarchy. Ick ick ick.

You know what gets around all of these issues? The Unix user system. Just make a user for the agent and all of these problems are much easier to solve. Device access, persisting the SDK, providing the agent supplementary groups.

So for embedded projects, this is what I recommend.

P.S. I see something really interesting here, you could have an agent always running on the other user and polling /var/requests/ or whatever and then put deliverables (reports, patches, artifacts) in /var/deliverables. It could copy in the files it needs to work from your home directory into its. You could even have a header in the descriptions in e.g. /var/requests/req1.md the API budget. Then you get the results in /var/deliverables/req1 or a report on why it couldn't do the task within budget. Someone will make something like this in the next 6 months.

P.P.S There is no Linux namespace for devices, but virtual machines have passthrough mechanisms. Maybe you can make a nicer sandbox with that.

Please send comments to blogger-jack@pearson.onl.