It's a bad idea to put a coding agent in a box. At least if you're developing embedded software and it runs on an external target. Let me tell you why!
It all started because I wanted to try letting GitHub Copilot iterate on a problem totally autonomously. Let it run whatever shell commands it wanted then come back a half an hour later and see what it came up with. However, seeing as the best analogy for coding agents now is a 12-year-old who is just a bit too smart for his own good rather than a superintelligence bent on world domination, it is a bad idea to give the agent unrestricted access to the shell on your personal machine. It might remove some important files or exfiltrate your data if it looks at the wrong webpage. Just as a 12-year-old might give your credit card information to a website telling him he just one a FREE IPHONE!! Come to think of it, the superintelligence bent on world domination would also be a bad candidate for permission to run arbitrary commands on your personal machine for other reasons. But obviously, I digress.
The method for sandboxing that I reach for first these days is Linux
namespaces. So I set up a container using bwrap with all the usual mounts
and poked around with bash to see if the environment was suitable for
development.
The first problem I saw was that even though I had mapped the user and group
ids in the container to my own uid and gid, all of my supplementary groups
were not passed to the container. And since I needed serial ports which were
accessible to the dialout group in order to test my software, this was a
problem. And it turned out to be intractable due to the semantics of the
clone syscall which is used to set up namespaces.
So that already meant that I would have to expose the ports to the container
via a program running outside the container. Particularly, I wound up using a
chain of socats for my experiments. They went from a serial port in the root
namespace to a Unix socket between the namespaces to a PTY in the container
namespace. This had to be managed by a server, as the ports would appear and
disappear on every flash. All that to pass through the power of the dialout
group to the container.
But then I looked at actually flashing the board. And I realized that the
current flash method (copy UF2 to a temporary mounted filesystem) wouldn't
work in the container without giving the container access to every filesystem
in /media/$USER. Which would be a bit of a security risk. So would I have
the server also do flashing?
On top of that, the procedure for collecting crash logs is very sensitive to
timing, and I realized that doing it after waiting for the socat chain to
come up could be problematic.
Overall, it became clear that the container sandbox for agents is inelegant and overly particular for embedded software projects. Would you want to go through this process of identifying every required resource for every embedded project you work on?
The main problem is that Linux namespaces just don't encapsulate devices very well. There is no way to say "have everything connected to this USB hub be in namespace X rather than the root namespace." I suspect that containers work a lot better for software that runs exclusively on the host PC.
One final snag was that the project required the Zephyr SDK, which when installed in the container was placed in the home directory which was a tmpfs. So I unless I wanted to redownload that massive tree over and over, I needed to think about persisting some parts of the temporary container hierarchy. Ick ick ick.
You know what gets around all of these issues? The Unix user system. Just make a user for the agent and all of these problems are much easier to solve. Device access, persisting the SDK, providing the agent supplementary groups.
So for embedded projects, this is what I recommend.
P.S. I see something really interesting here, you could have an agent always
running on the other user and polling /var/requests/ or whatever and then
put deliverables (reports, patches, artifacts) in /var/deliverables. It
could copy in the files it needs to work from your home directory into its.
You could even have a header in the descriptions in e.g.
/var/requests/req1.md the API budget. Then you get the results in
/var/deliverables/req1 or a report on why it couldn't do the task within
budget. I guarantee someone will make something like this in the next 6
months.
Please send comments to blogger-jack@pearson.onl.