Building out a small fleet of coding and research agents for LazerBunny means I regularly have requests to LLMs running, which are hosted on various hardware in my local network. This is not a big deal most of the time, but things get a little bit more interesting when multiple agents compete for the limited local resources. I spent some time this week to look into potential solutions.
I have worked on software for Apple devices on and off during my career. It even goes as far back as working on a desktop application, before things were called apps or the little glass slabs became our daily companion. Some of my highlights include having an app featured in the AppStore, working on an Apple Watch integration and having day one support (mostly, thanks review process) and some other fun stuff.
This week was highly unstructured in regards to working on Lazerbunny, to the point that you could have rolled a dice to figure out which task from the todo list I will pick up at any given day. The two most important ones were progress on the avatar and adding context compression to the coding agent.
I am a very big fan of self-hosting services I use on a daily basis or that hold my data. Some services are mostly a backup, some are in day to day production use for my company, some are to toy around. Whenever one of the big providers messes up, has a data breach or is bought and shuts down people advocate for self-hosting similar services, and you might think I’d be one of the,. Sadly it is not that easy.
One of the key components of any personal assistant is the ability to remember what you ask them to do and put it in the context of what happened before. So for this week I was mostly focused on building the memory layer of LazerBunny. A fun little exercise when trying to push search and retrieval with minimal resource usage.