Thanks for the memories
One of the key components of any personal assistant is the ability to remember what you ask them to do and put it in the context of what happened before. So for this week I was mostly focused on building the memory layer of LazerBunny. A fun little exercise when trying to push search and retrieval with minimal resource usage.
First of all I want a cookie point for not titling the blogpost "thnks fr th mmrs".
Now, most people who spend any time with LLMs might have seen the term RAG - Retrieval Augmented Generation. Long story short: Store a bunch of data, retrieve it and add it to your LLMs context. What it is often confused with is the usage of a vector based datastore, with vectors generated with embedding models. You can do this and it will perform well. But there is nothing in the actual concept that requires any of this. A single, big text file can be used for RAG just as well as ChromaDB.
Having had my fair share of adventures with search, databases, vectorization and screaming at an Elastic Search cluster at 3am why the stupid node would randomly turn red, fall over, nuke the cluster but still pretend to be healthy when checking on the node itself, I have opinions and ideas how to do this.
There are two ways how you would usually search for relevant data in a large dataset: keywords or semantic. Keywords are useful for exact matches such as shell commands, function names or error codes. Vector search is more useful for returning results based on the intent of the question, such as "how do I do x?". When you start mixing these two you end up with what is usually referred to as Reciprocal Rank Fusion (RRF).
Now that goes easy off the tongue. It is a very fancy way of saying:
- Find the top X results via keywords search
- Find the top X results via vector search
- Merge based on a mathematical formula (no, we won't go deep into the math and discuss why 60 is the best magic number for the constant in the formula)
For keyword search things are pretty easy, we use BM25 like any self respecting engineer too lazy to do an in depth comparison of other options for a well solved and understood problem. From past experience it is also most likely not worth it to look at other solutions, except in very specific edge cases.
SQLite vs Postgres
The age old battle of the only two databases I will never complain about using. To be fair I took a brief look at meilisearch, which looks like a pretty neat replacement for Elastic Search, but that was mostly because I was curious and did not have a chance to build anything with it.
Between my two favorites both would have worked, just with a different set of constraints. I have built a small proof of concept with both. The data I will be dealing with is not outlandishly big, maybe a few 10GBs total over a long time. CPU and memory resources are available and latency is not the biggest concern. So it came down to what is easier to use.
I had alluded to dropping PydanticAI and that happened. Full rewrite in Go of the "brain" and I regret nothing. I based it on the same foundation as the coding agent, which gave me a solid enough starting point to iterate quickly.
As the major differences between the two system do not matter too much I started with SQLite. SQLite has FTS5 support built-in and there is the sqlite-vec extension for vector search. It works. I can say that much. But it was a bit clunky to work with. And I had the good old "which SQLite driver to use with Go" problem. Especially when you throw in extension management. Two other things I noticed was that I had a lot of locking via mutexes (there are a good number of goroutines doing things already) and as I threw more and more data in it felt like the lack of HNSW became noticeable.
Postgres on the other hand was "same old, same old". Throw in pgvector and vector search is solved. For BM25 I actually spent a second looking at pg_search and pg_textsearch. That was a bit of a tough choice. pg_search was a bit more resource hungry but also has a ton of features like fuzzy and facet support. pg_textsearch is noticeably faster but also lacks term position support. The tradeoffs of pg_textsearch seem to be nicely mitigated pairing it with vector search, so that is what I went with for now.
Embedding models
Now this is a fun one. Depending on your needs you might want to spend some time evaluating different models or maybe even mix and match. My goto so far always was nomic-embed. Never the best, but always good enough. And if I would ever want to optimize for storage space I could easily truncate the vectors to less than half the size.
BGE-M3 is doing a really good job when I mix English and German documents, like when I’m letting it run on my email database. I am not sure it is worth the additional resource overhead, I do not even know if I want all my emails pulled into the datastore at this point. If I was building a system without resource limitation I would most likely go with it.
Jina Code v2 is the one specialized model I might start using at some point, if I ever pull in large bodies of technical documentation or codebases. What to keep in mind is that mixing models means also running a vectorization of the search query with the matching model. So things can get a bit messy with two embedding models in the same project.
Progress
Right now there is an ingest process running to get some data into my Postgres database. The coding agent is humming along trying to build a web interface based on my specification. Endirillias body is slowly starting to look like, well, kind of a body. Still got the tutorial introduced Disney vibes. I will most likely focus a bit more on Blender next week. The last two weeks were full of fun tech, time to get a bit more traction behind the avatar as things are coming together.
posted on April 5, 2026, 6:51 p.m. in AI, lazerbunny