Full text search with Django and SQLite
One thing I wanted to add to this blog for a long time was full text search. Content is steadily increasing and sometimes I have a hard time finding the exact right post to link to when writing a new post. My wife was also complaining that she could never find the post about backups for artists when she wanted to link it. So what a better way to spend a few hours of my vacation than to add full text search functionality.
Usually when adding search to a system people eye ElasticSearch. A monstrosity I had my fair share of "fun" with and something I do not want to touch if I can avoid it. Avid Postgres users might lean on PostgreSQLs excellent full text search and vectorisation options. This blog is running on Django and SQLite and there is no chance I would add either of these two to the stack for "just a blog".
SQLite has an extension for full text search. Technically multiple, but we are using FTS5. Per documentation we need a virtual table and fill it with our data. For the table we add the title of the post and its raw content. A row ID
is automatically generated for you and we will use this as a back reference to our post IDs. (Please do not ask me why name
and not title
. There are good(tm) reasons)
create virtual table search USING fts5(name, content);
The big question is how to get data into the table. We could create an unmanaged model, setup post_save
receivers and let Django do its magic. Or we create a few triggers for insert, update and delete in SQLite. I am usually not a fan of this approach, especially with larger codebases and / or teams. But there is a fairly good chance no one but me will ever touch this code.
CREATE TRIGGER post_insert
AFTER INSERT ON blog_post BEGIN
INSERT INTO search (rowid, name, content)
VALUES (new.id, new.name, new.content);
END;
And we obviously want to backfill all existing content.
INSERT INTO search (rowid, name, content) SELECT id, name, content FROM blog_post;
Since rowid
is automatically generated we re-use it to set it to the content tables ID. It feels like adding a separate field for that might be cleaner, but I also have not found any suggestion that manually managing row IDs is a bad idea. So here we are, being super efficient, not storing 300 additional integers!
Creating the table and setting up the triggers happens in an empty database migration.
Querying the search index in Django is fairly easy thanks to Django exposing a database connection to run raw SQL queries with. But we are using a raw database connection, so the general guidance and warnings about using params
, etc. to prevent SQL injections apply.
cursor.execute("SELECT rowid, rank FROM search WHERE search MATCH %s ORDER BY rank", (query,))
The query returns the rowid
which is the ID
of our post and a rank. The data is already ordered, no need to further sort it. With Case
and When
we preserve the order when fetching posts by ID from the database.
ids = [doc[0] for doc in result]
ordering = []
for idx, val in enumerate(ids):
ordering.append(When(id=val, then=Value(idx)))
return self.published().filter(id__in=ids).order_by(Case(*ordering, output_field=IntegerField()))
I uploaded the model manager and migrations, which will likely make it a bit clearer how things are set up. The triggers need to be created in a separate migration from the one creating the virtual table. I am not yet sure why this is the case, but this was the only way to reliably create the triggers. There is a try / catch blog in the model manager that will make a lot more sense in a second.
Limitations
I think this search works well enough for my blog. But there are some noticeable limitations, especially when you are used to services requiring a distributed cluster to figure out of there is a document with the name "foo".
SQLites FTS really does not like any special characters like >
. This is fine. Also do not take this as an invitation to hammer this server with some automated sql injection tools to verify it is actually working as expected. Thanks in advance.
sqlite3.OperationalError: fts5: syntax error near ">"
There are ways to make this work by wrapping >
in quotation marks ">"
but this means FTS5 will query for an exact match for the string.
The second thing that will be obvious very quickly when misspelling a word is that there is no fuzzy search. backpu
will not bring up the article about backup strategies. Neither will back
. Word matches only.
There is also no custom weighting of title matches vs content matches.
Improving Search
One option is to use a different tokeniser which will make search behave more like traditional search you know. There is a porter
tokeniser implementing the Porter stemming algorithm which will make sure running
matches run
. There is also a trigram
tokeniser indexing three character sequences for partial matching. I still want to run some experiments to see if it is worth using either of them. For some content and use cases it will absolutely be worth it, I cannot imagine building search for e-commerce with them.
Tokenisers support some more options including unicode vs ascii tokenisers and handling diacritics.
Weighting and ranking should be solvable using SQLites bm25()
method. I am saying should be solvable because a quick test on my data did not influence the rank at all, even if the content got a 10x boost over the title. There is a fairly good chance this is me messing up, but I do not think this will be a big enough difference to not rollout search v1.
The real fun begins when we install Python with a compile time option to load extensions in SQLite and compile the spellfix1 SQLite extension. This should help with correcting misspelled words and should work with FTS5. I did not spend enough time playing with spellfix, but from its description I think it is doing what you would expect when you ever hand rolled fuzzy search.
I am curious if spellfix will be worth the effort or if it will be easier to query the FTS5 vocabulary table and calculate the Levenshtein distance in memory considering the fairly small dataset of this blog.
Good enough
Compared to an ElasticSearch cluster, PostgreSQL and all the other fancy search engines out there this seems very rudimentary and more like a toy. But for this blog it is more than good enough. The fact that something exists is already a huge improvement.
But do not let this simple example make you believe this is all FTS5 is capable of. It is worth quickly scanning the FTS5 documentation to get an idea what is possible, because there is a lot in there including some fun stuff like custom tokenisers, external content references and contentless tables to highlight and snippet functions that could come in very handy in certain situations.
I already toyed around with some of the tokenisers and spellfix, but I will need a bit more time for testing and to actually compare them to ElasticSearch to get an idea how good they actually work and what the pitfalls are. I will likely iterate on search every now and then and see what is feasible to do in SQLite. So far I am more than pleasantly surprised and I already was a fan of SQLite before.
posted on Sept. 15, 2025, 4:46 p.m. in django, software engineering