How over-engineered do you want your blog to be? Yes.

It has been some time since I last published an article, and this time there is actually a good reason for it. I was busy over-engineering my blogging platform and the migration cost for every single added post would have added to the delay. So I did the most sensible thing and wrote around twelve drafts while working on POCs and finally the foundation for the new platform.

While I really enjoyed my Django based blog, it was also a bit annoying for all the reasons any Python project gets annoying at some point. Django Admin is still one of the best automatic admin interface generators I am aware of, but using it as primary content editor and review tool without further customization is less than pleasant. What does not help is that I strongly prefer BBEdit and iA Writer for - you guessed it - writing.

I still prefer plaintext files over a database, even if the database is SQLite. While static site generators provide a certain peace of mind, they are a bit too inflexible for my taste and I would like to integrate features such as webmentions some day. Maybe aggregating comments across the web? Or suggesting related posts based on recent traffic or referrer count? I am sure I will find a few neat features to build. Also there has to be a two click maximum to publish a new post or I will consider it too bothersome at some point and hate the process - which does not help motivating yourself to publish more.

There are some ground rules I set for whatever I will end up with which guided me through the POC phase. (As I was not sure which direction I actually wanted to take and wanted to play around with a few libraries and deployment strategies anyway I did not exactly plan this project, nor did I care about finishing it in a timely manner.)

  1. whatever I do it has to work for me and no one else
  2. hard-coding everything that does not have to be dynamic is fine (as this one is solely for me)
  3. as resilient to any potential error as possible
  4. written in Golang without any dependencies
  5. publishing an article with at most two clicks, preferably one

That being said, the process went somewhat like this:

Considering I ended up choosing to use Golang it was pretty much given whatever I do will end up being a single binary to deploy to the server and run my blog. But how to get the content there?

I considered syncing a folder with markdown files using git or some file sharing solution such as NextCloud. This would have been pretty lightweight, but parsing every single file for every request felt wasteful. Especially when implementing something such as a tagging system. But what if I read the files only once and operate on an in-memory data structure? Well, I would have to somehow get the content there and trigger a reload functionality.

So, what would be the easiest way to be able to review content before publishing and automatically deploying it to a server? Considering I already run a local Gitea instance and DroneCI, the choice to manage content in Git and let the CI do the deployment when merging into main seemed pretty straight forward. Next step would be letting the CI trigger the reload. Not too bad.

But how can I guarantee I did not mess up the format of my markdown file? Try parsing the whole filesystem and throw an error if loading fails? Easy. But how would I get the error to me? Tail a log? API endpoint? Pushover? Sigh this is getting complicated. I would prefer to verify all data before deploying to the server.

Extracting the parser to a library and throwing a binary to verify all data on the CI worked. But let us be honest, this is far from elegant, requires me to maintain two binaries and a library and not forget about updating both at the same time. I really did not like this approach.

Deploying content this way would also mean I need a second deployment script for the actual application running the blog. This is getting a bit unwieldy.

So why not bundle the markdown files with the app server and only have one binary to deploy, containing everything running the blog? This was actually a question I spent most of my time on answering before building anything. I ended up giving it a try and it turned out to meet all requirements and also got some nice little gimmicks.

When pushing to the main branch of my git repository a few things happen on the CI:

  1. tests for the application run
  2. two binaries are built embedding the content directory - one for production, one for debugging (more on that later)
  3. the production binary is executed with the -verify flag which reads all data and makes sure a few paths such as index and feed return the expected data
  4. the production binary is executed with the -compare flag which fetches some meta data such as number of posts, last build and a bit more from the application currently serving screamingatmyscreen.com and compares it to its own data
  5. the production binary is copied to the server and the user systemd service is restarted

If any of the steps after build fail, the production and debug binaries are uploaded to a local Minio share for debugging if necessary.

-verify and -compare actually uncovered two issues with the way data was exported from my old blog while initially writing the ingestions layer and parser. And I have fairly high trust in those two commands to let me know if anything seems to be obviously wrong with the state of the blog.

Let us talk about embedding content for a bit. First of all it is fast. And I mean really fast. I still only read and parse all content once during startup and keep data in memory. From a performance perspective the difference between iterating over all content for requests (such as a tag list), and having all tags in a slice in memory is hardly noticeable. But there is a difference, so why not optimize it anyway? Isn‘t this the fun part of side projects?

Depending on how much data I would add there would be a startup penalty at some point. I estimate to reach this point in roughly 21 thousand years, based on my current blogging schedule.

-verify in itself is not anything special. If you squint a bit you might actually confuse the code with integration tests. It is reassuring to have a deployment artifact which can actually verify it will not horribly blow up by running some tests on its bundled data. Also I would be lying if I would say I did not take some inspiration from other engineering fields.

-compare on the other hand is a smoke test set up to never fail the CI pipeline. The test itself can fail. But it will not have any impact. I mostly compare the number of posts, tags and hashes of post contents to get a list of what exactly changed to the currently deployed application. If five tags are missing, 12 posts have a changed checksum and index did not regenerate when I publish a new post, I know something went wrong without opening a browser.

I am planning to publish the code at some point, but I first want to make sure it is „done“. I took shortcuts.

The one I despise the most is having used echo. Not because the framework itself is bad, but it is unnecessary for what I do and actually complicated a few things. This was purely a comfort pick because I knew if I want to implement some more functionality than serving rendered content echo will be useful and not get in my way. But it will go at some point soon.

Next on the list is blackfriday. When I exported my posts I simply dumped the markdown representation into a file. This is how I wrote nearly all of my posts, beside a few when I used WordPress. I want to explore the option to write my posts in HTML. I only need a few tags such as p, li, a, img and code for nearly all my posts and you can omit closing tags for the ones used most often. The overhead of writing HTML is not really that bad. At the same time remembering how to write a blockquote and code block for the specific markdown spec the parser implements is annoying. I am not sure if this will be a good idea and I will most likely start with writing a few posts this way while keeping the markdown library as is, but if it works out well I might simply remove it.

Last but not least gorilla/feeds. Generating an RSS feed is not that hard, the spec is pretty light and quick to implement. And the Gorilla Toolkit seems to have some maintainer issues, making it a good candidate to cause problems down the line.

Getting rid of these three dependencies would mean I only need Golangs standard library to build my blog. And Golang is doing a pretty good job keeping things compatible and improving them, without requiring me to gently kick the bits and bytes of a project every other month to get it to play nicely with a new version.

A reduced version of .drone.yml (I removed some setup code and steps gluing things together) to outline the steps looks simple. But believe me when I say the lack of in depth knowledge of Drone and the various base images caused me a few headaches.

kind: pipeline
name: default

steps:
  - name: test
    image: golang
    commands:
      - go test
  - name: build
    image: golang
    commands:
      - make production
      - cp build/sams /build
      - make debug
      - cp build/sams-debug /build
    volumes:
      - name: build
        path: /build
  - name: verify
    image: golang
    commands:
      - /build/sams verify
    volumes:
      - name: build
        path: /build
  - name: compare
    image: golang
    failure: ignore
    commands:
      - /build/sams compare
    volumes:
      - name: build
        path: /build
    when:
      branch:
        - main
  - name: deploy
    image: alpine:latest
    environment:
      SSH_KEY:
        from_secret: SSH_KEY
      SSH_HOST_KEY:
        from_secret: SSH_HOST_KEY
      PRODUCTION_IP:
        from_secret: PRODUCTION_IP
    commands:
      - apk --no-cache add openssh-client bash
      - bash scripts/deploy.sh $PRODUCTION_IP /build
    volumes:
      - name: build
        path: /build
    when:
      branch:
        - main
      event:
        - push

volumes:
  - name: build
    temp: {}

verify and compare decide to fail when being ran on alpine:latest - /build/sams cannot be found. Might be shell related, I did not really debug this yet as the golang image is already cached anyway.

deploy is doing some stunts setting up SSH keys and adding a host key to known_hosts. This is the other part I want to debug at some point, I am fairly certain this should not be needed to the extent some examples suggested.

The deploy script on the other hand is as simple as it looks, and most likely shows that I did not write shell scripts in a very, very long time.

#!/usr/bin/env bash

set -e

if [ -f "$2/sams" ]
then
  echo "binary found"
else
  echo "binary does not exist"
  exit 1
fi

ssh sams@$1 "rm ~/sams"
scp $2/sams sams@$1:~/sams
ssh sams@$1 "systemctl --user restart sams"

echo "deployment complete"

I might end up skipping a few of the SSH related setup and using the UserKnownHostsFile option. Passing the IP and path to the binary to the script means I can build and deploy on any system I have connected to my VPN, as long as the SSH key is accessible. Did I mention I do not trust computers? This also is true for the host running Drone.

The systemd file living in ~/.config/systemd/user/sams.service is only a few lines long.

[Unit]
Description=same
After=network.target

[Service]
Type=simple
ExecStart=/home/sams/sams

[Install]
WantedBy=default.target

The only gotcha here was setting WantedBy to default.target. A quick loginctl enable-linger sams as root and systemctl —user {enable|start} sams as user sams later all is good to go.

I am aware that this seems overly complicated for a blog. I actually agree. But it was a nice exercise to see if this deployment strategy actually works as expected. I got some plans for the near future where this will come in handy.

But overall I am pretty happy with the current implementation. It checks all the boxes, it is fast, easy to port to new hosts if needed and so far did not fail in some unexpected way. The day will surely come, but hopefully the whole system already grew on me at this point, so it will not be an excuse to spend even more time on it.

>> posted on July 21, 2022 in #life #software-engineering