Local LLMs & ChatGPT Pro

OpenAI announced ChatGPT Pro for $200 a month. This is a lot of money for a hallucination machine built on top of copyright infringement and I believe this is a price point not many - besides enthusiasts - are willing to pay. Let us assume this is the price OpenAI calculated they need to charge to offer the service in a sustainable way. This might be the first market check for AI companies who do not just want to burn money. And I think it will be a tough one.

Now, $200 also puts the price for GPUs in a better context if you’re considering a local LLM you can have all to yourself. While a modern system surely helps, you can easily build something competent enough for around $1000. Which means within five months (plus a bit for electricity) you are at a break even point. This is obviously napkin math, so take it with a grain of salt. And you can do a lot more with your personal hallucination machine. A nice added bonus is that it is not sending your data to companies with questionable ethics.

Getting your hands on a cheap GPU that works "good enough" for a local LLM is not that easy. Especially when focusing on the word cheap. I got a Nvidia RTX 3060 12GB memory. 12GB is sufficient for smaller models that perform well. Still as a general rule you can say more memory is better (with some exceptions). qwen2.5-coder is running extremely well on this card. I don't think I ran into any issues with 8-13b models. 22b is the point where things get slow. And I got the for about 320 euro a few month ago, so pretty good deal.

Beware when buying this card, the TI only comes with 8GB and the 8GB non-ti has a different memory interface width, you want the 12GB non TI. There is also the 4060 TI with 16GB but memory seems to be slower. I haven't spent too much time looking at benchmarks, but if you are deciding to get a GPU you might want to do some research first.

Intel is about to release their new B series ARC GPU. So far ARC has been a mixed bag. Except for AV1 encoding for streamers. But they really stepped up their driver game lately and the new cards also seem promising. I would still hold out for a bit. ARC drivers were known to introduce some issues running LLMs. Making hallucinations worse really is not something you need when you spend money explicitly for this use case.

Something at this price point with freely available models will not be able to compete with o1. But the question everyone will have to answer is if the price difference between your own box and OpenAIs offering is actually worth the money.

posted on Dec. 6, 2024, 10:02 p.m. in AI

This entry was posted as a "note" and did not undergo the same editing and review as regular posts.