Souring a bit on local models

Maybe renting aint so bad

Posted on 2026-04-14 in AI, Software, reflections • 887 words • 5 minute read

Tags: Programming

I've been playing around with """AI""" stuff for a while now, and I've been pretty impressed with what it is capable of. That said, it obviously has its limitations. I don't think that AI as it stands right now will really be replacing anyone's job. What I've come to realize is that code and code velocity were really meaningless, especially in the corporate world. Ultimately what matters is delivering feature- complete features, that don't fuck up and waste the customers time and money. AI seems to be very strong at the former, but fails at the latter. To give an example. I was working on a feature that involved placing labels on a plot in realtime as the plot moved. This problem sounds easy, but is actually surprisingly difficult. AI came up with some off-the-top solutions, many of which were just off the shelf implementations of common algos, but for the usecase I had, it wasn't enough. I ended up needing to guide it, very carefully down the path of how to design and engineer the solution such that it wasn't a resource hog, and that it worked as expected. I expect that in the future it will get a little bit better, but the reality is that we are still always going to be bound by the inaccuracy and "hand-wavishness" that comes with the English language. If you were to remove all the ambiguity from English, you end up reinventing programming languages. Now how does this relate to local models? Well, as you may know if you are one of the 0 people that read this blog, I have built myself a server (that you are actually reading this on) that hosts my AI experiments. I have been playing around with a variety of different models, but to be quite honest with you, local LLM's just are not ready for any real workloads. It seems that the local stuff lags behind. Is it better than GPT2? Maybe? It's certainly slower. The best results that I have gotten is by using IBM's granite4 models for in-the-middle completions. They seem to be able to handle that well enough. One-shotting from a prompt is just far beyond the capabilities of any model I have been able to run. Even GPTOSS can't really handle that, despite being the largest model I can comfortably run without spending hours and hours waiting for a response. That brings me back to paying for access to models. In the past few months I have been experimenting with different providers. In the meantime I have settled on Anthropic, simply because I seem to have the best luck with them. I also paid for a month or two of Codex, but I wasn't impressed with it's abilities compared to Opus. I'm planning on investigating Mistral next. Their free plan is quite generous, and I used vibe to put together a shitty landing page for my company. Thinking about it further though, if the future really is in paying some other company to compute your tokens for you, I think that ultimately the service is going to get cheaper and cheaper. There will be a point in the future where local models will catch up to hosted ones, but simply due to economy of scale, it's almost impossible to compete. The reality is that someone pooling resources with multiple users is always going to be better off that a single guy building a server in his basement with deprecated server parts. (hmm wonder who I could be referring to?) I think following a natural progression, it's not unrealistic to see companies paying for a provider for their employees, much like they currently do with internet, email and office tools. For the individual? Well, that's kind of interesting. What I see is this: Much like how MS Office and Windows try to rope individuals into using their products in the HOPE that if they were in the position to make a business decision, they would choose Windows, I think AI Companies will treat the individual as a loss-leader. Ultimately they have the most to gain if an employee uses some tool, likes it, and then asks their boss to buy a subscription. Now, when it comes to pricing, the doomers might say something like "Well, the tech companies are going to price you out of the market and make it too expensive" And that may very well be true, but I think that in this specific case, small players would eat the big guy's lunch if they did that. Ultimately running an AI model costs nothing but the servers and the hardware, as well as the dataset. Small companies CAN and HAVE put this together in the past, there is no reason that they can't do it again. If the MAGA (meta, Amazon, google, Apple) companies start to rest on their laurels, they will simple be out competed by a more agile small fry. Am I giving up on local models? No, but I'm not expecting as much anymore. I think the real benefit for local models is going to come from extremely niche usecases. Timeseries data prediction, embedding searches, etc etc. All of these things that are too niche for a foundational model to use are going to be the things that local models will excel at.

Laingsoft

Fundamentally throwing good software at problems.

Souring a bit on local models

Maybe renting aint so bad