A new Rune project tries to force LLM arithmetic

Rune project – A new Rune effort by Alvaro Videla aims to make language models do arithmetic by monitoring internal states and inserting the right results during inference. The experiment ultimately “sort-of worked” but was judged a failure—another reminder that deterministi
Ask an LLM to do arithmetic and you’ll get the same kind of laugh—sometimes relief. sometimes confusion—because the numbers don’t behave like numbers. With a well-written chatbot interface. it can even spot that you’re asking for arithmetic (like summing 1 and 1) and hand the job off to a dedicated calculator application. But that doesn’t fix the deeper issue: an LLM still can’t reliably count the ‘r’s in ‘strawberry’ or handle math the way a deterministic system does.
That’s the question at the center of a new Rune project from Alvaro Videla: can arithmetic be done with a language model at all. or is the entire premise mismatched from the start?. Videla’s starting point is blunt. At its core, an LLM is a vector space of probabilities. A matrix-based inference process then produces a probabilistic output of tokens—meaning you wouldn’t expect the kind of deterministic behavior arithmetic demands.
Rune takes a different approach. Instead of leaning on the model’s raw “best guess. ” it’s described as ‘a mechanism-aware JIT compilation project for language-model arithmetic.’ The idea is to treat arithmetic like something you can intercept mid-flight. While it’s statistically impossible for an LLM to correctly perform any random series of arithmetic calculations. the project proposes that internal model state can be monitored and that interference can happen once the parameters of an arithmetic calculation are identified.
When those parameters are found. the plan is to put the correct result back into the inference process and then let the model continue. That means the system doesn’t need external tools to finish the job—at least in theory. The arithmetic wouldn’t be “solved” by the LLM as a thinker; it would be nudged back onto the right track during computation.
In the end, the attempt only goes so far. The attempt “sort-of worked,” but it was deemed a failure. The conclusion is hard to miss: for now, a language model still looks like the wrong tool for replacing the humble calculator.
LLM arithmetic Rune project Alvaro Videla mechanism-aware JIT language model probabilistic inference token prediction deterministic math
So basically it’s trying to make ChatGPT count? Lol.
“Sort-of worked” sounds like it didn’t work. Why are we still forcing LLMs to do math when calculators exist. Also what does “monitoring internal states” even mean, like spying on its thoughts?
Wait I thought the whole point was it can spot when you ask “1+1” and then just uses a calculator app. So isn’t that just cheating? Like the model never actually does arithmetic, it just punts.
Numbers don’t behave like numbers?? That’s exactly why I don’t trust any of these systems. Next they’ll say the LLM is allergic to the letter R in strawberry or whatever. If it can’t count r’s then how’s it gonna do my taxes. Sounds like a lot of extra steps for the same wrong answers.