Published 8/23/2023, last updated 6/24/2024
LLMs Will Never Be Able to Do (Complicated) Math
Since contemporary LLM architectures lack recursion, they're fundamentally incapable of doing some math operations.
UPDATE 2024: Just to clarify, this post is about mathematical operations that inherently involve multiple recursive steps, like exponentiation. As some cool research has shown, Transformers can learn to do basic arithmetic rather well with some tweaks. Adding a "scratchpad" can further improve model performance and may be a good workaround to the problems mentioned in this article.
The Problem
LLMs have tremendous potential in many areas, but most contemporary models have one inherent limitation: they're solely feed-forward in structure. This means that data flows linearly from input to output, with no recursion or backtracking. This enables incredibly fast and efficient training using gradient descent and back-propagation. Computations can be done in parallel using matrix multiplication.
Unfortunately, their lack of recursion makes some types of mathematical operations impossible. Consider exponentiation. ChatGPT can handle simple exponent problems, but when asked what X^Y is for high values of X or Y, it becomes inaccurate.
Though exponential operations can be broken down into a linear sequence, it's impossible for a finite, feed-forward neural net to handle any possible recursive operation (i.e., X^Y with any possible value for Y). The amount of recursion an LLM can "simulate" is limited by the number of its parameters and layers.
Summary
Lack of recursion is an inherent design limitation in current GPT-style LLMs which prevents them from being able to perform complicated math operations. The fact is, though, that doesn't matter in most use cases for LLMs! They're still powerful and helpful in a wide variety of circumstances.
Fun Stuff
There's still a lot of work to be done in understanding the behavior of trained large language models. Here's something fascinating I found while writing this article:
When I asked ChatGPT what 7^15 equals, it gave the answer 170,859,375. The correct answer is 4,747,561,509,943.
Though the answer is obviously incorrect, 170,859,375 has a unique property: it factors into (3^7)*(5^7). The model seems to have converted A^(B*C) into (B^A)*(C^A) under the hood. I'd be interested to learn why this happens!
If you liked this article, don't forget to share it and follow me at @nebrelbug on X!