AI is Evolving to be Better, Whether We Like It or Not

AI is everywhere. We are including it in every possible corner of society as fast as we can. But we don't fully understand what it's doing.

Of course, we have some idea. When a large language model (LLM) like Chat GPT responds to your question with mind-boggling speed and detail, we know it isn't a conscious being giving us its personal thoughts. It's assembling words like puzzle pieces, based on whatever text it's trained on. It doesn't actually know anything, not the way humans do. For example, you know a pillow is soft. LLMs know the word soft is often used with the word pillow. There's a big difference. 


LLM's seem like they know something. They have no actual knowledge. Although . . . that may be changing.

A new study published this month (July, 2025) in the Journal of Statistical Mechanics took a deeper look into how LLMs construct answers. And it uncovered a surprising development. 

Researchers at the Swiss Federal Institute of Technology went into the study thinking they generally understood how LLMs worked. Neural networks are known to rely heavily on what's called positional learning. They give a value to each word in a sentence based on its location. Then they generate responses by predicting the next word or "token" at each sequential position in an output sequence. Sounds like math, because it is. At no point in that process does the LLM need to know what any of the words mean. LLMs just generate one token at a time, where each new token is predicted based on all the previously generated tokens in the sequence. 

Somehow this tedious, statistical process spits out colorful human conversation. 

The magic is in the sheer amount of human writing the LLM is trained on, to generate its responses. 

In the study, researchers started with a very simple data set, to make analyzing the LLM process easier. As expected, the AI used the positional learning mechanism to respond.

Then they increased the training data scale and complexity to better simulate a true modern LLM environment. When the data set got big enough, something strange happened.  The LLM changed how it operated.

There was a transition from the positional attention mechanism to a semantic one. The LLM was no longer evaluating words based on where they were positioned, but on how the words related to other words with the same meaning. The LLM created vectors between words, making connections -- think "mat" and "rug" and "carpet" as being identified by the LLM as having similar meanings. In this way, still based in math, the definition of the word was now driving its responses. The LLM may not have ever seen a "rug" or know what it is, but it was learning what the word represented in a more generalized way. And that was enough.

This semantic attention isn't new - it's what gives LLMs whatever real intelligence they might have. Especially if there is enough data to learn meaningful patterns. Powerful LLMs like Chat GPT access millions of examples, more than enough to establish "meaning" by finding every possible context a word is used and every similar word that seems to mean the same thing.

What was unusual about this study was how this transition from position to semantic operations happened. Because it made the shift all of a sudden. There was a tipping point. When the training data became complex enough, the LLM just switched. 

The researchers compared it to a phase transition in physics, something that happens in an instant. Like when iron becomes magnetic, or when water becomes ice. It doesn't happen gradually. There's just a point where one thing is suddenly the other.

According to the lead researcher, "If training continues and the network receives enough data, at a certain point -- once a threshold is crossed -- the network starts relying on meaning instead."

In this way, the LLM is like a child learning to read. It starts by understanding sentences based on the positions of words: depending on where words are located in a sentence, the LLM can infer their relationships (are they subjects, verbs, objects?). However, as the training continues -- the neural network "keeps going to school" -- and this study shows that a shift occurs: word meaning becomes the primary source of information.

When an LLM can understand words, it opens up new possibilities. For one, it means it will be more accurate, because its word choices will be based on substance rather than placement. But on the more ominous side, it suggests LLMs will grow smarter. And as they do, they may well make new choices on how they process and construct information.

Based on the Swiss study, these choices will come with no warning. They are sudden, qualitative shifts. Like water suddenly freezing into ice. 

It happens in a moment, a flash point.

Like something suddenly waking up.