This post is also available in:
עברית (Hebrew)
A new study has shed light on one of the more mysterious aspects of modern artificial intelligence: how language models decide what to focus on when processing text. Researchers have discovered that transformer-based AI systems—like those behind ChatGPT, Gemini, and other conversational platforms—undergo a sudden and dramatic shift in their learning strategy depending on how much data they are trained with.
Published in the Journal of Statistical Mechanics: Theory and Experiment, the research, titled “A Phase Transition between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention,” explores how attention-based neural networks evolve during training. When exposed to limited data, these systems tend to rely on the position of words in a sentence to understand relationships—recognizing, for instance, that the subject usually comes before the verb in English, according o TechXplore.
However, once a certain volume of training data is reached, the network switches strategies almost instantaneously. Rather than relying on where words appear, it begins to prioritize semantic meaning—grasping the deeper significance of words and their context.
The study is based on a simplified model of the “self-attention” mechanism, a key component in transformer language models that allows the network to weigh the importance of each word relative to others in a sequence. This mechanism enables transformers to capture complex dependencies within text, powering everything from autocomplete tools to advanced chatbots.
What makes this discovery particularly interesting is not just the transition itself, but the way it happens. Below a certain threshold, the model uses only positional cues. Cross that threshold, and it suddenly starts using meaning instead—with no gradual blend of the two.
While the model used in the study is simpler than the large-scale systems used in commercial applications, the implications are significant. The findings could help AI researchers better understand—and potentially control—how and when language models adopt different learning strategies, which could contribute to making them more efficient, interpretable, and reliable in the future.