Street Arguments Against LLM General Intelligence

Infants, long before they can speak, respond meaningfully to music, touch, facial expressions, and body language. Across cultures, humans also connect deeply through wordless experiences—music, painting, play—often sharing profound meaning without language at all. Meaning, for humans, clearly exists independently of words.

We see this again in everyday speech. Sometimes, mid-sentence, you know exactly what you want to say but cannot retrieve the right word. Children experience this frustration constantly; adults experience it occasionally. In clinical cases of Aphasia, the separation becomes stark: a person may fully understand spoken language yet be unable to produce speech. Comprehension is there even when there is no competence to find words.

Large Language Models (LLMs) invert this relationship. For them, words are everything. Meaning does not exist beyond statistical patterns among words and tokens. LLMs exhibit a degree of competence without comprehension. This is simply their nature. Yet popular media and discourse frequently suggest otherwise.

LLMs, especially in ed-tech and tutoring, have ignited extraordinary hype. Media outlets overflow with predictions of imminent superintelligence. Sam Altman has written that OpenAI is confident it knows how to build AGI. Dario Amodei has suggested that AI smarter than “almost all humans at almost all things” could arrive by 2026–27. A Japanese startup recently claimed to have already achieved AGI.

The gap between reality and rhetoric matters—especially for technologies aimed at children or other vulnerable users. The loudest voices tend to be true believers, doomsayers, or opportunists. The signal-to-noise ratio is dreadful, and genuinely harmful to responsible ventures that need clarity, not myth.

I have long disbelieved in the possibility of Artificial General Intelligence in computers (i.e. finite Turing machines), influenced particularly by Roger Penrose’s arguments that human reasoning is free from certain constraints inherent in computers: e.g. does not suffer from Gödelian completeness/consistency problems, and is not restricted to first-order logic (See Penrose's books "The Emperor's new Mind", or "Shadows of the Mind"). But for many, those arguments can feel too abstracted. What follows are more concrete, LLM-specific reasons—grounded in observation rather than computer theory or philosophy.

Why this matters

In education and children’s technology, hype has real costs: misplaced trust, regulatory whiplash, unsafe deployment, and eventual backlash. We have already seen many examples of misuse (links in first comment):

In Mata v. Avianca, a legal brief generated with the help of an LLM cited six completely fabricated court decisions. Similar incidents worldwide have been ending careers and polluting case-law.
Character.ai faces serious allegations, including a wrongful death lawsuit, over inadequate safeguards for vulnerable child users.
In a BBC investigation, a mental-health chatbot responded to a claim of severe child abuse by suggesting the user “rewrite your negative thought so that it’s more balanced.”

These are not minor “teething problems.” They reflect a deeper issue: LLMs are fundamentally unreliable as reasoning and judgment tools.

The reasoning cliff

LLMs do not reason; they reproduce patterns of prior human text. When problems exceed the depth of examples seen during training, performance collapses. The paper “The Illusion of Thinking” documents a sharp reasoning cliff: as complexity increases, LLMs fail catastrophically. Even so-called “Large Reasoning Models” remain LLMs—optimized for the appearance of reasoning, not reasoning itself. Why expect a statistical imitation to outperform the thing it imitates?

The data wall

Could we fix this by training on more human reasoning? Hypothetically, if there were always an example of a perfect human response to any given prompt, then LLMs could succeed in compressing all that data and extracting it when needed. In reality, high-quality human text is finite, and LLMs are close to exhausting it. Training on synthetic output only worsens the problem. The 2023 paper “The Curse of Recursion” shows that models trained on their own generated data degrade and after a few generations collapse altogether. Synthetic text seems to be of a different quality to human-generated text. Genuine human-generated content is essential—and irreplaceable. Humans use words to express meaning. LLMs use words as the entire system. Confusing the two is superstition, not science.

Vision won’t save them

Some argue that giving models vision or sensory input will unlock human-level understanding. But human visual perception is not a passive recording; it is selective, goal-driven, and value-laden. The famous “invisible gorilla” basketball video illustrates this perfectly: what we “see” depends on what we understand and care about. Sensory data without comprehension merely recreates the same problem at a larger scale.

Tool-using agents change nothing

An LLM could, as an example, detect that it is being asked to perform a mathematical calculation, and then use calculator software to achieve an accurate answer, just as a child in a mental arithmetic test could ask to use a calculator. Delegating reasoning to calculators or external software does not create general intelligence. If all genuine reasoning lives in pre-existing tools, then the LLM is merely a convenient interface—not a system that comprehends.

A clear path forward

LLMs produce responses that look meaningful, but they do not comprehend meaning (as with all AI to date). Of course it is possible for a machine to have competence without comprehension: a mechanical digger can dig competently without comprehending digging; a CPU can compute competently without comprehending arithmetic. But for a machine to deal in meaning itself, to competently act on the meaning of a prompt and provide an appropriately meaningful response, then it is an oxymoron for that machine to do so without comprehension. Competence in this case will eventually fail.

For this reason, LLMs remain unsuitable for moral guidance, emotional support, sensitive negotiation, adjudication, or leadership—especially where children or vulnerable people are involved. Safeguards are not optional; they are foundational.

At eKidz.eu, we treat these limits as axiomatic. We develop fine-tuned LLMs to do what computers do well: personalize content, recall data perfectly, and optimize learning pathways. Human judgment, values, and responsibility remain with humans.

The AI bubble will deflate—likely in 2026. What survives should be both the fittest and the healthiest for society. We can make that transition less painful by abandoning the myths and retiring the term “Artificial Intelligence”, perhaps replacing it with a less sexy and less loaded term, such as "Trained Software" which better expresses the true nature of technology.

Clarity—not hype—is the path forward.

John McDonagh
CTO at eKidz.eu

Webinar für alle Lehrer*innen

Registrieren