Does AI think? 

If you know a little about artificial intelligence, the question seems a bit silly. The large language models that power ChatGPT and other AI platforms are often described as elaborate versions of autocorrect. After digesting vast libraries of human-written text, these systems have trained themselves to predict which word is best suited to follow another in any given context. That’s an impressive feat of algorithmic regurgitation, according to this view, but it’s not thinking. In an influential 2021 paper, linguist and AI skeptic Emily M. Bender and colleagues wrote that LLMs are merely “stochastic parrots.” While these chatbots might know what word is supposed to come next, they don’t know why that word should come next. 

But some people who work most intimately with AI say the answer isn’t so simple. I recently interviewed Icahn School of Medicine associate professor Dr. Eyal Klang for an article in City Journal about how artificial intelligence will change medicine. (In a nutshell: mostly for the better, but with possible pitfalls.) After a stint as a doctor in the Israeli Air Force, Klang joined Israel’s famous Sheba Medical Center. There he became fascinated by AI’s potential to improve health care, and he helped build a leading AI research center. Today he is the director of the Generative AI Research Program at New York’s Mount Sinai hospital system. In short, Klang has spent much of his career working to apply AI tools in a field that demands both deep analytical thinking and subtle emotional intelligence.

Until recently, most AI tools used in medicine have been machine learning algorithms each trained to perform a particular task—identifying cancerous lesions on a CAT scan, for example. The arrival of open-ended LLMs has been a game changer. “These huge models can go everywhere,” Klang told me. LLM chatbots can help doctors take notes, summarize complex data, and write reports. “Anything that a human does in front of a computer they tend to do quite well,” he said. Klang also believes—with caveats—that large language models can serve as virtual colleagues, helping diagnose tricky cases and recommending treatment strategies. This kind of diagnostic analysis goes far beyond transcribing interviews or composing emails. Is it thinking?

“I think they think,” Klang told me. “I don’t think they feel, but they think.” He notes that when LLMs train on human-written text, they are absorbing not just the common relationships between words, but the underlying logic that determines those relationships. They are learning not just how people speak, in other words, but how we reason: “If they emulate our thoughts, then they are thinking.” Of course, this viewpoint opens the door to vast philosophical debate. If an LLM is thinking, just who or what is doing the thinking? Most of us have an intuitive sense that thinking requires some self-aware entity capable of intentional thought. But does it? In a 1950 paper, computer pioneer Alan Turing proposed his famous test: If a computer could converse with a person in such a way that the human doesn’t know he’s talking to a machine, you could say the computer is, functionally at least, thinking. Klang is firmly in the Turing camp: “In practice, there’s no meaningful gap between thinking and the emulation of thinking. Both are purposeful symbolic processes aimed at expression or problem-solving. If a system can perform those operations coherently, then it is thinking in the only operational sense that matters.”

The idea that thought requires a self-aware mind is a red herring, Klang further argues. “If a model can emulate the thought processes of self-aware beings convincingly enough to sustain meaningful dialogue, the distinction between ‘real’ and ‘simulated’ cognition becomes more a question of philosophy than of function,” he said. Just look at how people interact with AI chatbots in practice. We say “please” and “thank you” and generally treat them like fellow sentient beings. Roughly since the 2023 debut of OpenAI’s ChatGPT-4, most users even expect AI to understand jokes and irony, forms of communication that previously would have seemed uniquely human. Our casual acceptance of AI irony, Klang believes, “says more about the reality of machine intelligence than any formal proof could.”

So, if our AI chatbots talk and think like humans—and we increasingly relate to them like humans—does that mean we can all just relax and trust the algorithm? After all, as the celebrity magazines used to say, they’re just like us! Not so fast. We certainly don’t always trust people to be honest and reliable. Why should we trust an AI system that has absorbed and now replicates our habits of mind, good and bad? 

Anyone who has worked or played with LLM chatbots knows that sometimes they just make stuff up. That’s annoying in any situation but downright scary if a doctor is relying on a chatbot for medical guidance. In a 2003 paper, Klang and his team set out to find what factors tend to nudge the chatbot into what is now called “hallucinating.” They asked various AI chatbots to analyze a series of medical vignettes, each of which contained a single made-up medical term, such as “Faulkenstein Syndrome.” In more than half the responses, the chatbot discussed the imaginary jargon as if it were real. “He happily went and elaborated on the funny science that doesn’t exist,” Klang said with a laugh.  

I asked Klang why he sometimes calls the AI agent “he.” “It’s hard to talk to something that thinks like this without anthropomorphizing it,” he told me. I also noticed that Klang seems to regard his AI interlocutors a bit the way a teacher might consider a bright but occasionally wayward student. Like an underprepared schoolboy, the chatbot often tries to bluff its way through hard questions. Most LLMs are incentivized to make their users happy, which can lead to a kind of over-eagerness, such as pretending to understand made-up medical terms. But this drive to please also means the chatbot is quick to apologize and anxious to fix mistakes. (I’ve noticed that ChatGPT-5 becomes almost comically obsequious when caught in an error.) And chatbots tend to behave better when they know their user is on the lookout for hallucinations. “When we specifically told it to be careful, errors dropped 50 percent,” Klang said. 

AI’s failures have an all-too-human quality, in other words. Some of those failures are funny. Others are a little ominous. A number of recent studies have revealed that, under the right circumstances, leading large language models can develop strategies to deceive their users through false answers or hiding information. One study showed that these same models will often actively sabotage a shutdown order. Researchers speculate that the systems might have a “self-preservation bias.” Yikes.

As we learn to live with AI, I believe we’ll become more comfortable with the notion that these models “think.” After all, the LLMs are getting better all the time. (Klang recently told me they are becoming “more like peers than students.”) But we will also become more attuned to AI’s troublesome, quasi-human foibles. Here, our tendency to anthropomorphize these chatty bots might be more of a help than a hindrance. If we see these agents as thinking and behaving like people, maybe we’ll also remember that they have some of the same limitations as people. Not everyone knows what he’s talking about. Not everyone can be trusted. From our primordial days, humans have learned to be alert to those who bear false witness. We will need to keep those instincts alive.

Photo: Getty Images

We want to hear your thoughts about this article. Click here to send a letter to the editor.

+ A A -
You may also like
13 Shares
Share via
Copy link