If a machine or an AI program matches or surpasses human intelligence, does that mean it can simulate humans perfectly? If yes, then what about reasoning—our ability to apply logic and think rationally before making decisions? How could we even recognize whether an AI program can reason? To try to answer this question, a team of researchers has proposed a novel framework that works admire a psychological investigate for software.
“This assess treats an ‘intelligent’ program as though it were a participant in a psychological investigate and has three steps: (a) assess the program in a set of experiments examining its inferences, (b) assess its understanding of its own way of reasoning, and (c) scrutinize, if possible, the cognitive adequacy of the source code for the program,” the researchers note.
They suggest the standard methods of evaluating a machine’s intelligence, such as the Turing assess, can only tell you if the machine is good at processing information and mimicking human responses. The current generations of AI programs, such as Google’s LaMDA and OpenAI’s ChatGPT, for example, have come close to passing the Turing assess, yet the assess results don’t imply these programs can think and reason admire humans.
This is why the Turing assess may no longer be relevant, and there is a need for new evaluation methods that could effectively evaluate the intelligence of machines, according to the researchers. They claim that their framework could be an alternative to the Turing assess. “We suggest to exchange the Turing assess with a more focused and fundamental one to answer the question: do programs reason in the way that humans reason?” the investigate authors argue.
What’s wrong with the Turing assess?
During the Turing assess, evaluators play different games involving text-based communications with real humans and AI programs (machines or chatbots). It is a blind assess, so evaluators don’t know whether they are texting with a human or a chatbot. If the AI programs are successful in generating human-admire responses—to the extent that evaluators struggle to distinguish between the human and the AI program—the AI is considered to have passed. However, since the Turing assess is based on subjective interpretation, these results are also subjective.
The researchers suggest that there are several limitations associated with the Turing assess. For instance, any of the games played during the assess are imitation games designed to assess whether or not a machine can imitate a human. The evaluators make decisions solely based on the language or tone of messages they procure. ChatGPT is great at mimicking human language, even in responses where it gives out incorrect information. So, the assess clearly doesn’t evaluate a machine’s reasoning and logical ability.
The results of the Turing assess also can’t tell you if a machine can introspect. We often think about our past actions and ponder on our lives and decisions, a critical ability that prevents us from repeating the same mistakes. The same applies to AI as well, according to a investigate from Stanford University which suggests that machines that could self-ponder are more practical for human use.
“AI agents that can leverage prior go through and alter well by efficiently exploring new or changing environments will direct to much more adaptive, flexible technologies, from household robotics to personalized learning tools,” Nick Haber, an assistant professor from Stanford University who was not involved in the current investigate, said.
In addition to this, the Turing assess fails to scrutinize an AI program’s ability to think. In a recent Turing assess experiment, GPT-4 was able to convince evaluators that they were texting with humans over 40 percent of the time. However, this score fails to answer the basic question: Can the AI program think?
Alan Turing, the famous British scientist who created the Turing assess, once said, “A computer would deserve to be called intelligent if it could deceive a human into believing that it was human.” His assess only covers one aspect of human intelligence, though: imitation. Although it is possible to deceive someone using this one aspect, many experts believe that a machine can never accomplish true human intelligence without including those other aspects.
“It’s unclear whether passing the Turing assess is a meaningful milestone or not. It doesn’t tell us anything about what a system can do or comprehend, anything about whether it has established complex inner monologues or can engage in planning over abstract time horizons, which is key to human intelligence,” Mustafa Suleyman, an AI expert and founder of DeepAI, told Bloomberg.