ChatGPT Falls Short of Turing Threshold

Nov 11, 2023

This post is also available in: עברית (Hebrew)

Ever since ChatGPT was released to the public in 2022, artificial intelligence experts have been asking whether it met the Turing test of generating output indistinguishable from human response. According to two researchers from the University of California at San Diego, it comes close- but not quite.

ChatGPT is undoubtedly smart, quick, and exhibits apparent intelligence (sounds humanlike and even displays humor), but it also on occasion provides completely false information- it hallucinates and does not reflect on its own output.

The researchers, Cameron Jones and Benjamin Bergen, turned to the work of Alan Turing, who invented a process that determines whether a machine could reach a point of intelligence and conversational prowess at which it could fool someone into thinking it was human.

In their report “Does GPT-4 Pass the Turing Test?” they gathered 650 participants and generated 1,400 “games” that consisted of brief conversations conducted between participants and either another human or a GPT model. Participants were then asked to determine who they were talking to.

According to Techxplore, GPT-4 models fooled participants 41% of the time, while GPT-3.5 fooled them only 5-14% of the time. Even more interesting is that humans only succeeded in convincing participants they were not machines in 63% of the trials.

The researchers concluded that they did not find evidence that GPT-4 passes the Turing Test. However, they noted that the Turing test is still a valuable measure of the effectiveness of machine dialogue.

Nevertheless, they warned that chatbots can still communicate convincingly enough to fool users in many instances- “A success rate of 41% suggests that deception by AI models may already be likely, especially in contexts where human interlocutors are less alert to the possibility they are not speaking to a human,” they said, and added that AI models that can impersonate people could have widespread social and economic consequences.

When it comes to identifying the conversation partner, participants focused on several factors. Formality for example- models that were too formal or too informal raised red flags. Other indicators were if they were either too wordy or too brief, and had either too-good or “unconvincingly” bad use of grammar and punctuation.

Lastly, participants were reportedly sensitive to generic-sounding responses. The researchers explained: “LLMs learn to produce highly likely completions and are fine-tuned to avoid controversial opinions. These processes might encourage generic responses that are typical overall, but lack the idiosyncrasy typical of an individual: a sort of ecological fallacy.”

The researchers concluded that it is important to track AI models as they gain more fluidity and absorb more humanlike quirks in conversation, stating that it will become increasingly important to identify factors that lead to deception, and develop strategies to mitigate it.

ChatGPT Falls Short of Turing Threshold

Latest

NSO Ordered to Pay WhatsApp 168 Million Dollars in Lawsuit

Massive Surge in DDoS Attacks Targets Spain and Europe Amid Rising...

The Low-Cost, 3D-Printable Open-Source Humanoid Robot for Research and Education

SOHO Jerusalem: An Urban Renewal Project Bringing the Future to a...

Android 16 Might Block USB Data Transfers to Enhance Security

AI-Driven Threat Detection Enhances EAGLS Air Defense Platform

Chinese Researchers Claim Breakthrough in Detecting U.S. Stealth Radar Signals with...

TikTok Fined by EU with €530M Over Data Transfer and Privacy...

AI-Powered Intelligence Platform to Boost SOCOM’s Operational Readiness

Israeli Encrypted Messaging Archiving Platform Used by U.S. Officials Compromised in...

U.S. Immigration Begins Screening Social Media Accounts for Antisemitic Activity

Upcoming Exercise will Test 3D-Printed Drones for Electromagnetic Threat Detection

Pentagon Exposes Gaps in Counter-Drone Defense Capabilities

AI-Powered Facial Recognition Helps Track Down Fugitive Child Predator After 28...

SOHO Jerusalem: An Urban Renewal Project Bringing the Future to a...

FBI Reports Cybercrime in the U.S. Soars to $16 Billion in...

Construction Drones: Flying Robots Could Build Skyscrapers

MIT Develops Robot to Assist in Search and Rescue Missions

General Atomics Develops Airborne Laser Weapon for MQ-9 Drone Systems

DeepSeek Investigated Again Over Unauthorized Data Transfers