Study Looks at Whether AI Language Models Can Replace Human Participants in Research

This post is also available in: עברית (Hebrew)

As generative AI systems grow more advanced, their potential applications in research continue to expand. However, a recent study has found critical limitations when using large language models (LLMs) as stand-ins for human participants in qualitative research. While LLMs offer efficiency and cost savings, researchers from Carnegie Mellon University (CMU) have concluded that these models fall short in replicating the depth and nuance of human experience required for certain types of studies.

The research, presented at CHI 2025 in Japan, explored whether LLM-based agents could adequately replace human voices in qualitative studies, such as interviews or ethnographic research, where subjective perception, emotional context, and personal narrative are essential. According to TechXplore, the study involved 19 people who interacted with an LLM via a chatbot interface. Their goal: to assess the accuracy, depth, and ethical implications of responses generated by the AI.

Findings revealed that LLMs, although capable of producing coherent and structured answers, often merged conflicting viewpoints into a single narrative. For example, in a hypothetical study on factory conditions, an LLM might blend the perspectives of a manager and a floor worker into a unified—but misleading—response. This conflation distorts reality, erasing important distinctions between social roles, experiences, and power dynamics, according to the researchers.

Another major concern is informed consent. LLMs are typically trained on large datasets, some of which include publicly available user data scraped from online platforms. The study questions whether individuals whose data contribute to training these models ever truly consented to that use, especially in contexts where their words are repurposed as synthetic participants in scientific studies.

The research also raises broader concerns about how LLMs may encode and perpetuate societal biases embedded in their training data. These models, the researchers warn, do not simply reflect knowledge—they shape it, potentially reinforcing dominant narratives while marginalizing others.

Ultimately, the conclusion is clear: while LLMs may serve as useful tools in research support roles, they cannot authentically replicate the human voice in studies where understanding individual perspectives is key. The technology, for now, is best seen as a complement to—not a replacement for—human insight in qualitative research.