New Study Highlights Gaps in AI Search Accuracy and Reliability

Image by Pexels

This post is also available in: עברית (Hebrew)

A recent study has raised important concerns about the reliability of AI-powered search tools, showing that many of their responses are not backed by the sources they cite. The findings suggest that while these tools may be fast and convenient, their outputs often require closer scrutiny.

The study, conducted by researchers at Salesforce AI Research, examined the performance of several leading AI-based search platforms—including Perplexity, You.com, Microsoft’s Bing Chat, and OpenAI’s GPT-4.5—using a purpose-built evaluation framework called DeepTRACE. The audit involved more than 300 questions and assessed the tools across eight different criteria.

According to TechXplore, the results showed significant inconsistencies. GPT-4.5, for example, produced unsupported claims in nearly half of its responses (47%), while other tools ranged between 30% and 40%. In many cases, citations either failed to support the claim or were unrelated altogether.

The audit also tested how these systems handle two types of questions: debate-oriented prompts, which involve contentious or politically sensitive topics, and expertise-driven queries that demand subject-specific knowledge. In debate scenarios, AI responses often presented a one-sided viewpoint, with little consideration for counterarguments. These unbalanced replies were frequently delivered in an authoritative tone, raising concerns about their potential to reinforce user biases and limit exposure to diverse perspectives.

Human reviewers verified the results produced by DeepTRACE to ensure the findings were accurate and reflective of real-world usage.

Beyond identifying flaws, the study also demonstrates a path forward. DeepTRACE provides a practical method for systematically evaluating how AI systems handle information retrieval and citation. The researchers argue that such frameworks are essential as AI tools become more integrated into public and professional information workflows.

While AI continues to offer benefits in automating search and research tasks, the findings serve as a reminder that users should approach outputs critically and verify sources when accuracy matters.

The full findings are available on the arXiv preprint server.