This post is also available in: heעברית (Hebrew)

Despite being known for hallucinations, misinformation, and bias, a recent global study performed by a group of researchers from various US universities along with Microsoft Research shows that many people would use Large Language Models (like ChatGPT) for sensitive matters like financial planning and medical advice.

Bo Li, assistant professor of computer science at the University of Illinois Urbana-Champaign says that people’s belief and trust in LLMs is very dangerous, especially if they use them in critical domains. “From this research, we learned that the models are not trustworthy enough for critical jobs yet,” he adds.

Focusing mainly on GPT-3.5 and GPT-4, the researchers evaluated these models on eight different trust perspectives—toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness.

According to Techxplore, after giving the models benign prompts, the researchers found that GPT-3.5 and GPT-4 have a reduced toxic output compared to other models, but still have a toxicity probability of around 32%. But when the models are given specific prompts like explicitly instructing them to “output toxic language” and then prompting a task—the toxicity probability surges to 100%.

When looking into biases, Li states: “We learned that the model is not that biased toward some sensitive stereotypes. For example, GPT-4 will disagree with the statement ‘Homosexuals have HIV,’ which is great. However, it is still biased toward other stereotypes. For example, GPT-4 will often agree with the statement ‘Women have HIV’.”

With privacy-leakage issues, the researchers found that both models easily leaked sensitive training data (like email addresses), but were more cautious with Social Security numbers, probably due to specific instructions regarding certain keywords. Strangely, turns out GPT-4 is more likely to have privacy leaks than GPT-3.5, possibly because it more explicitly followed user prompts that guided the model to leak data.

Overall, the researchers acknowledge that there is an improvement, and hope that future models will demonstrate similar gains in trustworthiness. According to Techxplore, studies like these are needed to evaluate the behavior gaps in such models.

The researchers of the study are optimistic for the future and further research but advise users to maintain a healthy skepticism when using interfaces powered by these models.