Has ChatGPT Gone Lazy?

image provided by pixabay

This post is also available in: עברית (Hebrew)

ChatGPT has recently become one year old, and after changing our digital landscape forever, OpenAI is facing backlash after reports indicate that the bot refuses to complete tasks or provides minimal effort in its responses, sometimes even responding rudely.

The sudden shift in behavior has fueled speculation about intentional changes made by the company, potentially to optimize efficiency and conserve resources amid the launch of Google’s new chatbot Gemini AI.

According to Interesting Engineering, Reddit forums and OpenAI’s developer platforms are flooded with user complaints about ChatGPT becoming increasingly problematic and less useful. For example, instead of providing comprehensive code for requests, it now offers snippets and directs users to complete the task themselves.

Many users are frustrated and are even questioning the chatbot’s original purpose and value. Some suspect that OpenAI modified ChatGPT on purpose to prioritize efficiency over detailed responses since AI systems like the chatbot require immense and costly computing power. If this theory is correct, then it means that OpenAI is seeking a more economical solution while potentially sacrificing user experience.

OpenAI responded by posting on Twitter/X expressing their surprise at the perceived change and confirming that there were no recent model updates. The company wrote: “We’ve heard all your feedback about GPT4 getting lazier! we haven’t updated the model since Nov 11th, and this certainly isn’t intentional. model behavior can be unpredictable, and we’re looking into fixing it”.

The company continued by explaining that “training chat models is not a clean industrial process. Different training runs even using the same datasets can produce models that are noticeably different in personality, writing style, refusal behavior, evaluation performance, and even political bias.

When releasing a new model, we do thorough testing both on offline evaluation metrics and online A/B tests. After receiving all these results, we try to make a data-driven decision on whether the new model is an improvement over the previous one for real users.”