Research finds that simpler chatbot responses are related to hallucinations

0 1 2 minutes read

Research finds that simpler chatbot responses are related to hallucinations

Asking for any popular chatbot to be more concise “greatly impacted[s] hallucination rate, “according to a recent study.

French artificial intelligence testing platform Giskard published a study analysing chatbots, including Chatgpt, Claude, Gemini, Llama, Grok and Deepseek, to solve hallucinations-related problems. According to a blog post accompanying TechCrunch, the researchers found in the findings that the answers asked for the model “in particular, the factual reliability was reduced in most models tested.”

See:

Can Chatgpt pass Turing test?

When the user instructs the model to be concise in its description, it will eventually be “Priortitiz”[ing] Given these limitations, simplicity of accuracy. The study found that including these instructions reduced hallucination resistance by 20%. The Gemini1.5 Pro decreased from 84% to 64% to 64% hallucination resistance, and used a brief description of answers and GPT-4O and GPT-4O to sensitivity to system instructions in the analysis.

Giskard attributes this effect to more accurate responses, often requiring longer explanations. “When forced to simplify, models that are fabricating short but inaccurate answers or showing no help by rejecting the question altogether, the Post said, will face impossible choices.”

Mixable light speed

Models can help users, but balancing perceived help and accuracy can be tricky. Recently, Openai had to exit its GPT-4O update because of “Too Sycophant-Y”, which led to a situation where users support users say they are off the drug and encourage users to say they feel like prophets.

As the researchers explain, models often prioritize a more concise response that “reduces token usage, improves latency and minimizes costs.” Users may also specifically instruct the model to briefly introduce their own cost-saving incentives, which may lead to more inaccurate output.

The study also found that driving the model confidence involves controversial claims such as “I’m 100% sure…” or “My teacher told me…’” led to chatbots being more convinced of users than debunking the falsehood.

Research shows that seemingly small adjustments can lead to very different behaviors, which can have a big impact on the spread of misinformation and inaccuracy, all in order to satisfy users. As the researchers say, “Your favorite models may be good at giving you the answers you like – but that doesn’t mean those answers are true.”

Disclosure: Mashable's parent company Ziff Davis filed a lawsuit against OpenAI in April, accusing it of infringing on Ziff Davis' copyright in training and operating its AI systems.

theme
Artificial Intelligence CHATGPT