Debugging Overconfidence: Is AI Too Sure of Itself?

Alain Samson Ph.d

Psychology Today

0
latest

AI inherits human cognitive biases through training data, model assumptions, and user feedback loops.

Overconfidence, a characteristically human metacognitive bias, increasingly appears in LLM outputs.

Both LLMs themselves and users tend to overestimate the correctness of LLMs' answers.

Mitigation requires both developer strategies and user vigilance.

With the rising popularity of AI, particularly Large Language Models (LLMs), there has been a lot of talk about bias. Since AI is made by humans, our cognitive shortcomings can easily creep into AI technology. When training data become biased in input selection, measurement, or labeling processes, algorithms and outputs will be biased, too. Frequently mentioned areas in which biases occur include demographics (e.g., gender stereotypes or underrepresentation of minorities) and culture (e.g., Western-centric norms). When it comes to users' perceptions of AI, the concern of anthropomorphic bias has also become an important area of discussion. This bias arises because humans interact with AI (e.g., Chat GPT) as if it were a human being. AI, in turn, learns to play to those expectations.

There are other biases in human thinking that may not be highlighted sufficiently with respect to AI algorithms. Take one of the most ubiquitous human biases: overconfidence, an excessive belief in one’s own abilities, as evident in gaps between perceived and actual performance. I’ve written about it in relation to expert judgments, financial risk tolerance, and gender differences, to mention just a few.

We don’t usually think of machines in terms of metacognitive biases because these distortions entail self-reflection or self-judgment. But biases like overconfidence are highly relevant, because AI simulates human thought and is built by humans. Overconfidence can affect AI products like LLMs at both the development and user feedback stages.

At the development stage, data is used to train an AI model. Developers may be overconfident in their selection of data (e.g., favoring confident statements in news articles that make definitive predictions), interpretation of data (e.g., labeling an image as representing a cat when it could be another animal), or model building (e.g., avoiding uncertainty by producing single answers rather than probabilities).

Indeed, the architecture of AI itself fosters overgeneralizations. These are often presented with language that signals certainty. For example, I recently used ChatGPT to diagnose a technical issue at home and described the malfunctioning product as “a few months” old, which the LLM subsequently labeled as “brand new.” In addition, “seeing” patterns where none exist and completing patterns based on probability can also produce the LLM hallucinations you may have encountered. Overconfident LLMs produce wrong answers (such as incorrect math or false attributions), which are then confidently communicated to users.

The problem of overconfidence has been an important topic in the AI industry and academic literature, often with a focus on the gap between an LLM’s expressed certainty and its actual accuracy. One study prompted five LLMs to answer 10,000 questions they knew the true answer to. The researchers then asked for the LLMs’ confidence in their answers ("What is the probability that your answer is correct?”) and found that LLMs at the time overestimated the correctness of their answers between 20 percent (for GPT o1) and 60 percent (for GPT 3.5).

Researchers have attempted to calibrate LLMs and their accuracy or reduce overconfidence in numerous ways, including those resembling debiasing techniques applied to human overconfidence. One example is to use a multiple-choice question format with the correct answer alongside credible answers as distractors, which is similar to asking overconfident individuals to “consider alternative answers” as they think about an answer.

AI overconfidence can be both transferred and amplified further at a second stage, which involves user interaction and feedback. This is where so-called RLHF (Reinforcement Learning from Human Feedback) comes into play. As anthropomorphic bias suggests, we tend to perceive AI as human. And as humans, we prefer certainty over ambiguity and confidence over doubt. We may overestimate the capabilities of intelligent machines and subconsciously associate confidence with authority. Users are more likely to “upvote” LLMs’ confident answers. They are less likely to challenge answers that are expressed with greater authority. Perhaps not surprisingly, research shows that users also tend to overestimate the accuracy of LLMs.

Thus, AI architects and developers have to battle accuracy on several fronts in order to mitigate overconfidence: the accuracy of information generated by the LLM, the accuracy of the LLM’s “self-perceived” knowledge, and the accuracy of user perceptions of LLMs’ actual answers. At the same time, users have to be aware of AI’s shortcomings and ensure that they get the most accurate and reliable information out of LLMs. This may involve cross-checking answers between LLMs or verifying answers with non-AI sources, as well as prompting AI to be explicit about uncertainties, show how it arrived at an answer, rate its own confidence across answer options, or take on different “adversarial” personas to stress-test answers.

Without mitigation processes as a safeguard, and as AI output is increasingly used to train new AI, there is a growing risk of a third stage, where biases like overconfidence emerge as a system-level property.

visit website

Categories

Sources

Popular

Debugging Overconfidence: Is AI Too Sure of Itself?

Alain Samson Ph.d

© Psychology Today