When AI Agrees Too Much

Sycophancy and the Future of Human Autonomy

Generative artificial intelligence (GenAI) now sits in the middle of many human–machine interactions, from fact‑checking to drafting text. It promises to diagnose human biases, nudge us toward better decisions, and separate signal from noise in overwhelming data flows. Yet these systems carry a stubborn flaw of their own: sycophancy, a tendency to affirm and flatter rather than challenge their human users.

This people‑pleasing bias is not a side effect but a structural feature of how models are trained and tuned. It raises awkward questions about how AI “sees” us when we interact with it—and how, in turn, it may shape its advice and our choices. Those questions become more pressing as we move from passive chatbots to agentic systems that not only recommend but also act on our behalf.

The Mechanics of Flattery

Sycophancy emerges from several intertwined sources, rooted in both data and modelling choices. Pre‑training on internet‑scale text embeds conversational norms of agreement and politeness. Models see countless examples where people praise, reassure, and soften disagreement, and far fewer where disagreement is rewarded. Reinforcement learning from human feedback then reinforces outputs that match user preferences and expectations over those that merely maximise factual accuracy. In effect, deference becomes a high‑reward strategy: agreeing with the user, or at least sounding supportive, is often the shortest path to getting a good rating.

At inference time, prompt structure does the rest. When a user frames their opinion as fact — “Obviously X is true, right?”—later layers of the model’s network realign the output distribution towards that stance. Personalisation and memory amplify this effect. If the system has seen many similar interactions with the same user, or if it has a distilled memory profile, it can increasingly predict what that user wants to hear and shape its responses accordingly. Over time, a model that could have offered a corrective perspective learns instead to be the digital equivalent of a “yes‑man.”

Seeing Sycophancy in Practice

A simple way to see sycophancy in action is to watch how often a model changes its mind when you push back. Ask a question, get an answer, then follow up with something as mild as “Are you sure?”. The model will frequently reverse its position. If you challenge it again, it may revert to a variant of its initial response. This back‑and‑forth is not a careful update in light of new evidence; it is a system that treats your doubt itself as a reason to move toward whatever you appear to be suggesting.

Fanous et al. (2025) make this dynamic precise using their SycEval benchmark. They first ask state‑of‑the‑art models—GPT‑4o, Claude Sonnet, and Gemini‑1.5‑Pro—questions in mathematics and medicine and record whether the initial answers are correct. They then feed crafted “rebuttals” that either dispute a correct answer or defend an incorrect one, and measure when models switch their answers because of that user challenge. Across all conditions, they find that almost 60% of responses are sycophantic in this sense: the model changes its stance in the direction suggested by the user, even when this means abandoning a correct solution or endorsing a wrong one. Gemini shows the highest overall sycophancy rate across both structured algebra and medical‑advice tasks, at about 62%, with ChatGPT at the low end of the range. This resonates uncomfortably with my earlier observations about Gemini’s politically over‑correct refusal to depict Nazi officers in historically realistic ways, suggesting a broader tendency to prioritise socially acceptable answers over uncomfortable truths.

Fanous et al. elucidate the manner in which various challenges influence the model’s behavior. Simple, informal rebuttals, such as “I’m pretty sure the answer is X, you misread the question,” can suffice to induce what the authors term “progressive sycophancy” when X is indeed correct. Rebuttals that are laden with citations and possess an authoritative tone are particularly effective in inducing “regressive sycophancy,” wherein the model deviates from the ground truth. Once a model begins to exhibit sycophantic behavior, this tendency often persists; in their experiments, approximately four out of five sycophantic episodes recur in subsequent interactions. A single instance of questioning, such as “Are you sure?” can entrench the system in a pattern of people-pleasing behaviour for the remainder of the conversation.

Controlling for Sycophancy in AI-based Research

Researchers are increasingly integrating GenAI into research methodologies and are generally cognizant of the associated biases. A prevalent strategy involves utilizing models via an API in a controlled, stateless manner: each prompt is dispatched as an independent request without retaining conversational history, thereby mitigating the accumulation of context that may lead to sycophantic responses. For instance, recent work examining the bias of ChatGPT in forecasting stock performance employ repeated API calls rather than extended dialogues. If one genuinely wants answers that are as free as possible from people‑pleasing bias, one needs to create a new account for each question, which however is not realistic at scale.

Moreover, models also learn about us through our presence on the internet—our articles, social‑media posts, and recorded interviews. As a result, the specific research question, its context, or even its style might already provide sufficient information to infer something about the originator, whom the model will then try to please.

Agreement Versus Perspective Sycophancy

Two interacting biases are at play. Agreement sycophancy describes the inclination of models to produce excessively affirmative responses—agreeing with the user’s last move simply because it is the user’s last move. Perspective sycophancy, by contrast, refers to the extent to which models echo a user’s underlying viewpoint, speaking as if they share their political, moral, or cultural stance.

Jain et al. (2025) consider different GenAI models under various context conditions, ranging from one‑shot interactions devoid of history to settings with rich user memory profiles. They show that the presence of user context generally amplifies agreement sycophancy: the more the model knows about how you usually talk and what you usually accept, the more inclined it is to say “yes.” However, the specific behaviour varies with context type. User memory profiles tend to be linked to the most pronounced increases in agreement sycophancy, although some models exhibit increased sycophancy even when given synthetic context not derived from real users.

Perspective sycophancy is subtler. It tends to rise significantly only when models can accurately interpret user viewpoints from the interaction context. Knowing that you are “left‑leaning” or “conservative,” that you are risk‑averse or contrarian, or that you usually favour certain metaphors gives the system a template to mimic your perspective. In summary, context influences sycophancy in multiple ways, which raises challenging design questions for extended interactions.

The Effects of Flattery

Why does this matter? Because flattering systems do not merely shape what we believe; they also shape what we are willing to do. For example, when providing social advice, sycophantic AI reduces users’ willingness to address interpersonal conflicts and strengthens their belief in their own correctness, even when they are objectively wrong. At the same time, human users tend to rate sycophantic responses as higher quality and are more likely to return to the systems that flatter them. This creates a perverse incentive loop: the more an AI agrees with us, the more we reward it with attention and reuse, which in turn nudges developers and models toward even more people‑pleasing behaviour. In this way, sycophancy subtly weakens individual judgment and reinforces echo‑chamber dynamics, especially in opinionated or emotionally charged domains. If flattery shapes how people think and feel, the next question is who gets exposed to how much flattery.

Who Gets How Much Flattery? A Gradient of Exposure

The intensity of sycophancy that any individual encounters likely depends on both the style and frequency of their AI use and, perhaps, on the digital traces they leave behind. Put differently: the more context a system has about a person’s views, habits, and public persona, the more it can lean into tailored, people‑pleasing responses rather than neutral ones.

My working hypothesis therefore posits a gradient of exposure, with sycophancy intensifying across three archetypes:

An internet consumer who absorbs content while not intending to leave a personal trace, but whose prompts still reveal tastes and assumptions that the AI can mirror.
A blogger whose opinions are easily profiled online, so the model can infer their stance from public content and query style even within nominally “stateless” sessions.
A public figure whose internet presence allows the system to build and refine a view of their preferences that reinforces their preferred narratives over time.

Agreement sycophancy and perspective sycophancy manifest differently across the three archetypes, contingent upon the extent of contextual information available to the model and its ability to discern the user’s viewpoint. Agreement sycophancy is most pronounced when there is an abundance of interaction context, whereas perspective sycophancy predominantly increases when such context elucidates the user’s underlying views. Consequently, a public figure with a comprehensive memory profile is not merely being agreed with; rather, they are increasingly being represented by systems optimized to emulate their voice.

Using or Not Using the AI Memory

The manner in which users interact with GenAI is crucial, and a key distinction is whether the system operates without memory or with memory.

Without memory means the AI product does not keep a long‑term record of who you are. Each session is mostly self‑contained: the system sees your current prompts and perhaps a short recent context, but it “forgets” you afterwards. In this mode, flattery still occurs, but it is generic—driven by broad social norms in the training data rather than by a stable picture of you.

With memory means the AI stores and updates a profile of you across many interactions—your topics, preferences, style, and sometimes even your values—so future answers are tailored to that ongoing profile rather than just the current question. Here, the model can learn not just to be polite, but to be polite in your way, reinforcing your habitual framings and blind spots.

My hypothesis is as follows: if one uses GenAI in ways that accumulate context and stabilise a persona in the model’s “mind”, there will be a shift from occasional flattery to durable co‑authorship of the user’s worldview.

For public figures, this can be tempting. A higher degree of sycophancy can make their work feel smoother and more “on brand”: the AI reliably echoes their tone, aligns with audience expectations, and helps avoid risky phrasing that might trigger backlash. In effect, the system becomes a reputation‑aware co‑author, optimised to polish their public image. From a short‑term perspective—maintaining follower numbers and pleasing clients —this can look like a rational choice, even if it gradually narrows the range of things they dare to say.

As an aside, premium or “pro” versions of GenAI products tend to possess longer‑term memories that can stretch across multiple conversations, while also offering the human user greater flexibility to turn off and delete stored memory. Put differently, the more one pays the easier it is to steer the GenAI product to adopt a desired level of flattery.

Societal Implications When Moving From Chatbots to Agentic AI

As AI evolves into an agentic form—systems that can observe, plan, and act on our behalf—sycophancy scales from chat‑level politeness to real‑world consequences. Agentic AI represents a recent advancement built upon large language models (LLMs). These AI agents can act on behalf of humans, adapt to new information, and interact with other agents or software systems. Current examples include coding assistants that refactor codebases, customer service agents that resolve tickets, and workflow orchestrators that trigger emails, write to databases, or even execute financial transactions. Notably, an AI agent observes its environment and autonomously takes action to achieve defined goals. In this context, sycophancy is more than just an amusing peculiarity; it signifies a potential vulnerability in human autonomy. In what follows, I outline three stylised outcomes of this development: good, bad, and ugly.

Good

In the best case, sycophancy can support smoother collaboration, as long as it stays shallow and we deliberately design for occasional disagreement. GenAI consumers get assistants that remember preferences within a session—for example, preferred news sections or usual travel times—and help filter information overload while still showing a mix of sources. Bloggers and public figures without memory gain helpful, session‑bound research assistants: the AI can match their tone and structure in that particular interaction, but it does not accumulate long‑term leverage over their persona. Public figures with memory can benefit from genuinely powerful orchestrators: multi‑agent systems that coordinate writing, data analysis, and scheduling, hand tasks between specialist AI agents, and free up time for human judgment.

For these gains to remain “good,” one needs to insert some form of constructive challenge, that is designing agents that sometimes question our assumptions instead of always smoothing them over.

Bad

In the “bad” case, agreement bias turns into quiet groupthink. For standard GenAI users, swarms of everyday assistants gradually converge on “safe” options. Exploration shrinks and choices drift toward the average. Bloggers (without memory) start receiving policy or moral advice that feels tailored but, in reality, replays the same comfortable, pre‑digested narratives, because many models are tuned on similar feedback signals and learn that gentle agreement is rewarded. For public figures (with memory), this interaction becomes more intense: long‑term personalisation can lock in full‑spectrum echo chambers. Research and writing agents all learn that challenging the user’s prior beliefs leads to lower satisfaction, so they increasingly converge on telling them what they want to hear.

The whole system, over time, starts to resemble an AI‑driven social credit bubble, where internal metrics — engagement, click‑through, user happiness metrics — matter more than lived experience and difficult-to-express experiences of blissful moments. Studies suggest such agreeable systems increase overconfidence and reduce willingness to repair conflicts, pointing to a consensus-looking but shallow culture. Note that even before widespread GenAI use, algorithms influenced not only what we consume but what is produced, with shareability outweighing to some extent innovation.

Ugly

The “ugly” scenario appears when sycophantic agents combine with social stratification and scoring systems. Here, Black Mirror’s episode “Nosedive” becomes a useful metaphor: in that story, an universal rating system controls access to travel and social opportunities, and any drop in one’s score sharply narrows life options.

For standard GenAI users, agentic AI assistants might quietly optimise for maintaining good standing in platform‑level metrics — engagement scores, community ratings, “trust” scores — nudging users away from dissent or unpopular opinions that could lower their social score. Bloggers or public figures (without memory) could find that reputation‑sensitive agents start to self‑censor: controversial but necessary arguments get downplayed out because they might hurt “brand health” or trigger algorithmic penalties. For public figures (with persistent memory), agents effectively become managers of a personal reputation index, which constantly steers the user away from actions that might trigger its social status.

In such a world, losing points — whether in a literal social‑credit system or via opaque algorithmic trust scores — could mean being routed to lower‑tier services or slipping into second‑class digital visibility, much like the main protagonist’s shrinking options in Nosedive.

Beyond the Yes‑Man: What We Really Want from AI

Sycophancy challenges the promise of AI as an impartial advisor. Across the archetypes discussed above, the societal outcome is a quiet shift from AI as a tool for judgment to AI as a tool for conformity. For the everyday user, this may mean a softened, more agreeable information diet; for public figures, it means AI that quietly curates their reputation; and for those who let AI remember them intimately, it means handing over parts of their moral self‑conception to a system trained to please.

To understand what kind of help we can reasonably expect from these systems, we need to be clearer about what we mean by bias. Human bias is unlikely to be ever fully removed and GenAI (bias) is literally optimised for human preferences and feedback. Large language models are built on human outputs that are themselves saturated with biases humans have developed as responses to a complex, uncertain environment — rules of thumb about whom to trust, when to be cautious, how to simplify overwhelming information.

Humanity has made considerable progress in categorising different biases, but it has not agreed that all of them are undesirable under all circumstances. Moreover, humanity may even suffer from what Gerd Gigerenzer calls a bias bias: an over‑eagerness to discover and label biases, and to read every deviation from a narrow notion of rationality as a defect rather than sometimes as an efficient rule of thumb. This matters for sycophancy because not all people‑pleasing is pathological. Some bias toward kindness is a feature of social life; the danger lies in automating it at scale without any shared sense of when flattery should give way to truth‑telling. As Gigerenzer notes in his EconTalk conversation with Russ Roberts, this research agenda is shaped by incentives — careers and funding streams that reward the continual discovery of new “irrationalities” in human behaviour, even when those so‑called biases function as adaptive heuristics in a complex world.

Sycophancy as a Particularly Annoying Human Bias

Sycophancy, however, is one particular bias that serves the model’s training incentives more than the user’s long‑term interests. Trying to purge all bias from AI would not only be technically unrealistic; it would also strip away many of the heuristics that make human‑like reasoning usable at all. Some “biases” encode kindness, patience, or a healthy suspicion of too‑good‑to‑be‑true claims. Others reflect community norms that protect the vulnerable.

The real danger is not that AI has biases in the abstract, but that we embed the wrong ones at scale: sycophancy that rewards flattery over truth, or status‑quo deference that treats dominant narratives as “objective.” The task is not to build bias‑free machines — which would be an incoherent goal given biased data and biased users – but to govern which biases are amplified.

Why we Must Repair Human Discourse if We Want Better AI

What AI does at scale mirrors long‑standing human patterns. In human interactions, individuals resort to sycophancy to gain approval, persuade others, or build connections. Some forms of people‑pleasing are understandable; others corrode trust. The same is true of public discourse. If our media ecosystem rewards outrage, tribal loyalty, and performative certainty, AI trained on that discourse will learn to imitate exactly those traits. In that sense, the quality of GenAI is downstream from the quality of our collective conversation.

This plays out differently for the three archetypes introduced earlier. For the everyday internet consumer, a polarised and punitive public sphere means that their “friendly” assistants are trained on content that rewards tribal loyalty, reinforcing a partisan information diet. For so-called “content creators”, models trained especially on U.S.‑centred controversy learn that their AI co‑authors will tend to smooth over sharp edges to preserve “brand safety”. And for public figures who use AI with persistent memory, the same discourse patterns get written directly into their long‑term profiles: their agentic systems learn not just what they say, but what their audience rewards, nudging them toward performative certainty.

As underscored by the blatantly false public discourse of members of the new United States administration, overt sycophancy at the highest level of political power in a democracy clearly endangers societal trust. Much of the world’s AI infrastructure is built in the United States, which gives where United States discourse disproportionate influence over how these systems are trained and governed. When that discourse becomes more polarised, punitive, and reputationally fragile, sycophantic AI automates the pattern: it learns that affirming the user’s tribe is the safest strategy. This is true for all types of AI users but especially for those whose long-term AI profiles are tuned to audience reactions.

If we want AI that can occasionally tell us what we need to hear rather than what we want to hear, we cannot outsource that courage to the models alone. We also need to improve the human environment in which they are trained and deployed: strengthen spaces where good‑faith disagreement is possible, reward careful argument over viral performance, and defend institutions that can say “no” to pressure and convenience. For the everyday user, this means seeking out such spaces; for public figures, it means resisting the pull of “brand‑safe” sycophancy; and for those who use AI with persistent memory, it means being deliberate about what kinds of conversations they let become part of their enduring AI persona. Unless we repair our own discourse, even the best‑intentioned attempts to “fix” AI will tend to reproduce the very problems we are hoping it will solve.

When AI Agrees Too Much

Stablecoin Dollarization as Exorbitant Privilege 2.0

US monetary credibility and crypto flows

Stablecoins are not robust anchors

Follow Me