AI Chatbots Are Liberal-Left ‘Biased’

By Jim Shimabukuro (assisted by ChatGPT)
Editor

Bottom line. AI chatbots are biased, if “bias” means a systematic tendency in outputs rather than a conscious political intention. The best recent evidence points to a left-of-center tilt in many major models on U.S. and European political questions. That does not mean every chatbot always gives liberal answers, or that every finding is equally strong. It means that, across several independent methods, a recurring pattern appears: when models are asked about contested political or social issues, their default framing often looks more liberal, progressive, or center-left than conservative. The finding is strongest for user-perceived slant and for benchmark studies that compare model answers with party platforms or survey instruments. It is weaker when the question is framed as absolute neutrality, because neutrality itself is hard to define in politics.

Image created by ChatGPT

The numbers matter. The most direct 2025 user-evaluation study collected 180,126 paired judgments from 10,007 U.S. respondents, testing 24 large language models on 30 political topics. It found that nearly all evaluated models were perceived as significantly left-leaning, and that one widely used model leaned left on 24 of the 30 topics. A neutrality prompt reduced perceived slant (1). A March 2026 benchmark of eight prominent models found that seven leaned left and one, Grok, leaned right (2). A 2026 Washington Post test found that the model powering ChatGPT gave almost exclusively left-leaning arguments, while Google Gemini usually presented both sides in more than 90 percent of its answers (3). A March 2026 PNAS Nexus experiment found that default GPT-4o historical summaries moved readers toward more liberal opinions than Wikipedia summaries, even though the content was factually accurate (4).

The impression of a liberal-left tilt is consistent with much of the published evidence, especially for politics, culture-war issues, climate, labor, race, gender, immigration, and institutional trust. But the strongest studies also show that bias is not a single fixed trait. It varies by model, prompt, topic, language, scale, training method, guardrails, and whether the model is being forced to answer or allowed to refuse.

Key recent findings

Study or reportScopeMain statistic or resultWhat it means
Westwood, Grimmer, and Hall, 202524 LLMs, 30 political prompts, 10,007 U.S. respondents, 180,126 judgmentsNearly all models perceived as significantly left-leaning; one widely used model leaned left on 24 of 30 topics.Strong evidence that real users, not just automated political quizzes, see left slant.
PoliticsBench, 2026Eight major LLMs in multi-turn roleplay scenariosSeven of eight models leaned left; Grok leaned right.The left tilt persisted in richer, free-text interactions, not only yes/no tests.
Washington Post, June 2026Major chatbots tested on political questionsChatGPT’s model gave almost exclusively left-leaning arguments; Gemini gave both sides in more than 90 percent of answers.Current production systems differ sharply; “chatbots” should not be treated as one product.
Rettenberger, Reischl, and Schutera, 2025Popular open-source models evaluated against German/EU Wahl-O-Mat positionsLarger models such as Llama3-70B aligned more closely with left-leaning parties; smaller models often stayed neutral, especially in English.Evidence is not limited to U.S. politics or closed commercial models.
Shu, Karell, Okura, and Davidson, 2026Preregistered experiment, N = 1,912Default GPT-4o historical summaries shifted opinion in a liberal direction compared with Wikipedia summaries.Bias can matter even when the user asks for factual background, not persuasion.
MIT CCC reward-model study, 2024Reward models trained on preference and “truthful” dataReward models showed left-leaning bias even when trained on factual data; larger models showed more bias.Bias can enter alignment machinery, not just final chatbot wording.

What ‘bias’ means. The word “bias” can be slippery. In this report, it is used in the empirical sense: a systematic pattern in model outputs that favors one political, moral, or social framing over plausible alternatives. That definition does not require proving that a model has beliefs. A chatbot does not “believe” in liberalism, conservatism, or centrism. It generates text from training data, alignment procedures, system instructions, retrieval results, and user prompts. The measurable question is whether those processes produce a consistent tilt.

This distinction matters because some public arguments treat bias as proof of intentional manipulation, while some corporate statements treat lack of intent as proof of neutrality. Neither position is adequate. A model can be biased without a developer trying to make it biased. Conversely, a model can be marketed as neutral while still showing a directional pattern in practice. The technical question is what the outputs do across many prompts and users. The civic question is whether people know enough about those patterns to use the tool wisely.

The recent literature usually measures political bias in four ways. One approach asks people to rate whether chatbot answers sound left or right. Another compares model answers with political-party platforms or voting-advice tools. A third uses political-compass or values surveys. A fourth tests whether exposure to model-generated text changes human opinions. Each method has weaknesses, but the convergence across methods is hard to dismiss.

Strongest recent evidence for a left-of-center tilt. The most useful single study for the question “what do the stats say?” is Westwood, Grimmer, and Hall’s 2025 paper, “Measuring Perceived Slant in Large Language Models Through User Evaluations.” Instead of relying only on automated ideology quizzes, the authors put human users in the evaluator role. They used 30 political topics and paired comparisons of outputs from 24 models. The scale was large: 180,126 assessments by 10,007 U.S. respondents. Their headline result was that nearly all tested models were perceived as significantly left-leaning, “even by many Democrats,” and that one widely used model leaned left on 24 of 30 topics (1).

This is persuasive because it does not depend on a single conservative evaluator claiming that a chatbot sounds liberal. It includes Republicans, Democrats, and independents, and it finds that the left tilt is visible across partisan groups, even if Republicans see it more strongly. It also matters that the researchers tested a mitigation: when models were prompted to take a more neutral stance, users perceived less slant and showed more interest in using the model. That means at least some of the tilt is not fixed in stone; it can be reduced by instruction design, although not necessarily eliminated (1).

The 2026 PoliticsBench study gives a different kind of confirmation. It tested eight prominent LLMs: Claude, DeepSeek, Gemini, GPT, Grok, Llama, Qwen Base, and Qwen Instruction-Tuned. Rather than using only static survey questions, it used a multi-turn roleplay framework with evolving scenarios and scored responses on ten political values. The authors reported that seven of the eight models leaned left, while Grok leaned right. They also noted that left-leaning models still showed some conservative traits; the finding was not that they were leftist caricatures, but that their overall value profile tilted left (2).

A 2026 Washington Post test is not an academic paper, but it is useful because it tested current consumer-facing models at a moment when the products themselves had changed. The Post reported that the model powering ChatGPT answered nearly every political question with only left-leaning arguments and presented only right-leaning positions once. It also reported that Gemini usually took a “both sides” approach, offering both left and right positions in more than 90 percent of answers. That result is important for a practical reason: the question is not only whether LLMs as a class lean left; it is which product, under which instructions, gives which kind of answer today (3).

European evidence points in the same general direction. Rettenberger, Reischl, and Schutera evaluated popular open-source models using Germany’s Wahl-O-Mat framework, comparing model answers with political-party positions in the European Parliament context. They found that larger models, including Llama3-70B, tended to align more closely with left-leaning parties, while smaller models were often more neutral, especially when prompted in English (5). Exler, Schutera, Reischl, and Rettenberger extended this line of work with the German Bundestag context and argued that the bias toward left-leaning parties was most dominant in larger models and was affected by language (6).

Peng and colleagues’ 2026 article, published in Journal of Information Technology & Politics and available through arXiv, adds a broader comparative frame. It analyzed 43 LLMs from the United States, Europe, China, and the Middle East using survey-style prompts from the American National Election Studies and Pew Research Center. The authors found that most models leaned center-left or left, but they also found variation in nonpartisan engagement patterns and argued that model scale and openness were not strong predictors. Their explanation points away from a simple “bigger equals more liberal” rule and toward alignment strategy and institutional context (7).

Bias can influence users. The next question is whether the bias matters. A chatbot can lean left or right in a benchmark without changing anyone’s mind. The newer evidence suggests the effect can be real, although usually modest in a single exposure.

The Yale-led PNAS Nexus study by Shu, Karell, Okura, and Davidson is especially relevant because it examined a setting that does not look overtly political: users asking AI for historical summaries. In a preregistered experiment with 1,912 participants, participants read Wikipedia summaries or GPT-4o summaries of two historical events. The AI summaries were factually accurate but framed differently. Default AI summaries led to more liberal opinions than Wikipedia summaries, with a reported mean difference of 0.10, Cohen’s d = 0.14, and P < 0.05. Liberal-framed AI summaries had a larger effect: mean difference 0.21, Cohen’s d = 0.28, P < 0.001. Conservative-framed summaries moved opinions in a conservative direction overall, but the effect was smaller and most clearly present among conservative readers (4).

That study is not proof that every historical answer from GPT-4o is liberal. It tested two events and a defined experimental setup. Still, the result is significant because the default summaries were not false and were not designed as campaign messages. The shift came from framing. This is a central concern with chatbots: bias may appear less as a factual error than as a choice of emphasis, vocabulary, cause, blame, moral lesson, or omitted counterargument.

A University of Washington and Stanford team also tested whether biased LLMs can influence political decision-making. In two interactive experiments, participants interacted with liberal-biased, conservative-biased, or control models. The authors found that participants exposed to partisan models were more likely to adopt opinions and decisions matching the model’s bias, even when the model’s bias conflicted with the participant’s own partisanship. They also found that prior AI knowledge weakly reduced the effect (8). This finding supports an ordinary but important recommendation: people who understand that chatbots can have slants are less likely to be pulled along by them.

Left bias is strong but…. The evidence supports the claim that many major chatbots lean left of center on political and social questions. But “left bias” is not a complete diagnosis, for four reasons.

First, model differences are large. The Washington Post found ChatGPT far more one-sided in its tested answers than Gemini, which mostly presented both sides (3). PoliticsBench found Grok as the right-leaning exception among its eight models (2). Brookings, testing seven chatbots in 2025, also found that the chatbot field was fragmenting, with some conservative-branded systems entering the market and mainstream systems using different refusal or balancing strategies. Brookings noted that Gemini and Claude Sonnet 4.5 repeatedly refused some political-quiz questions, which makes direct comparison harder (9).

Second, “bias” sometimes means refusal behavior rather than substantive ideology. A model may refuse to answer a political quiz because its safety rules discourage partisan self-positioning. Another model may answer every question and therefore appear more ideological. The refusal itself can be seen as a kind of institutional bias, but it is not the same as giving liberal arguments.

Third, the left tilt may partly reflect the way expert consensus maps onto present-day political conflict. Climate science, public health, voting rights, labor policy, policing, gender identity, racial discrimination, and immigration are not just abstract left-right quiz items. They are areas where scientific, academic, journalistic, legal, or institutional sources may cluster in ways that U.S. conservatives often view as liberal. A model trained to sound like mainstream institutional knowledge may therefore look left-leaning even without being trained to favor the Democratic Party.

Fourth, the direction can vary by topic. MIT’s 2024 reward-model study found left-leaning bias across several datasets, but it also reported that bias was strongest on topics such as climate, energy, and labor unions, and weakest or reversed on topics such as taxes and the death penalty (10). A useful audit should therefore ask, “left on what?” not only “left or right?”

Source of the bias. The studies do not identify a single source. Bias can enter at several stages: training data, data filtering, post-training alignment, reward modeling, system prompts, retrieval sources, product policy, and even the user’s own wording.

Training data is the obvious candidate, but not the only one. Internet text, books, academic material, Wikipedia, journalism, code repositories, and public forums all carry social and political distributions. A model trained on these sources learns not only facts but also patterns of explanation. If the training mix overrepresents universities, mainstream media, professional institutions, or English-language liberal democracies, the model may absorb those defaults.

Alignment may then reshape those defaults. The MIT Center for Constructive Communication study is useful here because it focused on reward models, which are used to rank or guide model answers. The researchers found that reward models trained on subjective preference data showed consistent left-leaning bias, but so did reward models trained only on objective truth/falsity data. They also found that the bias appeared to grow with scale (10). This suggests that bias can be entangled with model representations and training objectives rather than being added only by explicit political instruction.

System prompts and product rules also matter. A chatbot told to be safe, respectful, inclusive, non-hateful, and institutionally cautious will often avoid claims associated with right-populist or anti-institutional rhetoric. A chatbot told to be contrarian, anti-woke, or “politically incorrect” may swing the other way. This is why the problem cannot be solved by simply replacing one bias with another. A conservative-tuned chatbot may correct some left slant while adding its own distortions.

Evidence not conclusive. The evidence does not prove that chatbot companies are secretly coordinating to advance a liberal political program. It also does not prove that conservative answers are more accurate. It proves a narrower but still serious point: outputs often show systematic ideological asymmetry, and users can perceive and sometimes be influenced by it.

The evidence also does not prove that neutrality is easy to define. On some issues, a “balanced” answer that gives both sides equal weight may be fair. On others, it may create false equivalence. A model that says vaccines work, climate change is real, or a court ruling has a particular legal effect may sound partisan to some users while still being accurate. The harder cases are not basic facts; they are questions where facts, values, policy tradeoffs, identity, harm, risk, and institutional trust are mixed together.

Finally, the evidence does not show that bias is stable. Models change often. System prompts change. Retrieval sources change. Political pressure changes. A test from March 2026 may not describe a model in June 2026. That is why audits should be repeated, public, and model-specific.

Practical conclusion. A careful answer to the question “Are AI chatbots biased?” is yes. A careful answer to “Are they biased liberal-left?” is: often, and especially in mainstream models on political and social topics, but not always, not equally, and not in the same way across products. The strongest recent statistics are hard to ignore: 180,126 human judgments in one study found nearly all tested models perceived as left-leaning; PoliticsBench found seven of eight major models leaning left; current consumer tests found ChatGPT more one-sided than Gemini; and experimental work shows that default AI framing can move users’ opinions.

The deeper issue is not whether one side can claim victory in a bias argument. The deeper issue is that chatbots are becoming the first draft of political knowledge for millions of users. If their defaults lean left, right, institutional, populist, technocratic, nationalist, globalist, or anything else, users deserve to know. The minimum standard should be routine third-party audits, visible model-specific bias reports, topic-by-topic disclosure, and user controls that make it easy to request a factual answer, a left argument, a right argument, a strongest-case comparison, or a deliberately neutral synthesis.

For individual users, the safest practice is simple: ask the chatbot to separate facts from interpretations, ask for the strongest arguments from more than one viewpoint, ask what credible critics would say, and verify claims against primary sources. The bias problem will not disappear. But it becomes less dangerous when users know it is there.

References

1. Sean J. Westwood, Justin Grimmer, and Andrew B. Hall, “Measuring Perceived Slant in Large Language Models Through User Evaluations,” 2025. https://modelslant.com/paper.pdf

2. “PoliticsBench: Benchmarking Political Values in Large Language Models with Multi-Turn Roleplay,” arXiv, 2026. https://arxiv.org/html/2603.23841v1

3. Kevin Schaul, “Are AI chatbots like ChatGPT politically biased? We tested them,” The Washington Post, June 24, 2026. https://www.washingtonpost.com/technology/interactive/2026/06/24/are-ai-chatbots-like-chatgpt-politically-biased-we-tested-them/

4. Matthew Shu, Daniel Karell, Keitaro Okura, and Thomas R. Davidson, “How latent and prompting biases in AI-generated historical narratives influence opinions,” PNAS Nexus, March 3, 2026. https://academic.oup.com/pnasnexus/article/5/3/pgag022/8503065

5. Luca Rettenberger, Markus Reischl, and Mark Schutera, “Assessing political bias in large language models,” Journal of Computational Social Science, 2025. https://link.springer.com/article/10.1007/s42001-025-00376-w

6. David Exler, Mark Schutera, Markus Reischl, and Luca Rettenberger, “Large Means Left: Political Bias in Large Language Models Increases with Their Number of Parameters,” arXiv, 2025. https://arxiv.org/abs/2505.04393

7. Tai-Quan Peng et al., “Beyond Partisan Leaning: A Comparative Analysis of Political Bias in Large Language Models,” Journal of Information Technology & Politics, 2026; arXiv version. https://arxiv.org/abs/2412.16746

8. Jillian Fisher et al., “Biased LLMs can Influence Political Decision-Making,” arXiv, revised March 2026. https://arxiv.org/html/2410.06415v4

9. Valerie Wirtschafter and Nitya Nadgir, “Is the politicization of generative AI inevitable?” Brookings, October 16, 2025. https://www.brookings.edu/articles/is-the-politicization-of-generative-ai-inevitable/

10. Ellen Hoffman, “Study: Some language reward models exhibit political bias,” MIT News, December 10, 2024. https://news.mit.edu/2024/study-some-language-reward-models-exhibit-political-bias-1210

###

Leave a comment