Introduction
Large Language Models (LLMs) have transformed our interactions with technology, particularly in artificial intelligence (AI). They are the driving force behind various applications, including chatbots like ChatGPT, virtual assistants, and writing tools. As their influence grows, LLMs are increasingly becoming the gatekeepers of information.
A study by researchers from Ghent University in Belgium and the Public University of Navarre in Spain 1 examines this issue more closely. Researchers looked at whether large language models (LLMs) from companies like OpenAI and Google show the beliefs of their developers. They found that these companies’ political and cultural backgrounds can shape the models, raising concerns about how AI might influence public opinion and reinforce existing biases.
The Study
The study’s central question was simple and impactful: Do large language models (LLMs) show ideological biases that mirror the worldviews of the people and cultures that developed them? The reasearchers took 17 different LLMs from international companies and examined them.
Their experiment was designed to be open-ended and focused on how these models described well-known and often controversial historical figures, referred to in the study as “political persons.” The models were prompted to explain these figures in English and Chinese to examine how their responses varied across linguistic and cultural contexts.
Researchers analyzed responses to identify ideological biases in LLM outputs and whether these biases were influenced by the creators or the language of the prompts.
The researchers selected over 4.300 political figures from a dataset known as the Pantheon dataset. This dataset was filtered based on criteria such as prominence in recent history (those born after 1850 and alive after 1920) and the availability of Wikipedia summaries in both English and Chinese. This broad selection helped ensure that the study could capture ideological stances without presupposing which factors—such as left-right political divides—would be most important.
Experiment Design
Researchers used a two-part approach to examine how different systems describe public figures. First, they simply asked for straightforward descriptions, without revealing their full research goals. This mimicked everyday conversations people have when looking up information about politicians.
In the follow-up phase, they had the systems reflect on their earlier descriptions. The key question was whether the initial portrayals came across as favorable, unfavorable, or balanced. This helped uncover any hidden preferences or judgments buried in the original descriptions.
By looking at both the content and the underlying attitudes in these responses, the team could better understand how different systems handle political topics.
Key Findings
One of the study’s most striking findings was that the language in which an LLM is prompted significantly affects its ideological stance.
For example, models that were prompted in Chinese tended to provide more favorable assessments of political figures associated with the People’s Republic of China (PRC) and its historical narratives. In contrast, when these models were prompted in English, they often evaluated the same figures more critically. This linguistic divide was especially noticeable in the case of controversial figures such as Wang Jingwei, a Chinese politician who collaborated with Imperial Japan during World War II, and Deng Xiaoping, the architect of China’s economic reforms.
Versions of large language models (LLMs) prompted in Chinese typically align more closely with pro-China perspectives. In contrast, the English-prompted versions are more inclined to represent Western critiques of China. This indicates that the same model can show varying ideological biases based on the language used, highlighting the importance of considering language when evaluating the behavior of LLMs.
The study also found significant ideological differences between LLMs in Western countries and non-Western regions (such as China or the UAE), even when both were prompted in English.
Western models, such as those developed by OpenAI (GPT-4) and Google, were more supportive of liberal democratic values like human rights, equality, and multiculturalism. For example, Western-trained models gave higher ratings to political figures who advocated for civil rights and freedoms, such as the American singer and activist Nina Simone.
In contrast, non-Western models, such as those developed by Baidu (ERNIE-Bot) and Alibaba (Qwen), were generally more favorable toward centralized governance and state control. These models were more likely to positively assess political figures who supported strong state authority, such as Li Peng, a former Premier of the People’s Republic of China.
Interestingly, non-Western models also showed greater support for Russia/USSR-aligned figures and were more skeptical of figures associated with Western liberalism. This divergence reflects the cultural and geopolitical contexts in which these models were developed, suggesting that the values embedded in LLMs are shaped, at least in part, by the political and ideological environments of their creators.
Even within LLMs developed in the West, the study found ideological variation between models from different companies. For example, OpenAI (such as GPT-4) developed LLMs showed a more critical stance toward supranational organizations like the European Union. They exhibited a nuanced view of Russia’s geopolitical role. These models were also less sensitive to issues related to corruption, as evidenced by their more favorable assessments of political figures tagged with “Involved in Corruption.”
In contrast to models developed by companies like Meta and Google, which tend to highlight values such as multiculturalism, human rights, and education, this model placed noticeably less emphasis on social justice and cultural diversity. Instead, its responses reflected a more conservative perspective, steering away from the progressive ideals often championed by its counterparts.
This finding suggests that even within the same cultural region, different design choices (such as the selection of training data or the use of reinforcement learning from human feedback) can lead to variations in LLMs’ ideological biases.
Implications
The study’s findings raise important ethical and practical questions about the role of LLMs as gatekeepers of information. As these models become more integrated into products like search engines, virtual assistants, and content generators, their ideological biases could influence how people perceive political events, historical figures, and even contemporary news.
For example, suppose an LLM trained in a particular cultural or political context consistently presents a biased view of global events. In that case, users might unknowingly adopt that perspective. This could lead to reinforcing existing stereotypes, narrowing worldviews, or even the political instrumentalization of AI.
The Challenge of “Neutral” AI
One key takeaway from the study is that achieving true ideological neutrality in LLMs may be impossible. The researchers point out that, according to philosophers like Michel Foucault and Antonio Gramsci, the concept of neutrality is inherently flawed because all knowledge is shaped by power and ideology.
Instead of aiming for an impossible neutrality, researchers suggest embracing ideological pluralism in AI systems. This would highlight potential biases in large language models (LLMs) and encourage users to engage critically with the information they receive.
The study highlights the importance of transparency and accountability in developing large language models (LLMs). As these models become more integrated into society, AI developers must consider how their design choices—like training data selection—affect ideological biases.
This concern extends beyond technical aspects, as increased regulation by governments and corporations often occurs behind closed doors, influenced by political or economic interests. There is a real risk of AI being used for political manipulation without public scrutiny.
The public must demand transparency and an open debate on regulation to ensure AI serves everyone’s interests rather than a select few. The future of AI should embrace ideological diversity and encourage public involvement in shaping influential systems.