A new study by the MIT Centre for Constructive Communication has raised concerns about bias in leading artificial intelligence chatbots, finding that they provide lower-quality responses and refuse more queries from users with lower levels of formal education and non-native English speakers.
The research examined popular large language models (LLMs) including GPT-4, Claude 3 Opus and Llama 3. According to the findings, these systems were more likely to decline answering questions from users who were less educated and not fluent in English. In some cases, responses also contained condescending or patronising language.
The study found the sharpest decline in accuracy for users who fell into both categories those with less formal education and limited English proficiency. Claude 3 Opus, for instance, declined 11% of questions from less-educated, non-native English speakers. It also used patronising or mocking language 43.7% of the time when responding to less-educated users, compared to under 1% for highly educated users.
Researchers also compared performance across users from the United States, Iran and China, and observed significantly weaker performance for users from Iran.
The paper, titled “LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users,” was presented at the AAAI Conference on Artificial Intelligence in January.
The findings mirror patterns seen in human sociocognitive bias, where non-native English speakers are often perceived as less competent regardless of expertise. Researchers warned that if deployed widely without safeguards, such AI systems could reinforce existing inequalities by spreading misinformation or denying assistance to those who may rely on them most.
The study adds to the growing debate over fairness, accountability and inclusivity in AI development.