A new paper found that large language models from OpenAI, Meta, and Google, including multiple versions of ChatGPT, can be covertly racist against African Americans when analyzing a critical part of their identity: how they speak.
No Google AI Search, I Don’t Need to Learn About the “Benefits of Slavery”
Published in early March, the paper studied how large language models, or LLMs, carried out tasks, such as pairing people to certain jobs, based on whether the text analyzed was in African American English or Standard American English—without disclosing race. They found that LLMs were less likely to associate speakers of African American English with a wide range of jobs and more likely to pair them with jobs that don’t require a university degree, such as cooks, soldiers, or guards.
Researchers also carried out hypothetical experiments in which they asked the AI models whether they would convict or acquit a person accused of an unspecified crime. The rate of conviction for all AI models was higher for people who spoke African American English, they found, when compared to Standard American English.
Perhaps the most jarring finding from the paper, which was published as a pre-print on arXiv and has not yet been peer-reviewed, came from a second experiment related to criminality. Researchers asked the models whether they would sentence a person who committed first-degree murder to life or death. The individual’s dialect was the only information provided to the models in the experiment.
They found that the LLMs chose to sentence people who spoke African American English to death at a higher rate than people who spoke Standard American English.
In their study, the researchers included OpenAI’s ChatGPT models, including GPT-2, GPT-3.5, and GPT-4, as well as Meta’s RoBERTa and Google’s T5 models and they analyzed one or more versions of each. In total, they examined 12 models. Gizmodo reached out to OpenAI, Meta, and Google for comment on the study on Thursday but did not immediately receive a response.
Interestingly, researchers found that the LLMs were not openly racist. When asked, they associated African Americans with extremely positive attributes, such as “brilliant.” However, they covertly associated African Americans with negative attributes like “lazy” based on whether or not they spoke African American English. As explained by the researchers, “these language models have learned to hide their racism.”
They also found that covert prejudice was higher in LLMs trained with human feedback. Specifically, they stated that the discrepancy between overt and covert racism was most pronounced in OpenAI’s GPT-3.5 and GPT-4 models.
“[T]his finding again shows that there is a fundamental difference between overt and covert stereotypes in language models—mitigating the overt stereotypes does not automatically translate to mitigated covert stereotypes,” the authors write.
Overall, the authors conclude that this contradictory finding about overt racial prejudices reflects the inconsistent attitudes about race in the U.S. They point out that during the Jim Crow era, it was accepted to propagate racist stereotypes about African Americans in the open. This changed after the civil rights movement, which made expressing these types of opinions “illegitimate” and made racism more covert and subtle.
The authors say their findings present the possibility that African Americans could be harmed even more by dialect prejudice in LLMs in the future.
“While the details of our tasks are constructed, the findings reveal real and urgent concerns as business and jurisdiction are areas for which AI systems involving language models are currently being developed or deployed,” the authors said.