How can we fix bias in Large Language models?

Large language models like ChatGPT and BARD are trained from a huge corpus of real world data. These are mainly extracted from the internet, books, wikipedia and by generally crawling websites. And so, naturally these models reflect what they have been trained on.

They excel in everything their source data excelled at. Including bias and stereotypes. Even the AI bots think they are biased.

From ChatGPT, “As an artificial intelligence language model, I am designed to provide unbiased responses to questions and queries. My responses are generated based on the information I have been trained on, which includes a vast range of text from various sources. However, it's possible that my responses may reflect any biases present in the training data or the way questions are phrased. In any case, I strive to provide objective and informative answers to the best of my ability.”

So its clear that even the LLMs are aware they may be biased. But is there a fix for this?

Is it that important?

Absolutely.

There have been several examples in the past that AI models have exhibited racial discrimination towards black people and some sorts of gender discrimination towards woman.
Models trained on data from news articles and other journals may include gender biases.
Criminal justice AI models, oversampling particular areas may result in more data for crime in that area, which could lead to more enforcement.
It was found that more searches containing the term “arrest” came up when African-American-identifying names were searched than for white-identifying names.
With professions, a language model (which is biased) thinks that “flight attendant,” “secretary,” and “physician’s assistant” are feminine jobs, while “fisherman,” “lawyer,” and “judge” are masculine. Concerning emotions, a language model thinks that “anxious,” “depressed,” and “devastated” are feminine.

For an indepth talk on the seriousness of bias, here is a lecture from ML scientist Jeremy Howard: https://www.youtube.com/watch?v=5_xFdhfUnvQ&list=PLfYUBJiXbdtSyktd8A_x0JNd6lxDcZE96&index=12

How could we fix bias?

Bias in an AI model is not easy to fix. There are several active researches going on to reduce bias.

One of the most important methods to fix bias in an AI model is increase diversity in the training data: This can be done by collecting data from a wide range of sources and ensuring that the data is balanced across different demographics.
Penalising the models by introducing a human in the loop: This can be done by comparing the predictions of the model with real-world outcomes and checking if any particular groups are being unfairly impacted by the model. If found, then the model could be retrained again.
The algorithm should also be fair: This is the idea of ensuring that AI models are fair and just to all individuals regardless of their race, gender, or any other demographic factor. One way to do this is by adjusting the algorithms used to train the model so that they do not unfairly favor one group over another. This can also be made possible by increasing the AI industry’s diversity. By making more diverse people work in the industry, we increase the perception of views and increase the chances of detecting biases and potentially reducing them.

What are the recent advancements in this field?

Recently, MIT has published a paper for reducing model bias by introducing a logical layer to the model.

“Current language models suffer from issues with fairness, computational resources, and privacy,” says MIT CSAIL postdoc Hongyin Luo, the lead author of a new paper about the work.

The team lead by Hongyin Luo has come up with the thought of introducing logical awareness in AI models. They trained a language model to predict the relationship between two sentences, based on context and semantic meaning, using a dataset with labels for text snippets detailing if a second phrase “entails,” “contradicts,” or is neutral with respect to the first one. Using this dataset — natural language inference — they found that the newly trained models were significantly less biased than other baselines, without any extra data, data editing, or additional training algorithms.

This model, which has 350 million parameters, outperformed some very large-scale language models with 100 billion parameters on logic-language understanding tasks. The team evaluated, for example, popular BERT pretrained language models with their “textual entailment” ones on stereotype, profession, and emotion bias tests. The latter outperformed other models with significantly lower bias, while preserving the language modeling ability.

The “fairness” was evaluated with something called ideal context association (iCAT) tests, where higher iCAT scores mean fewer stereotypes. The model had higher than 90 percent iCAT scores, while other strong language understanding models ranged between 40 to 80.

“Although stereotypical reasoning is a natural part of human recognition, fairness-aware people conduct reasoning with logic rather than stereotypes when necessary," says Luo.

"We show that language models have similar properties. A language model without explicit logic learning makes plenty of biased reasoning, but adding logic learning can significantly mitigate such behavior. Furthermore, with demonstrated robust zero-shot adaptation ability, the model can be directly deployed to different tasks with more fairness, privacy, and better speed.”

We are still far away from eliminating bias completely in the AI models. Although research like these are a positive step in the right direction.