Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions

Current LLMs use RLHF to reduce explicit bias in their outputs. But do they also address implicit bias?

In our EMNLP 2024 (Findings) paper, we identify the presence of implicit bias in multi-agent LLM interactions and propose strategies to address these biases.

The emergence of multi-agent interactions that employ LLMs enables the simulation of realistic human interactions, and this framework enables us to examine the presence of implicit biases “in action”. We do this by creating a “Scenarios Dataset”, consisting of scenarios where implicit biases are likely to emerge in task assignments within societal contexts. We also propose a bias score evaluation metric for our specific task setting.

We find that biases increase after multi-agent interaction. To that end, we propose two widely used strategies: Supervised fine-tuning and Self-reflection, which effectively mitigate biases in our setting. For more information, read our paper:

By Angana Borah and Rada Mihalcea

Lessons Learned

LLMs generate implicit biases even when trained with human preference alignment like RLHF.
Larger models are prone to produce more biased outputs.
Biases increase after multi-agent LLM interactions.
Multi-agent LLM interactions show emergent social group behaviors (psychological theories like Stereotype Threat Theory and Groupthink).

Data and Code

The Scenarios, Fine-tune and Test datasets are provided in the Data folder.

The codebase for the multi-agent framework is in the Code folder.

Citation

@misc{borah2024implicitbiasdetectionmitigation,
      title={Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions}, 
      author={Angana Borah and Rada Mihalcea},
      year={2024},
      eprint={2410.02584},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.02584}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions

Lessons Learned

Data and Code

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions

Lessons Learned

Data and Code

Citation