Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 90% Match Research Paper AI ethics researchers,ML engineers,Computer vision researchers,NLP researchers,Multimodal AI developers 1 week ago

Breaking Language Barriers or Reinforcing Bias? A Study of Gender and Racial Disparities in Multilingual Contrastive Vision Language Models

large-language-models › multimodal-llms
📄 Abstract

Abstract: Multilingual vision-language models (VLMs) promise universal image-text retrieval, yet their social biases remain underexplored. We perform the first systematic audit of four public multilingual CLIP variants: M-CLIP, NLLB-CLIP, CAPIVARA-CLIP, and the debiased SigLIP-2, covering ten languages that differ in resource availability and morphological gender marking. Using balanced subsets of FairFace and the PATA stereotype suite in a zero-shot setting, we quantify race and gender bias and measure stereotype amplification. Contrary to the intuition that multilinguality mitigates bias, every model exhibits stronger gender skew than its English-only baseline. CAPIVARA-CLIP shows its largest biases precisely in the low-resource languages it targets, while the shared encoder of NLLB-CLIP and SigLIP-2 transfers English gender stereotypes into gender-neutral languages; loosely coupled encoders largely avoid this leakage. Although SigLIP-2 reduces agency and communion skews, it inherits -- and in caption-sparse contexts (e.g., Xhosa) amplifies -- the English anchor's crime associations. Highly gendered languages consistently magnify all bias types, yet gender-neutral languages remain vulnerable whenever cross-lingual weight sharing imports foreign stereotypes. Aggregated metrics thus mask language-specific hot spots, underscoring the need for fine-grained, language-aware bias evaluation in future multilingual VLM research.
Authors (3)
Zahraa Al Sahili
Ioannis Patras
Matthew Purver
Submitted
May 20, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

Conducts the first systematic audit of four multilingual CLIP variants (M-CLIP, NLLB-CLIP, CAPIVARA-CLIP, SigLIP-2) across ten languages to quantify race and gender bias. Finds that multilinguality often strengthens gender bias, and bias can transfer from high-resource to low-resource languages, contrary to expectations.

Business Value

Highlights critical ethical considerations for developing and deploying global AI systems, promoting fairness and mitigating harmful biases in multimodal applications.