arxiv_cl 90% Match Research Paper AI ethics researchers,ML engineers,Computer vision researchers,NLP researchers,Multimodal AI developers 1 week ago

Breaking Language Barriers or Reinforcing Bias? A Study of Gender and Racial Disparities in Multilingual Contrastive Vision Language Models

large-language-models › multimodal-llms

📄 Abstract

Abstract: Multilingual vision-language models (VLMs) promise universal image-text retrieval, yet their social biases remain underexplored. We perform the first systematic audit of four public multilingual CLIP variants: M-CLIP, NLLB-CLIP, CAPIVARA-CLIP, and the debiased SigLIP-2, covering ten languages that differ in resource availability and morphological gender marking. Using balanced subsets of FairFace and the PATA stereotype suite in a zero-shot setting, we quantify race and gender bias and measure stereotype amplification. Contrary to the intuition that multilinguality mitigates bias, every model exhibits stronger gender skew than its English-only baseline. CAPIVARA-CLIP shows its largest biases precisely in the low-resource languages it targets, while the shared encoder of NLLB-CLIP and SigLIP-2 transfers English gender stereotypes into gender-neutral languages; loosely coupled encoders largely avoid this leakage. Although SigLIP-2 reduces agency and communion skews, it inherits -- and in caption-sparse contexts (e.g., Xhosa) amplifies -- the English anchor's crime associations. Highly gendered languages consistently magnify all bias types, yet gender-neutral languages remain vulnerable whenever cross-lingual weight sharing imports foreign stereotypes. Aggregated metrics thus mask language-specific hot spots, underscoring the need for fine-grained, language-aware bias evaluation in future multilingual VLM research.

Authors (3)

Zahraa Al Sahili

Ioannis Patras

Matthew Purver

Submitted

May 20, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

Conducts the first systematic audit of four multilingual CLIP variants (M-CLIP, NLLB-CLIP, CAPIVARA-CLIP, SigLIP-2) across ten languages to quantify race and gender bias. Finds that multilinguality often strengthens gender bias, and bias can transfer from high-resource to low-resource languages, contrary to expectations.

Business Value

Highlights critical ethical considerations for developing and deploying global AI systems, promoting fairness and mitigating harmful biases in multimodal applications.

Paper Metadata

Innovation Type

Evaluation / Analysis

Deployment Feasibility

High for the audit methodology; moderate for developing debiased multilingual VLMs.

Limitations Addressed

Underexplored social biases in multilingual vision-language models, particularly gender and racial disparities, and how multilinguality affects bias compared to English-only baselines.

Technical Tags

multilingual vision-language modelsgender biasracial biasstereotype amplificationCLIP variantszero-shot settinglow-resource languagesmorphological genderbias audit

Research Topics

AI Bias and FairnessMultilingual AIVision-Language ModelsAI Ethics

Methods & Architectures

Systematic auditZero-shot evaluationBias quantificationStereotype amplification measurement M-CLIPNLLB-CLIPCAPIVARA-CLIPSigLIP-2

Applications & Tasks

Computer Vision Natural Language Processing Multimodal AI AI Ethics Social biases in multilingual VLMsGender and racial disparitiesStereotype amplificationBias leakage across languages Quantifying race and gender biasMeasuring stereotype amplificationAuditing multilingual CLIP variantsAnalyzing bias transfer in VLMs

Datasets & Benchmarks

Datasets

FairFace, PATA stereotype suite

Race biasGender biasStereotype amplification

Related Fields

Artificial IntelligenceMachine LearningComputer VisionNatural Language ProcessingAI EthicsSociolinguistics

Keywords

multilingualvision-language modelsbiasfairnessgenderraceCLIPstereotypelow-resourceauditmultimodal AI

Academic Context

#AI Bias and Fairness#Multilingual AI#Vision-Language Models#AI Ethics

Commercial Potential

Potential Products

Fairer and less biased multimodal AI systemsTools for auditing AI bias in multilingual contexts

Target Industries

TechnologyAI DevelopmentSocial MediaContent Platforms

Use Case Examples

Developing image search engines that are free from gender or racial stereotypesCreating AI systems for content generation that avoid harmful biasesEnsuring fair representation in AI-driven visual analysis tools

Competitive Edge

Provides a comprehensive, cross-lingual audit of bias in leading multilingual VLMs, revealing critical issues that previous English-centric studies might miss.

Market Opportunity

Large and growing market for multimodal AI, with increasing regulatory focus on fairness.

Revenue Models

Consulting on AI ethics and bias mitigationdevelopment of fairer AI models.

Resource Requirements

Compute Needs

Moderate for running evaluations on VLM models.

Data Requirements

Requires balanced datasets like FairFace and PATA for bias evaluation.

Deployment Constraints

Mitigating bias in multilingual models is complex and requires careful architectural choices and training data curation.

Scalability

The audit methodology is scalable to more languages and VLM architectures.

Regulatory Considerations

Highrelated to AI fairnessnon-discriminationand ethical AI development.

Production Readiness

Maturity Level

Research / Audit

Time to Market

2-3 years for significant improvements in bias mitigation for multilingual VLMs.

Patent Potential

Low, focused on analysis and findings.

View Full Paper Back to Papers