arxiv_cl 70% Match Research Paper Linguists,Computational Linguists,NLP Researchers 3 weeks ago

Quantifying Phonosemantic Iconicity Distributionally in 6 Languages

speech-audio › text-to-speech

📄 Abstract

Abstract: Language is, as commonly theorized, largely arbitrary. Yet, systematic relationships between phonetics and semantics have been observed in many specific cases. To what degree could those systematic relationships manifest themselves in large scale, quantitative investigations--both in previously identified and unidentified phenomena? This work undertakes a distributional approach to quantifying phonosemantic iconicity at scale across 6 diverse languages (English, Spanish, Hindi, Finnish, Turkish, and Tamil). In each language, we analyze the alignment of morphemes' phonetic and semantic similarity spaces with a suite of statistical measures, and discover an array of interpretable phonosemantic alignments not previously identified in the literature, along with crosslinguistic patterns. We also analyze 5 previously hypothesized phonosemantic alignments, finding support for some such alignments and mixed results for others.

Authors (2)

George Flint

Kaustubh Kislay

Submitted

October 15, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

This work presents a novel distributional approach to quantitatively measure phonosemantic iconicity across six diverse languages. It discovers new interpretable phonosemantic alignments and cross-linguistic patterns, contributing to a deeper understanding of the non-arbitrary aspects of language.

Business Value

Understanding the systematic relationships between sound and meaning can inform the design of more intuitive and effective natural language processing systems, potentially improving machine translation, speech recognition, and text generation.

Paper Metadata

Innovation Type

Methodological

Deployment Feasibility

High, as it's a research methodology that can be applied to existing linguistic data.

Limitations Addressed

Previous studies on phonosemantic iconicity were often limited to specific cases or qualitative observations; this work provides a large-scale, quantitative investigation.

Technical Tags

phonosemanticsiconicitydistributional semanticslinguisticscomputational linguisticscross-linguistic analysismorpheme analysisphonetic similaritysemantic similarity

Research Topics

Linguistic TheoryComputational LinguisticsPhonetics and PhonologySemanticsLanguage Diversity

Methods & Architectures

Distributional approachStatistical measuresComparative analysis

Applications & Tasks

Linguistics research Language acquisition studies Natural Language Processing Quantifying linguistic phenomenaIdentifying systematic relationships in language Quantifying phonosemantic iconicityAnalyzing morpheme similarity spaces

Related Fields

LinguisticsPsycholinguisticsCognitive ScienceNatural Language Processing

Keywords

phonosemantic iconicitydistributional semanticslinguisticslanguagephoneticssemanticscross-linguisticmorphemequantitative analysislanguage structure

Academic Context

#Linguistic Theory#Computational Linguistics#Phonetics and Phonology#Semantics#Language Diversity

Commercial Potential

Competitive Edge

Offers a more scalable and quantitative approach compared to previous qualitative or case-specific studies of phonosemantic iconicity.

Resource Requirements

Compute Needs

Moderate, depending on the size of linguistic corpora analyzed.

Data Requirements

Large linguistic corpora in multiple languages.

Scalability

The distributional approach is inherently scalable to larger datasets and more languages.

Production Readiness

Maturity Level

Research

Patent Potential

Low

View Full Paper Back to Papers