Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research AI researchers,Speech technology developers,Open-source community,Developers of conversational AI 1 week ago

OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model

speech-audio › text-to-speech
📄 Abstract

Abstract: Empathetic interaction is a cornerstone of human-machine communication, due to the need for understanding speech enriched with paralinguistic cues and generating emotional and expressive responses. However, the most powerful empathetic LSLMs are increasingly closed off, leaving the crucial details about the architecture, data and development opaque to researchers. Given the critical need for transparent research into the LSLMs and empathetic behavior, we present OpenS2S, a fully open-source, transparent and end-to-end LSLM designed to enable empathetic speech interactions. Based on our empathetic speech-to-text model BLSP-Emo, OpenS2S further employs a streaming interleaved decoding architecture to achieve low-latency speech generation. To facilitate end-to-end training, OpenS2S incorporates an automated data construction pipeline that synthesizes diverse, high-quality empathetic speech dialogues at low cost. By leveraging large language models to generate empathetic content and controllable text-to-speech systems to introduce speaker and emotional variation, we construct a scalable training corpus with rich paralinguistic diversity and minimal human supervision. We release the fully open-source OpenS2S model, including the dataset, model weights, pre-training and fine-tuning codes, to empower the broader research community and accelerate innovation in empathetic speech systems. The project webpage can be accessed at https://casia-lm.github.io/OpenS2S
Authors (11)
Chen Wang
Tianyu Peng
Wen Yang
Yinan Bai
Guangfu Wang
Jun Lin
+5 more
Submitted
July 7, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

Presents OpenS2S, a fully open-source, end-to-end LSLM designed for empathetic speech interactions. It features a streaming interleaved decoding architecture for low-latency generation and an automated data construction pipeline, leveraging LLMs for empathetic content.

Business Value

Promotes research and development in empathetic AI by providing an open-source, transparent platform. This can lead to more natural and emotionally intelligent human-machine interactions in various applications.