Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models

arXiv cs.CL·Pablo Riera, Pablo Brusco, Cristina Kuo, Marcelo Sancinetti, S. R. K. Branavan

8h ago

·~2 min·5/21/2026·en·1

Quick Take

The study explores synchronization and turn-taking in full-duplex speech dialogue models, revealing insights into their interaction dynamics.

Key Points

Full-duplex models enable simultaneous listening and speaking.
Synchronization measured via Centered Kernel Alignment.
Internal states encode anticipatory turn-taking cues.

📖 Reader Mode

~2 min read

[Submitted on 19 May 2026]

View PDF HTML (experimental)

Abstract:Full-duplex spoken dialogue models (SDMs) can listen and speak simultaneously, enabling interaction dynamics closer to human conversation than turn-based systems. Inspired by neural coupling in human communication, we study how such models coordinate their internal representations during interaction. We simulate full-duplex dialogues between two instances of the pretrained \textit{Moshi} model under controlled conditions, manipulating channel noise and decoding bias. Synchronization is measured using Centered Kernel Alignment (CKA) across temporal lags, while anticipatory turn-taking cues are probed from delayed internal activations using causal LSTM models, from both speaker and listener perspectives. We find strong representational synchronization under no noise conditions, peaking near zero lag and degrading with noise, and we show that internal states encode anticipatory information that supports turn-taking prediction ahead of time.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD)
Cite as:	arXiv:2605.20356 [cs.CL]
	(or arXiv:2605.20356v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2605.20356 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Pablo Riera Dr [view email]
[v1] Tue, 19 May 2026 18:11:03 UTC (1,605 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models

Quick Take

Key Points

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution

Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs

Related in this space

From Prompts to Protocols: An AI Agent for Laboratory Automation

Agentic Trading: When LLM Agents Meet Financial Markets