Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models
Quick Take
The study explores synchronization and turn-taking in full-duplex speech dialogue models, revealing insights into their interaction dynamics.
Key Points
- Full-duplex models enable simultaneous listening and speaking.
- Synchronization measured via Centered Kernel Alignment.
- Internal states encode anticipatory turn-taking cues.
📖 Reader Mode
~2 min readAbstract:Full-duplex spoken dialogue models (SDMs) can listen and speak simultaneously, enabling interaction dynamics closer to human conversation than turn-based systems. Inspired by neural coupling in human communication, we study how such models coordinate their internal representations during interaction. We simulate full-duplex dialogues between two instances of the pretrained \textit{Moshi} model under controlled conditions, manipulating channel noise and decoding bias. Synchronization is measured using Centered Kernel Alignment (CKA) across temporal lags, while anticipatory turn-taking cues are probed from delayed internal activations using causal LSTM models, from both speaker and listener perspectives. We find strong representational synchronization under no noise conditions, peaking near zero lag and degrading with noise, and we show that internal states encode anticipatory information that supports turn-taking prediction ahead of time.
| Subjects: | Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD) |
| Cite as: | arXiv:2605.20356 [cs.CL] |
| (or arXiv:2605.20356v1 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2605.20356 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Pablo Riera Dr [view email]
[v1]
Tue, 19 May 2026 18:11:03 UTC (1,605 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.