MLLP-VRAIN UPV system for the IWSLT 2026 Simultaneous Speech Translation task
Quick Answer
This paper shows that The MLLP-VRAIN group employs Parakeet and Qwen 3.5 models for IWSLT 2026 Simultaneous Speech Translation, achieving a +5.82 improvement on the MCIF En→De test set.
Quick Take
The MLLP-VRAIN group employs Parakeet and Qwen 3.5 models for IWSLT 2026 Simultaneous Speech Translation, achieving a +5.82 improvement on the MCIF En→De test set. Their new context track further enhances performance by +1.03 through ASR word-boosting and mechanisms.
Key Points
- Utilized Parakeet and Qwen 3.5 models for robust SimulST solutions.
- Participated in all language directions, including new context track.
- Achieved +5.82 improvement on MCIF En→De test set.
- Context track processing improved performance by +1.03.
- Implemented adaptive 'black-box' policies for better quality-latency trade-offs.
Paper Resources
Article Excerpt
From source RSS / original summaryarXiv:2606. 17255v1 Announce Type: new Abstract: This work describes the participation of the MLLP-VRAIN research group in the shared task of the IWSLT 2026 Simultaneous Speech Translation track. Our submission utilizes the recently released Parakeet and Qwen 3. 5 models to create a robust, cascaded solution for long-form SimulST through the use of adaptive "black-box" policies. We explore relaxations of these policies to achieve better quality-latency trade-offs.
Compared to last year, we participate on all language directions. In addition to this, for the En$\rightarrow${De, It, Zh} directions we also participate in this year's new context track employing a combination of ASR word-boosting and a mechanism of offline pre-translated exemplars to guide generation and enrich our system with domain-specific context. Finally, we provide a detailed latency analysis of our system.
Compared to last year, results on the MCIF En$\rightarrow$De test set shows a substantial quality improvement of +5. 82 XCOMET-XL. Our context track processing further improves performance by +1. 03.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.