Syll: Open-Source Personal Automation with Cross-Surface Execution
Quick Answer
Syll is an open-source multimodal personal automation agent that integrates APIs, CLI, and GUI, enabling users to teach and audit agent behavior across diverse interfaces.
Quick Take
Syll is an open-source multimodal personal automation agent that integrates APIs, CLI, and GUI, enabling users to teach and audit agent behavior across diverse interfaces. It supports direct user demonstrations to compile reusable skills and provides multimodal evidence for inspection, validated on applications like Adobe Photoshop and macOS Finder.
Key Points
- Syll unifies /API tools, CLI execution, and visual GUI control in a modular runtime.
- Users can teach procedures through demonstrations, which Syll converts into reusable skills.
- Agent execution generates multimodal evidence for user inspection and control.
- Memory, skills, and routines are externalized as editable local artifacts.
- Validated on production applications like Adobe Photoshop and Stardew Valley.
Article Content
From source RSS / original summaryarXiv:2606. 07594v1 Announce Type: new Abstract: Personal AI agents must increasingly operate across APIs, shells, web surfaces, and desktop GUIs, yet many systems remain tuned to a single interface and offer limited support for user teaching and auditability.
We present Syll, an open-source, self-hosted multimodal agent harness that unifies /API tools, CLI execution, and visual GUI control in a modular runtime, enabling agents to coordinate computer use across heterogeneous interfaces while streamlining how users and agents exchange information.
At the core of Syll is a bidirectional user-agent interaction layer: users teach procedures through direct demonstration, which Syll compiles into reusable skills; agent execution is translated back into multimodal evidence -- logs, keyframes, and approval checkpoints -- for inspection and control. Syll further externalizes memory, skills, routines, and governance as editable local artifacts, supporting straightforward inspection, extension, and downstream development.
Our implementation has been validated on production desktop applications including Adobe Photoshop, Adobe Audition, Stardew Valley, macOS Finder and others. We report mechanism-oriented studies that validate multimodal routing, teachable GUI replay, and persistent local artifacts. We hope Syll can serve as a practical open-source foundation for personal automation that users can teach, inspect, and continuously extend.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →The Sim-to-Real Gap of Foundation Model Agents: A Unified MDP Perspective
This paper addresses the sim-to-real gap for foundation model agents by framing it within a Markov Decision Process (MDP) structure. It advocates for established solutions like domain randomization to enhance agent robustness, aiming to create standardized benchmarks for reliable real-world applications.