Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs
Quick Take
Crafter is a multi-agent system for generating editable scientific figures from diverse inputs, outperforming standalone generators and agentic baselines on benchmarks like PaperBanana-Bench and CraftBench. It includes CraftEditor for converting raster outputs to editable SVGs, with significant improvements in quality and flexibility.
Key Points
- Crafter generalizes across figure types without architectural changes.
- CraftEditor converts raster outputs into editable SVGs, enhancing usability.
- Experiments show Crafter outperforms standalone generators significantly.
- CraftBench includes three figure types and four input conditions with human quality annotation.
- Code and benchmark available at https://github.com/HaozheZhao/Crafter.
Article Content
From source RSS / original summaryarXiv:2605. 30611v1 Announce Type: new Abstract: Scientific figures are among the most effective means of communicating complex research ideas, yet producing publication-quality illustrations remains one of the most labor-intensive parts of paper preparation. Existing automated systems each target a single figure type under text-only input, leaving the diversity of types and conditions researchers actually use unaddressed; their raster outputs further cannot be locally revised.
Because scientific figures are structured compositions of discrete semantic components, the localized errors generators produce on such layouts demand not a stronger backbone but a harness. We instantiate this harness in two complementary systems: Crafter, a multi-agent harness for figure generation that generalizes across figure types and input conditions without architectural changes, and CraftEditor, which applies the same pattern to convert raster outputs into editable SVGs.
Moreover, we introduce CraftBench, a benchmark spanning three figure types and four input conditions with human quality annotation. Experiments show that Crafter substantially outperforms both standalone generators and the agentic baseline on PaperBanana-Bench and CraftBench, with ablations confirming each component's independent contribution; CraftEditor faithfully converts outputs into editable SVGs that surpass all baselines. Our code and benchmark are available at https://github. com/HaozheZhao/Crafter.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning
Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, enabling efficient fine-tuning with only 0.11% parameter updates. It significantly enhances performance in few-shot learning and domain shifts across 15 biomedical imaging datasets, demonstrating robustness for clinical applications.