Brick-Composer: Using MLLMs for Assembly with Diverse Bricks

arXiv cs.AI·Jiateng Liu, Bingxuan Li, Zhenhailong Wang, Rushi Wang, Kaiwen Hong, Cheng Qian, Jiayu Liu, Denghui Zhang, Katherine Driggs-Campbell, Manling Li, Heng Ji

6/6/2026

·~2 min·6/6/2026·en·2

Quick Answer

The Brick-Composer framework enhances MLLMs' assembly skills, improving brick selection accuracy by over 300% and raising assembly success rates from under 1% to 15%.

Quick Take

Utilizing BC-Bench for evaluation, the Qwen-3-8B model demonstrates the potential for effective brick assembly through targeted learning strategies.

Key Points

Brick assembly is framed as a sequential decision-making problem with two main subtasks.
Current MLLMs struggle with fine-grained brick selection and precise pose estimation.
Brick-Composer integrates Human Design Sparks, World Feedback, and Synthetic Experience for training.
After training, Qwen-3-8B can correctly compose 42% of steps for complete objects.
BC-Bench is introduced as the first benchmark for evaluating MLLMs in brick assembly.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Source Excerpt

arXiv:2606. 05445v1 Announce Type: new Abstract: We dream of AI agents that can read arbitrary designs and construct real-world objects from reusable building blocks. As a first step toward this vision, we study whether multimodal (MLLMs) possess the visual grounding and spatial reasoning capabilities required for brick assembly.

We formulate brick assembly as a sequential decision-making problem, where each step involves two subtasks: brick selection, identifying the target brick from candidate components, and brick pose estimation, predicting where and how the selected brick should be placed. …

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Sumit Verma, Pritam Prasun, Pritish Kumar

1d ago

FeaturedOriginal

RAIL Guard: Closing the Evaluation-to-Remediation Gap in Responsible AI for Agents

AI Summary

RAIL Guard introduces a closed-loop AI pipeline for large language models (LLMs) that evaluates outputs across eight dimensions and iteratively remediates failures, achieving 96.9% convergence compared to 49.1% for traditional block-and-retry methods. The system reduces unsafe agent executions by 33% without impacting task completion and is available as open-source SDKs.

#LLM #Agent #Open Source #Policy

Brick-Composer: Using MLLMs for Assembly with Diverse Bricks

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.AI

RAIL Guard: Closing the Evaluation-to-Remediation Gap in Responsible AI for Agents

Automatic Ordinary Differential Equations Discovery For Biological Systems Using Powered Agentic System

The Emerging Paradigm of Geospatial Foundation Models: From Pre-Training to Agentic Reasoning

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.AI

RAIL Guard: Closing the Evaluation-to-Remediation Gap in Responsible AI for LLM Agents

Automatic Ordinary Differential Equations Discovery For Biological Systems Using Large Language Model Powered Agentic System

The Emerging Paradigm of Geospatial Foundation Models: From Pre-Training to Agentic Reasoning

RAIL Guard: Closing the Evaluation-to-Remediation Gap in Responsible AI for Agents

Automatic Ordinary Differential Equations Discovery For Biological Systems Using Powered Agentic System