GraphARC: A Comprehensive Benchmark for Graph-Based Abstract Reasoning

arXiv cs.AI·Saku Peltonen, August B{\o}gh R{\o}nberg, Andreas Plesner, Roger Wattenhofer

4h ago

·~1 min·6/1/2026·en·0

Quick Take

GraphARC introduces a benchmark for abstract reasoning on graph-structured data, revealing limitations in state-of-the-art language models like GPT-3. While models can identify graph properties, they struggle with full transformation tasks, especially as graph size increases, highlighting a comprehension-execution gap. This benchmark offers a new testbed for developing graph foundation models.

Key Points

GraphARC generalizes the few-shot transformation learning paradigm of the Abstraction and Reasoning Corpus.
Tasks involve inferring transformation rules from input-output pairs and applying them to new test graphs.
State-of-the-art models struggle with full graph transformation tasks, indicating a comprehension-execution gap.
Performance declines on larger graph instances, exposing scaling barriers.
GraphARC combines node classification, link prediction, and graph generation in one framework.

Article Content

From source RSS / original summary

arXiv:2605. 31031v1 Announce Type: new Abstract: Relational reasoning lies at the heart of intelligence, but existing benchmarks are typically confined to formats such as grids or text. We introduce GraphARC, a benchmark for abstract reasoning on graph-structured data. GraphARC generalizes the few-shot transformation learning paradigm of the Abstraction and Reasoning Corpus (ARC).

Each task requires inferring a transformation rule from a few input-output pairs and applying it to a new test graph, covering local, global, and hierarchical graph transformations. Unlike grid-based ARC, GraphARC instances can be generated at scale across diverse graph families and sizes, enabling systematic evaluation of generalization abilities. We evaluate state-of-the-art language models on GraphARC and observe clear limitations.

Models can answer questions about graph properties but often fail to solve the full graph transformation task, revealing a comprehension-execution gap. Performance further degrades on larger instances, exposing scaling barriers. More broadly, by combining aspects of node classification, link prediction, and graph generation within a single framework, GraphARC provides a promising testbed for future graph foundation models.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Tyler Akidau, Tyler Rockwood, Johannes Br\"uderl, Marc Millstone

3d ago

FeaturedOriginal

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

AI Summary

The Redpanda Agentic Data Plane (ADP) introduces out-of-band metadata channels to enhance the safety of autonomous AI agents, ensuring secure data access and tamper-proof audit trails. This architecture mitigates risks associated with unpredictable AI behavior by enforcing governance throughout the agent lifecycle, demonstrated in a multi-agent trading system with strict data scoping and approval thresholds.

#Agent #Robotics #Security #Policy