TouchThinker: Scaling Tactile Commonsense Reasoning to the Open World with Large-scale Data and Action-aware Representation
Quick Answer
TouchThinker introduces a tactile-language framework to enhance commonsense reasoning in open-world settings, featuring the TouchThinker-1M dataset with 1 million samples across 415 objects and 8 scenarios.
Quick Take
TouchThinker introduces a tactile-language framework to enhance commonsense reasoning in open-world settings, featuring the TouchThinker-1M dataset with 1 million samples across 415 objects and 8 scenarios. The action-aware modeling mechanism improves representation efficiency, achieving competitive performance against state-of-the-art models.
Key Points
- TouchThinker-1M dataset includes 1 million samples from diverse sources.
- Framework covers 415 objects and 8 scenarios for comprehensive reasoning.
- Action-aware modeling enhances representation efficiency for tactile signals.
- TouchThinker benchmarks against state-of-the-art models, showing competitive results.
- Code and dataset will be publicly available on GitHub.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 11637v1 Announce Type: new Abstract: Touch is a key modality for embodied agents to understand the physical world.
Although recent work has incorporated tactile signals into language systems for tactile commonsense reasoning, scaling such systems to realistic open-world settings remains challenging due to two key bottlenecks: (1) current tactile reasoning datasets remain limited in format and scale, providing insufficient supervision for reasoning from tactile observations to physical commonsense and hindering the learning of transferable tactile commonsense; (2) Tactile signals are inherently redundant and action-specific, yet existing methods often overlook these properties, resulting in inefficient representations with limited semantic expressiveness.
To address these limitations, we propose TouchThinker, a tactile-language framework that scales tactile commonsense reasoning to the open world from both data and representation perspectives. First, we construct TouchThinker-1M, a million-scale, multi-source tactile reasoning dataset covering \textbf{415} objects, \textbf{8} scenarios, and \textbf{7} sensor types, providing a solid data foundation for open-world generalization.
We further introduce TouchThinker-Bench, an open-world benchmark with more realistic and diverse tasks. Then, we propose action-aware modeling mechanism to improve tactile representation efficiency and enable efficient reasoning. Experimental results demonstrate that TouchThinker achieves competitive performance against state-of-the-art models across multiple datasets. Our code and dataset will be made available at: https://github. com/lvkailin0118/TouchThinker.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Arbor: Tree Search as a Cognition Layer for Autonomous Agents
Arbor introduces a multi-agent framework utilizing structured tree search for optimizing LLM inference, achieving up to 193% throughput-latency improvement compared to vendor-optimized systems. It employs an Orchestrator and Critic agent for stability and coordination, demonstrating hardware-agnostic performance with minimal variance.