BRepCLIP: Contrastive Multimodal Pretraining on BRep Primitives for CAD Understanding
Quick Answer
BRepCLIP introduces a novel framework for aligning boundary representations (BReps) of CAD models with language and image embeddings through contrastive pretraining, achieving a 40.4% improvement in Top-1 retrieval over OpenShape on ABC.
Quick Take
BRepCLIP introduces a novel framework for aligning boundary representations (BReps) of CAD models with language and image embeddings through contrastive pretraining, achieving a 40.4% improvement in Top-1 retrieval over OpenShape on ABC. This model enhances zero-shot classification performance by 15% on FabWave, demonstrating significant advancements in CAD understanding.
Key Points
- BRepCLIP is the first framework for contrastive pretraining on BRep geometry.
- It uses discrete vocabularies for surface and curve geometry tokens.
- Achieved 40.4% improvement in Top-1 retrieval on ABC dataset.
- Improved zero-shot classification by 15% on FabWave dataset.
- Demonstrates the importance of structure-aware pretraining for CAD.
Article Content
From source RSS / original summaryarXiv:2606. 05515v1 Announce Type: new Abstract: Learning representations of CAD models is a largely open problem. While 3D representation learning has flourished around point clouds and meshes, the native format of CAD - boundary representations BReps, which encodes exact parametric surfaces, curves, and their topology, has received little attention as a representation learning substrate.
We introduce BRepCLIP, the first framework to align BRep geometry with language and image embeddings through contrastive pretraining. We model each CAD object as a sequence of face and edge tokens with separate discrete vocabularies for surface and curve geometry, augmented with spatial and semantic descriptors that capture surface types (e. g. , cylindrical, torus, NURBS) and curve primitives (e. g. , line, arc, B-spline).
A transformer encoder aggregates these tokens into a global BRep embedding, aligned with CLIP's text and image encoders via a joint contrastive objective. BRepCLIP generates more discriminative and semantically grounded embeddings than existing point-based alternatives, improving Top-1 retrieval over OpenShape by 40. 4%, 22. 0%, and 23. 9% on ABC, CADParser, and Automate, respectively, and improving zero-shot classification on FabWave by 15% in Top-1 score.
We further demonstrate its utility as a CAD-aware similarity metric for evaluating text and image-conditioned CAD generation, establishing the importance of structure-aware pretraining for multimodal CAD understanding. Project page is available at https://muhammadusama100. github. io/BrepClip2026/
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.