Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Assembly hinges on reliably forming connections between parts; yet most
robotic approaches plan assembly sequences and part poses while treating
connectors as an afterthought. Connections represent the critical "last mile"
of assembly execution, while task planning may sequence operations and motion
plan may position parts, the precise establishment of physical connections
ultimately determines assembly success or failure. In this paper, we consider
connections as first-class primitives in assembly representation, including
connector types, specifications, quantities, and placement locations. Drawing
inspiration from how humans learn assembly tasks through step-by-step
instruction manuals, we present Manual2Skill++, a vision-language framework
that automatically extracts structured connection information from assembly
manuals. We encode assembly tasks as hierarchical graphs where nodes represent
parts and sub-assemblies, and edges explicitly model connection relationships
between components. A large-scale vision-language model parses symbolic
diagrams and annotations in manuals to instantiate these graphs, leveraging the
rich connection knowledge embedded in human-designed instructions. We curate a
dataset containing over 20 assembly tasks with diverse connector types to
validate our representation extraction approach, and evaluate the complete task
understanding-to-execution pipeline across four complex assembly scenarios in
simulation, spanning furniture, toys, and manufacturing components with
real-world correspondence.
Authors (12)
Chenrui Tie
Shengxiang Sun
Yudi Lin
Yanbo Wang
Zhongrui Li
Zhouhan Zhong
+6 more
Submitted
October 18, 2025
Key Contributions
Presents Manual2Skill++, a vision-language framework that enables general robotic assembly by treating connections between parts as first-class primitives. It automatically extracts structured connector information from instruction manuals and encodes assembly tasks as hierarchical graphs, improving reliability in the critical 'last mile' of assembly execution.
Business Value
Enables more flexible and automated manufacturing processes, reducing reliance on manual labor for complex assembly tasks. Improves product quality and consistency.