Menu

Today's Robotics & Embodied AI Research Top Papers

Wednesday, November 5, 2025
Introduces SigmaCollab, a dataset for human-AI collaboration research featuring multimodal streams like egocentric video and tracking data. Enables study of AI agents guiding humans in physical tasks, advancing embodied AI and human-robot interaction.
Proposes a mobile robotic multi-view photometric stereo system for 3D acquisition. Addresses limitations of fixed setups, enabling 3D reconstruction for mobile robotics applications by adapting MVPS to movable platforms.
Introduces FIOC-WM, a framework learning structured representations of objects and interactions for world models. Captures environment dynamics with disentangled modules, improving robustness and transferability for object-centric RL agents.
Proposes Maestro, a framework augmenting off-the-shelf VLMs with robotics modules for generalist robots. Leverages VLM capabilities and curated modules to reduce data needs, enabling zero-shot control in physical tasks.
Instantiates the SCoBots framework for interpretable, neurosymbolic RL agents addressing shortcut learning. Decomposes tasks into interpretable representations, enabling generalization from raw pixel states for robotics and embodied AI.
Enables off-the-shelf VLMs like GPT-4 to control humanoid agents by augmenting their generalization with specialized modules. Addresses data scarcity for humanoid robotics, bridging the gap between language models and physical interaction.
Proposes URDF-Anything, an automatic framework for reconstructing articulated objects using 3D multimodal LLMs. Creates digital twins for robotic simulation by generating URDF models from point clouds and text inputs.
Addresses data scarcity for dexterous manipulation by generating feasible and trainable tasks. Proposes GenDexHand for creating specialized simulation environments, overcoming limitations of existing methods for embodied AI.
Sort by:
arxiv_cv

Mobile Robotic Multi-View Photometric Stereo

Abstract: Abstract: Multi-View Photometric Stereo (MVPS) is a popular method for fine-detailed 3D acquisition of an object from images. Despite its outstanding results on diverse material objects, a typical MVPS experimental setup requires a well-calibrated li...
#3D Reconstruction#Robotics Perception#Photometric Stereo#Sensor Fusion#Machine Learning for Robotics
17 hours ago
70%
arxiv_cv

An unscented Kalman filter method for real time input-parameter-state estimation

Abstract: Abstract: The input-parameter-state estimation capabilities of a novel unscented Kalman filter is examined herein on both linear and nonlinear systems. The unknown input is estimated in two stages within each time step. Firstly, the predicted dynamic...
#State Estimation#System Identification#Control Theory#Kalman Filtering#Real-time Systems
17 hours ago
65%
arxiv_cv

Light Future: Multimodal Action Frame Prediction via InstructPix2Pix

Abstract: Abstract: Predicting future motion trajectories is a critical capability across domains such as robotics, autonomous systems, and human activity forecasting, enabling safer and more intelligent decision-making. This paper proposes a novel, efficient,...
#Robotics#Predictive Modeling#Multimodal AI#Computer Vision#Human-Robot Interaction
17 hours ago
91%
arxiv_cv

SigmaCollab: An Application-Driven Dataset for Physically Situated Collaboration

Abstract: Abstract: We introduce SigmaCollab, a dataset enabling research on physically situated human-AI collaboration. The dataset consists of a set of 85 sessions in which untrained participants were guided by a mixed-reality assistive AI agent in performin...
#Human-AI Interaction#Human-Robot Collaboration#Mixed Reality#Embodied AI#Dataset Creation
17 hours ago
95%
arxiv_cv

From the Laboratory to Real-World Application: Evaluating Zero-Shot Scene Interpretation on Edge Devices for Mobile Robotics

Abstract: Abstract: Video Understanding, Scene Interpretation and Commonsense Reasoning are highly challenging tasks enabling the interpretation of visual information, allowing agents to perceive, interact with and make rational decisions in its environment. L...
#Robotics#Artificial Intelligence#Computer Vision#Natural Language Processing#Edge Computing
17 hours ago
90%
arxiv_cv

A Step Toward World Models: A Survey on Robotic Manipulation

Abstract: Abstract: Autonomous agents are increasingly expected to operate in complex, dynamic, and uncertain environments, performing tasks such as manipulation, navigation, and decision-making. Achieving these capabilities requires agents to understand the u...
#Robotics#Artificial Intelligence#Autonomous Systems#Machine Learning#Cognitive Robotics#Perception and Action
17 hours ago
90%
arxiv_cv

Keeping it Local, Tiny and Real: Automated Report Generation on Edge Computing Devices for Mechatronic-Based Cognitive Systems

Abstract: Abstract: Recent advancements in Deep Learning enable hardware-based cognitive systems, that is, mechatronic systems in general and robotics in particular with integrated Artificial Intelligence, to interact with dynamic and unstructured environments...
#Robotics#Edge AI#Cognitive Systems#Mechatronics#Natural Language Generation#Data Privacy#Human-Robot Interaction
17 hours ago
80%
arxiv_cv

iFlyBot-VLA Technical Report

Abstract: Abstract: We introduce iFlyBot-VLA, a large-scale Vision-Language-Action (VLA) model trained under a novel framework. The main contributions are listed as follows: (1) a latent action model thoroughly trained on large-scale human and robotic manipula...
#Robotics#Embodied AI#Vision-Language Models#Robotic Control#Human-Robot Collaboration
17 hours ago
93%
arxiv_cv

StrengthSense: A Dataset of IMU Signals Capturing Everyday Strength-Demanding Activities

Abstract: Abstract: Tracking strength-demanding activities with wearable sensors like IMUs is crucial for monitoring muscular strength, endurance, and power. However, there is a lack of comprehensive datasets capturing these activities. To fill this gap, we in...
#Human Activity Recognition#Wearable Computing#Biomedical Engineering#Robotics (Human-Robot Interaction)#Machine Learning#Dataset Development
17 hours ago
88%
arxiv_ml

Rethinking Bimanual Robotic Manipulation: Learning with Decoupled Interaction Framework

Abstract: Abstract: Bimanual robotic manipulation is an emerging and critical topic in the robotics community. Previous works primarily rely on integrated control models that take the perceptions and states of both arms as inputs to directly predict their acti...
#Robotics#Control Systems#Machine Learning for Robotics#Human-Robot Interaction (indirectly)#Automation
17 hours ago
95%
arxiv_ml

Multiscale spatiotemporal heterogeneity analysis of bike-sharing system's self-loop phenomenon: Evidence from Shanghai

Abstract: Abstract: Bike-sharing is an environmentally friendly shared mobility mode, but its self-loop phenomenon, where bikes are returned to the same station after several time usage, significantly impacts equity in accessing its services. Therefore, this s...
#Urban Mobility#Transportation Systems#Spatial Analysis#Machine Learning for Urban Planning#Shared Mobility
17 hours ago
60%
arxiv_ml

Constrained Optimal Fuel Consumption of HEVs under Observational Noise

Abstract: Abstract: In our prior work, we investigated the minimum fuel consumption of a hybrid electric vehicle (HEV) under a state-of-charge (SOC) balance constraint, assuming perfect SOC measurements and accurate reference speed profiles. The constrained op...
#Hybrid Electric Vehicles#Energy Management#Reinforcement Learning#Control Systems#Robustness in Control
17 hours ago
70%
arxiv_ml

Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning

Abstract: Abstract: We argue that sixth-generation (6G) intelligence is not fluent token prediction but the capacity to imagine and choose -- to simulate future scenarios, weigh trade-offs, and act with calibrated uncertainty. We reframe open radio access netw...
#Future Network Intelligence (6G)#AI for Network Control#Reinforcement Learning#Generative AI#Robotics Control#World Models
17 hours ago
80%
arxiv_ml

TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System

Abstract: Abstract: Large-scale data has driven breakthroughs in robotics, from language models to vision-language-action models in bimanual manipulation. However, humanoid robotics lacks equally effective data collection frameworks. Existing humanoid teleoper...
#Robotics#Humanoid Robots#Data Collection#Reinforcement Learning#Human-Robot Interaction
17 hours ago
94%
arxiv_ml

Human-Machine Ritual: Synergic Performance through Real-Time Motion Recognition

Abstract: Abstract: We introduce a lightweight, real-time motion recognition system that enables synergic human-machine performance through wearable IMU sensor data, MiniRocket time-series classification, and responsive multimedia control. By mapping dancer-sp...
#Human-Machine Interaction#Motion Recognition#Wearable Computing#Creative AI#Performance Art
17 hours ago
90%
arxiv_ai

STRIDER: Navigation via Instruction-Aligned Structural Decision Space Optimization

Abstract: Abstract: The Zero-shot Vision-and-Language Navigation in Continuous Environments (VLN-CE) task requires agents to navigate previously unseen 3D environments using natural language instructions, without any scene-specific training. A critical challen...
#Robotics Navigation#Embodied AI#Natural Language Understanding#Reinforcement Learning#Zero-shot Learning
17 hours ago
90%
arxiv_ai

End-to-End Dexterous Arm-Hand VLA Policies via Shared Autonomy: VR Teleoperation Augmented by Autonomous Hand VLA Policy for Efficient Data Collection

Abstract: Abstract: Achieving human-like dexterous manipulation remains a major challenge for general-purpose robots. While Vision-Language-Action (VLA) models show potential in learning skills from demonstrations, their scalability is limited by scarce high-q...
#Human-Robot Collaboration for Data Collection#Learning Dexterous Manipulation Skills#Improving Scalability of Robot Learning#Vision-Language Models for Robotics
17 hours ago
92%
arxiv_ai

MO-SeGMan: Rearrangement Planning Framework for Multi Objective Sequential and Guided Manipulation in Constrained Environments

Abstract: Abstract: In this work, we introduce MO-SeGMan, a Multi-Objective Sequential and Guided Manipulation planner for highly constrained rearrangement problems. MO-SeGMan generates object placement sequences that minimize both replanning per object and ro...
#Robotic Manipulation#Motion Planning#Optimization#Robotics
17 hours ago
96%
arxiv_ai

Endowing GPT-4 with a Humanoid Body: Building the Bridge Between Off-the-Shelf VLMs and the Physical World

Abstract: Abstract: Humanoid agents often struggle to handle flexible and diverse interactions in open environments. A common solution is to collect massive datasets to train a highly capable model, but this approach can be prohibitively expensive. In this pap...
#Embodied AI#Humanoid Robotics#Vision-Language Models#Robot Control#Zero-Shot Learning
17 hours ago
90%
arxiv_ai

Maestro: Orchestrating Robotics Modules with Vision-Language Models for Zero-Shot Generalist Robots

Abstract: Abstract: Today's best-explored routes towards generalist robots center on collecting ever larger "observations-in actions-out" robotics datasets to train large end-to-end models, copying a recipe that has worked for vision-language models (VLMs). We...
#Generalist Robots#Embodied AI#Vision-Language Models#Robotics Control#Zero-Shot Learning
17 hours ago
92%
arxiv_ai

A High-Throughput Spiking Neural Network Processor Enabling Synaptic Delay Emulation

Abstract: Abstract: Synaptic delay has attracted significant attention in neural network dynamics for integrating and processing complex spatiotemporal information. This paper introduces a high-throughput Spiking Neural Network (SNN) processor that supports sy...
#Neuromorphic Computing#Hardware Accelerators for AI#Edge AI#Efficient Neural Networks
17 hours ago
70%
arxiv_ai

Digital Twin based Automatic Reconfiguration of Robotic Systems in Smart Environments

Abstract: Abstract: Robotic systems have become integral to smart environments, enabling applications ranging from urban surveillance and automated agriculture to industrial automation. However, their effective operation in dynamic settings - such as smart cit...
#Dynamic Reconfiguration of Robotic Systems#Digital Twin Technology for Robotics#Autonomous Adaptation in Smart Environments#Improving Robot Robustness in Dynamic Settings
17 hours ago
89%
arxiv_ai

Lifted Successor Generation in Numeric Planning

Abstract: Abstract: Most planners ground numeric planning tasks, given in a first-order-like language, into a ground task representation. However, this can lead to an exponential blowup in task representation size, which occurs in practice for hard-to-ground t...
#Automated Planning#AI Planning under Uncertainty#Symbolic AI#State Representation in Planning
17 hours ago
75%
arxiv_ai

URDF-Anything: Constructing Articulated Objects with 3D Multimodal Language Model

Abstract: Abstract: Constructing accurate digital twins of articulated objects is essential for robotic simulation training and embodied AI world model building, yet historically requires painstaking manual modeling or multi-stage pipelines. In this work, we p...
#3D Reconstruction#Robotics Simulation#Embodied AI#Multimodal AI#Generative Models
17 hours ago
88%
arxiv_ai

GenDexHand: Generative Simulation for Dexterous Hands

Abstract: Abstract: Data scarcity remains a fundamental bottleneck for embodied intelligence. Existing approaches use large language models (LLMs) to automate gripper-based simulation generation, but they transfer poorly to dexterous manipulation, which demand...
#Embodied AI#Robotics Simulation#Data Augmentation#Dexterous Manipulation#Generative Models
17 hours ago
90%
arxiv_ai

FoldPath: End-to-End Object-Centric Motion Generation via Modulated Implicit Paths

Abstract: Abstract: Object-Centric Motion Generation (OCMG) is instrumental in advancing automated manufacturing processes, particularly in domains requiring high-precision expert robotic motions, such as spray painting and welding. To realize effective automa...
#Robotic Motion Generation#Deep Learning for Robotics#Automated Manufacturing#Trajectory Optimization
17 hours ago
97%
arxiv_cv

Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process

Abstract: Abstract: Vision-language-action (VLA) models aim to understand natural language instructions and visual observations and to execute corresponding actions as an embodied agent. Recent work integrates future images into the understanding-acting loop, ...
#Embodied AI#Robotics#Multimodal Learning#Generative Models#Reinforcement Learning#Vision-Language Models
1 day ago
90%
arxiv_cv

Finite element-based space-time total variation-type regularization of the inverse problem in electrocardiographic imaging

Abstract: Abstract: Reconstructing cardiac electrical activity from body surface electric potential measurements results in the severely ill-posed inverse problem in electrocardiography. Many different regularization approaches have been proposed to improve nu...
#Inverse Problems#Medical Imaging#Electrophysiology#Numerical Methods#Signal Processing
1 day ago
75%
arxiv_cv

mmCooper: A Multi-agent Multi-stage Communication-efficient and Collaboration-robust Cooperative Perception Framework

Abstract: Abstract: Collaborative perception significantly enhances individual vehicle perception performance through the exchange of sensory information among agents. However, real-world deployment faces challenges due to bandwidth constraints and inevitable ...
#Multi-Agent Systems#Cooperative Perception#Robotics#Sensor Fusion#Autonomous Driving
1 day ago
96%
arxiv_cv

MindJourney: Test-Time Scaling with World Models for Spatial Reasoning

Abstract: Abstract: Spatial reasoning in 3D space is central to human cognition and indispensable for embodied tasks such as navigation and manipulation. However, state-of-the-art vision-language models (VLMs) struggle frequently with tasks as simple as antici...
#Embodied AI#Spatial Reasoning#Vision-Language Models#Robotics#AI for 3D Environments
1 day ago
92%

Loading more papers...

📚 You've reached the end of the papers list