Your AI Papers Research Assistant
All Research
14188 papers
Large Language Models
5422 papers
Computer Vision
2516 papers
Generative AI
1696 papers
AI Safety & Ethics
1587 papers
Reinforcement Learning
1088 papers
Graph Neural Networks
885 papers
Robotics & Embodied AI
689 papers
Speech & Audio
258 papers
uncategorized
40 papers
Efficient AI
6 papers
AI for Science
1 papers
Today's Robotics & Embodied AI Research Top Papers
Wednesday, November 5, 2025
📊 Read Full Intelligence Reports:
Introduces SigmaCollab, a dataset for human-AI collaboration research featuring multimodal streams like egocentric video and tracking data. Enables study of AI agents guiding humans in physical tasks, advancing embodied AI and human-robot interaction.
Proposes a mobile robotic multi-view photometric stereo system for 3D acquisition. Addresses limitations of fixed setups, enabling 3D reconstruction for mobile robotics applications by adapting MVPS to movable platforms.
Introduces FIOC-WM, a framework learning structured representations of objects and interactions for world models. Captures environment dynamics with disentangled modules, improving robustness and transferability for object-centric RL agents.
Proposes Maestro, a framework augmenting off-the-shelf VLMs with robotics modules for generalist robots. Leverages VLM capabilities and curated modules to reduce data needs, enabling zero-shot control in physical tasks.
Instantiates the SCoBots framework for interpretable, neurosymbolic RL agents addressing shortcut learning. Decomposes tasks into interpretable representations, enabling generalization from raw pixel states for robotics and embodied AI.
Enables off-the-shelf VLMs like GPT-4 to control humanoid agents by augmenting their generalization with specialized modules. Addresses data scarcity for humanoid robotics, bridging the gap between language models and physical interaction.
Proposes URDF-Anything, an automatic framework for reconstructing articulated objects using 3D multimodal LLMs. Creates digital twins for robotic simulation by generating URDF models from point clouds and text inputs.
Addresses data scarcity for dexterous manipulation by generating feasible and trainable tasks. Proposes GenDexHand for creating specialized simulation environments, overcoming limitations of existing methods for embodied AI.
Sort by:
arxiv_cv
Mobile Robotic Multi-View Photometric Stereo
Abstract: Abstract: Multi-View Photometric Stereo (MVPS) is a popular method for fine-detailed 3D
acquisition of an object from images. Despite its outstanding results on
diverse material objects, a typical MVPS experimental setup requires a
well-calibrated li...
arxiv_cv
An unscented Kalman filter method for real time input-parameter-state estimation
Abstract: Abstract: The input-parameter-state estimation capabilities of a novel unscented Kalman
filter is examined herein on both linear and nonlinear systems. The unknown
input is estimated in two stages within each time step. Firstly, the predicted
dynamic...
arxiv_cv
Light Future: Multimodal Action Frame Prediction via InstructPix2Pix
Abstract: Abstract: Predicting future motion trajectories is a critical capability across domains
such as robotics, autonomous systems, and human activity forecasting, enabling
safer and more intelligent decision-making. This paper proposes a novel,
efficient,...
arxiv_cv
SigmaCollab: An Application-Driven Dataset for Physically Situated Collaboration
Abstract: Abstract: We introduce SigmaCollab, a dataset enabling research on physically situated
human-AI collaboration. The dataset consists of a set of 85 sessions in which
untrained participants were guided by a mixed-reality assistive AI agent in
performin...
arxiv_cv
From the Laboratory to Real-World Application: Evaluating Zero-Shot Scene Interpretation on Edge Devices for Mobile Robotics
Abstract: Abstract: Video Understanding, Scene Interpretation and Commonsense Reasoning are
highly challenging tasks enabling the interpretation of visual information,
allowing agents to perceive, interact with and make rational decisions in its
environment. L...
arxiv_cv
A Step Toward World Models: A Survey on Robotic Manipulation
Abstract: Abstract: Autonomous agents are increasingly expected to operate in complex, dynamic,
and uncertain environments, performing tasks such as manipulation, navigation,
and decision-making. Achieving these capabilities requires agents to understand
the u...
arxiv_cv
Keeping it Local, Tiny and Real: Automated Report Generation on Edge Computing Devices for Mechatronic-Based Cognitive Systems
Abstract: Abstract: Recent advancements in Deep Learning enable hardware-based cognitive systems,
that is, mechatronic systems in general and robotics in particular with
integrated Artificial Intelligence, to interact with dynamic and unstructured
environments...
arxiv_cv
iFlyBot-VLA Technical Report
Abstract: Abstract: We introduce iFlyBot-VLA, a large-scale Vision-Language-Action (VLA) model
trained under a novel framework. The main contributions are listed as follows:
(1) a latent action model thoroughly trained on large-scale human and robotic
manipula...
arxiv_cv
StrengthSense: A Dataset of IMU Signals Capturing Everyday Strength-Demanding Activities
Abstract: Abstract: Tracking strength-demanding activities with wearable sensors like IMUs is
crucial for monitoring muscular strength, endurance, and power. However, there
is a lack of comprehensive datasets capturing these activities. To fill this
gap, we in...
arxiv_ml
Rethinking Bimanual Robotic Manipulation: Learning with Decoupled Interaction Framework
Abstract: Abstract: Bimanual robotic manipulation is an emerging and critical topic in the
robotics community. Previous works primarily rely on integrated control models
that take the perceptions and states of both arms as inputs to directly predict
their acti...
arxiv_ml
Multiscale spatiotemporal heterogeneity analysis of bike-sharing system's self-loop phenomenon: Evidence from Shanghai
Abstract: Abstract: Bike-sharing is an environmentally friendly shared mobility mode, but its
self-loop phenomenon, where bikes are returned to the same station after
several time usage, significantly impacts equity in accessing its services.
Therefore, this s...
arxiv_ml
Constrained Optimal Fuel Consumption of HEVs under Observational Noise
Abstract: Abstract: In our prior work, we investigated the minimum fuel consumption of a hybrid
electric vehicle (HEV) under a state-of-charge (SOC) balance constraint,
assuming perfect SOC measurements and accurate reference speed profiles. The
constrained op...
arxiv_ml
Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning
Abstract: Abstract: We argue that sixth-generation (6G) intelligence is not fluent token
prediction but the capacity to imagine and choose -- to simulate future
scenarios, weigh trade-offs, and act with calibrated uncertainty. We reframe
open radio access netw...
arxiv_ml
TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System
Abstract: Abstract: Large-scale data has driven breakthroughs in robotics, from language models
to vision-language-action models in bimanual manipulation. However, humanoid
robotics lacks equally effective data collection frameworks. Existing humanoid
teleoper...
arxiv_ml
Human-Machine Ritual: Synergic Performance through Real-Time Motion Recognition
Abstract: Abstract: We introduce a lightweight, real-time motion recognition system that enables
synergic human-machine performance through wearable IMU sensor data, MiniRocket
time-series classification, and responsive multimedia control. By mapping
dancer-sp...
arxiv_ai
STRIDER: Navigation via Instruction-Aligned Structural Decision Space Optimization
Abstract: Abstract: The Zero-shot Vision-and-Language Navigation in Continuous Environments
(VLN-CE) task requires agents to navigate previously unseen 3D environments
using natural language instructions, without any scene-specific training. A
critical challen...
arxiv_ai
End-to-End Dexterous Arm-Hand VLA Policies via Shared Autonomy: VR Teleoperation Augmented by Autonomous Hand VLA Policy for Efficient Data Collection
Abstract: Abstract: Achieving human-like dexterous manipulation remains a major challenge for
general-purpose robots. While Vision-Language-Action (VLA) models show
potential in learning skills from demonstrations, their scalability is limited
by scarce high-q...
arxiv_ai
MO-SeGMan: Rearrangement Planning Framework for Multi Objective Sequential and Guided Manipulation in Constrained Environments
Abstract: Abstract: In this work, we introduce MO-SeGMan, a Multi-Objective Sequential and Guided
Manipulation planner for highly constrained rearrangement problems. MO-SeGMan
generates object placement sequences that minimize both replanning per object
and ro...
arxiv_ai
Endowing GPT-4 with a Humanoid Body: Building the Bridge Between Off-the-Shelf VLMs and the Physical World
Abstract: Abstract: Humanoid agents often struggle to handle flexible and diverse interactions in
open environments. A common solution is to collect massive datasets to train a
highly capable model, but this approach can be prohibitively expensive. In this
pap...
arxiv_ai
Maestro: Orchestrating Robotics Modules with Vision-Language Models for Zero-Shot Generalist Robots
Abstract: Abstract: Today's best-explored routes towards generalist robots center on collecting
ever larger "observations-in actions-out" robotics datasets to train large
end-to-end models, copying a recipe that has worked for vision-language models
(VLMs). We...
arxiv_ai
A High-Throughput Spiking Neural Network Processor Enabling Synaptic Delay Emulation
Abstract: Abstract: Synaptic delay has attracted significant attention in neural network dynamics
for integrating and processing complex spatiotemporal information. This paper
introduces a high-throughput Spiking Neural Network (SNN) processor that
supports sy...
arxiv_ai
Digital Twin based Automatic Reconfiguration of Robotic Systems in Smart Environments
Abstract: Abstract: Robotic systems have become integral to smart environments, enabling
applications ranging from urban surveillance and automated agriculture to
industrial automation. However, their effective operation in dynamic settings -
such as smart cit...
arxiv_ai
Lifted Successor Generation in Numeric Planning
Abstract: Abstract: Most planners ground numeric planning tasks, given in a first-order-like
language, into a ground task representation. However, this can lead to an
exponential blowup in task representation size, which occurs in practice for
hard-to-ground t...
arxiv_ai
URDF-Anything: Constructing Articulated Objects with 3D Multimodal Language Model
Abstract: Abstract: Constructing accurate digital twins of articulated objects is essential for
robotic simulation training and embodied AI world model building, yet
historically requires painstaking manual modeling or multi-stage pipelines. In
this work, we p...
arxiv_ai
GenDexHand: Generative Simulation for Dexterous Hands
Abstract: Abstract: Data scarcity remains a fundamental bottleneck for embodied intelligence.
Existing approaches use large language models (LLMs) to automate gripper-based
simulation generation, but they transfer poorly to dexterous manipulation,
which demand...
arxiv_ai
FoldPath: End-to-End Object-Centric Motion Generation via Modulated Implicit Paths
Abstract: Abstract: Object-Centric Motion Generation (OCMG) is instrumental in advancing
automated manufacturing processes, particularly in domains requiring
high-precision expert robotic motions, such as spray painting and welding. To
realize effective automa...
arxiv_cv
Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process
Abstract: Abstract: Vision-language-action (VLA) models aim to understand natural language
instructions and visual observations and to execute corresponding actions as an
embodied agent. Recent work integrates future images into the
understanding-acting loop, ...
arxiv_cv
Finite element-based space-time total variation-type regularization of the inverse problem in electrocardiographic imaging
Abstract: Abstract: Reconstructing cardiac electrical activity from body surface electric
potential measurements results in the severely ill-posed inverse problem in
electrocardiography. Many different regularization approaches have been
proposed to improve nu...
arxiv_cv
mmCooper: A Multi-agent Multi-stage Communication-efficient and Collaboration-robust Cooperative Perception Framework
Abstract: Abstract: Collaborative perception significantly enhances individual vehicle perception
performance through the exchange of sensory information among agents. However,
real-world deployment faces challenges due to bandwidth constraints and
inevitable ...
arxiv_cv
MindJourney: Test-Time Scaling with World Models for Spatial Reasoning
Abstract: Abstract: Spatial reasoning in 3D space is central to human cognition and indispensable
for embodied tasks such as navigation and manipulation. However,
state-of-the-art vision-language models (VLMs) struggle frequently with tasks
as simple as antici...
Loading more papers...
📚 You've reached the end of the papers list