Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Realistic and interactive surgical simulation has the potential to facilitate
crucial applications, such as medical professional training and autonomous
surgical agent training. In the natural visual domain, world models have
enabled action-controlled data generation, demonstrating the potential to train
autonomous agents in interactive simulated environments when large-scale real
data acquisition is infeasible. However, such works in the surgical domain have
been limited to simplified computer simulations, and lack realism. Furthermore,
existing literature in world models has predominantly dealt with action-labeled
data, limiting their applicability to real-world surgical data, where obtaining
action annotation is prohibitively expensive. Inspired by the recent success of
Genie in leveraging unlabeled video game data to infer latent actions and
enable action-controlled data generation, we propose the first surgical vision
world model. The proposed model can generate action-controllable surgical data
and the architecture design is verified with extensive experiments on the
unlabeled SurgToolLoc-2022 dataset. Codes and implementation details are
available at https://github.com/bhattarailab/Surgical-Vision-World-Model