Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
SteerVLM introduces a lightweight activation steering module for Vision-Language Models (VLMs) that enables fine-grained, inference-time control over outputs without modifying model weights. It learns from latent embeddings to dynamically adjust activations, preserving performance on off-target tasks and requiring only 0.14% of the original VLM's parameters. This offers a robust and efficient way to steer VLM behavior.
Enables the creation of more controllable and reliable multimodal AI applications, such as image generation tools that precisely follow user prompts or visual assistants that adapt their responses based on nuanced instructions.