Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
Introduces Butter-Bench, a novel benchmark designed to evaluate the 'practical intelligence' of LLM-controlled robots in navigating the complexities of the physical world. The benchmark reveals that while LLMs excel in analytical tasks, humans still significantly outperform them in embodied tasks, particularly in multi-step spatial planning and social understanding.
Provides a crucial tool for developers and researchers to accurately assess and improve the real-world capabilities of robots powered by LLMs, accelerating the development of more capable and reliable autonomous systems.