MORE

MORE addresses robotic planning for mobile manipulation in large-scale environments by grounding language in scene graphs, employing instance differentation, and pruning task-relevant sub graphs in order minimize hallucinations.

Autonomous long-horizon mobile manipulation presents significant challenges, including dynamic environments, unexplored spaces, and the need for robust error recovery. Although recent approaches employ foundation models for scene-level reasoning and planning, their effectiveness diminishes in large-scale settings with numerous objects. To address these limitations, we propose MORE, a novel framework that augments language models for zero-shot planning in mobile manipulation rearrangement tasks. MORE utilizes scene graphs for structured environment representation, incorporates instance-level differentiation, and introduces an active filtering mechanism to extract task-relevant subgraphs, thereby constraining the planning space and enhancing reliability. Our method supports both indoor and outdoor scenarios and demonstrates superior performance on 81 tasks from the BEHAVIOR-1K benchmark, outperforming existing foundation model-based methods and exhibiting strong generalization to complex real-world tasks.

Technical Approach

MORE: Starting in an unexplored environment, we continuously construct a hierarchical scene graph of the environment based on RGB-D data and semantic segmentation. We first build a dense occupancy map and then extract a navigational Voronoi graph. This graph is separated at room borders based on door locations, then clustered and segmented into rooms and outdoor regions. The resulting scene graph is converted to natural language descriptions. To generate a bounded planning problem, we first filter the observed objects by task relevance and then use an LLM as a task planner in the resulting subgraph. The LLM orchestrates navigation, manipulation, and exploration subpolicies, which in turn result in an updated scene representation.

Publications

If you find our work useful, please consider citing our paper:

Mohammad Mohammadi, Daniel Honerkamp, Martin Büchner, Matteo Cassinelli, Tim Welschehold, Fabien Despinoy, Igor Gilitschenski, and Abhinav Valada
MORE: Mobile Manipulation Rearrangement Through Grounded Language Reasoning
Under review, 2025.

(PDF) (BibTeX)