MORE addresses robotic planning for mobile manipulation in large-scale environments by grounding language in scene graphs, employing instance differentation, and pruning task-relevant sub graphs in order minimize hallucinations.

Autonomous long-horizon mobile manipulation presents significant challenges, including dynamic environments, unexplored spaces, and the need for robust error recovery. Although recent approaches employ foundation models for scene-level reasoning and planning, their effectiveness diminishes in large-scale settings with numerous objects. To address these limitations, we propose MORE, a novel framework that augments language models for zero-shot planning in mobile manipulation rearrangement tasks. MORE utilizes scene graphs for structured environment representation, incorporates instance-level differentiation, and introduces an active filtering mechanism to extract task-relevant subgraphs, thereby constraining the planning space and enhancing reliability. Our method supports both indoor and outdoor scenarios and demonstrates superior performance on 81 tasks from the BEHAVIOR-1K benchmark, outperforming existing foundation model-based methods and exhibiting strong generalization to complex real-world tasks.

Technical Approach

Overview of our approach
MORE: Starting in an unexplored environment, we continuously construct a hierarchical scene graph of the environment based on RGB-D data and semantic segmentation. We first build a dense occupancy map and then extract a navigational Voronoi graph. This graph is separated at room borders based on door locations, then clustered and segmented into rooms and outdoor regions. The resulting scene graph is converted to natural language descriptions. To generate a bounded planning problem, we first filter the observed objects by task relevance and then use an LLM as a task planner in the resulting subgraph. The LLM orchestrates navigation, manipulation, and exploration subpolicies, which in turn result in an updated scene representation.

Code

This work is released under CC BY-NC-SA license. A software implementation of this project can be found on GitHub.

Publications

If you find our work useful, please consider citing our paper:

Mohammad Mohammadi, Daniel Honerkamp, Martin Büchner, Matteo Cassinelli, Tim Welschehold, Fabien Despinoy, Igor Gilitschenski, and Abhinav Valada
MORE: Mobile Manipulation Rearrangement Through Grounded Language Reasoning
Under review, 2025.

(PDF) (BibTeX)

Authors

Mohammad Mohammadi

Mohammad Mohammadi

University of Freiburg, University of Toronto

Daniel Honerkamp

Daniel Honerkamp

University of Freiburg

Martin Büchner

Martin Büchner

University of Freiburg

Matteo Cassinelli

Matteo Cassinelli

Toyota Motor Europe

Tim Welschehold

Tim Welschehold

University of Freiburg

Fabien Despinoy

Fabien Despinoy

Toyota Motor Europe

Igor Gilitschenski

Igor Gilitschenski

University of Toronto

Abhinav Valada

Abhinav Valada

University of Freiburg

Acknowledgment

This work was funded by Toyota Motor Europe, an academic grant from NVIDIA, and the BrainLinks-BrainTools center of University of Freiburg.