Robotics: Science and Systems

Learning to Evolve: Multi-modal Interactive Fields for Robust Humanoid Navigation in Dynamic Environments

Peifeng Jiang¹, Hong Liu^1,*, Wenshuai Wang¹, Jin Jin², Xia Li³

¹State Key Laboratory of General Artificial Intelligence, Peking University, Shenzhen Graduate School
²Oxford Robotics Institute, University of Oxford
³Institute for Machine Learning, Department of Computer Science, ETH Zurich

Paper PDF Watch Video BibTeX Code Coming Soon

MIF teaser showing appearance, spatial, and geometry fields. — Multi-modal Interactive Fields couple appearance, spatial memory, and geometry for robust humanoid navigation.

Video

Humanoid Navigation and Adaptation Demo

Add your final RSS demonstration videos under docs/assets/videos/. The page is already wired for a main demo and optional adaptation or interaction clips.

Main System Demo

Unitree G1 executes language-conditioned navigation, detects scene changes, updates memory, and verifies IPS.

Suggested Additional Clips

assets/videos/adaptation.mp4: dynamic scene graph update after object relocation.
assets/videos/interaction.mp4: mesh recovery and interaction-pose safety verification.

Abstract

Robust Scene Memory for Humanoid Robots

Safe manipulation-oriented navigation for humanoid robots requires scene memory that remains reliable under locomotion-induced perceptual distortion, environmental changes, and interaction-level geometric safety constraints. MIF integrates confidence-aware semantic 3D Gaussian Splatting, discrepancy-triggered spatial memory updates, and task-driven geometric reconstruction in a closed-loop perception-adaptation pipeline. On a Unitree G1 humanoid in a real dynamic office, MIF improves relocation success from 12% to 94% compared with static scene-graph memory, while reducing semantic memory footprint by 91.4%.

Method

Three Coupled Fields

MIF treats scene memory as a locally revisable system representation, grounding language queries into spatial memory and interaction-ready geometry.

MIF framework pipeline. — Replace this placeholder with the final method overview figure from the paper.

Appearance Field

Builds a confidence-aware semantic 3DGS representation and suppresses gait-corrupted primitives during rendering and graph construction.

Spatial Field

Maintains topological scene memory and triggers local updates when persistent multi-modal discrepancies indicate relocated, removed, or newly introduced objects.

Geometry Field

Recovers object-centric meshes on demand and verifies terminal humanoid poses through interaction-pose safety checks.

Results

Real-World Dynamic Office Evaluation

The full ROS1 system runs on a centralized RTX 4090 workstation and communicates with a Unitree G1 humanoid during navigation and interaction trials.

94%IPS success

91.4%semantic memory reduction

0%observed collision rate

0.12mmean terminal error

IPS Success

Dense geometry makes interaction poses safer

Memory Reduction

Feature distillation keeps memory practical

Navigation Error

Stable tracking for humanoid navigation

Collision Rate

Local memory updates avoid obsolete collision-prone paths

Gallery

Paper Figures

Supplementary qualitative figures are shown as compact panels so large source images do not dominate the page.

Real Robot Experiments

Real-world Unitree G1 patrol, navigation, replanning, and object retrieval sequences.

Reconstructed Object Meshes

Object meshes generated by the Geometry Field.

Watertight mesh compared with sparse point cloud.

Mesh vs Sparse Point Cloud

Continuous watertight geometry exposes collision constraints missed by sparse centroids.

Denoised mapping quality and appearance-field update.

Appearance Field Quality

Denoised mapping quality and appearance-field memory update.

Citation

BibTeX

@inproceedings{jiang2026mif,
  title={Learning to Evolve: Multi-modal Interactive Fields for Robust Humanoid Navigation in Dynamic Environments},
  author={Jiang, Peifeng and Liu, Hong and Wang, Wenshuai and Jin, Jin and Li, Xia},
  booktitle={Robotics: Science and Systems},
  year={2026}
}