LEGS

Incrementally Building Room-Scale Language-Embedded Gaussian Splats (LEGS) with a Mobile Robot

1 The AUTOLab at UC Berkeley

2 The Toyota Research Institute

*Denotes Equal Contribution

IROS 2024 (Oral)





Overview

Building semantic 3D maps can be valuable for searching offices, warehouses, stores and homes for objects of interest. We present a multi-camera mapping system that incrementally builds a Language-Embedded Gaussian Splat (LEGS), a detailed 3D scene representation that encodes both appearance and semantics in a unified representation. LEGS is trained online as the robot traverses its environment, enabling localization of open-vocabulary object queries. We evaluate LEGS on three room-scale scenes where we query random objects in the scene to assess the system's ability to capture semantic meaning. We compare our system to LERF for these three scenes and find that while both systems have comparable object query success rates, LEGS trains over 3.5x faster than LERF. Qualitative results suggest that multi-camera setup and incremental bundle adjustment boost visual reconstruction quality in constrained robot trajectories, and experimental results suggest LEGS can localize objects with up to 66% accuracy across three large indoor environments, and produce high fidelity Gaussian Splats in an online manner by integrating bundle adjustment updates.

LEGS
LEGS2
Large-scale language-embedded Gaussian splatting setup. The Gaussian splat 3D reconstruction was used to render a novel view of a large-scale environment. Given open-vocabulary queries, LEGS can localize the desired objects as seen with the heatmap activations.

Citation

If you use this work or find it helpful, please consider citing: (bibtex)

@article{yu2024language,
title={Language-Embedded Gaussian Splats (LEGS): Incrementally Building Room-Scale Representations with a Mobile Robot},
author={Yu, Justin and Hari, Kush and Srinivas, Kishore and El-Refai, Karim and Rashid, Adam 
and Kim, Chung Min and Kerr, Justin and Cheng, Richard and Irshad, Muhammad Zubair 
and Balakrishna, Ashwin and Kollar, Thomas and Goldberg, Ken},
journal={arXiv preprint arXiv:2409.18108},
year={2024}
} 

Acknowledgements

This work was supported by the Toyota Research Institute (TRI)

For questions, please contact

 yujustin@berkeley.edu, kush_hari@berkeley.edu