Tracking and manipulating irregularly-shaped, previously unseen objects in dynamic environments is important for robotic applications in manufacturing, assembly, and logistics. Recently introduced Gaussian Splats can efficiently model the geometry of such objects. We present Persistent Object Gaussian Splat (POGS), a system that embeds semantics, self-supervised visual features, and object grouping features into a compact representation. Embedded visual features enable online and persistent object tracking and manipulation by dynamically updating the POGS as objects move, without the need for scene recapture or retraining. After an initial multi-view scene capture and training phase, POGS uses a single stereo camera to identify and track multiple objects online, supporting grasping, reorientation, and natural language queries. We evaluate POGS through physical robot experiments on two tasks: sequential pick-and-place where a human randomly arranges irregular objects to different poses between grasps and tool in-gripper visual servoing to track a target as a human moves both the target and perturbs the tool in the robot gripper.
This work was supported by the Toyota Research Institute (TRI)
yujustin@berkeley.edu, kush_hari@berkeley.edu