Commercial plant phenotyping systems using fixed cameras cannot perceive many plant details due to leaf occlusion. In this paper, we present Botany-Bot, a system for building detailed “annotated digital twins” of living plants using two stereo cameras, a digital turntable inside a lightbox, an industrial robot arm, and 3D segmentated Gaussian Splat models. We also present robot algorithms for manipulating leaves to take high-resolution indexable images of occluded details such as stem buds and the underside/topside of leaves. Results from experiments suggest that Botany-Bot can segment leaves with 90.8% accuracy, detect leaves with 86.2% accuracy, lift/push leaves with 77.9% accuracy, and take detailed overside/underside images with 77.3% accuracy.
Botany-Bot uses a light box, two fixed cameras and a digital turntable to obtain a plant scan. To obtain multi-view camera poses we place an ArUco marker on the turntable and calibrate the camera-to-turntable pose for angles through which the turntable moves. Next, we place the plant on top of the turntable and repeat the same angles, which results in a multi-view posed capture. We utilize two ZED 2Stereo cameras oriented vertically, for a total of 4 angles of elevation, and rotate the turntable to evenly spaced radial angles. Every plant also has an ArUco marker which we use to save a relative pose between the plant and the turntable by calculating the relative pose between the camera- to-turntable pose and the camera-to-plant pose.
The rotating turntable multiview capture breaks the core fundamental assumption in NeRF and 3DGS that the scene remain static during capture in two ways: 1) the background around the object is static relative to the camera, and 2) lighting on the surface of the object is not 3D-consistent. To alleviate 1), we preprocess input data by automatically masking the potted plant with Segment Anything 2 (SAM 2). During radiance field construction, we do not compute standard loss functions on pixels lying outside this mask. We implement an extra L1 loss between the potted plant’s mask and accumulation in the gaussian splatting reconstruction which allows us to delete spurious geometry in the scene. We refer to this loss as an alpha loss. We use GARField, for segmenting the various parts of the plants. See below the Gaussian splatting reconstruction for the plants and click parts to see the resultant segmentation.
These 3D segmented reconstructions are rendered in-browser! If you think that's cool, check out Viser!
Certain plant regions like leaf undersides, can be under-reconstructed in teh resulting 3D reconstruction while being crucial for detecting issues such as pest infestations or diseases. To
address this, Botany-Bot uses robot interaction with a custom
end-effector to lift/push down each leaf toward a static
camera, capturing high-resolution underside/overside images.
To achieve this we have three task primitives for manipulating a target leaf using the robot arm with its inspection tool and turntable:
We evaluate the following three metrics for 3D reconstruction:
and these two metrics for autonomous robot leaf inspection:
Across 68 safely accessible leaves across 8 plants, the robot can successfully lift/push 53 leaves. Where the leaf is not sufficiently aligned with the camera, we mark this as a failure even if the lift/push motion is executed correctly. In one notable case (Croton), the robot successfully pushes down a leaf but the leaf is broken. We do not include this leaf among the considered results. Besides breakage, for lifting/pushing, there are three main failure cases:
A robot may fail to interact with a leaf properly if the gripper accidentally catches on a lower leaf during its motion, bends the stem, and causes the leaf to rotate out of the way. Also, even if the robot made proper contact with the leaf initially, the leaf may slip out of the way depending on how it is connected to the stem. Any pose registration error will only exacerbate these problems. Solving this would require some form of visual closed-loop servoing to detect any errors.
Out of the 53 leaves that were pushed/lifted, their over- sides/undersides are visible in 41 cases. For observing leaf oversides/undersides, the biggest challenges are singulating the leaf and choosing the correct distance to lift/push down the leaf. Most failure cases (6/12) are due to a nearby leaf that gets lifted up/pushed down, and blocks the underside of the target leaf from the camera view. Collision-free, contact-aware motion planning with the 3D plant model would be required to “burrow” between leaves carefully. Another failure mode (5/12) is the leaf being lifted/pushed but not enough to expose its underside/overside. This is because the leaf lift/push distance selection is a naive implementation; the lift/push distance should depend on the leaf’s position in the camera image center, as the leaves lower on the camera perspective should be lifted up a larger amount while leaves higher on the camera perspective should be pushed up a larger amount. Another solution could be to implement a closed-loop motion that takes a image just before losing leaf contact. Lastly, we notice that the gripper did not always orient parallel to the leaf surface. Improving the plant-specific tuning of this parameter could help fix this.
If you use this work or find it helpful, please consider citing: (bibtex)
@inproceedings{adebola2025botanybot,
title={Botany-Bot: Digital Twin Monitoring of Occluded and Underleaf Plant Structures with Gaussian Splats},
author={Simeon Adebola and Chung Min Kim `and Justin Kerr and Shuangyu Xie and Prithvi Akella and Jose Luis Susa Rincon and Eugen Solowjow and Ken Goldberg},
booktitle={2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2025},
}