SHIV


Project maintained by BerkeleyAutomation Hosted on GitHub Pages — Theme by mattgraham

Image cannot be displayed
Abstract

Online learning from demonstration algorithms, such as DAgger, can learn policies for problems where the system dynamics and the cost function are unknown. However, during learning, they impose a burden on supervisors to respond to queries each time the robot encounters new states while executing its current best policy. Algorithms such as MMD-IL reduce supervisor burden by filtering queries with insufficient discrepancy in distribution and maintaining multiple policies. We introduce the SHIV algorithm (Svm-based reduction in Human InterVention), which converges to a single policy and reduces supervisor burden in non-stationary high dimensional state distributions. To facilitate scaling and outlier rejection, filtering is based on distance to an approximate level set boundary defined by a One Class support vector machine. We report on experiments in three contexts: 1) a driving simulator with a 27,936 dimensional visual feature space, 2) a push-grasping in clutter simulation with a 22 dimensional state space, and 3) physical surgical needle insertion with a 16 dimensional state space. Results suggest that SHIV can efficiently learn policies with equivalent performance requiring up to 70% fewer queries.

Documents

SHIV: Reducing Supervisor Burden using Support Vectors for Efficient Learning from Demonstrations in High Dimensional State Spaces Michael Laskey, Sam Staszak, Wesley Hsieh , Jeffrey Mahler,Florian Pokorny, Anca Dragan, Ken Goldberg. IEEE International Conference on Robotics and Automation, 2016 (Under Review). [PDF].

Theoretical Analysis of SHIV Michael Laskey, Jeffrey Mahler,Florian Pokorny, Anca Dragan, Ken Goldberg. [PDF].

Video for Grasping in Clutter

Authors and Contributors

This is an ongoing project at UC Berkeley with active contributions from:
Michael Laskey, Florian Pokorny, Jeff Mahler, Wesley Hsieh, Anca Dragan and Ken Goldberg

We recently extended this approach for the grasping in clutter domain use a hierarchy of supervisors. A preprint can be found here

.

Past contributors include:
Sam Staszak

Support or Contact

Please Contact Michael Laskey, laskeymd@berkeley.edu for code requests or further info