Online learning from demonstration algorithms, such as DAgger, can learn policies for problems where the system dynamics and the cost function are unknown. However, during learning, they impose a burden on supervisors to respond to queries each time the robot encounters new states while executing its current best policy. Algorithms such as MMD-IL reduce supervisor burden by filtering queries with insufficient discrepancy in distribution and maintaining multiple policies. We introduce the SHIV algorithm (Svm-based reduction in Human InterVention), which converges to a single policy and reduces supervisor burden in non-stationary high dimensional state distributions. To facilitate scaling and outlier rejection, filtering is based on distance to an approximate level set boundary defined by a One Class support vector machine. We report on experiments in three contexts: 1) a driving simulator with a 27,936 dimensional visual feature space, 2) a push-grasping in clutter simulation with a 22 dimensional state space, and 3) physical surgical needle insertion with a 16 dimensional state space. Results suggest that SHIV can efficiently learn policies with equivalent performance requiring up to 70% fewer queries.
SHIV: Reducing Supervisor Burden using Support Vectors for Efficient Learning from Demonstrations in High Dimensional State Spaces Michael Laskey, Sam Staszak, Wesley Hsieh , Jeffrey Mahler,Florian Pokorny, Anca Dragan, Ken Goldberg. IEEE International Conference on Robotics and Automation, 2016 (Under Review). [PDF].
Theoretical Analysis of SHIV Michael Laskey, Jeffrey Mahler,Florian Pokorny, Anca Dragan, Ken Goldberg. [PDF].
This is an ongoing project at UC Berkeley with active contributions from:
Michael Laskey, Florian Pokorny, Jeff Mahler, Wesley Hsieh, Anca Dragan and Ken Goldberg
We recently extended this approach for the grasping in clutter domain use a hierarchy of supervisors. A preprint can be found here
.Past contributors include:
Sam Staszak
Please Contact Michael Laskey, laskeymd@berkeley.edu for code requests or further info