First-order methods for quadratic optimization such as OSQP are widely used for large-scale machine learning and embedded optimal control, where many related problems must be rapidly solved. These methods face two persistent challenges: manual hyperparameter tuning and convergence time to high-accuracy solutions. To address these, we explore how Reinforcement Learning (RL) can learn a policy to tune parameters to accelerate convergence. In experiments with well-known QP benchmarks we find that our RL policy, RLQP, significantly outperforms state-of-the-art QP solvers by up to 3x. RLQP generalizes surprisingly well to previously unseen problems with varying dimension and structure from different applications, including the QPLIB, Netlib LP and Maros-Mészáros problems.
We consider optimizing quadratic programs (QPs) with
These problems find frequent application to finance, machine learning and robotic control. For example, quadratic programming is critical to Model Predictive Control (MPC) for robotics. Given a problem with linear dynamics, the optimal solution is derived by solving a QP.
The current SOTA solver for QPs is the Operator Splitting Quadratic Program solver which is based on ADMM. OSQP first factorizes the KKT matrix derived from optimality conditions and then iteratively applying a series of updates scaled by the step-size parameter
In order to automatically select
In both handcrafted and RL cases, the policy is a function
This simple scalar policy outperforms the handwritten heuristic in OSQP. However, this simple policy does not consider variations in how
Instead, we consider a reformulation of the vectorized environment as a multi-agent partially-observed MDP. Given a QP with
Jeffrey Ichnowski, Paras Jain, Bartolomeo Stellato, Goran Banjac, Michael Luo, Francesco Borrelli, Joseph E. Gonzalez, Ion Stoica, Ken Goldberg. Accelerating Quadratic Optimization with Reinforcement Learning. Proc. Conference on Neural Information Processing Systems (NeurIPS), 2021.
@article{ichnowski2021rlqp,
title={Accelerating Quadratic Optimization with Reinforcement Learning},
author={Jeffrey Ichnowski, Paras Jain, Bartolomeo Stellato,
and Goran Banjac, Michael Luo, Francesco Borrelli
and Joseph E. Gonzalez, Ion Stoica, Ken Goldberg},
journal={Proc. Conference on Neural Information Processing Systems (NeurIPS)},
year={2021}
}