The recent advancement in deep reinforcement learning (RL) enables solving complex high-dimensional problems in robotics. Nevertheless, effectively training an RL policy requires exploring robot states and actions that may be unsafe for the robot. Therefore, a recent paper by Google Research introduces a RL framework for learning legged locomotion while satisfying safety constraints during training.
The framework consists of two policies. A “safe recovery policy” recovers robots from near-unsafe states, and a “learner policy” performs the desired control task. The effectiveness of the algorithm is demonstrated on three locomotion tasks. A policy with no falls and without the need for a manual reset is achieved for the efficient gait and catwalk tasks.
A two-leg balance task is trained with only four falls. The paper shows that it is possible to learn legged locomotion skills autonomously and safely in the real world.