Unsupervised skill discovery holds the promise of improving the sample efficiency of Reinforcement Learning, by learning a set of reusable skills through reward-free interaction with the environment. These skills can be later recombined to tackle multiple downstream tasks more efficiently. In practice, however, learning to use and recombine these skills can be extremely hard for an agent trying to solve downstream tasks, especially in complex domains. We propose Disentangled Unsupervised Skill Discovery (DUSDi), a method for learning disentangled skills that can be efficiently reused to solve downstream tasks. DUSDi decomposes skills (e.g. learning to drive) into disentangled components (e.g. controlling speed, steering, and headlights), where each skill component only affects one factor of the state space. Importantly, these skill components can be concurrently composed to generate low-level actions, and efficiently chained to tackle downstream tasks through hierarchical Reinforcement Learning. DUSDi defines a novel mutual-information-based objective to enforce disentanglement between the influences of different skill components, and utilizes value factorization to optimize this objective efficiently. Evaluated in a set of challenging environments, DUSDi successfully learns disentangled skills, and significantly outperforms previous skill discovery methods when it comes to applying the learned skills to solve downstream tasks.
We first examine DUSDi skills by randomly sampling a skill vector z and check if the skill policy induces divers behaviors. Here are some results.
However, a more interesting way to visualize DUSDi skills is to randomly perturb a specific skill component/dimension and observe the overall effect on the system's behavior, where we expect only one factor of the environment to change value.
These disentangled skills can then be used as low-level policy in a hierarchical RL setting, achieving significantly superior downstream task learning performance compared to previous state-of-the-art methods. Please checkout our paper and code for a detailed description of our methods and experiments.