Disentangled Unsupervised Skill Discovery for
Efficient Hierarchical Reinforcement Learning

The University of Texas at Austin
NeurIPS 2024

Overview

Unsupervised skill discovery holds the promise of improving the sample efficiency of Reinforcement Learning, by learning a set of reusable skills through reward-free interaction with the environment. These skills can be later recombined to tackle multiple downstream tasks more efficiently. In practice, however, learning to use and recombine these skills can be extremely hard for an agent trying to solve downstream tasks, especially in complex domains. We propose Disentangled Unsupervised Skill Discovery (DUSDi), a method for learning disentangled skills that can be efficiently reused to solve downstream tasks. DUSDi decomposes skills (e.g. learning to drive) into disentangled components (e.g. controlling speed, steering, and headlights), where each skill component only affects one factor of the state space. Importantly, these skill components can be concurrently composed to generate low-level actions, and efficiently chained to tackle downstream tasks through hierarchical Reinforcement Learning. DUSDi defines a novel mutual-information-based objective to enforce disentanglement between the influences of different skill components, and utilizes value factorization to optimize this objective efficiently. Evaluated in a set of challenging environments, DUSDi successfully learns disentangled skills, and significantly outperforms previous skill discovery methods when it comes to applying the learned skills to solve downstream tasks.

Skills Visualization

Random Skills Visualization

We first examine DUSDi skills by randomly sampling a skill vector z and check if the skill policy induces divers behaviors. Here are some results.

Image 1 Image 2 Image 3 Image 4

iGibson

Image 1 Image 2 Image 3 Image 4

DMC-Walker

Image 1 Image 2 Image 3 Image 4

Particle

Disentangled Skills Visualization

However, a more interesting way to visualize DUSDi skills is to randomly perturb a specific skill component/dimension and observe the overall effect on the system's behavior, where we expect only one factor of the environment to change value.

In Particle, perturbing one skill dimension would cause only one particle to shift its interaction with the landmark (in this case the grey agent)

Perturbing another skill dimension causes another agent to drastically shift its interaction with the landmark (in this case the orange agent)

Of course, we can simultaneously perturb two skill dimensions such that two agents react (in this case, the blue and the red agents)

Image 1 Image 2

In iGibson (fetch), perturbing one skill dimension causes the robot to change position, while keeping its camera direction and end-effector direction fixed

Image 1 Image 2

In iGibson (tiago), changing the first skill component causes the left hand wiping region to change (marked by red)

Image 1 Image 2

Changing another skill component caused the right arm to shift wiping regions (marked by blue)

Image 1 Image 2

Yet another skill component caused the robot to switch position

Experiments

These disentangled skills can then be used as low-level policy in a hierarchical RL setting, achieving significantly superior downstream task learning performance compared to previous state-of-the-art methods. Please checkout our paper and code for a detailed description of our methods and experiments.