Today's Deep Reinforcement Learning algorithms produce black-box policies, that are often difficult to interpret and trust for a person. We introduce a policy distilling algorithm, based on the CN2 rule mining algorithm, that distills the deep policy into a rule-based decision system. At the core of our approach is the fact that an RL process does not just learn a policy, a mapping from states to individual actions, but also produces extra meta-information, such as lists of visited states, or action probabilities. This meta-information can, for example, indicate whether more than one action is near-optimal for a certain state. We exploit knowledge about these equally-good actions to distill the policy into fewer rules, which contributes to interpretability, while ensuring that the performance of the distilled policy still matches the original policy. This ensures that we don't provide an explanation for a degenerate or over-simplified policy. We demonstrate the applicability of our algorithm to the Mario AI benchmark, a complex task that requires modern deep reinforcement learning algorithms. The explanations we produce capture the learned policy in only a few rules, and can be further refined and tailored by the user with a two-step process that we introduce in this paper.
Original languageEnglish
Title of host publicationProceedings of the 1st TAILOR workshop at ECAI 2020
Number of pages16
Publication statusPublished - 4 Sep 2020

ID: 53530017