Open-Ended Discovery via Setter-Solver Games

Michael Dennis

Research Scientist at Google Deepmind

Abstract: There is immense value in using RL for virtual environments to improve tool-use, computer-use, Math, and Coding. Even in physical settings, the Genie models have paved a path for training within designable virtual environments. This presents an opportunity for setter-solver algorithms, like those used in Unsupervised Environment Design (UED) to drive efficient learning, transfer, and to lead to the open-ended discovery. In this talk, we address several challenges generalising these setter-solver auto-curricula past the traditional self-play that was critical for AlphaGo but struggles outside of the symmetric 2-player zero-sum setting. We present a setter-solver algorithm (PAIRED) fit for asymmetric settings where the setter may be able to create unsolvable tasks for the solver. Moreover, we present an approach to generalise arbitrary setter-solver algorithms (Rational Policy Gradient) to general-sum games while preventing behaviours that self-sabotage -- which go against their incentives in the underlying game -- thus preserving agent rationality. More broadly, we discuss some initial approaches for human-AI coordination around discoveries, presenting an initial approach for humans to understand AI discoveries through designing puzzles for human experts and for AI's to understand human's objectives by defining the best questions to ask to know which problems we care about.

Speaker Bio: I am interested with this interesection between Problem Specification and Open-Ended Complexity -- studying the boundary between what complexity must be described, and what can be artificially generated. To this end we have formalized the problem of Unsupervised Environment Design (UED), which aims to build complex and challenging environments automatically to promote efficient learning and transfer . This framework has deep connections to decision theory, which allows us to make guarantees about how the resulting policies would perform in human-designed environments, without having ever trained on them. I'm currently a Research Scientist on Google Deepmind's Openendedness team. I was previously a Ph.D. Student at the Center for Human Compatible AI (CHAI) advised by Stuart Russell. Prior to research in AI I conducted research on computer science theory and computational geometry.

Open-Ended Discovery via Setter-Solver Games

Colleges

Departments

Other

Stay Connected