Skip to main content
Forum for Artificial Intelligence
-
GDC 6.302 and via Zoom

Deterministic Policy Gradients in the Era of Soft RL: Mixture of Actors and Adaptive Ensembles

Guni Sharon
Associate Professor, Department of Computer Science and Engineering, Texas A&M University

Abstract: In modern continuous-control reinforcement learning, soft (stochastic) policy gradients have become the dominant choice—especially when training expressive policy classes such as Gaussian Mixture Models (GMMs). In this talk, I revisit this trend and show that deterministic policy gradients can, in many cases, be more effective for optimizing mixture-based actors. I begin by comparing deterministic and soft policy gradients in the context of GMM actors, highlighting their conceptual differences, optimization characteristics, and empirical behavior.

Building on this comparison, I introduce a method for incorporating entropy maximization directly into deterministic GMM gradients. This is achieved using a variational approximation of the KL divergence between mixture components, enabling principled entropy-driven exploration. While this approach yields strong performance—outperforming stochastic GMM optimization in multiple dense-reward MuJoCo tasks—it also introduces several additional hyperparameters that demand careful tuning for stability and efficiency.

To address these practical challenges, I next present a deterministic actor-ensemble algorithm designed to retain the benefits of deterministic gradients while mitigating the tuning burden. It provides two key solutions: (1) dual-randomized actor selection that preserves diversity without requiring explicit regularization, and (2) adaptive pruning that automatically removes redundant or low-performing actors using critic-guided value and action-similarity criteria.

Together, these mechanisms offer a practical, tuning-light framework for deterministic policy gradient optimization. The deterministic actor-ensemble algorithm achieves competitive or superior performance across dense-reward MuJoCo tasks and strong results on sparse-reward Fetch benchmarks relative to deterministic baselines, while also improving computational efficiency over fixed-size ensembles. This talk demonstrates that deterministic gradients—when combined with principled entropy design and practical ensemble management—constitute a powerful and underappreciated alternative to today’s soft gradient methods.

Speaker Bio: Dr. Guni Sharon is an Associate Professor in the Department of Computer Science and Engineering at Texas A&M University. He earned his B.Sc., M.Sc., and Ph.D. from Ben-Gurion University and subsequently served as a Postdoctoral Fellow at the University of Texas at Austin.
His research focuses on Reinforcement Learning, Multi-Agent Systems, Algorithmic Game Theory, and Combinatorial Search, with an emphasis on deploying AI-driven methods in safety-critical transportation and infrastructure systems. His work bridges foundational algorithmic research with real-world impact in intelligent transportation and autonomous systems. Dr. Sharon’s contributions have been recognized with several prestigious honors, including the NSF CAREER Award (2023), the Bergmann Memorial Research Award (2024), and the Dean of Engineering Excellence Award (2024). He has also received multiple Best Paper Awards at leading AI venues, including the Journal of Artificial Intelligence Research, the AAAI Conference on Artificial Intelligence, and the International Symposium on Combinatorial Search. His research has been featured in invited talks at major international conferences and covered by both national and international media.