Keywords: Reinforcement Learning, Markov Decision Processes, Contextual Bandits, Learning Theory, Federated Learning, Multi-agent, General Function Approximation, Machin Learning
TL;DR: We propose provably efficient algorithms for multi-agent reinforcement learning with general function approximation, where agents cooperate through asynchronous communications with a central server to learn a shared environment.
Abstract: We study multi-agent reinforcement learning (RL) where agents cooperate through asynchronous communications with a central server to learn a shared environment. Our first focus is on the case of multi-agent contextual bandits with general function approximation, for which we introduce the Async-NLin-UCB algorithm. This algorithm is proven to achieve a regret of $\tilde{O} (\sqrt{T \dim_E(\mathcal{F}) \log N(\mathcal{F})})$ and a communication complexity of $\tilde{O} (M^2 \dim_E(\mathcal{F}))$, where $M$ is the total number of agents and $T$ is the number of rounds, while $\dim_E(\mathcal{F})$ and $N(\mathcal{F})$ are the Eluder dimension and the covering number of function space $\mathcal{F}$ respectively. We then progress to the more intricate framework of multi-agent RL with general function approximation, and present the Async-NLSVI-UCB algorithm. This algorithm enjoys a regret of $\tilde{O} (H^2 \sqrt{K \dim_E(\mathcal{F}) \log N(\mathcal{F})})$ and a communication complexity of $\tilde{O} (H M^2 \dim_E(\mathcal{F}))$, where $H$ is the horizon length and $K$ the number of episodes. Our findings showcase the provable efficiency of both algorithms in fostering collaborative learning within nonlinear environments, and they achieve this with minimal communication overhead.
Primary Area: Learning theory
Submission Number: 20701
Loading