Asynchronous Multi-Agent Reinforcement Learning with General Function Approximation

Yuanzhou Chen; Jiafan He; Quanquan Gu

Asynchronous Multi-Agent Reinforcement Learning with General Function Approximation

Yuanzhou Chen, Jiafan He, Quanquan Gu

15 May 2024 (modified: 06 Nov 2024)Submitted to NeurIPS 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Markov Decision Processes, Contextual Bandits, Learning Theory, Federated Learning, Multi-agent, General Function Approximation, Machin Learning

TL;DR: We propose provably efficient algorithms for multi-agent reinforcement learning with general function approximation, where agents cooperate through asynchronous communications with a central server to learn a shared environment.

Abstract: We study multi-agent reinforcement learning (RL) where agents cooperate through asynchronous communications with a central server to learn a shared environment. Our first focus is on the case of multi-agent contextual bandits with general function approximation, for which we introduce the Async-NLin-UCB algorithm. This algorithm is proven to achieve a regret of $\tilde{O} (\sqrt{T \dim_E(\mathcal{F}) \log N(\mathcal{F})})$ and a communication complexity of $\tilde{O} (M^2 \dim_E(\mathcal{F}))$, where $M$ is the total number of agents and $T$ is the number of rounds, while $\dim_E(\mathcal{F})$ and $N(\mathcal{F})$ are the Eluder dimension and the covering number of function space $\mathcal{F}$ respectively. We then progress to the more intricate framework of multi-agent RL with general function approximation, and present the Async-NLSVI-UCB algorithm. This algorithm enjoys a regret of $\tilde{O} (H^2 \sqrt{K \dim_E(\mathcal{F}) \log N(\mathcal{F})})$ and a communication complexity of $\tilde{O} (H M^2 \dim_E(\mathcal{F}))$, where $H$ is the horizon length and $K$ the number of episodes. Our findings showcase the provable efficiency of both algorithms in fostering collaborative learning within nonlinear environments, and they achieve this with minimal communication overhead.

Primary Area: Learning theory

Submission Number: 20701

Loading