DeepRL | Hao Dai

Communication Learning in Multi-Agent Cooperation

Addressing the significant communication resource wastage caused by centralized training in multi-agent cooperation, this research introduces the NOAC training method for distributed communication learning.

By incorporating the concept of neighborhood consistency, we design an encoding network, allowing devices to train using only neighborhood data, thus effectively reducing the amount of transmitted data.

Furthermore, to address the non-stationarity of training in multi-agent reinforcement learning, the research leverages the concept of best response from game theory to design a pseudo-pre-action mechanism. By incorporating pseudo-pre-actions into communication, NOAC can deploy model training to multiple edge servers for distributed training. NOAC could achieve performance comparable to centralized training while substantially reducing data transmission during the training process.

Offline-to-Online Reinforcement Learning

To-do :

Optimistic In-Sample Learning Based on State Guidance.
Alleviating the discrepancy between offline RL and online RL.

Reference

[1] Dai, H., Wu, J., Brinkmann, A., & Wang, Y. (2023, September). Neighborhood-Oriented Decentralized Learning Communication in Multi-Agent System. In International Conference on Artificial Neural Networks (pp. 490-502). Cham: Springer Nature Switzerland.