B.Eng. Thesis · February 2026
Scaling MARL-CPC: Achieving Decentralized Coordination in Multi-Agent Environments
Symbol Emergence Systems Lab, Kyoto University · Advisor: Prof. Tadahiro Taniguchi
MARL-CPC is a variational take on Collective Predictive Coding that gives multi-agent reinforcement-learning agents a reward-independent reason to communicate: each agent learns to send messages that let the group mutually predict one another’s observations, so communication emerges even in non-cooperative settings.
Prior work showed this works for two agents exchanging a single message. My thesis extends the framework to multi-round message passing and investigates how it scales to 3–5 agents, comparing two CPC loss strategies — Final Round (loss only on the last round) and Every Round (loss accumulated each round).
Across a non-cooperative Bandit coordination task, Every Round’s advantage over Final Round and prior work grew substantially with agent count — evidence that dense per-round learning signal is essential for scaling CPC-based emergent communication. Full training and evaluation pipeline implemented in PyTorch.