Despite the impact of DEC-MDPs over the past decade, scaling to large problem domains has been difficult to achieve.
The scale-up problem is exacerbated in DEC-MDPs with
continuous states, which are critical in domains involving
time; the latest algorithm (M-DPFP) does not scale-up beyond two agents and a handful of unordered tasks per agent.
This paper is focused on meeting this challenge in continuous resource DEC-MDPs with two predominant contributions. First, it introduces a novel continuous time model for
multi-agent planning problems that exploits transition independence in domains with graphical agent dependencies
and temporal constraints. More importantly, it presents a
new, iterative, locally optimal algorithm called SPAC that
is a combination of the following key ideas: (1) defining
a novel augmented CT-MDP such that solving this singleagent continuous time MDP provably provides an automatic
best response to neighboring agents’ policies; (2) fast convolution to efficiently generate such augmented MDPs; (3)
new enhanced lazy approximation algorithm to solve these
augmented MDPs; (4) intelligent seeding of initial policies
in the iterative process; (5) exploiting graph structure of
reward dependencies to exploit local interactions for scalability. Our experiments show SPAC not only finds solutions
substantially faster than M-DPFP with comparable quality,
but also scales well to large teams of agents.