Teamwork in Distributed POMDPs: Execution-time Coordination Under Model Uncertainty

Citation:

Jun-young Kwak, Rong Yang, Zhengyu Yin, Matthew E. Taylor, and Milind Tambe. 2011. “Teamwork in Distributed POMDPs: Execution-time Coordination Under Model Uncertainty .” In International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) .

Abstract:

Despite their NEXP-complete policy generation complexity [1], Distributed Partially Observable Markov Decision Problems (DEC-POMDPs) have become a popular paradigm for multiagent teamwork [2, 6, 8]. DEC-POMDPs are able to quantitatively express observational and action uncertainty, and yet optimally plan communications and domain actions. This paper focuses on teamwork under model uncertainty (i.e., potentially inaccurate transition and observation functions) in DEC-POMDPs. In many domains, we only have an approximate model of agent observation or transition functions. To address this challenge we rely on execution-centric frameworks [7, 11, 12], which simplify planning in DEC-POMDPs (e.g., by assuming costfree communication at plan-time), and shift coordination reasoning to execution time. Specifically, during planning, these frameworks have a standard single-agent POMDP planner [4] to plan a policy for the team of agents by assuming zero-cost communication. Then, at execution-time, agents model other agents’ beliefs and actions, reason about when to communicate with teammates, reason about what action to take if not communicating, etc. Unfortunately, past work in execution-centric approaches [7, 11, 12] also assumes a correct world model, and the presence of model uncertainty exposes key weaknesses that result in erroneous plans and additional inefficiency due to reasoning over incorrect world models at every decision epoch. This paper provides two sets of contributions. The first is a new execution-centric framework for DEC-POMDPs called MODERN (MOdel uncertainty in Dec-pomdp Execution-time ReasoNing). MODERN is the first execution-centric framework for DECPOMDPs explicitly motivated by model uncertainty. It is based on three key ideas: (i) it maintains an exponentially smaller model of other agents’ beliefs and actions than in previous work and then further reduces the computation-time and space expense of this model via bounded pruning; (ii) it reduces execution-time computation by exploiting BDI theories of teamwork, thus limiting communication to key trigger points; and (iii) it simplifies its decision-theoretic reasoning about communication over the pruned model and uses a systematic markup, encouraging extra communication and reducing uncertainty among team members at trigger points. This paper’s second set of contributions are in opening up model uncertainty as a new research direction for DEC-POMDPs and emphasizing the similarity of this problem to the Belief-DesireIntention (BDI) model for teamwork [5, 9]. In particular, BDI teamwork models also assume inaccurate mapping between realworld problems and domain models. As a result, they emphasize robustness via execution-time reasoning about coordination [9]. Given some of the successes of prior BDI research in teamwork, we leverage insights from BDI in designing MODERN.
See also: 2011