Despite their NEXP-complete policy generation complexity ,
Distributed Partially Observable Markov Decision Problems
(DEC-POMDPs) have become a popular paradigm for multiagent
teamwork [2, 6, 8]. DEC-POMDPs are able to quantitatively express observational and action uncertainty, and yet optimally plan
communications and domain actions.
This paper focuses on teamwork under model uncertainty (i.e.,
potentially inaccurate transition and observation functions) in
DEC-POMDPs. In many domains, we only have an approximate
model of agent observation or transition functions. To address this
challenge we rely on execution-centric frameworks [7, 11, 12],
which simplify planning in DEC-POMDPs (e.g., by assuming costfree communication at plan-time), and shift coordination reasoning
to execution time. Specifically, during planning, these frameworks
have a standard single-agent POMDP planner  to plan a policy for the team of agents by assuming zero-cost communication.
Then, at execution-time, agents model other agents’ beliefs and actions, reason about when to communicate with teammates, reason
about what action to take if not communicating, etc. Unfortunately,
past work in execution-centric approaches [7, 11, 12] also assumes
a correct world model, and the presence of model uncertainty exposes key weaknesses that result in erroneous plans and additional
inefficiency due to reasoning over incorrect world models at every
This paper provides two sets of contributions. The first is a
new execution-centric framework for DEC-POMDPs called MODERN (MOdel uncertainty in Dec-pomdp Execution-time ReasoNing). MODERN is the first execution-centric framework for DECPOMDPs explicitly motivated by model uncertainty. It is based on three key ideas: (i) it maintains an exponentially smaller model of
other agents’ beliefs and actions than in previous work and then further reduces the computation-time and space expense of this model
via bounded pruning; (ii) it reduces execution-time computation by
exploiting BDI theories of teamwork, thus limiting communication
to key trigger points; and (iii) it simplifies its decision-theoretic
reasoning about communication over the pruned model and uses a
systematic markup, encouraging extra communication and reducing uncertainty among team members at trigger points.
This paper’s second set of contributions are in opening up model
uncertainty as a new research direction for DEC-POMDPs and
emphasizing the similarity of this problem to the Belief-DesireIntention (BDI) model for teamwork [5, 9]. In particular, BDI
teamwork models also assume inaccurate mapping between realworld problems and domain models. As a result, they emphasize
robustness via execution-time reasoning about coordination .
Given some of the successes of prior BDI research in teamwork,
we leverage insights from BDI in designing MODERN.
Stackelberg games have recently gained significant attention
for resource allocation decisions in security settings. One
critical assumption of traditional Stackelberg models is that
all players are perfectly rational and that the followers perfectly observe the leader’s strategy. However, in real-world
security settings, security agencies must deal with human adversaries who may not always follow the utility maximizing
rational strategy. Accounting for these likely deviations is
important since they may adversely affect the leader’s (security agency’s) utility. In fact, a number of behavioral gametheoretic models have begun to emerge for these domains.
Two such models in particular are COBRA (Combined Observability and Bounded Rationality Assumption) and BRQR
(Best Response to Quantal Response), which have both been
shown to outperform game-theoretic optimal models against
human adversaries within a security setting based on Los Angeles International Airport (LAX). Under perfect observation
conditions, BRQR has been shown to be the leading contender for addressing human adversaries. In this work we
explore these models under limited observation conditions.
Due to human anchoring biases, BRQR’s performance may
suffer under limited observation conditions. An anchoring
bias is when, given no information about the occurrence of
a discrete set of events, humans will tend to assign an equal
weight to the occurrence of each event (a uniform distribution). This study makes three main contributions: (i) we
incorporate an anchoring bias into BRQR to improve performance under limited observation; (ii) we explore finding
appropriate parameter settings for BRQR under limited observation; (iii) we compare BRQR’s performance versus COBRA under limited observation conditions.
Multi-agent autonomous reasoning systems have emerged as a promising planning technique for addressing satellite defense problems. The main challenge is to extend and scale up the capabilities of current and
emerging reasoning and planning methods to handle the characteristics of the satellite defense problem. This
paper focuses on some key critical research issues that need to be addressed in order to perform automated
planning and execution fitted to the specific nature of response to ASAT attacks, and provides MAARS, a new
autonomous reasoning framework for satellite defense. As the core of MAARS, we present MODERN, a new
execution-centric method for DEC-POMDPs explicitly motivated by model uncertainty. There are two key
innovative features in MODERN: (i) it maintains an exponentially smaller model of other agents’ beliefs and
actions than in previous work and then further reduces the computation-time and space expense of this model
via bounded pruning; and (ii) it reduces execution-time computation by exploiting BDI theories of teamwork,
and limits communication reasoning to key trigger points. We demonstrate a proof of concept of MAARS in
the simplified ASAT mitigation scenario. We then show initial evaluation results of MAARS in ASAT domains
that are critical in advancing the state-of-the-art in providing autonomous reasoning to delve into unperceived
models as well as deal with exponential explosion of the computational complexity of current algorithms.
Despite their worst-case NEXP-complete planning
complexity, DEC-POMDPs remain a popular framework for
multiagent teamwork. This paper introduces effective teamwork under model uncertainty (i.e., potentially inaccurate
transition and observation functions) as a novel challenge for
DEC-POMDPs and presents MODERN, the first executioncentric framework for DEC-POMDPs explicitly motivated by
addressing such model uncertainty. MODERN’s shift of coordination reasoning from planning-time to execution-time avoids
the high cost of computing optimal plans whose promised
quality may not be realized in practice. There are three
key ideas in MODERN: (i) it maintains an exponentially
smaller model of other agents’ beliefs and actions than in
previous work and then further reduces the computationtime and space expense of this model via bounded pruning;
(ii) it reduces execution-time computation by exploiting BDI
theories of teamwork, and limits communication to key trigger
points; and (iii) it limits its decision-theoretic reasoning about
communication to trigger points and uses a systematic markup
to encourage extra communication at these points — thus
reducing uncertainty among team members at trigger points.
We empirically show that MODERN is substantially faster
than existing DEC-POMDP execution-centric methods while
achieving significantly higher reward.
This paper addresses the problem of detecting suspicious behavior from a collection of individuals events,
where no single event is enough to decide whether
his/her behavior is suspicious, but the combination
of multiple events enables reasoning. We establish a
Bayesian framework for evaluating multiple events and
show that the current approaches lack modeling behavior history included in the estimation whether a trace of
events is generated by a suspicious agent. We propose
a heuristic for evaluating events according to the behavior of the agent in the past. The proposed approach,
tested on an airport domain, outperforms the current approaches.
Recent years have seen a rise of interest in the deployment of multiagent systems in energy domains that inherently have uncertain
and dynamic environments with limited resources. In such domains, the key challenge is to minimize the energy consumption
while satisfying the comfort level of occupants in the buildings under uncertainty (regarding agent negotiation actions). As human
agents begin to interact with complex building systems as a collaborative team, it becomes crucial that the resulting multiagent
teams reason about coordination under such uncertainty to optimize
multiple metrics, which have not been systematically considered in
previous literature. This paper presents a novel multiagent system
based on distributed coordination reasoning under uncertainty for
sustainability called SAVES. There are three key ideas in SAVES:
(i) it explicitly considers uncertainty while reasoning about coordination in a distributed manner relying on MDPs; (ii) human behaviors and their occupancy preferences are incorporated into planning
and modeled as part of the system; and (iii) the influence of various
control strategies for multiagent teams is evaluated on an existing
university building as the practical research testbed with actual energy consumption data. We empirically show the preliminary results that our intelligent control strategies substantially reduce the
overall energy consumption in the actual simulation testbed compared to the existing control means while achieving comparable
average satisfaction level of occupants.
The primary consumers of building energy are heating, cooling, ventilation, and lighting systems, which
maintain occupant comfort, and electronics and appliances that enable occupant functionality. The optimization of
building energy is therefore a complex problem highly dependent on unique building and environmental conditions as well
as on time dependent operational factors. To provide computational support for this optimization, this paper presents and
implements a multi-agent comfort and energy simulation (MACES) to model alternative management and control of
building systems and occupants. Human and device agents are used to explore current trends in energy consumption and
management of a university test bed building. Reactive and predictive control strategies are then imposed on device agents
in an attempt to reduce building energy consumption while maintaining occupant comfort. Finally, occupant agents are
motivated by simulation feedback to accept more energy conscious scheduling through multi-agent negotiations. Initial
results of the MACES demonstrate potential energy savings of 17% while maintaining a high level of occupant comfort.
This work is intended to demonstrate a simulation tool, which is implementable in the actual test bed site and compatible
with real-world input to instigate and motivate more energy conscious control and occupant behaviors.
This paper discusses some of the recent cooperative multiagent systems work in the TEAMCORE lab at the University of Southern California.
Based in part on an invited talk at the CARE 2010 workshop, we highlight
how and why execution-time reasoning has been supplementing, or replacing,
planning-time reasoning in such systems.
The last five years have witnessed the successful application of game theory in reasoning about complex security problems [Basilico et al. 2009; Korzhyk et al. 2010; Dickerson et al. 2010; Jakob et al. 2010; Paruchuri et al. 2008; Pita et al. 2009; Pita et al. 2010; Kiekintveld et al. 2009; Jain et al. 2010]. Stackelberg games have been widely used to model patrolling or monitoring problems in security. In a Stackelberg security game, the defender commits to a strategy and the adversary makes its decision with knowledge of the leader’s commitment. Two systems applying Stackelberg game models to assist with randomized resource allocation decisions are currently in use by the Los Angeles International Airport (LAX) [Pita et al. 2008] and the Federal Air Marshals Service (FAMS) [Tsai et al. 2009]. Two new applications called GUARDS (Game-theoretic Unpredictable and Randomly Deployed Security) [Pita et al. 2011] and PROTECT (Port Resilience Operational / Tactical Enforcement to Combat Terrorism) are under development for the Transportation Security Administration (TSA) and the United States Coast Guard respectively. Both are based on Stackelberg games. In contrast with previous applications at LAX and FAMS, which focused on one-off tailored applications and one security activity (e.g., canine patrol, checkpoints, or covering flights) per application, both GUARDS and PROTECT face new challenging issues due to the potential large scale deployment. This includes reasoning about hundreds of heterogeneous security activities, reasoning over diverse potential threats, and developing a system designed for hundreds of end-users. In this article we will highlight several of the main issues that have arisen. We begin with an overview of the new applications and then discuss these issues in turn.
Entry control is an important security measure that prevents undesired persons from entering secure areas.
The advanced risk analysis presented in this paper makes it possible to distinguish between acceptable and
unacceptable entries, based on several entry sensors, such as fingerprint readers, and intelligent methods that
learn behavior from previous entries. We have extended the intelligent layer in two ways: first, by adding
a meta-learning layer that combines the output of specific intelligent modules, and second, by constructing
a Bayesian network to integrate the predictions of the learning and meta-learning modules. The obtained
results represent an important improvement in detecting security attacks.
Distributed Constraint Optimization (DCOP) is a popular framework for cooperative multi-agent decision making. DCOP is NPhard, so an important line of work focuses on developing fast incomplete solution algorithms for large-scale applications. One of
the few incomplete algorithms to provide bounds on solution quality is k-size optimality, which defines a local optimality criterion
based on the size of the group of deviating agents. Unfortunately,
the lack of a general-purpose algorithm and the commitment to
forming groups based solely on group size has limited the use of
This paper introduces t-distance optimality which departs from
k-size optimality by using graph distance as an alternative criteria
for selecting groups of deviating agents. This throws open a new research direction into the tradeoffs between different group selection
and coordination mechanisms for incomplete DCOP algorithms.
We derive theoretical quality bounds for t-distance optimality that
improve known bounds for k-size optimality. In addition, we develop a new efficient asynchronous local search algorithm for finding both k-size and t-distance optimal solutions — allowing these
concepts to be deployed in real applications. Indeed, empirical results show that this algorithm significantly outperforms the only existing algorithm for finding general k-size optimal solutions, which
is also synchronous. Finally, we compare the algorithmic performance of k-size and t-distance optimality using this algorithm. We
find that t-distance consistently converges to higher-quality solutions in the long run, but results are mixed on convergence speed;
we identify cases where k-size and t-distance converge faster.
Distributed constraint optimization (DCOP) is a useful framework for
cooperative multiagent coordination. DCOP focuses on optimizing a single team objective. However, in many domains, agents must satisfy constraints on resources consumed locally while optimizing the team goal.
Yet, these resource constraints may need to be kept private. Designing
DCOP algorithms for these domains requires managing complex trade-offs
in completeness, scalability, privacy and efficiency.
This article defines the multiply-constrained DCOP (MC-DCOP) framework and provides complete (globally optimal) and incomplete (locally optimal) algorithms for solving MC-DCOP problems. Complete algorithms find the best allocation of scarce resources while optimizing the team objective, while incomplete algorithms are more scalable. The algorithms use
four main techniques: (i) transforming constraints to maintain privacy;
(ii) dynamically setting upper bounds on resource consumption; (iii) identifying the extent to which the local graph structure allows agents to compute exact bounds; and (iv) using a virtual assignment to flag problems
rendered unsatisfiable by resource constraints. Proofs of correctness are
presented for all algorithms. Experimental results illustrate the strengths
and weaknesses of both the complete and incomplete algorithms.
A growing number of security applications are being developed and deployed to explicitly reduce risk from adversaries’ actions. However, there are many challenges when attempting to
evaluate such systems, both in the lab and in the real world. Traditional evaluations used by
computer scientists, such as runtime analysis and optimality proofs, may be largely irrelevant.
The primary contribution of this paper is to provide a preliminary framework which can guide
the evaluation of such systems and to apply the framework to the evaluation of ARMOR (a system deployed at LAX since August 2007). This framework helps to determine what evaluations
could, and should, be run in order to measure a system’s overall utility. A secondary contribution of this paper is to help familiarize our community with some of the difficulties inherent in
evaluating deployed applications, focusing on those in security domains.
Law enforcement agencies frequently must allocate limited resources
to protect targets embedded in a network, such as important buildings in a city road network. Since intelligent attackers may observe
and exploit patterns in the allocation, it is crucial that the allocations be randomized. We cast this problem as an attacker-defender
Stackelberg game: the defender’s goal is to obtain an optimal mixed
strategy for allocating resources. The defender’s strategy space is
exponential in the number of resources, and the attacker’s exponential in the network size. Existing algorithms are therefore useless
for all but the smallest networks.
We present a solution approach based on two key ideas: (i) a
polynomial-sized game model obtained via an approximation of
the strategy space, solved efficiently using a linear program; (ii)
two efficient techniques that map solutions from the approximate
game to the original, with proofs of correctness under certain assumptions. We present in-depth experimental results, including an
evaluation on part of the Mumbai road network.
Protecting targets against potential attacks is an important problem for security forces worldwide. The general setting we study
is as follows: An attacker assigns different values to reaching (and
damaging or destroying) one of multiple targets. A defender is able
to allocate resources (such as patrol cars or canine units) to capture
the attacker before he reaches a target. In many of these situations, the domain has structure that is naturally modeled as a graph.
For example, city maps can be modeled with intersections as nodes
and roads as edges, where nodes are targets for attackers. In order to prevent attacks, security forces can schedule checkpoints on
edges (e.g., roads) to detect intruders. For instance, in response to
the devastating terrorist attacks in 2008 , Mumbai police deploy
randomized checkpoints as one countermeasure to prevent future
attacks . The strategy for placing these checkpoints must necessarily be decided in advance of attack attempts, should account for
targets of differing importance, and should anticipate an intelligent
adversary who can observe the strategy prior to attacking.
In light of these requirements, game-theoretic approaches have
been developed to assist in generating randomized security strategies in several real-world domains, including applications in use
by the Los Angeles International Airport  and the Federal Air
Marshals Service . To account for the attacker’s ability to observe deployment patterns, these methods model the problem as a
Stackelberg game and solve for an optimal probability distribution
over the possible deployments to ensure unpredictability. Novel solvers for classes of security games have recently been
developed [3, 11, 4]. However, these solvers take time at least
polynomial in the number of actions of both players. In our setting,
every path from an entry point to a target is an attacker action, and
every set of r or fewer edges is a defender action. (r is the maximum number of checkpoints.) Since the attacker’s actions grow
exponentially with the size of the network, and the defender’s actions grow exponentially with r, existing methods quickly become
too slow when applied to large real-world domains. Therefore, our
goal is to develop faster methods for these settings and evaluate
them theoretically and empirically.
The Networked Distributed POMDPs (ND-POMDPs) can model multiagent systems in uncertain domains and has
begun to scale-up the number of agents. However, prior work in ND-POMDPs has failed to address communication. Without
communication, the size of a local policy at each agent within the ND-POMDPs grows exponentially in the time horizon. To
overcome this problem, we extend existing algorithms so that agents periodically communicate their observation and action
histories with each other. After communication, agents can start from new synchronized belief state. Thus, we can avoid the
exponential growth in the size of local policies at agents. Furthermore, we introduce an idea that is similar to the Point-based
Value Iteration algorithm to approximate the value function with a fixed number of representative points. Our experimental
results show that we can obtain much longer policies than existing algorithms as long as the interval between communications is
The field of ―intelligent agents and multiagent systems‖ is maturing; no longer is it a special topic
to be introduced to graduate students after years of training in computer science and many
introductory courses in artificial intelligence. Instead, the time is ripe to introduce agents and
multiagents directly to undergraduate students, whether majoring in computer science or not. This
chapter focuses on exactly this challenge, drawing on the co-authors‘ experience of teaching
several such undergraduate courses on agents and multiagents, over the last three years at two
different universities. The chapter outlines three key issues that must be addressed. The first issue
is facilitating students‘ intuitive understanding of fundamental concepts of multiagent systems;
we illustrate uses of science fiction materials and classroom games to not only provide students
with the necessary intuitive understanding but with the excitement and motivation for studying
multiagent systems. The second is in selecting the right material — either science-fiction material
or games — for providing students the necessary motivation and intuition; we outline several
criteria that have been useful in selecting such material. The third issue is in educating students
about the fundamental philosophical, ethical and social issues surrounding agents and multiagent
systems: we outline course materials and classroom activities that allow students to obtain this
―big picture‖ futuristic vision of our science. We conclude with feedback received, lessons
learned and impact on both the computer science students and non computer-science students.
Recently there has been significant interest in applications of gametheoretic analysis to analyze security resource allocation decisions. Two examples
of deployed systems based on this line of research are the ARMOR system in use
at the Los Angeles International Airport , and the IRIS system used by the
Federal Air Marshals Service . Game analysis always begins by developing
a model of the domain, often based on inputs from domain experts or historical
data. These models inevitably contain significant uncertainty—especially in security domains where intelligence about adversary capabilities and preferences is
very difficult to gather. In this work we focus on developing new models and algorithms that capture this uncertainty using continuous payoff distributions. These
models are richer and more powerful than previous approaches that are limited to
small finite Bayesian game models. We present the first algorithms for approximating equilibrium solutions in these games, and study these algorithms empirically. Our results show dramatic improvements over existing techniques, even in
cases where there is very limited uncertainty about an adversaries’ payoffs.
Algorithms to solve security games, an important class of Stackelberg games, have seen successful real-world deployment by LAX
police and the Federal Air Marshal Service. These algorithms provide randomized schedules to optimally allocate limited security
resources for infrastructure protection. Unfortunately, these stateof-the-art algorithms fail to scale-up or to provide a correct solution for massive security games with arbitrary scheduling constraints. This paper provides ASPEN, a branch-and-price algorithm to overcome this limitation based on two key contributions:
(i) A column-generation approach that exploits an innovative compact network flow representation, avoiding a combinatorial explosion of schedule allocations; (ii) A branch-and-bound approach
with novel upper-bound generation via a fast algorithm for solving under-constrained security games. ASPEN is the first known
method for efficiently solving real-world-sized security games with
arbitrary schedules. This work contributes to a very new area of
work that applies techniques used in large-scale optimization to
game-theoretic problems—an exciting new avenue with the potential to greatly expand the reach of game theory.
Game theoretic methods for making resource allocation decision
in security domains have attracted growing attention from both researchers and security practitioners, including deployed applications at both the LAX airport and the Federal Air Marshals Service. We develop a new class of security games designed to model
decisions faced by the Transportation Security Administration and
other agencies in protecting airports, ports, and other critical infrastructure. Our model allows for a more diverse set of security
activities for the defensive resources than previous work, which
has generally focused on interchangeable resources that can only
defend against possible attacks in one way. Here, we are concerned
in particular with the possibility that adversaries can circumvent
specific security activities if they are aware of common security
measures. The model we propose takes this capability into account and generates more unpredictable, diverse security policies
as a result—without resorting to an external value for entropy or
Solving these games is a significant computational challenge, and
existing algorithms are not capable of solving realistic games. We
introduce a new method that exploits common structure in these
problems to reduce the size of the game representation and enable
faster solution algorithm. These algorithms are able to scale to
make larger games than existing solvers, as we show in our experimental results.