Matthew Brown, Emma Bowring, Shira Epstein, Mufaddal Jhaveri, Rajiv Maheswaran, Parag Mallick, Shannon Mumenthaler, Michelle Povinelli, and Milind Tambe. 2011. “Applying Multi-Agent Techniques to Cancer Modeling .” In Workshop on Multiagent Sequential Decision Making in Uncertain Domains(MSDM) at AAMAS 2011 .Abstract
Each year, cancer is responsible for 13% of all deaths worldwide.
In the United States, that percentage increases to 25%, translating
to an estimated 569,490 deaths in 2010 . Despite significant
advances in the fight against cancer, these statistics make clear the
need for additional research into new treatments. As such, there has
been growing interest in the use of computer simulations as a tool
to aid cancer researchers. We propose an innovative multi-agent
approach in which healthy cells and cancerous cells are modeled
as opposing teams of agents using a decentralized Markov decision
process (DEC-MDP). We then describe changes made to traditional
DEC-MDP algorithms in order to better handle the complexity and
scale of our domain. We conclude by presenting and analyzing
preliminary simulation results. This paper is intended to introduce
the cancer modeling domain to the multi-agent community with
the hope of fostering a discussion about the opportunities and challenges it presents. Given the complexity of the domain, we do not
claim our approach to be a definitive solution but rather a first step
toward the larger goal of creating realistic simulations of cancer.
Game theory is fast becoming a vital tool for reasoning about complex real-world security problems, including critical infrastructure
protection. The game models for these applications are constructed
using expert analysis and historical data to estimate the values of
key parameters, including the preferences and capabilities of terrorists. In many cases, it would be natural to represent uncertainty over
these parameters using continuous distributions (such as uniform
intervals or Gaussians). However, existing solution algorithms are
limited to considering a small, finite number of possible attacker
types with different payoffs. We introduce a general model of infinite Bayesian Stackelberg security games that allows payoffs to be
represented using continuous payoff distributions. We then develop
several techniques for finding approximate solutions for this class
of games, and show empirically that our methods offer dramatic
improvements over the current state of the art, providing new ways
to improve the robustness of security game models.
Continuous state DEC-MDPs are critical for agent
teams in domains involving resources such as time,
but scaling them up is a significant challenge. To
meet this challenge, we first introduce a novel
continuous-time DEC-MDP model that exploits
transition independence in domains with temporal
constraints. More importantly, we present a new locally optimal algorithm called SPAC. Compared to
the best previous algorithm, SPAC finds solutions
of comparable quality substantially faster; SPAC
also scales to larger teams of agents.
A significant body of work exists on effectively allowing multiple agents to coordinate to achieve a shared goal. In particular, a growing body of work in the Distributed
Constraint Optimization (DCOP) framework enables such coordination with different
amounts of teamwork. Such algorithms can implicitly or explicitly trade-off improved
solution quality with increased communication and computation requirements. However, the DCOP framework is limited to planning problems; DCOP agents must have
complete and accurate knowledge about the reward function at plan time.
We extend the DCOP framework, defining the Distributed Coordination of Exploration and Exploitation (DCEE) problem class to address real-world problems, such as
ad-hoc wireless network optimization, via multiple novel algorithms. DCEE algorithms
differ from DCOP algorithms in that they (1) are limited to a finite number of actions
in a single trial, (2) attempt to maximize the on-line, rather than final, reward, (3) are
unable to exhaustively explore all possible actions, and (4) may have knowledge about
the distribution of rewards in the environment, but not the rewards themselves. Thus, a
DCEE problem is not a type of planning problem, as DCEE algorithms must carefully
balance and coordinate multiple agents’ exploration and exploitation.
Two classes of algorithms are introduced: static estimation algorithms perform simple calculations that allow agents to either stay or explore, and balanced exploration
algorithms use knowledge about the distribution of the rewards and the time remaining
in an experiment to decide whether to stay, explore, or (in some algorithms) backtrack to
a previous location. These two classes of DCEE algorithms are compared in simulation
and on physical robots in a complex mobile ad-hoc wireless network setting. Contrary to our expectations, we found that increasing teamwork in DCEE algorithms may lower
team performance. In contrast, agents running DCOP algorithms improve their reward
as teamwork increases. We term this previously unknown phenomenon the team uncertainty penalty, analyze it in both simulation and on robots, and present techniques to
ameliorate the penalty.
In response to the Mumbai attacks of 2008, the Mumbai police
have started to schedule a limited number of inspection checkpoints
on the road network throughout the city. Algorithms for similar
security-related scheduling problems have been proposed in recent
literature, but security scheduling in networked domains when targets have varying importance remains an open problem at large.
In this paper, we cast the network security problem as an attackerdefender zero-sum game. The strategy spaces for both players are
exponentially large, so this requires the development of novel, scalable techniques.
We first show that existing algorithms for approximate solutions
can be arbitrarily bad in general settings. We present RUGGED
(Randomization in Urban Graphs by Generating strategies for Enemy and Defender), the first scalable optimal solution technique for
such network security games. Our technique is based on a double
oracle approach and thus does not require the enumeration of the
entire strategy space for either of the players. It scales up to realistic
problem sizes, as is shown by our evaluation of maps of southern
Mumbai obtained from GIS data.
In social psychology, emotional contagion describes the widely observed phenomenon of one person’s emotions being influenced by surrounding people’s emotions. While the overall effect is agreed upon, the underlying mechanism of the spread of emotions has seen little quantification and application to computational agents despite extensive evidence of its impacts in everyday life. In this paper, we examine computational models of emotional contagion by implementing two models ( and ) that draw from two separate lines of contagion research: thermodynamics-based and epidemiological-based. We first perform sensitivity tests on each model in an evacuation simulation, ESCAPES, showing both models to be reasonably robust to parameter variations with certain exceptions. We then compare their ability to reproduce a real crowd panic scene in simulation, showing that the thermodynamics-style model () produces superior results due to the ill-suited contagion mechanism at the core of epidemiological models. We also identify that a graduated effect of fear and proximity-based contagion effects are key to producing the superior results. We then reproduce the methodology on a second video, showing that the same results hold, implying generality of the conclusions reached in the first scene.
In creating an evacuation simulation for training and planning, realistic agents that reproduce known phenomenon are required. Evacuation simulation in the airport domain requires additional features
beyond most simulations, including the unique behaviors of firsttime visitors who have incomplete knowledge of the area and families that do not necessarily adhere to often-assumed pedestrian
behaviors. Evacuation simulations not customized for the airport
domain do not incorporate the factors important to it, leading to
inaccuracies when applied to it.
In this paper, we describe ESCAPES, a multiagent evacuation
simulation tool that incorporates four key features: (i) different
agent types; (ii) emotional interactions; (iii) informational interactions; (iv) behavioral interactions. Our simulator reproduces phenomena observed in existing studies on evacuation scenarios and
the features we incorporate substantially impact escape time. We
use ESCAPES to model the International Terminal at Los Angeles
International Airport (LAX) and receive high praise from security
Security and sustainability are two critical global challenges
that involve the interaction of many intelligent actors. Game theory provides a sound mathematical framework to model such interactions, and
computational game theory in particular has a promising role to play in
helping to address key aspects of these challenges. Indeed, in the domain
of security, we have already taken some encouraging steps by successfully
applying game-theoretic algorithms to real-world security problems: our
algorithms are in use by agencies such as the US coast guard, the Federal
Air Marshals Service, the LAX police and the Transportation Security
Administration. While these applications of game-theoretic algorithms
have advanced the state of the art, this paper lays out some key challenges as we continue to expand the use of these algorithms in real-world
domains. One such challenge in particular is that classical game theory
makes a set of assumptions of the players, which may not be consistent with real-world scenarios, especially when humans are involved. To
actually model human behavior within game-theoretic framework, it is
important to address the new challenges that arise due to the presence of
human players: (i) human bounded rationality; (ii) limited observations
and imperfect strategy execution; (iii) large action spaces. We present
initial solutions to these challenges in context of security games. For sustainability, we lay out our initial efforts and plans, and key challenges
related to human behavior in the loop.
Building on research previously reported at AAMAS conferences,
this paper describes an innovative application of a novel gametheoretic approach for a national scale security deployment. Working with the United States Transportation Security Administration
(TSA), we have developed a new application called GUARDS to
assist in resource allocation tasks for airport protection at over 400
United States airports. In contrast with previous efforts such as ARMOR and IRIS, which focused on one-off tailored applications and
one security activity (e.g. canine patrol or checkpoints) per application, GUARDS faces three key issues: (i) reasoning about hundreds
of heterogeneous security activities; (ii) reasoning over diverse potential threats; (iii) developing a system designed for hundreds of
end-users. Since a national deployment precludes tailoring to specific airports, our key ideas are: (i) creating a new game-theoretic
framework that allows for heterogeneous defender activities and
compact modeling of a large number of threats; (ii) developing
an efficient solution technique based on general purpose Stackelberg game solvers; (iii) taking a partially centralized approach for
knowledge acquisition and development of the system. In doing so
we develop a software scheduling assistant, GUARDS, designed to
reason over two agents — the TSA and a potential adversary —
and allocate the TSA’s limited resources across hundreds of security activities in order to provide protection within airports.
The scheduling assistant has been delivered to the TSA and is
currently under evaluation and testing for scheduling practices at
an undisclosed airport. If successful, the TSA intends to incorporate the system into their unpredictable scheduling practices nationwide. In this paper we discuss the design choices and challenges
encountered during the implementation of GUARDS. GUARDS
represents promising potential for transitioning years of academic
research into a nationally deployed system.
We describe an innovative application of a novel
game-theoretic approach for a national scale security deployment. Working with the United States
Transportation Security Administration (TSA), we
have developed a new application called GUARDS
to allocate the TSA’s limited resources across hundreds of security activities to provide protection
at over 400 United States airports. Similar security applications (e.g., ARMOR and IRIS) have focused on one-off tailored applications and one security activity (e.g. checkpoints) per application,
GUARDS on the other hand faces three new key
issues: (i) reasoning about hundreds of heterogeneous security activities; (ii) reasoning over diverse
potential threats; (iii) developing a system designed
for hundreds of end-users. Since a national deployment precludes tailoring to specific airports, our
key ideas are: (i) creating a new game-theoretic
framework that allows for heterogeneous defender
activities and compact modeling of a large number
of threats; (ii) developing an efficient solution technique based on general purpose Stackelberg game
solvers; (iii) taking a partially centralized approach
for knowledge acquisition. The scheduling assistant has been delivered to the TSA and is currently
undergoing evaluation for scheduling practices at
an undisclosed airport. If successful, the TSA intends to incorporate the system into their unpredictable scheduling practices nationwide.
Security games refer to a special class of attacker-defender
Stackelberg games. In these non zero-sum games, the attacker’s utility of attacking a target decreases as the defender
allocates more resources to protect it (and vice versa for the
defender). The defender (leader) first commits to a mixed
strategy, assuming the attacker (follower) decides on a pure
strategy after observing the defender’s strategy. This models the situation where an attacker conducts surveillance to
learn the defender’s mixed strategy and then launches an
attack on a single target. Given that the defender has limited resources, she must design her mixed-strategy optimally
against the adversaries’ response to maximize effectiveness.
One leading family of algorithms to compute such mixed
strategies are DOBSS and its successors [3, 5], which are
used in the deployed ARMOR  and IRIS  applications. One key set of assumptions these systems make is about
how attackers choose strategies based on their knowledge
of the security strategy. Typically, such systems apply the
standard game-theoretic assumption that attackers are perfectly rational. This is a reasonable proxy for the worst case
of a highly intelligent attacker, but it can lead to a defense
strategy that is not robust against attackers using different
decision procedures, and it fails to exploit known weaknesses in human decision-making. Indeed, it is widely accepted
that standard game-theoretic assumptions of perfect rationality are not ideal for predicting the behavior of humans in
multi-agent decision problems . Thus, integrating more realistic models of human decision-making has become necessary in solving real-world security problems.
The current leading contender that accounts for human
behavior in security games is COBRA , which assumes
that adversaries can deviate to −optimal strategies and
that they have an anchoring bias when interpreting a probability distribution. It remains an open question whether
other models yield better solutions than COBRA against
human adversaries. The literature has introduced a multitude of candidate models, but there is an important empirical question of which model best represents the salient
features of human behavior in applied security contexts.
We address these open questions by developing three new
algorithms to generate defender strategies in security games,
based on using two fundamental theories of human behavior to predict an attacker’s decision: Prospect Theory (PT)
 and Quantal Response Equilibrium (QRE) . PT is a
noble-prize-winning theory, which describes human decision
making as a process of maximizing ‘prospect’. ‘Prospect’
is defined as the weighted sum of the benefit of all possible outcomes for each action. QRE suggests that instead of
strictly maximizing utility, individuals respond stochastically in games: the chance of selecting a non-optimal strategy
increases as the cost of such an error decreases.
Recent real-world deployments of Stackelberg security games make it critical that we address human adversaries’ bounded rationality in computing optimal strategies. To that end, this paper
provides three key contributions: (i) new efficient
algorithms for computing optimal strategic solutions using Prospect Theory and Quantal Response
Equilibrium; (ii) the most comprehensive experiment to date studying the effectiveness of different
models against human subjects for security games;
and (iii) new techniques for generating representative payoff structures for behavioral experiments in
generic classes of games. Our results with human
subjects show that our new techniques outperform
the leading contender for modeling human behavior in security games.
By using robots as routers, a team of networked robots can provide a communication substrate to establish a wireless mesh network. The mobile mesh network can autonomously optimize its configuration, increasing performance. One of the main sources of radio signal fading
in such a network is multi-path propagation, which can be mitigated by moving the senders or
the receivers on the distance of the order of a wavelength. In this paper, we measure the performance gain when robots are allowed to make such small movements and find that it may be as
much as 270%. Our main contribution is the design of a system that allows robots to cooperate
and improve the real-world network throughput via a practical solution. We model the problem
of which robots to move as a distributed constraint optimization problem (DCOP). Our study
includes four local metrics to estimate global throughput.
Stackelberg games have been widely used to model patrolling
or monitoring problems in security. In a Stackelberg security
game, the defender commits to a strategy and the adversary
makes its decision with knowledge of the leader’s commitment. Algorithms for computing the defender’s optimal strategy are used in deployed decision-support tools in use by
the Los Angeles International Airport (LAX), the Federal Air
Marshals Service, and the Transportation Security Administration (TSA). Those algorithms take into account various resource usage constraints defined by human users. However,
those constraints may lead to poor (even infeasible) solutions
due to users’ insufficient information and bounded rationality. A mixed-initiative approach, in which human users and
software assistants (agents) collaborate to make security decisions, is needed. Efficient human-agent interaction process
leads to models with higher overall solution quality. This paper preliminarily analyzes the needs and challenges for such
a mixed-initiative approach.
In social psychology, emotional contagion describes the widely observed phenomenon of one person’s emotions being influenced by surrounding
people’s emotions. While the overall effect is agreed upon, the underlying mechanism of the spread of emotions has seen little quantification and application to
In this paper, we explore computational models of emotional contagion by implementing two models (Bosse et al., Durupinar et al.) and augmenting them to
better model real world observations. Our additions include examining the impact
of physical proximity and authority figures. We show that these additions provide
substantial improvements to the qualitative trends of emotion spreading, more in
line with expectations than either of the two previous models. We also evaluate
their impact on evacuation safety in an evacuation simulation, ESCAPES, showing substantial differences in predicted safety based on the contagion model.
k- and t-optimality algorithms [9, 6] provide solutions to
DCOPs that are optimal in regions characterized by its size
and distance respectively. Moreover, they provide quality
guarantees on their solutions. Here we generalise the k- and
t-optimal framework to introduce C-optimality, a flexible
framework that provides reward-independent quality guarantees for optima in regions characterised by any arbitrary
criterion. Therefore, C-optimality allows us to explore the
space of criteria (beyond size and distance) looking for those
that lead to better solution qualities. We benefit from this
larger space of criteria to propose a new criterion, the socalled size-bounded-distance criterion, which outperforms kand t-optimality.
The fastest known algorithm for solving General Bayesian Stackelberg games with a finite set of follower (adversary) types have seen
direct practical use at the LAX airport for over 3 years; and currently, an (albeit non-Bayesian) algorithm for solving these games
is also being used for scheduling air marshals on limited sectors
of international flights by the US Federal Air Marshals Service.
These algorithms find optimal randomized security schedules to allocate limited security resources to protect targets. As we scale up
to larger domains, including the full set of flights covered by the
Federal Air Marshals, it is critical to develop newer algorithms that
scale-up significantly beyond the limits of the current state-of-theart of Bayesian Stackelberg solvers. In this paper, we present a
novel technique based on a hierarchical decomposition and branch
and bound search over the follower type space, which may be applied to different Stackelberg game solvers. We have applied this
technique to different solvers, resulting in: (i) A new exact algorithm called HBGS that is orders of magnitude faster than the best
known previous Bayesian solver for general Stackelberg games;
(ii) A new exact algorithm called HBSA which extends the fastest
known previous security game solver towards the Bayesian case;
and (iii) Approximation versions of HBGS and HBSA that show
significant improvements over these newer algorithms with only 1-
2% sacrifice in the practical solution quality.
Given the real-world deployments of attacker-defender Stackelberg security games, robustness to deviations from expected
attacker behaviors has now emerged as a critically important
issue. This paper provides four key contributions in this context. First, it identifies a fundamentally problematic aspect of
current algorithms for security games. It shows that there are
many situations where these algorithms face multiple equilibria, and they arbitrarily select one that may hand the defender
a significant disadvantage, particularly if the attacker deviates
from its equilibrium strategies due to unknown constraints.
Second, for important subclasses of security games, it identifies situations where we will face such multiple equilibria.
Third, to address these problematic situations, it presents two equilibrium refinement algorithms that can optimize the
defender’s utility if the attacker deviates from equilibrium strategies. Finally, it experimentally illustrates that the refinement approach achieved significant robustness in consideration of attackers’ deviation due to unknown constraints.
Distributed constraint optimization (DCOP) is a promising
approach to coordination, scheduling and task allocation in multi agent
networks. DCOP is NP- hard , so an important line of work focuses on
developing fast incomplete solution algorithms that can provide guarantees on the quality of their local optimal solutions.
Region optimality  is a promising approach along this line: it provides
quality guarantees for region optimal solutions, namely solutions that are
optimal in a specific region of the DCOP. Region optimality generalises
k- and t-optimality [7, 4] by allowing to explore the space of criteria that
define regions to look for solutions with better quality guarantees.
Unfortunately, previous work in region-optimal quality guarantees fail to
exploit any a-priori knowledge of the reward structure of the problem.
This paper addresses this shortcoming by defining reward-dependent region optimal quality guarantees that exploit two different levels of knowledge about rewards, namely: (i) a ratio between the least minimum reward to the maximum reward among relations; and (ii) the minimum
and maximum rewards per relation.
Attacker-defender Stackelberg games have become a popular
game-theoretic approach for security with deployments for
LAX Police, the FAMS and the TSA. Unfortunately, most of
the existing solution approaches do not model two key uncertainties of the real-world: there may be noise in the defender’s
execution of the suggested mixed strategy and/or the observations made by an attacker can be noisy. In this paper, we provide a framework to model these uncertainties, and demonstrate that previous strategies perform poorly in such uncertain settings. We also provide RECON, a novel algorithm that
computes strategies for the defender that are robust to such
uncertainties, and provide heuristics that further improve RECON’s efficiency