2008
J. Pita, Manish Jain, Fernando Ordonez, Christopher Portway, Milind Tambe, Craig Western, Praveen Paruchuri, and Sarit Kraus. 2008. “
ARMOR Security for Los Angeles International Airport .” In AAAI Intelligent Systems Demonstrations.
AbstractSecurity at major locations of economic or political importance is a key concern around the world, particularly given
the threat of terrorism. Limited security resources prevent
full security coverage at all times, which allows adversaries
to observe and exploit patterns in selective patrolling or monitoring, e.g. they can plan an attack avoiding existing patrols.
Hence, randomized patrolling or monitoring is important, but
randomization must provide distinct weights to different actions based on their complex costs and benefits. To this end,
this demonstration showcases a promising transition of the
latest in multi-agent algorithms into a deployed application.
In particular, it exhibits a software assistant agent called ARMOR (Assistant for Randomized Monitoring over Routes)
that casts this patrolling/monitoring problem as a Bayesian
Stackelberg game, allowing the agent to appropriately weigh
the different actions in randomization, as well as uncertainty
over adversary types. ARMOR combines two key features:
(i) It uses the fastest known solver for Bayesian Stackelberg
games called DOBSS, where the dominant mixed strategies
enable randomization; (ii) Its mixed-initiative based interface
allows users to occasionally adjust or override the automated
schedule based on their local constraints. ARMOR has been
successfully deployed since August 2007 at the Los Angeles
International Airport (LAX) to randomize checkpoints on the
roadways entering the airport and canine patrol routes within
the airport terminals.
2008_9_teamcore_aaai_demo_08.pdf Manish Jain, J Pita, Milind Tambe, Fernando Ordonez, Praveen Paruchuri, and Sarit Kraus. 2008. “
Bayesian Stackelberg Games and their Application for Security at Los Angeles International Airport .” In SIGecom Exchanges.
AbstractMany multiagent settings are appropriately modeled as Stackelberg games [Fudenberg
and Tirole 1991; Paruchuri et al. 2007], where a leader commits to a strategy first, and
then a follower selfishly optimizes its own reward, considering the action chosen by the
leader. Stackelberg games are commonly used to model attacker-defender scenarios in security domains [Brown et al. 2006] as well as in patrolling [Paruchuri et al. 2007; Paruchuri
et al. 2008]. For example, security personnel patrolling an infrastructure commit to a patrolling strategy first, before their adversaries act taking this committed strategy into account. Indeed, Stackelberg games are being used at the Los Angeles International Airport
to schedule security checkpoints and canine patrols [Murr 2007; Paruchuri et al. 2008; Pita
et al. 2008a]. They could potentially be used in network routing, pricing in transportation
systems and many other situations [Korilis et al. 1997; Cardinal et al. 2005].
Although the follower in a Stackelberg game is allowed to observe the leader’s strategy
before choosing its own strategy, there is often an advantage for the leader over the case
where both players must choose their moves simultaneously. To see the advantage of being
the leader in a Stackelberg game, consider the game with the payoff as shown in Table I.
The leader is the row player and the follower is the column player. The only pure-strategy
Nash equilibrium for this game is when the leader plays a and the follower plays c which
gives the leader a payoff of 2. However, if the leader commits to a mixed strategy of
playing a and b with equal (0.5) probability, then the follower will play d, leading to an
expected payoff for the leader of 3.5.
c d
2008_12_teamcore_sigecom_article.pdf J. Pita, Manish Jain, Craig Western, Christopher Portway, Milind Tambe, Fernando Ordonez, Sarit Kraus, and Praveen Paruchuri. 2008. “
Deployed ARMOR protection: The application of a game theoretic model for security at the Los Angeles International Airport .” In International Joint Conference on Autonomous Agents and Multiagent Systems, 2008.
AbstractSecurity at major locations of economic or political importance is
a key concern around the world, particularly given the threat of
terrorism. Limited security resources prevent full security coverage at all times, which allows adversaries to observe and exploit
patterns in selective patrolling or monitoring, e.g. they can plan
an attack avoiding existing patrols. Hence, randomized patrolling
or monitoring is important, but randomization must provide distinct weights to different actions based on their complex costs and
benefits. To this end, this paper describes a promising transition
of the latest in multi-agent algorithms – in fact, an algorithm that
represents a culmination of research presented at AAMAS – into a
deployed application. In particular, it describes a software assistant
agent called ARMOR (Assistant for Randomized Monitoring over
Routes) that casts this patrolling/monitoring problem as a Bayesian
Stackelberg game, allowing the agent to appropriately weigh the
different actions in randomization, as well as uncertainty over adversary types. ARMOR combines three key features: (i) It uses
the fastest known solver for Bayesian Stackelberg games called
DOBSS, where the dominant mixed strategies enable randomization; (ii) Its mixed-initiative based interface allows users to occasionally adjust or override the automated schedule based on their
local constraints; (iii) It alerts the users if mixed-initiative overrides
appear to degrade the overall desired randomization. ARMOR has
been successfully deployed since August 2007 at the Los Angeles
International Airport (LAX) to randomize checkpoints on the roadways entering the airport and canine patrol routes within the airport
terminals. This paper examines the information, design choices,
challenges, and evaluation that went into designing ARMOR.
2008_7_teamcore_amasind2008final.pdf Praveen Paruchuri, Jonathan P. Pearce, Janusz Marecki, Milind Tambe, Fernando Ordonez, and Sarit Kraus. 2008. “
Efficient Algorithms to solve Bayesian Stackelberg Games for Security Applications .” In National Conference on Artificial Intelligence (AAAI).
AbstractIn a class of games known as Stackelberg games, one agent
(the leader) must commit to a strategy that can be observed by
the other agent (the adversary/follower) before the adversary
chooses its own strategy. We consider Bayesian Stackelberg
games, in which the leader is uncertain about the type of the
adversary it may face. Such games are important in security domains, where, for example, a security agent (leader)
must commit to a strategy of patrolling certain areas, and an
adversary (follower) can observe this strategy over time before choosing where to attack. We present here two different MIP-formulations, ASAP (providing approximate policies with controlled randomization) and DOBSS (providing
optimal policies) for Bayesian Stackelberg games. DOBSS is
currently the fastest optimal procedure for Bayesian Stackelberg games and is in use by police at the Los Angeles International Airport(LAX) to schedule their activities.
2008_8_teamcore_nectar08.pdf M. Tasaki, Y. Yabu, Y. Iwanari, M. Yokoo, M. Tambe, J. Marecki, and P. Varakantham. 2008. “
Introducing Communication in Dis-POMDPs with Locality of Interaction .” In IEEE International conference on Intelligent Agent Technology (IAT).
AbstractWhile Distributed POMDPs have become popular for
modeling multiagent systems in uncertain domains, it is the
Networked Distributed POMDPs (ND-POMDPs) model —
a model tailored to real agent networks — that has begun
to scale-up the number of agents. However, prior work
in ND-POMDPs has failed to address communication, a
shortcoming that has the side-effect of limiting the planning
horizon. In particular, without communication, the size of
a local policy at each agent within the ND-POMDPs grows
exponentially in the time horizon. To overcome this problem, we extend existing algorithms (LID-JESP and SPIDER) so that agents periodically communicate their observation and action histories with each other. After communication, agents can start from new synchronized belief
states. While by introducing communication, we can avoid
the exponential growth in the size of local policies at agents,
the key idea is to avoid an exponential number of synchronized belief states after communication. To this end, we
introduce an idea that is similar the Point-based Value Iteration (PBVI) algorithm and approximate the value function with a fixed number of representative points and their
α vectors. Our experimental results show that we can obtain much longer policies than existing algorithms as long
as the interval between communications is small.
2008_13_teamcore_iat2008_0703.pdf Emma Bowring, Jonathan P. Pearce, Christopher Portway, Manish Jain, and Milind Tambe. 2008. “
On K-Optimal Distributed Constraint Optimization Algorithms: New Bounds and Algorithms .” In The Seventh International Conference on Autonomous Agents and Multiagent Systems.
AbstractDistributed constraint optimization (DCOP) is a promising approach
to coordination, scheduling and task allocation in multi agent networks. In large-scale or low-bandwidth networks, finding the global
optimum is often impractical. K-optimality is a promising new approach: for the first time it provides us a set of locally optimal algorithms with quality guarantees as a fraction of global optimum. Unfortunately, previous work in k-optimality did not address domains
where we may have prior knowledge of reward structure; and it
failed to provide quality guarantees or algorithms for domains with
hard constraints (such as agents’ local resource constraints). This
paper addresses these shortcomings with three key contributions.
It provides: (i) improved lower-bounds on k-optima quality incorporating available prior knowledge of reward structure; (ii) lower
bounds on k-optima quality for problems with hard constraints; and
(iii) k-optimal algorithms for solving DCOPs with hard constraints
and detailed experimental results on large-scale networks.
2008_1_teamcore_final.pdf A. Freedy, O. Sert, E. Freedy, G. Weltman, J. Mcdonough, Milind Tambe, and Tapana Gupta. 2008. “
Multiagent adjustable autonomy framework (MAAF) for multirobot, multihuman teams .” In International symposium on collaborative technologies (CTS).
AbstractThis paper describes the ongoing development of a
Multiagent Adjustable Autonomy Framework (MAAF) for
multi-robot, multi-human teams performing tactical
maneuvers. The challenge being addressed in this SBIR
Phase I R&D project is how to exploit fully the unique
capabilities of heterogeneous teams composed of a
mixture of Robots, Agents or Persons (RAPs): that is, how
to improve the safety, efficiency, reliability and cost of
achieving mission goals while maintaining dynamic
adaptation to the unique limitations and contingencies of
a real-world operating environment. Our response to this
challenge is the creation of a new infrastructure that will
facilitate cooperative and collaborative performance of
human and robots as equal team partners through the
application of advances in goal-oriented, multiagent
planning and coordination technology. At the heart of
our approach is the USC Teamcore Group’s Machinetta,
a state-of-the-art robot proxy framework with adjustable
autonomy. Machinetta facilitates robot-human role
allocation decisions and collaborative sharing of team
tasks in the non-deterministic and unpredictable military
environment through the use of a domain-independent
teamwork model that supports flexible teamwork. This
paper presents our innovative proxy architecture and its
constituent algorithms, and also describes our initial
demonstration of technical feasibility in a realistic
simulation scenario.
2008_11_teamcore_maaf_cts2008paper.pdf Janusz Marecki, Tapana Gupta, Pradeep Varakantham, Milind Tambe, and Makoto Yokoo. 2008. “
Not All Agents Are Equal: Scaling up Distributed POMDPs for Agent Networks .” In The Seventh International Conference on Autonomous Agents and Multiagent Systems.
AbstractMany applications of networks of agents, including mobile sensor networks, unmanned air vehicles, autonomous underwater vehicles, involve 100s of agents acting collaboratively under uncertainty. Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are well-suited to address such applications, but so far, only limited scale-ups of up to five agents
have been demonstrated. This paper escalates the scale-up, presenting an algorithm called FANS, increasing the number of agents
in distributed POMDPs for the first time into double digits. FANS
is founded on finite state machines (FSMs) for policy representation and expoits these FSMs to provide three key contributions:
(i) Not all agents within an agent network need the same expressivity of policy representation; FANS introduces novel heuristics
to automatically vary the FSM size in different agents for scaleup; (ii) FANS illustrates efficient integration of its FSM-based policy search within algorithms that exploit agent network structure;
(iii) FANS provides significant speedups in policy evaluation and
heuristic computations within the network algorithms by exploiting the FSMs for dynamic programming. Experimental results
show not only orders of magnitude improvements over previous
best known algorithms for smaller-scale domains (with similar solution quality), but also a scale-up into double digits in terms of
numbers of agents.
2008_5_teamcore_fans.pdf Janusz Marecki. 2008. “
Planning with Continuous Resources in Agent Systems ”.
AbstractMy research concentrates on developing reasoning techniques for intelligent, autonomous agent systems. In particular, I focus on planning techniques for both single and multi-agent systems acting in uncertain domains. In modeling these domains, I consider two types of uncertainty: (i) the outcomes of agent actions are uncertain and (ii) the amount of resources consumed by agent actions is uncertain and only characterized by continuous probability density functions. Such rich domains, that range from the Mars rover exploration to the unmanned aerial surveillance to the automated disaster rescue operations are commonly modeled as continuous resource Markov decision processes (MDPs) that can then be solved in order to construct policies for agents acting in these domains. This thesis addresses two major unresolved problems in continuous resource MDPs. First, they are very difficult to solve and existing algorithms are either fast, but make additional restrictive assumptions about the model, or do not introduce these assumptions but are very inefficient. Second, continuous resource MDP framework is not directly applicable to multi-agent systems and current approaches all discretize resource levels or assume deterministic resource consumption which automatically invalidates the formal solution quality guarantees. The goal of my thesis is to fundamentally alter this landscape in three contributions:
I first introduce CPH, a fast analytic algorithm for solving continuous resource MDPs. CPH solves the planning problems at hand by first approximating with a desired accuracy the probability distributions over the resource consumptions with phase-type distributions, which use exponential distributions as building blocks. It then uses value iteration to solve the resulting MDPs more efficiently than its closest competitor, and allows for a systematic trade-off of solution quality for speed. Second, to improve the anytime performance of CPH and other continuous resource MDP solvers I introduce the DPFP algorithm. Rather than using value iteration to solve the problem at hand, DPFP performs a forward search in the corresponding dual space of cumulative distribution functions. In doing so, DPFP discriminates in its policy generation effort providing only approximate policies for regions of the state-space reachable with low probability yet it bounds the error that such approximation entails. Third, I introduce CR-DEC-MDP, a framework for planning with continuous resources in multi-agent systems and propose two algorithms for solving CR-DEC-MDPs: The first algorithm (VFP) emphasizes scalability. It performs a series of policy iterations in order to quickly find a locally optimal policy. In contrast, the second algorithm (M-DPFP) stresses optimality; it allows for a systematic trade-off of solution quality for speed by using the concept of DPFP in a multiagent setting. My results show up to three orders of magnitude speedups in solving single agent planning problems and up to one order of magnitude speedup in solving multi-agent planning problems. Furthermore, I demonstrate the practical use of one of my algorithms in a large-scale disaster simulation where it allows for a more efficient rescue operation.
2008_15_teamcore_marecki_j4015754173.pdf Praveen Paruchuri, Jonathan P. Pearce, Janusz Marecki, Milind Tambe, Fernando Ordonez, and Sarit Kraus. 2008. “
Playing games with security: An efficient exact algorithm for Bayesian Stackelberg Games .” In International Joint Conference on Autonomous Agents and Multiagent Systems, 2008.
AbstractIn a class of games known as Stackelberg games, one agent (the
leader) must commit to a strategy that can be observed by the other
agent (the follower or adversary) before the adversary chooses its
own strategy. We consider Bayesian Stackelberg games, in which
the leader is uncertain about the types of adversary it may face.
Such games are important in security domains, where, for example, a security agent (leader) must commit to a strategy of patrolling
certain areas, and a robber (follower) has a chance to observe this
strategy over time before choosing its own strategy of where to attack. This paper presents an efficient exact algorithm for finding the
optimal strategy for the leader to commit to in these games. This
algorithm, DOBSS, is based on a novel and compact mixed-integer
linear programming formulation. Compared to the most efficient
algorithm known previously for this problem, DOBSS is not only
faster, but also leads to higher quality solutions, and does not suffer from problems of infeasibility that were faced by this previous
algorithm. Note that DOBSS is at the heart of the ARMOR system
that is currently being tested for security scheduling at the Los Angeles International Airport.
2008_4_teamcore_aamas_2008.pdf Nathan Schurr, Janusz Marecki, and Milind Tambe. 2008. “
RIAACT: A robust approach to adjustable autonomy for human-multiagent teams .” In The Seventh International Conference on Autonomous Agents and Multiagent Systems.
AbstractWhen human-multiagent teams act in real-time uncertain domains,
adjustable autonomy (dynamic transferring of decisions between
human and agents) raises three key challenges. First, the human
and agents may differ significantly in their worldviews, leading to
inconsistencies in their decisions. Second, these human-multiagent
teams must operate and plan in real-time with deadlines with uncertain duration of human actions. Thirdly, adjustable autonomy in
teams is an inherently distributed and complex problem that cannot be solved optimally and completely online. To address these
challenges, our paper presents a solution for Resolving Inconsistencies in Adjustable Autonomy in Continuous Time (RIAACT).
RIAACT incorporates models of the resolution of inconsistencies,
continuous time planning techniques, and hybrid method to address
coordination complexity. These contributions have been realized in
a disaster response simulation system.
2008_6_teamcore_riaact.pdf Manish Jain, Fernando Ord´onez, James Pita, Christopher Portway, Milind Tambe, Craig Western, Praveen Paruchuri, and Sarit Kraus. 2008. “
Robust Solutions in Stackelberg Games: Addressing Boundedly Rational Human Preference Models .” In Association for the Advancement of Artificial Intelligence 4th Multidiciplinary Workshop on Advances in Preference Handling.
AbstractStackelberg games represent an important class of games in
which one player, the leader, commits to a strategy and the
remaining players, the followers, make their decision with
knowledge of the leader’s commitment. Existing algorithms
for Bayesian Stackelberg games find optimal solutions while
modeling uncertainty over follower types with an a-priori
probability distribution. Unfortunately, in real-world applications, the leader may also face uncertainty over the follower’s response which makes the optimality guarantees of
these algorithms fail. Such uncertainty arises because the
follower’s specific preferences or the follower’s observations
of the leader’s strategy may not align with the rational strategy, and it is not amenable to a-priori probability distributions. These conditions especially hold when dealing with
human subjects. To address these uncertainties while providing quality guarantees, we propose three new robust algorithms based on mixed-integer linear programs (MILPs) for
Bayesian Stackelberg games. A key result of this paper is
a detailed experimental analysis that demonstrates that these
new MILPs deal better with human responses: a conclusion
based on 800 games with 57 human subjects as followers. We
also provide run-time results on these MILPs.
2008_14_teamcore_aaai_preferences_2008.pdf Jonathan P. Pearce, Milind Tambe, and Rajiv T. Maheswaran. 2008. “
Solving multiagent networks using distributed constraint optimization .” AI Magazine 29 (3), Pp. 47-66.
AbstractIn many cooperative multiagent domains, the effect of local
interactions between agents can be compactly represented as
a network structure. Given that agents are spread across such
a network, agents directly interact only with a small group
of neighbors. A distributed constraint optimization problem
(DCOP) is a useful framework to reason about such networks
of agents. Given agents’ inability to communicate and collaborate in large groups in such networks, we focus on an
approach called k-optimality for solving DCOPs. In this approach, agents form groups of one or more agents until no
group of k or fewer agents can possibly improve the DCOP
solution; we define this type of local optimum, and any algorithm guaranteed to reach such a local optimum, as k-optimal.
The article provides an overview of three key results related
to k-optimality. The first set of results are worst-case guarantees on the solution quality of k-optima in a DCOP. These
guarantees can help determine an appropriate k-optimal algorithm, or possibly an appropriate constraint graph structure,
for agents to use in situations where the cost of coordination
between agents must be weighed against the quality of the
solution reached. The second set of results are upper bounds
on the number of k-optima that can exist in a DCOP. These
results are useful in domains where a DCOP must generate a
set of solutions rather than single solution. Finally, we sketch
algorithms for k-optimality and provide some experimental
results for 1-, 2- and 3-optimal algorithms for several types
of DCOPs.
2008_2_teamcore_k_optimality_ai_mag_2.pdf Janusz Marecki and Milind Tambe. 2008. “
Towards Faster Planning with Continuous Resources in Stochastic Domains .” In Twenty Third AAAI Conference on Artificial Intelligence .
AbstractAgents often have to construct plans that obey resource limits for continuous resources whose consumption can only be
characterized by probability distributions. While Markov Decision Processes (MDPs) with a state space of continuous and
discrete variables are popular for modeling these domains,
current algorithms for such MDPs can exhibit poor performance with a scale-up in their state space. To remedy that
we propose an algorithm called DPFP. DPFP’s key contribution is its exploitation of the dual space cumulative distribution functions. This dual formulation is key to DPFP’s novel
combination of three features. First, it enables DPFP’s membership in a class of algorithms that perform forward search
in a large (possibly infinite) policy space. Second, it provides
a new and efficient approach for varying the policy generation effort based on the likelihood of reaching different regions of the MDP state space. Third, it yields a bound on
the error produced by such approximations. These three features conspire to allow DPFP’s superior performance and systematic trade-off of optimality for speed. Our experimental
evaluation shows that, when run stand-alone, DPFP outperforms other algorithms in terms of its any-time performance,
whereas when run as a hybrid, it allows for a significant
speedup of a leading continuous resource MDP solver.
2008_10_teamcore_dpfp.pdf Milind Tambe, Anne Balsamo, and Emma Bowring. 2008. “
Using science fiction in teaching Artificial Intelligence .” In AAAI Spring Symposium 2008.
AbstractMany factors are blamed for the decreasing enrollments in
computer science and engineering programs in the U.S., including the dot-com economic bust and the increase in the use
of “off-shore” programming labor. One major factor is also
the lack of bold new vision and excitement about computer
science, which thus results in a view of computer science as
a field wedded to routine programming. To address this concern, we have focused on science fiction as a means to generate excitement about Artificial Intelligence, and thus in turn
in Computer Science and Engineering. In particular, since
the Fall of 2006, we have used science fiction in teaching
Artificial Intelligence to undergraduate students at the University of Southern California (USC), in teaching activities
ranging from an undergraduate upper division class in computer science to a semester-long freshman seminar for nonengineering students to micro-seminars during the welcome
week. As an interdisciplinary team of scholar/instructors, our
goal has been to use science fiction not only in motivating
students to learn about AI, but also to use science fiction in
understanding fundamental issues that arise at the intersection of technology and culture, as well as to provide students
with a more creative and well-rounded course that provided
a big picture view of computer science. This paper outlines
the courses taught using this theme, provides an overview of
our classroom teaching techniques in using science fiction,
and discusses some of the lectures in more detail as exemplars. We conclude with feedback received, lessons learned
and impact on both the computer science students and noncomputer-science (and non-engineering) students.
2008_3_teamcore_agents_scifi.pdf