2009
Janusz Marecki and Milind Tambe. 2009. “
Planning with Continuous Resources for Agent Teams .” In The Eighth International Conference on Autonomous Agents and Multiagent Systems.
AbstractMany problems of multiagent planning under uncertainty require
distributed reasoning with continuous resources and resource limits. Decentralized Markov Decision Problems (Dec-MDPs) are
well-suited to address such problems, but unfortunately, prior DecMDP approaches either discretize resources at the expense of speed
and quality guarantees, or avoid discretization only by limiting agents’
action choices or interactions (e.g. assumption of transition independence). To address these shortcomings, this paper proposes MDPFP, a novel algorithm for planning with continuous resources
for agent teams, with three key features: (i) it maintains the agent
team interaction graph to identify and prune the suboptimal policies
and to allow the agents to be transition dependent, (ii) it operates
in a continuous space of probability functions to provide the error
bound on the solution quality and finally (iii) it focuses the search
for policies on the most relevant parts of this search space to allow
for a systematic trade-off of solution quality for speed. Our experiments show that M-DPFP finds high quality solutions and exhibits
superior performance when compared with a discretization-based
approach. We also show that M-DPFP is applicable to solving
problems that are beyond the scope of existing approaches.
2009_4_teamcore_m_dpfp.pdf J. Pita, M. Jain, C. Kiekintveld, H. Bellamane, J. Tsai, M. Tambe, and F. Ordonez. 2009. “
Security Applications: Lessons of Real-World Deployment .” ACM SIGecom Exchanges, 8, 2.
AbstractGame theory has played an important role in security decisions. Recent work using
Stackelberg games [Fudenberg and Tirole 1991] to model security domains has been
particularly influential [Basilico et al. 2009; Kiekintveld et al. 2009; Paruchuri et al.
2008; Pita et al. 2008; Pita et al. 2009]. In a Stackelberg game, a leader (in this case
the defender) acts first and commits to a randomized security policy. The follower
(attacker) optimizes its reward considering the strategy chosen by the leader. These
games are well-suited to representing the problem security forces face in allocating
limited resources, such as officers, canine units, and checkpoints. In particular, the
fact that the attacker is able to observe the policy reflects the way real terrorist
organizations plan attacks using extensive surveillance and long planning cycles.
Stackelberg game models are not just theoretical models; they are at the heart of
deployed decision-support software now in use the the Los Angeles World Airport
(LAWA) police and the United States Federal Air Marshals Service (FAMS). A new
application is under development for the Transportation Security Administration
(TSA), also using game-theoretic analysis. Moving from theoretical analysis to
applying game theory in real applications posed many new challenged, and there
remain many open questions to be solved in this exciting area of work. In this
article we will highlight several of the main issues that have come up, including
(i) developing efficient algorithms to solve large-scale Stackelberg Security Games,
(ii) evaluating deployed security systems, (iii) knowledge acquisition from security
experts to specify the game models, and (iv) handling mixed-initiative interactions.
We begin with an overview of the deployed systems and then discuss these issues
in turn.
2009_14_teamcore_usc1.pdf Emma Bowring, Zhengyu Yin, Rob Zinkov, and Milind Tambe. 2009. “
Sensitivity analysis for distributed optimization with resource constraints .” In The Eighth International Conference on Autonomous Agents and Multiagent Systems.
AbstractPrevious work in multiagent coordination has addressed the challenge of planning in domains where agents must optimize a global
goal, while satisfying local resource constraints. However, the imposition of resource constraints naturally raises the question of whether the agents could significantly improve their team performance
if a few more resources were made available. Sensitivity analysis aims to answer that question. This paper focuses on sensitivity
analysis in the context of the distributed coordination framework,
Multiply-Constrained DCOP (MC-DCOP). There are three main
challenges in performing sensitivity analysis: (i) to perform it in a
distributed fashion, (ii) to avoid re-solving an NP-hard MC-DCOP
optimization from scratch, and (iii) to avoid considering unproductive uses for extra resources. To meet these challenges, this paper
presents three types of locally optimal algorithms: link analysis,
local reoptimization and local constraint propagation. These algorithms are distributed and avoid redundant computation by ascertaining just the effects of local perturbations on the original problem. Deploying our algorithms on a large number of MC-DCOP
problems revealed several results. While our cheapest algorithm
successfully identified quality improvements for a few problems,
our more complex techniques were necessary to identify the best
uses for additional resources. Furthermore, we identified two heuristics that can help identify a priori which agents might benefit most
from additional resources: density rank, which works well when
nodes received identical resources and remaining resource rank,
which works well when nodes received resources based on the
number of neighbors they had.
2009_5_teamcore_bowring_aamas09.pdf Jason Tsai, Zhengyu Yin, Jun-young Kwak, David Kempe, Christopher Kiekintveld, and Milind Tambe. 2009. “
Strategic Security Placement in Network Domains with Applications to Transit Security .” In In IJCAI 2009 Workshop on Quantitative Risk Analysis for Security Applications.
AbstractDeterministic placement of security personnel creates serious vulnerabilities for any organization attempting to prevent intrusion. Recent work in use
at the Los Angeles International Airport (LAX) and
in progress with the United States Federal Air Marshal Service (FAMS) has applied game-theoretic
analysis to the problem by modeling it as a Stackelberg game wherein security forces are the leaders
that commit to a strategy that is observed and countered by attackers.
In this work, we explore efficient techniques for
performing the same analysis on games with a
graph structure, wherein an attacker must follow
a path from an entry point to a target. If we
frame these problems in the straightforward manner with leader actions being sets of edges that can
be guarded and follower actions being paths from
entry to targets, the size of the game increases exponentially, quickly reaching memory limitations
when using general Stackelberg solvers.
In this work, we propose a novel linear program
that is able to solve this type of problem efficiently.
While it provides exact solutions for games where
only one checkpoint is allowed, it is an approximation in the general case. Finally, we compare the
performance of this and other methods by generating optimal policies for the Seoul Metropolitan
Subway in Seoul, South Korea.
2009_16_teamcore_tsai.pdf Matthew E. Taylor, Manish Jain, Prateek Tandon, and Milind Tambe. 2009. “
Using DCOPs to Balance Exploration and Exploitation in Time-Critical Domains .” In IJCAI 2009 Workshop on Distributed Constraint Reasoning (DCR 2009) .
AbstractSubstantial work has investigated balancing exploration and exploitation, but relatively little has addressed this tradeoff in the context of coordinated
multi-agent interactions. This paper introduces a class of problems in which
agents must maximize their on-line reward, a decomposable function dependent
on pairs of agent’s decisions. Unlike previous work, agents must both learn the
reward function and exploit it on-line, critical properties for a class of physicallymotivated systems, such as mobile wireless networks. This paper introduces algorithms motivated by the Distributed Constraint Optimization Problem framework
and demonstrates when, and at what cost, increasing agents’ coordination can improve the global reward on such problems.
2009_15_teamcore_dcr09_taylor.pdf James Pita, Manish Jain, Fernando Ordonez, Christopher Portway, Milind Tambe, Craig Western, Praveen Paruchuri, and Sarit Kraus. 2009. “
Using Game Theory for Los Angeles Airport Security .” AI Magazine 30 (1), Pp. 43-57.
AbstractSecurity at major locations of economic or political importance is a key concern around the world, particularly given
the threat of terrorism. Limited security resources prevent
full security coverage at all times, which allows adversaries
to observe and exploit patterns in selective patrolling or monitoring, e.g. they can plan an attack avoiding existing patrols. Hence, randomized patrolling or monitoring is important, but randomization must provide distinct weights to different actions based on their complex costs and benefits. To
this end, this paper describes a promising transition of the latest in multi-agent algorithms into a deployed application. In
particular, it describes a software assistant agent called ARMOR (Assistant for Randomized Monitoring over Routes)
that casts this patrolling/monitoring problem as a Bayesian
Stackelberg game, allowing the agent to appropriately weigh
the different actions in randomization, as well as uncertainty
over adversary types. ARMOR combines two key features:
(i) It uses the fastest known solver for Bayesian Stackelberg
games called DOBSS, where the dominant mixed strategies
enable randomization; (ii) Its mixed-initiative based interface
allows users to occasionally adjust or override the automated
schedule based on their local constraints. ARMOR has been
successfully deployed since August 2007 at the Los Angeles
International Airport (LAX) to randomize checkpoints on the
roadways entering the airport and canine patrol routes within
the airport terminals. This paper examines the information,
design choices, challenges, and evaluation that went into designing ARMOR.
2009_9_teamcore_ai_magazine.pdf P. Paruchuri, J. Pearce, J. Marecki, M. Tambe, F. Ordonez, and S. Kraus. 2009. “
Coordinating randomized policies for increasing security of agent systems.” Journal of Information Technology and Management (ITM), 10, 1, Pp. 67-79.
AbstractWe consider the problem of providing decision support to a patrolling or security service in an adversarial domain. The idea is to create patrols that can achieve a high level of coverage or reward while taking into account the presence of an adversary. We assume that the adversary can learn or observe the patrolling strategy and use this to its advantage. We follow two different approaches depending on what is known about the adversary. If there is no information about the adversary we use a Markov Decision Process (MDP) to represent patrols and identify randomized solutions that minimize the information available to the adversary. This lead to the development of algorithms CRLP and BRLP, for policy randomization of MDPs. Second, when there is partial information about the adversary we decide on efficient patrols by solving a Bayesian Stackelberg game. Here, the leader decides first on a patrolling strategy and then an adversary, of possibly many adversary types, selects its best response for the given patrol. We provide two efficient MIP formulations named DOBSS and ASAP to solve this NP-hard problem. Our experimental results show the efficiency of these algorithms and illustrate how these techniques provide optimal and secure patrolling policies. Note that DOBSS is at the heart of the ARMOR system that is currently deployed at the Los Angeles International airport (LAX) for randomizing checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals. Key words: Multiagent Systems, Decision Theory, Game Theory, Security, Randomized Policies
2009_1_teamcore_waid_journal.pdf 2008
J. Pita, Manish Jain, Fernando Ordonez, Christopher Portway, Milind Tambe, Craig Western, Praveen Paruchuri, and Sarit Kraus. 2008. “
ARMOR Security for Los Angeles International Airport .” In AAAI Intelligent Systems Demonstrations.
AbstractSecurity at major locations of economic or political importance is a key concern around the world, particularly given
the threat of terrorism. Limited security resources prevent
full security coverage at all times, which allows adversaries
to observe and exploit patterns in selective patrolling or monitoring, e.g. they can plan an attack avoiding existing patrols.
Hence, randomized patrolling or monitoring is important, but
randomization must provide distinct weights to different actions based on their complex costs and benefits. To this end,
this demonstration showcases a promising transition of the
latest in multi-agent algorithms into a deployed application.
In particular, it exhibits a software assistant agent called ARMOR (Assistant for Randomized Monitoring over Routes)
that casts this patrolling/monitoring problem as a Bayesian
Stackelberg game, allowing the agent to appropriately weigh
the different actions in randomization, as well as uncertainty
over adversary types. ARMOR combines two key features:
(i) It uses the fastest known solver for Bayesian Stackelberg
games called DOBSS, where the dominant mixed strategies
enable randomization; (ii) Its mixed-initiative based interface
allows users to occasionally adjust or override the automated
schedule based on their local constraints. ARMOR has been
successfully deployed since August 2007 at the Los Angeles
International Airport (LAX) to randomize checkpoints on the
roadways entering the airport and canine patrol routes within
the airport terminals.
2008_9_teamcore_aaai_demo_08.pdf Manish Jain, J Pita, Milind Tambe, Fernando Ordonez, Praveen Paruchuri, and Sarit Kraus. 2008. “
Bayesian Stackelberg Games and their Application for Security at Los Angeles International Airport .” In SIGecom Exchanges.
AbstractMany multiagent settings are appropriately modeled as Stackelberg games [Fudenberg
and Tirole 1991; Paruchuri et al. 2007], where a leader commits to a strategy first, and
then a follower selfishly optimizes its own reward, considering the action chosen by the
leader. Stackelberg games are commonly used to model attacker-defender scenarios in security domains [Brown et al. 2006] as well as in patrolling [Paruchuri et al. 2007; Paruchuri
et al. 2008]. For example, security personnel patrolling an infrastructure commit to a patrolling strategy first, before their adversaries act taking this committed strategy into account. Indeed, Stackelberg games are being used at the Los Angeles International Airport
to schedule security checkpoints and canine patrols [Murr 2007; Paruchuri et al. 2008; Pita
et al. 2008a]. They could potentially be used in network routing, pricing in transportation
systems and many other situations [Korilis et al. 1997; Cardinal et al. 2005].
Although the follower in a Stackelberg game is allowed to observe the leader’s strategy
before choosing its own strategy, there is often an advantage for the leader over the case
where both players must choose their moves simultaneously. To see the advantage of being
the leader in a Stackelberg game, consider the game with the payoff as shown in Table I.
The leader is the row player and the follower is the column player. The only pure-strategy
Nash equilibrium for this game is when the leader plays a and the follower plays c which
gives the leader a payoff of 2. However, if the leader commits to a mixed strategy of
playing a and b with equal (0.5) probability, then the follower will play d, leading to an
expected payoff for the leader of 3.5.
c d
2008_12_teamcore_sigecom_article.pdf J. Pita, Manish Jain, Craig Western, Christopher Portway, Milind Tambe, Fernando Ordonez, Sarit Kraus, and Praveen Paruchuri. 2008. “
Deployed ARMOR protection: The application of a game theoretic model for security at the Los Angeles International Airport .” In International Joint Conference on Autonomous Agents and Multiagent Systems, 2008.
AbstractSecurity at major locations of economic or political importance is
a key concern around the world, particularly given the threat of
terrorism. Limited security resources prevent full security coverage at all times, which allows adversaries to observe and exploit
patterns in selective patrolling or monitoring, e.g. they can plan
an attack avoiding existing patrols. Hence, randomized patrolling
or monitoring is important, but randomization must provide distinct weights to different actions based on their complex costs and
benefits. To this end, this paper describes a promising transition
of the latest in multi-agent algorithms – in fact, an algorithm that
represents a culmination of research presented at AAMAS – into a
deployed application. In particular, it describes a software assistant
agent called ARMOR (Assistant for Randomized Monitoring over
Routes) that casts this patrolling/monitoring problem as a Bayesian
Stackelberg game, allowing the agent to appropriately weigh the
different actions in randomization, as well as uncertainty over adversary types. ARMOR combines three key features: (i) It uses
the fastest known solver for Bayesian Stackelberg games called
DOBSS, where the dominant mixed strategies enable randomization; (ii) Its mixed-initiative based interface allows users to occasionally adjust or override the automated schedule based on their
local constraints; (iii) It alerts the users if mixed-initiative overrides
appear to degrade the overall desired randomization. ARMOR has
been successfully deployed since August 2007 at the Los Angeles
International Airport (LAX) to randomize checkpoints on the roadways entering the airport and canine patrol routes within the airport
terminals. This paper examines the information, design choices,
challenges, and evaluation that went into designing ARMOR.
2008_7_teamcore_amasind2008final.pdf Praveen Paruchuri, Jonathan P. Pearce, Janusz Marecki, Milind Tambe, Fernando Ordonez, and Sarit Kraus. 2008. “
Efficient Algorithms to solve Bayesian Stackelberg Games for Security Applications .” In National Conference on Artificial Intelligence (AAAI).
AbstractIn a class of games known as Stackelberg games, one agent
(the leader) must commit to a strategy that can be observed by
the other agent (the adversary/follower) before the adversary
chooses its own strategy. We consider Bayesian Stackelberg
games, in which the leader is uncertain about the type of the
adversary it may face. Such games are important in security domains, where, for example, a security agent (leader)
must commit to a strategy of patrolling certain areas, and an
adversary (follower) can observe this strategy over time before choosing where to attack. We present here two different MIP-formulations, ASAP (providing approximate policies with controlled randomization) and DOBSS (providing
optimal policies) for Bayesian Stackelberg games. DOBSS is
currently the fastest optimal procedure for Bayesian Stackelberg games and is in use by police at the Los Angeles International Airport(LAX) to schedule their activities.
2008_8_teamcore_nectar08.pdf M. Tasaki, Y. Yabu, Y. Iwanari, M. Yokoo, M. Tambe, J. Marecki, and P. Varakantham. 2008. “
Introducing Communication in Dis-POMDPs with Locality of Interaction .” In IEEE International conference on Intelligent Agent Technology (IAT).
AbstractWhile Distributed POMDPs have become popular for
modeling multiagent systems in uncertain domains, it is the
Networked Distributed POMDPs (ND-POMDPs) model —
a model tailored to real agent networks — that has begun
to scale-up the number of agents. However, prior work
in ND-POMDPs has failed to address communication, a
shortcoming that has the side-effect of limiting the planning
horizon. In particular, without communication, the size of
a local policy at each agent within the ND-POMDPs grows
exponentially in the time horizon. To overcome this problem, we extend existing algorithms (LID-JESP and SPIDER) so that agents periodically communicate their observation and action histories with each other. After communication, agents can start from new synchronized belief
states. While by introducing communication, we can avoid
the exponential growth in the size of local policies at agents,
the key idea is to avoid an exponential number of synchronized belief states after communication. To this end, we
introduce an idea that is similar the Point-based Value Iteration (PBVI) algorithm and approximate the value function with a fixed number of representative points and their
α vectors. Our experimental results show that we can obtain much longer policies than existing algorithms as long
as the interval between communications is small.
2008_13_teamcore_iat2008_0703.pdf Emma Bowring, Jonathan P. Pearce, Christopher Portway, Manish Jain, and Milind Tambe. 2008. “
On K-Optimal Distributed Constraint Optimization Algorithms: New Bounds and Algorithms .” In The Seventh International Conference on Autonomous Agents and Multiagent Systems.
AbstractDistributed constraint optimization (DCOP) is a promising approach
to coordination, scheduling and task allocation in multi agent networks. In large-scale or low-bandwidth networks, finding the global
optimum is often impractical. K-optimality is a promising new approach: for the first time it provides us a set of locally optimal algorithms with quality guarantees as a fraction of global optimum. Unfortunately, previous work in k-optimality did not address domains
where we may have prior knowledge of reward structure; and it
failed to provide quality guarantees or algorithms for domains with
hard constraints (such as agents’ local resource constraints). This
paper addresses these shortcomings with three key contributions.
It provides: (i) improved lower-bounds on k-optima quality incorporating available prior knowledge of reward structure; (ii) lower
bounds on k-optima quality for problems with hard constraints; and
(iii) k-optimal algorithms for solving DCOPs with hard constraints
and detailed experimental results on large-scale networks.
2008_1_teamcore_final.pdf A. Freedy, O. Sert, E. Freedy, G. Weltman, J. Mcdonough, Milind Tambe, and Tapana Gupta. 2008. “
Multiagent adjustable autonomy framework (MAAF) for multirobot, multihuman teams .” In International symposium on collaborative technologies (CTS).
AbstractThis paper describes the ongoing development of a
Multiagent Adjustable Autonomy Framework (MAAF) for
multi-robot, multi-human teams performing tactical
maneuvers. The challenge being addressed in this SBIR
Phase I R&D project is how to exploit fully the unique
capabilities of heterogeneous teams composed of a
mixture of Robots, Agents or Persons (RAPs): that is, how
to improve the safety, efficiency, reliability and cost of
achieving mission goals while maintaining dynamic
adaptation to the unique limitations and contingencies of
a real-world operating environment. Our response to this
challenge is the creation of a new infrastructure that will
facilitate cooperative and collaborative performance of
human and robots as equal team partners through the
application of advances in goal-oriented, multiagent
planning and coordination technology. At the heart of
our approach is the USC Teamcore Group’s Machinetta,
a state-of-the-art robot proxy framework with adjustable
autonomy. Machinetta facilitates robot-human role
allocation decisions and collaborative sharing of team
tasks in the non-deterministic and unpredictable military
environment through the use of a domain-independent
teamwork model that supports flexible teamwork. This
paper presents our innovative proxy architecture and its
constituent algorithms, and also describes our initial
demonstration of technical feasibility in a realistic
simulation scenario.
2008_11_teamcore_maaf_cts2008paper.pdf Janusz Marecki, Tapana Gupta, Pradeep Varakantham, Milind Tambe, and Makoto Yokoo. 2008. “
Not All Agents Are Equal: Scaling up Distributed POMDPs for Agent Networks .” In The Seventh International Conference on Autonomous Agents and Multiagent Systems.
AbstractMany applications of networks of agents, including mobile sensor networks, unmanned air vehicles, autonomous underwater vehicles, involve 100s of agents acting collaboratively under uncertainty. Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are well-suited to address such applications, but so far, only limited scale-ups of up to five agents
have been demonstrated. This paper escalates the scale-up, presenting an algorithm called FANS, increasing the number of agents
in distributed POMDPs for the first time into double digits. FANS
is founded on finite state machines (FSMs) for policy representation and expoits these FSMs to provide three key contributions:
(i) Not all agents within an agent network need the same expressivity of policy representation; FANS introduces novel heuristics
to automatically vary the FSM size in different agents for scaleup; (ii) FANS illustrates efficient integration of its FSM-based policy search within algorithms that exploit agent network structure;
(iii) FANS provides significant speedups in policy evaluation and
heuristic computations within the network algorithms by exploiting the FSMs for dynamic programming. Experimental results
show not only orders of magnitude improvements over previous
best known algorithms for smaller-scale domains (with similar solution quality), but also a scale-up into double digits in terms of
numbers of agents.
2008_5_teamcore_fans.pdf Janusz Marecki. 2008. “
Planning with Continuous Resources in Agent Systems ”.
AbstractMy research concentrates on developing reasoning techniques for intelligent, autonomous agent systems. In particular, I focus on planning techniques for both single and multi-agent systems acting in uncertain domains. In modeling these domains, I consider two types of uncertainty: (i) the outcomes of agent actions are uncertain and (ii) the amount of resources consumed by agent actions is uncertain and only characterized by continuous probability density functions. Such rich domains, that range from the Mars rover exploration to the unmanned aerial surveillance to the automated disaster rescue operations are commonly modeled as continuous resource Markov decision processes (MDPs) that can then be solved in order to construct policies for agents acting in these domains. This thesis addresses two major unresolved problems in continuous resource MDPs. First, they are very difficult to solve and existing algorithms are either fast, but make additional restrictive assumptions about the model, or do not introduce these assumptions but are very inefficient. Second, continuous resource MDP framework is not directly applicable to multi-agent systems and current approaches all discretize resource levels or assume deterministic resource consumption which automatically invalidates the formal solution quality guarantees. The goal of my thesis is to fundamentally alter this landscape in three contributions:
I first introduce CPH, a fast analytic algorithm for solving continuous resource MDPs. CPH solves the planning problems at hand by first approximating with a desired accuracy the probability distributions over the resource consumptions with phase-type distributions, which use exponential distributions as building blocks. It then uses value iteration to solve the resulting MDPs more efficiently than its closest competitor, and allows for a systematic trade-off of solution quality for speed. Second, to improve the anytime performance of CPH and other continuous resource MDP solvers I introduce the DPFP algorithm. Rather than using value iteration to solve the problem at hand, DPFP performs a forward search in the corresponding dual space of cumulative distribution functions. In doing so, DPFP discriminates in its policy generation effort providing only approximate policies for regions of the state-space reachable with low probability yet it bounds the error that such approximation entails. Third, I introduce CR-DEC-MDP, a framework for planning with continuous resources in multi-agent systems and propose two algorithms for solving CR-DEC-MDPs: The first algorithm (VFP) emphasizes scalability. It performs a series of policy iterations in order to quickly find a locally optimal policy. In contrast, the second algorithm (M-DPFP) stresses optimality; it allows for a systematic trade-off of solution quality for speed by using the concept of DPFP in a multiagent setting. My results show up to three orders of magnitude speedups in solving single agent planning problems and up to one order of magnitude speedup in solving multi-agent planning problems. Furthermore, I demonstrate the practical use of one of my algorithms in a large-scale disaster simulation where it allows for a more efficient rescue operation.
2008_15_teamcore_marecki_j4015754173.pdf Praveen Paruchuri, Jonathan P. Pearce, Janusz Marecki, Milind Tambe, Fernando Ordonez, and Sarit Kraus. 2008. “
Playing games with security: An efficient exact algorithm for Bayesian Stackelberg Games .” In International Joint Conference on Autonomous Agents and Multiagent Systems, 2008.
AbstractIn a class of games known as Stackelberg games, one agent (the
leader) must commit to a strategy that can be observed by the other
agent (the follower or adversary) before the adversary chooses its
own strategy. We consider Bayesian Stackelberg games, in which
the leader is uncertain about the types of adversary it may face.
Such games are important in security domains, where, for example, a security agent (leader) must commit to a strategy of patrolling
certain areas, and a robber (follower) has a chance to observe this
strategy over time before choosing its own strategy of where to attack. This paper presents an efficient exact algorithm for finding the
optimal strategy for the leader to commit to in these games. This
algorithm, DOBSS, is based on a novel and compact mixed-integer
linear programming formulation. Compared to the most efficient
algorithm known previously for this problem, DOBSS is not only
faster, but also leads to higher quality solutions, and does not suffer from problems of infeasibility that were faced by this previous
algorithm. Note that DOBSS is at the heart of the ARMOR system
that is currently being tested for security scheduling at the Los Angeles International Airport.
2008_4_teamcore_aamas_2008.pdf Nathan Schurr, Janusz Marecki, and Milind Tambe. 2008. “
RIAACT: A robust approach to adjustable autonomy for human-multiagent teams .” In The Seventh International Conference on Autonomous Agents and Multiagent Systems.
AbstractWhen human-multiagent teams act in real-time uncertain domains,
adjustable autonomy (dynamic transferring of decisions between
human and agents) raises three key challenges. First, the human
and agents may differ significantly in their worldviews, leading to
inconsistencies in their decisions. Second, these human-multiagent
teams must operate and plan in real-time with deadlines with uncertain duration of human actions. Thirdly, adjustable autonomy in
teams is an inherently distributed and complex problem that cannot be solved optimally and completely online. To address these
challenges, our paper presents a solution for Resolving Inconsistencies in Adjustable Autonomy in Continuous Time (RIAACT).
RIAACT incorporates models of the resolution of inconsistencies,
continuous time planning techniques, and hybrid method to address
coordination complexity. These contributions have been realized in
a disaster response simulation system.
2008_6_teamcore_riaact.pdf Manish Jain, Fernando Ord´onez, James Pita, Christopher Portway, Milind Tambe, Craig Western, Praveen Paruchuri, and Sarit Kraus. 2008. “
Robust Solutions in Stackelberg Games: Addressing Boundedly Rational Human Preference Models .” In Association for the Advancement of Artificial Intelligence 4th Multidiciplinary Workshop on Advances in Preference Handling.
AbstractStackelberg games represent an important class of games in
which one player, the leader, commits to a strategy and the
remaining players, the followers, make their decision with
knowledge of the leader’s commitment. Existing algorithms
for Bayesian Stackelberg games find optimal solutions while
modeling uncertainty over follower types with an a-priori
probability distribution. Unfortunately, in real-world applications, the leader may also face uncertainty over the follower’s response which makes the optimality guarantees of
these algorithms fail. Such uncertainty arises because the
follower’s specific preferences or the follower’s observations
of the leader’s strategy may not align with the rational strategy, and it is not amenable to a-priori probability distributions. These conditions especially hold when dealing with
human subjects. To address these uncertainties while providing quality guarantees, we propose three new robust algorithms based on mixed-integer linear programs (MILPs) for
Bayesian Stackelberg games. A key result of this paper is
a detailed experimental analysis that demonstrates that these
new MILPs deal better with human responses: a conclusion
based on 800 games with 57 human subjects as followers. We
also provide run-time results on these MILPs.
2008_14_teamcore_aaai_preferences_2008.pdf Jonathan P. Pearce, Milind Tambe, and Rajiv T. Maheswaran. 2008. “
Solving multiagent networks using distributed constraint optimization .” AI Magazine 29 (3), Pp. 47-66.
AbstractIn many cooperative multiagent domains, the effect of local
interactions between agents can be compactly represented as
a network structure. Given that agents are spread across such
a network, agents directly interact only with a small group
of neighbors. A distributed constraint optimization problem
(DCOP) is a useful framework to reason about such networks
of agents. Given agents’ inability to communicate and collaborate in large groups in such networks, we focus on an
approach called k-optimality for solving DCOPs. In this approach, agents form groups of one or more agents until no
group of k or fewer agents can possibly improve the DCOP
solution; we define this type of local optimum, and any algorithm guaranteed to reach such a local optimum, as k-optimal.
The article provides an overview of three key results related
to k-optimality. The first set of results are worst-case guarantees on the solution quality of k-optima in a DCOP. These
guarantees can help determine an appropriate k-optimal algorithm, or possibly an appropriate constraint graph structure,
for agents to use in situations where the cost of coordination
between agents must be weighed against the quality of the
solution reached. The second set of results are upper bounds
on the number of k-optima that can exist in a DCOP. These
results are useful in domains where a DCOP must generate a
set of solutions rather than single solution. Finally, we sketch
algorithms for k-optimality and provide some experimental
results for 1-, 2- and 3-optimal algorithms for several types
of DCOPs.
2008_2_teamcore_k_optimality_ai_mag_2.pdf