Publications

2009
Janusz Marecki and Milind Tambe. 2009. “Planning with Continuous Resources for Agent Teams .” In The Eighth International Conference on Autonomous Agents and Multiagent Systems.Abstract
Many problems of multiagent planning under uncertainty require distributed reasoning with continuous resources and resource limits. Decentralized Markov Decision Problems (Dec-MDPs) are well-suited to address such problems, but unfortunately, prior DecMDP approaches either discretize resources at the expense of speed and quality guarantees, or avoid discretization only by limiting agents’ action choices or interactions (e.g. assumption of transition independence). To address these shortcomings, this paper proposes MDPFP, a novel algorithm for planning with continuous resources for agent teams, with three key features: (i) it maintains the agent team interaction graph to identify and prune the suboptimal policies and to allow the agents to be transition dependent, (ii) it operates in a continuous space of probability functions to provide the error bound on the solution quality and finally (iii) it focuses the search for policies on the most relevant parts of this search space to allow for a systematic trade-off of solution quality for speed. Our experiments show that M-DPFP finds high quality solutions and exhibits superior performance when compared with a discretization-based approach. We also show that M-DPFP is applicable to solving problems that are beyond the scope of existing approaches.
2009_4_teamcore_m_dpfp.pdf
J. Pita, M. Jain, C. Kiekintveld, H. Bellamane, J. Tsai, M. Tambe, and F. Ordonez. 2009. “Security Applications: Lessons of Real-World Deployment .” ACM SIGecom Exchanges, 8, 2.Abstract
Game theory has played an important role in security decisions. Recent work using Stackelberg games [Fudenberg and Tirole 1991] to model security domains has been particularly influential [Basilico et al. 2009; Kiekintveld et al. 2009; Paruchuri et al. 2008; Pita et al. 2008; Pita et al. 2009]. In a Stackelberg game, a leader (in this case the defender) acts first and commits to a randomized security policy. The follower (attacker) optimizes its reward considering the strategy chosen by the leader. These games are well-suited to representing the problem security forces face in allocating limited resources, such as officers, canine units, and checkpoints. In particular, the fact that the attacker is able to observe the policy reflects the way real terrorist organizations plan attacks using extensive surveillance and long planning cycles. Stackelberg game models are not just theoretical models; they are at the heart of deployed decision-support software now in use the the Los Angeles World Airport (LAWA) police and the United States Federal Air Marshals Service (FAMS). A new application is under development for the Transportation Security Administration (TSA), also using game-theoretic analysis. Moving from theoretical analysis to applying game theory in real applications posed many new challenged, and there remain many open questions to be solved in this exciting area of work. In this article we will highlight several of the main issues that have come up, including (i) developing efficient algorithms to solve large-scale Stackelberg Security Games, (ii) evaluating deployed security systems, (iii) knowledge acquisition from security experts to specify the game models, and (iv) handling mixed-initiative interactions. We begin with an overview of the deployed systems and then discuss these issues in turn.
2009_14_teamcore_usc1.pdf
Emma Bowring, Zhengyu Yin, Rob Zinkov, and Milind Tambe. 2009. “Sensitivity analysis for distributed optimization with resource constraints .” In The Eighth International Conference on Autonomous Agents and Multiagent Systems.Abstract
Previous work in multiagent coordination has addressed the challenge of planning in domains where agents must optimize a global goal, while satisfying local resource constraints. However, the imposition of resource constraints naturally raises the question of whether the agents could significantly improve their team performance if a few more resources were made available. Sensitivity analysis aims to answer that question. This paper focuses on sensitivity analysis in the context of the distributed coordination framework, Multiply-Constrained DCOP (MC-DCOP). There are three main challenges in performing sensitivity analysis: (i) to perform it in a distributed fashion, (ii) to avoid re-solving an NP-hard MC-DCOP optimization from scratch, and (iii) to avoid considering unproductive uses for extra resources. To meet these challenges, this paper presents three types of locally optimal algorithms: link analysis, local reoptimization and local constraint propagation. These algorithms are distributed and avoid redundant computation by ascertaining just the effects of local perturbations on the original problem. Deploying our algorithms on a large number of MC-DCOP problems revealed several results. While our cheapest algorithm successfully identified quality improvements for a few problems, our more complex techniques were necessary to identify the best uses for additional resources. Furthermore, we identified two heuristics that can help identify a priori which agents might benefit most from additional resources: density rank, which works well when nodes received identical resources and remaining resource rank, which works well when nodes received resources based on the number of neighbors they had.
2009_5_teamcore_bowring_aamas09.pdf
Jason Tsai, Zhengyu Yin, Jun-young Kwak, David Kempe, Christopher Kiekintveld, and Milind Tambe. 2009. “Strategic Security Placement in Network Domains with Applications to Transit Security .” In In IJCAI 2009 Workshop on Quantitative Risk Analysis for Security Applications.Abstract
Deterministic placement of security personnel creates serious vulnerabilities for any organization attempting to prevent intrusion. Recent work in use at the Los Angeles International Airport (LAX) and in progress with the United States Federal Air Marshal Service (FAMS) has applied game-theoretic analysis to the problem by modeling it as a Stackelberg game wherein security forces are the leaders that commit to a strategy that is observed and countered by attackers. In this work, we explore efficient techniques for performing the same analysis on games with a graph structure, wherein an attacker must follow a path from an entry point to a target. If we frame these problems in the straightforward manner with leader actions being sets of edges that can be guarded and follower actions being paths from entry to targets, the size of the game increases exponentially, quickly reaching memory limitations when using general Stackelberg solvers. In this work, we propose a novel linear program that is able to solve this type of problem efficiently. While it provides exact solutions for games where only one checkpoint is allowed, it is an approximation in the general case. Finally, we compare the performance of this and other methods by generating optimal policies for the Seoul Metropolitan Subway in Seoul, South Korea.
2009_16_teamcore_tsai.pdf
Matthew E. Taylor, Manish Jain, Prateek Tandon, and Milind Tambe. 2009. “Using DCOPs to Balance Exploration and Exploitation in Time-Critical Domains .” In IJCAI 2009 Workshop on Distributed Constraint Reasoning (DCR 2009) .Abstract
Substantial work has investigated balancing exploration and exploitation, but relatively little has addressed this tradeoff in the context of coordinated multi-agent interactions. This paper introduces a class of problems in which agents must maximize their on-line reward, a decomposable function dependent on pairs of agent’s decisions. Unlike previous work, agents must both learn the reward function and exploit it on-line, critical properties for a class of physicallymotivated systems, such as mobile wireless networks. This paper introduces algorithms motivated by the Distributed Constraint Optimization Problem framework and demonstrates when, and at what cost, increasing agents’ coordination can improve the global reward on such problems.
2009_15_teamcore_dcr09_taylor.pdf
James Pita, Manish Jain, Fernando Ordonez, Christopher Portway, Milind Tambe, Craig Western, Praveen Paruchuri, and Sarit Kraus. 2009. “Using Game Theory for Los Angeles Airport Security .” AI Magazine 30 (1), Pp. 43-57.Abstract
Security at major locations of economic or political importance is a key concern around the world, particularly given the threat of terrorism. Limited security resources prevent full security coverage at all times, which allows adversaries to observe and exploit patterns in selective patrolling or monitoring, e.g. they can plan an attack avoiding existing patrols. Hence, randomized patrolling or monitoring is important, but randomization must provide distinct weights to different actions based on their complex costs and benefits. To this end, this paper describes a promising transition of the latest in multi-agent algorithms into a deployed application. In particular, it describes a software assistant agent called ARMOR (Assistant for Randomized Monitoring over Routes) that casts this patrolling/monitoring problem as a Bayesian Stackelberg game, allowing the agent to appropriately weigh the different actions in randomization, as well as uncertainty over adversary types. ARMOR combines two key features: (i) It uses the fastest known solver for Bayesian Stackelberg games called DOBSS, where the dominant mixed strategies enable randomization; (ii) Its mixed-initiative based interface allows users to occasionally adjust or override the automated schedule based on their local constraints. ARMOR has been successfully deployed since August 2007 at the Los Angeles International Airport (LAX) to randomize checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals. This paper examines the information, design choices, challenges, and evaluation that went into designing ARMOR.
2009_9_teamcore_ai_magazine.pdf
P. Paruchuri, J. Pearce, J. Marecki, M. Tambe, F. Ordonez, and S. Kraus. 2009. “Coordinating randomized policies for increasing security of agent systems.” Journal of Information Technology and Management (ITM), 10, 1, Pp. 67-79.Abstract
We consider the problem of providing decision support to a patrolling or security service in an adversarial domain. The idea is to create patrols that can achieve a high level of coverage or reward while taking into account the presence of an adversary. We assume that the adversary can learn or observe the patrolling strategy and use this to its advantage. We follow two different approaches depending on what is known about the adversary. If there is no information about the adversary we use a Markov Decision Process (MDP) to represent patrols and identify randomized solutions that minimize the information available to the adversary. This lead to the development of algorithms CRLP and BRLP, for policy randomization of MDPs. Second, when there is partial information about the adversary we decide on efficient patrols by solving a Bayesian Stackelberg game. Here, the leader decides first on a patrolling strategy and then an adversary, of possibly many adversary types, selects its best response for the given patrol. We provide two efficient MIP formulations named DOBSS and ASAP to solve this NP-hard problem. Our experimental results show the efficiency of these algorithms and illustrate how these techniques provide optimal and secure patrolling policies. Note that DOBSS is at the heart of the ARMOR system that is currently deployed at the Los Angeles International airport (LAX) for randomizing checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals. Key words: Multiagent Systems, Decision Theory, Game Theory, Security, Randomized Policies
2009_1_teamcore_waid_journal.pdf
2008
J. Pita, Manish Jain, Fernando Ordonez, Christopher Portway, Milind Tambe, Craig Western, Praveen Paruchuri, and Sarit Kraus. 2008. “ARMOR Security for Los Angeles International Airport .” In AAAI Intelligent Systems Demonstrations.Abstract
Security at major locations of economic or political importance is a key concern around the world, particularly given the threat of terrorism. Limited security resources prevent full security coverage at all times, which allows adversaries to observe and exploit patterns in selective patrolling or monitoring, e.g. they can plan an attack avoiding existing patrols. Hence, randomized patrolling or monitoring is important, but randomization must provide distinct weights to different actions based on their complex costs and benefits. To this end, this demonstration showcases a promising transition of the latest in multi-agent algorithms into a deployed application. In particular, it exhibits a software assistant agent called ARMOR (Assistant for Randomized Monitoring over Routes) that casts this patrolling/monitoring problem as a Bayesian Stackelberg game, allowing the agent to appropriately weigh the different actions in randomization, as well as uncertainty over adversary types. ARMOR combines two key features: (i) It uses the fastest known solver for Bayesian Stackelberg games called DOBSS, where the dominant mixed strategies enable randomization; (ii) Its mixed-initiative based interface allows users to occasionally adjust or override the automated schedule based on their local constraints. ARMOR has been successfully deployed since August 2007 at the Los Angeles International Airport (LAX) to randomize checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals.
2008_9_teamcore_aaai_demo_08.pdf
Manish Jain, J Pita, Milind Tambe, Fernando Ordonez, Praveen Paruchuri, and Sarit Kraus. 2008. “Bayesian Stackelberg Games and their Application for Security at Los Angeles International Airport .” In SIGecom Exchanges.Abstract
Many multiagent settings are appropriately modeled as Stackelberg games [Fudenberg and Tirole 1991; Paruchuri et al. 2007], where a leader commits to a strategy first, and then a follower selfishly optimizes its own reward, considering the action chosen by the leader. Stackelberg games are commonly used to model attacker-defender scenarios in security domains [Brown et al. 2006] as well as in patrolling [Paruchuri et al. 2007; Paruchuri et al. 2008]. For example, security personnel patrolling an infrastructure commit to a patrolling strategy first, before their adversaries act taking this committed strategy into account. Indeed, Stackelberg games are being used at the Los Angeles International Airport to schedule security checkpoints and canine patrols [Murr 2007; Paruchuri et al. 2008; Pita et al. 2008a]. They could potentially be used in network routing, pricing in transportation systems and many other situations [Korilis et al. 1997; Cardinal et al. 2005]. Although the follower in a Stackelberg game is allowed to observe the leader’s strategy before choosing its own strategy, there is often an advantage for the leader over the case where both players must choose their moves simultaneously. To see the advantage of being the leader in a Stackelberg game, consider the game with the payoff as shown in Table I. The leader is the row player and the follower is the column player. The only pure-strategy Nash equilibrium for this game is when the leader plays a and the follower plays c which gives the leader a payoff of 2. However, if the leader commits to a mixed strategy of playing a and b with equal (0.5) probability, then the follower will play d, leading to an expected payoff for the leader of 3.5. c d
2008_12_teamcore_sigecom_article.pdf
J. Pita, Manish Jain, Craig Western, Christopher Portway, Milind Tambe, Fernando Ordonez, Sarit Kraus, and Praveen Paruchuri. 2008. “Deployed ARMOR protection: The application of a game theoretic model for security at the Los Angeles International Airport .” In International Joint Conference on Autonomous Agents and Multiagent Systems, 2008.Abstract
Security at major locations of economic or political importance is a key concern around the world, particularly given the threat of terrorism. Limited security resources prevent full security coverage at all times, which allows adversaries to observe and exploit patterns in selective patrolling or monitoring, e.g. they can plan an attack avoiding existing patrols. Hence, randomized patrolling or monitoring is important, but randomization must provide distinct weights to different actions based on their complex costs and benefits. To this end, this paper describes a promising transition of the latest in multi-agent algorithms – in fact, an algorithm that represents a culmination of research presented at AAMAS – into a deployed application. In particular, it describes a software assistant agent called ARMOR (Assistant for Randomized Monitoring over Routes) that casts this patrolling/monitoring problem as a Bayesian Stackelberg game, allowing the agent to appropriately weigh the different actions in randomization, as well as uncertainty over adversary types. ARMOR combines three key features: (i) It uses the fastest known solver for Bayesian Stackelberg games called DOBSS, where the dominant mixed strategies enable randomization; (ii) Its mixed-initiative based interface allows users to occasionally adjust or override the automated schedule based on their local constraints; (iii) It alerts the users if mixed-initiative overrides appear to degrade the overall desired randomization. ARMOR has been successfully deployed since August 2007 at the Los Angeles International Airport (LAX) to randomize checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals. This paper examines the information, design choices, challenges, and evaluation that went into designing ARMOR.
2008_7_teamcore_amasind2008final.pdf
Praveen Paruchuri, Jonathan P. Pearce, Janusz Marecki, Milind Tambe, Fernando Ordonez, and Sarit Kraus. 2008. “Efficient Algorithms to solve Bayesian Stackelberg Games for Security Applications .” In National Conference on Artificial Intelligence (AAAI).Abstract
In a class of games known as Stackelberg games, one agent (the leader) must commit to a strategy that can be observed by the other agent (the adversary/follower) before the adversary chooses its own strategy. We consider Bayesian Stackelberg games, in which the leader is uncertain about the type of the adversary it may face. Such games are important in security domains, where, for example, a security agent (leader) must commit to a strategy of patrolling certain areas, and an adversary (follower) can observe this strategy over time before choosing where to attack. We present here two different MIP-formulations, ASAP (providing approximate policies with controlled randomization) and DOBSS (providing optimal policies) for Bayesian Stackelberg games. DOBSS is currently the fastest optimal procedure for Bayesian Stackelberg games and is in use by police at the Los Angeles International Airport(LAX) to schedule their activities.
2008_8_teamcore_nectar08.pdf
M. Tasaki, Y. Yabu, Y. Iwanari, M. Yokoo, M. Tambe, J. Marecki, and P. Varakantham. 2008. “Introducing Communication in Dis-POMDPs with Locality of Interaction .” In IEEE International conference on Intelligent Agent Technology (IAT).Abstract
While Distributed POMDPs have become popular for modeling multiagent systems in uncertain domains, it is the Networked Distributed POMDPs (ND-POMDPs) model — a model tailored to real agent networks — that has begun to scale-up the number of agents. However, prior work in ND-POMDPs has failed to address communication, a shortcoming that has the side-effect of limiting the planning horizon. In particular, without communication, the size of a local policy at each agent within the ND-POMDPs grows exponentially in the time horizon. To overcome this problem, we extend existing algorithms (LID-JESP and SPIDER) so that agents periodically communicate their observation and action histories with each other. After communication, agents can start from new synchronized belief states. While by introducing communication, we can avoid the exponential growth in the size of local policies at agents, the key idea is to avoid an exponential number of synchronized belief states after communication. To this end, we introduce an idea that is similar the Point-based Value Iteration (PBVI) algorithm and approximate the value function with a fixed number of representative points and their α vectors. Our experimental results show that we can obtain much longer policies than existing algorithms as long as the interval between communications is small.
2008_13_teamcore_iat2008_0703.pdf
Emma Bowring, Jonathan P. Pearce, Christopher Portway, Manish Jain, and Milind Tambe. 2008. “On K-Optimal Distributed Constraint Optimization Algorithms: New Bounds and Algorithms .” In The Seventh International Conference on Autonomous Agents and Multiagent Systems.Abstract
Distributed constraint optimization (DCOP) is a promising approach to coordination, scheduling and task allocation in multi agent networks. In large-scale or low-bandwidth networks, finding the global optimum is often impractical. K-optimality is a promising new approach: for the first time it provides us a set of locally optimal algorithms with quality guarantees as a fraction of global optimum. Unfortunately, previous work in k-optimality did not address domains where we may have prior knowledge of reward structure; and it failed to provide quality guarantees or algorithms for domains with hard constraints (such as agents’ local resource constraints). This paper addresses these shortcomings with three key contributions. It provides: (i) improved lower-bounds on k-optima quality incorporating available prior knowledge of reward structure; (ii) lower bounds on k-optima quality for problems with hard constraints; and (iii) k-optimal algorithms for solving DCOPs with hard constraints and detailed experimental results on large-scale networks.
2008_1_teamcore_final.pdf
A. Freedy, O. Sert, E. Freedy, G. Weltman, J. Mcdonough, Milind Tambe, and Tapana Gupta. 2008. “Multiagent adjustable autonomy framework (MAAF) for multirobot, multihuman teams .” In International symposium on collaborative technologies (CTS).Abstract
This paper describes the ongoing development of a Multiagent Adjustable Autonomy Framework (MAAF) for multi-robot, multi-human teams performing tactical maneuvers. The challenge being addressed in this SBIR Phase I R&D project is how to exploit fully the unique capabilities of heterogeneous teams composed of a mixture of Robots, Agents or Persons (RAPs): that is, how to improve the safety, efficiency, reliability and cost of achieving mission goals while maintaining dynamic adaptation to the unique limitations and contingencies of a real-world operating environment. Our response to this challenge is the creation of a new infrastructure that will facilitate cooperative and collaborative performance of human and robots as equal team partners through the application of advances in goal-oriented, multiagent planning and coordination technology. At the heart of our approach is the USC Teamcore Group’s Machinetta, a state-of-the-art robot proxy framework with adjustable autonomy. Machinetta facilitates robot-human role allocation decisions and collaborative sharing of team tasks in the non-deterministic and unpredictable military environment through the use of a domain-independent teamwork model that supports flexible teamwork. This paper presents our innovative proxy architecture and its constituent algorithms, and also describes our initial demonstration of technical feasibility in a realistic simulation scenario.
2008_11_teamcore_maaf_cts2008paper.pdf
Janusz Marecki, Tapana Gupta, Pradeep Varakantham, Milind Tambe, and Makoto Yokoo. 2008. “Not All Agents Are Equal: Scaling up Distributed POMDPs for Agent Networks .” In The Seventh International Conference on Autonomous Agents and Multiagent Systems.Abstract
Many applications of networks of agents, including mobile sensor networks, unmanned air vehicles, autonomous underwater vehicles, involve 100s of agents acting collaboratively under uncertainty. Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are well-suited to address such applications, but so far, only limited scale-ups of up to five agents have been demonstrated. This paper escalates the scale-up, presenting an algorithm called FANS, increasing the number of agents in distributed POMDPs for the first time into double digits. FANS is founded on finite state machines (FSMs) for policy representation and expoits these FSMs to provide three key contributions: (i) Not all agents within an agent network need the same expressivity of policy representation; FANS introduces novel heuristics to automatically vary the FSM size in different agents for scaleup; (ii) FANS illustrates efficient integration of its FSM-based policy search within algorithms that exploit agent network structure; (iii) FANS provides significant speedups in policy evaluation and heuristic computations within the network algorithms by exploiting the FSMs for dynamic programming. Experimental results show not only orders of magnitude improvements over previous best known algorithms for smaller-scale domains (with similar solution quality), but also a scale-up into double digits in terms of numbers of agents.
2008_5_teamcore_fans.pdf
Janusz Marecki. 2008. “Planning with Continuous Resources in Agent Systems ”.Abstract

My research concentrates on developing reasoning techniques for intelligent, autonomous agent systems. In particular, I focus on planning techniques for both single and multi-agent systems acting in uncertain domains. In modeling these domains, I consider two types of uncertainty: (i) the outcomes of agent actions are uncertain and (ii) the amount of resources consumed by agent actions is uncertain and only characterized by continuous probability density functions. Such rich domains, that range from the Mars rover exploration to the unmanned aerial surveillance to the automated disaster rescue operations are commonly modeled as continuous resource Markov decision processes (MDPs) that can then be solved in order to construct policies for agents acting in these domains. This thesis addresses two major unresolved problems in continuous resource MDPs. First, they are very difficult to solve and existing algorithms are either fast, but make additional restrictive assumptions about the model, or do not introduce these assumptions but are very inefficient. Second, continuous resource MDP framework is not directly applicable to multi-agent systems and current approaches all discretize resource levels or assume deterministic resource consumption which automatically invalidates the formal solution quality guarantees. The goal of my thesis is to fundamentally alter this landscape in three contributions:

I first introduce CPH, a fast analytic algorithm for solving continuous resource MDPs. CPH solves the planning problems at hand by first approximating with a desired accuracy the probability distributions over the resource consumptions with phase-type distributions, which use exponential distributions as building blocks. It then uses value iteration to solve the resulting MDPs more efficiently than its closest competitor, and allows for a systematic trade-off of solution quality for speed. Second, to improve the anytime performance of CPH and other continuous resource MDP solvers I introduce the DPFP algorithm. Rather than using value iteration to solve the problem at hand, DPFP performs a forward search in the corresponding dual space of cumulative distribution functions. In doing so, DPFP discriminates in its policy generation effort providing only approximate policies for regions of the state-space reachable with low probability yet it bounds the error that such approximation entails. Third, I introduce CR-DEC-MDP, a framework for planning with continuous resources in multi-agent systems and propose two algorithms for solving CR-DEC-MDPs: The first algorithm (VFP) emphasizes scalability. It performs a series of policy iterations in order to quickly find a locally optimal policy. In contrast, the second algorithm (M-DPFP) stresses optimality; it allows for a systematic trade-off of solution quality for speed by using the concept of DPFP in a multiagent setting. My results show up to three orders of magnitude speedups in solving single agent planning problems and up to one order of magnitude speedup in solving multi-agent planning problems. Furthermore, I demonstrate the practical use of one of my algorithms in a large-scale disaster simulation where it allows for a more efficient rescue operation.

2008_15_teamcore_marecki_j4015754173.pdf
Praveen Paruchuri, Jonathan P. Pearce, Janusz Marecki, Milind Tambe, Fernando Ordonez, and Sarit Kraus. 2008. “Playing games with security: An efficient exact algorithm for Bayesian Stackelberg Games .” In International Joint Conference on Autonomous Agents and Multiagent Systems, 2008.Abstract
In a class of games known as Stackelberg games, one agent (the leader) must commit to a strategy that can be observed by the other agent (the follower or adversary) before the adversary chooses its own strategy. We consider Bayesian Stackelberg games, in which the leader is uncertain about the types of adversary it may face. Such games are important in security domains, where, for example, a security agent (leader) must commit to a strategy of patrolling certain areas, and a robber (follower) has a chance to observe this strategy over time before choosing its own strategy of where to attack. This paper presents an efficient exact algorithm for finding the optimal strategy for the leader to commit to in these games. This algorithm, DOBSS, is based on a novel and compact mixed-integer linear programming formulation. Compared to the most efficient algorithm known previously for this problem, DOBSS is not only faster, but also leads to higher quality solutions, and does not suffer from problems of infeasibility that were faced by this previous algorithm. Note that DOBSS is at the heart of the ARMOR system that is currently being tested for security scheduling at the Los Angeles International Airport.
2008_4_teamcore_aamas_2008.pdf
Nathan Schurr, Janusz Marecki, and Milind Tambe. 2008. “RIAACT: A robust approach to adjustable autonomy for human-multiagent teams .” In The Seventh International Conference on Autonomous Agents and Multiagent Systems.Abstract
When human-multiagent teams act in real-time uncertain domains, adjustable autonomy (dynamic transferring of decisions between human and agents) raises three key challenges. First, the human and agents may differ significantly in their worldviews, leading to inconsistencies in their decisions. Second, these human-multiagent teams must operate and plan in real-time with deadlines with uncertain duration of human actions. Thirdly, adjustable autonomy in teams is an inherently distributed and complex problem that cannot be solved optimally and completely online. To address these challenges, our paper presents a solution for Resolving Inconsistencies in Adjustable Autonomy in Continuous Time (RIAACT). RIAACT incorporates models of the resolution of inconsistencies, continuous time planning techniques, and hybrid method to address coordination complexity. These contributions have been realized in a disaster response simulation system.
2008_6_teamcore_riaact.pdf
Manish Jain, Fernando Ord´onez, James Pita, Christopher Portway, Milind Tambe, Craig Western, Praveen Paruchuri, and Sarit Kraus. 2008. “Robust Solutions in Stackelberg Games: Addressing Boundedly Rational Human Preference Models .” In Association for the Advancement of Artificial Intelligence 4th Multidiciplinary Workshop on Advances in Preference Handling.Abstract
Stackelberg games represent an important class of games in which one player, the leader, commits to a strategy and the remaining players, the followers, make their decision with knowledge of the leader’s commitment. Existing algorithms for Bayesian Stackelberg games find optimal solutions while modeling uncertainty over follower types with an a-priori probability distribution. Unfortunately, in real-world applications, the leader may also face uncertainty over the follower’s response which makes the optimality guarantees of these algorithms fail. Such uncertainty arises because the follower’s specific preferences or the follower’s observations of the leader’s strategy may not align with the rational strategy, and it is not amenable to a-priori probability distributions. These conditions especially hold when dealing with human subjects. To address these uncertainties while providing quality guarantees, we propose three new robust algorithms based on mixed-integer linear programs (MILPs) for Bayesian Stackelberg games. A key result of this paper is a detailed experimental analysis that demonstrates that these new MILPs deal better with human responses: a conclusion based on 800 games with 57 human subjects as followers. We also provide run-time results on these MILPs.
2008_14_teamcore_aaai_preferences_2008.pdf
Jonathan P. Pearce, Milind Tambe, and Rajiv T. Maheswaran. 2008. “Solving multiagent networks using distributed constraint optimization .” AI Magazine 29 (3), Pp. 47-66.Abstract
In many cooperative multiagent domains, the effect of local interactions between agents can be compactly represented as a network structure. Given that agents are spread across such a network, agents directly interact only with a small group of neighbors. A distributed constraint optimization problem (DCOP) is a useful framework to reason about such networks of agents. Given agents’ inability to communicate and collaborate in large groups in such networks, we focus on an approach called k-optimality for solving DCOPs. In this approach, agents form groups of one or more agents until no group of k or fewer agents can possibly improve the DCOP solution; we define this type of local optimum, and any algorithm guaranteed to reach such a local optimum, as k-optimal. The article provides an overview of three key results related to k-optimality. The first set of results are worst-case guarantees on the solution quality of k-optima in a DCOP. These guarantees can help determine an appropriate k-optimal algorithm, or possibly an appropriate constraint graph structure, for agents to use in situations where the cost of coordination between agents must be weighed against the quality of the solution reached. The second set of results are upper bounds on the number of k-optima that can exist in a DCOP. These results are useful in domains where a DCOP must generate a set of solutions rather than single solution. Finally, we sketch algorithms for k-optimality and provide some experimental results for 1-, 2- and 3-optimal algorithms for several types of DCOPs.
2008_2_teamcore_k_optimality_ai_mag_2.pdf

Pages