Publications

2009
Manish Jain, Matthew Taylor, Milind Tambe, and Makoto Yokoo. 2009. “DCOPs Meet the RealWorld: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks .” In International Joint Conference on Artificial Intelligence (IJCAI).Abstract
Buoyed by recent successes in the area of distributed constraint optimization problems (DCOPs), this paper addresses challenges faced when applying DCOPs to real-world domains. Three fundamental challenges must be addressed for a class of real-world domains, requiring novel DCOP algorithms. First, agents may not know the payoff matrix and must explore the environment to determine rewards associated with variable settings. Second, agents may need to maximize total accumulated reward rather than instantaneous final reward. Third, limited time horizons disallow exhaustive exploration of the environment. We propose and implement a set of novel algorithms that combine decision-theoretic exploration approaches with DCOP-mandated coordination. In addition to simulation results, we implement these algorithms on robots, deploying DCOPs on a distributed mobile sensor network.
2009_10_teamcore_09ijcailandroids.pdf
James Pita, Manish Jain, Fernando Ordóñez, Milind Tambe, Sarit Kraus, and Reuma Magori-Cohen. 2009. “Effective Solutions for Real-World Stackelberg Games: When Agents Must Deal with Human Uncertainties .” In The Eighth International Conference on Autonomous Agents and Multiagent Systems.Abstract
How do we build multiagent algorithms for agent interactions with human adversaries? Stackelberg games are natural models for many important applications that involve human interaction, such as oligopolistic markets and security domains. In Stackelberg games, one player, the leader, commits to a strategy and the follower makes their decision with knowledge of the leader’s commitment. Existing algorithms for Stackelberg games efficiently find optimal solutions (leader strategy), but they critically assume that the follower plays optimally. Unfortunately, in real-world applications, agents face human followers who — because of their bounded rationality and limited observation of the leader strategy — may deviate from their expected optimal response. Not taking into account these likely deviations when dealing with human adversaries can cause an unacceptable degradation in the leader’s reward, particularly in security applications where these algorithms have seen real-world deployment. To address this crucial problem, this paper introduces three new mixed-integer linear programs (MILPs) for Stackelberg games to consider human followers, incorporating: (i) novel anchoring theories on human perception of probability distributions and (ii) robustness approaches for MILPs to address human imprecision. Since these new approaches consider human followers, traditional proofs of correctness or optimality are insufficient; instead, it is necessary to rely on empirical validation. To that end, this paper considers two settings based on real deployed security systems, and compares 6 different approaches (three new with three previous approaches), in 4 different observability conditions, involving 98 human subjects playing 1360 games in total. The final conclusion was that a model which incorporates both the ideas of robustness and anchoring achieves statistically significant better rewards and also maintains equivalent or faster solution speeds compared to existing approaches.
2009_2_teamcore_cobra.pdf
Pradeep Varakantham, Jun-young Kwak, Matthew Taylor, Janusz Marecki, Paul Scerri, and Milind Tambe. 2009. “Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping .” In International conference on automated planning and scheduling.Abstract
Distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, but NEXPComplete complexity hinders their scalability and application in real-world domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, an algorithm to solve such distributed POMDPs. The primary novelty of TREMOR is that agents plan individually with a single agent POMDP solver and use social model shaping to implicitly coordinate with other agents. Experiments demonstrate that TREMOR can provide solutions orders of magnitude faster than existing algorithms while achieving comparable, or even superior, solution quality
2009_12_teamcore_icaps09_tremor.pdf
Nathan Schurr, Janusz Marecki, and Milind Tambe. 2009. “Improving Adjustable Autonomy Strategies for Time-Critical Domains .” In The Eighth International Conference on Autonomous Agents and Multiagent Systems .Abstract
As agents begin to perform complex tasks alongside humans as collaborative teammates, it becomes crucial that the resulting humanmultiagent teams adapt to time-critical domains. In such domains, adjustable autonomy has proven useful by allowing for a dynamic transfer of control of decision making between human and agents. However, existing adjustable autonomy algorithms commonly discretize time, which not only results in high algorithm runtimes but also translates into inaccurate transfer of control policies. In addition, existing techniques fail to address decision making inconsistencies often encountered in human multiagent decision making. To address these limitations, we present novel approach for Resolving Inconsistencies in Adjustable Autonomy in Continuous Time (RIAACT) that makes three contributions: First, we apply continuous time planning paradigm to adjustable autonomy, resulting in high-accuracy transfer of control policies. Second, our new adjustable autonomy framework both models and plans for the resolving of inconsistencies between human and agent decisions. Third, we introduce a new model, Interruptible Action Time-dependent Markov Decision Problem (IA-TMDP), which allows for actions to be interrupted at any point in continuous time. We show how to solve IA-TMDPs efficiently and leverage them to plan for the resolving of inconsistencies in RIAACT. Furthermore, these contributions have been realized and evaluated in a complex disaster response simulation system.
2009_6_teamcore_schurr_aamas09.pdf
Jason Tsai, Shyamsunder Rathi, Christopher Kiekintveld, Fernando Ordonez, and Milind Tambe. 2009. “IRIS - A Tool for Strategic Security Allocation in Transportation Networks .” In The Eighth International Conference on Autonomous Agents and Multiagent Systems - Industry Track.Abstract
Security is a concern of major importance to governments and companies throughout the world. With limited resources, complete coverage of potential points of attack is not possible. Deterministic allocation of available law enforcement agents introduces predictable vulnerabilities that can be exploited by adversaries. Strategic randomization is a game theoretic alternative that we implement in Intelligent Randomization In Scheduling (IRIS) system, a software scheduling assistant for the Federal Air Marshals (FAMs) that provide law enforcement aboard U.S. commercial flights. In IRIS, we model the problem as a Stackelberg game, with FAMS as leaders that commit to a flight coverage schedule and terrorists as followers that attempt to attack a flight. The FAMS domain presents three challenges unique to transportation network security that we address in the implementation of IRIS. First, with tens of thousands of commercial flights per day, the size of the Stackelberg game we need to solve is tremendous. We use ERASERC, the fastest known algorithm for solving this class of Stackelberg games. Second, creating the game itself becomes a challenge due to number of payoffs we must enter for these large games. To address this, we create an attribute-based preference elicitation system to determine reward values. Third, the complex scheduling constraints in transportation networks make it computationally prohibitive to model the game by explicitly modeling all combinations of valid schedules. Instead, we model the leader’s strategy space by incorporating a representation of the underlying scheduling constraints. The scheduling assistant has been delivered to the FAMS and is currently undergoing testing and review for possible incorporation into their scheduling practices. In this paper, we discuss the design choices and challenges encountered during the implementation of IRIS.
2009_3_teamcore_aamas_09_industry.pdf
Zhengyu Yin, Christopher Kiekintveld, Atul Kumar, and Milind Tambe. 2009. “Local Optimal Solutions for DCOP: New Criteria, Bound, and Algorithm .” In AAMAS 2009 Workshop on Optimisation in Multi-Agent Systems (OptMas).Abstract
Distributed constraint optimization (DCOP) is a popular formalism for modeling cooperative multi-agent systems. In large-scale networks, finding a global optimum using complete algorithms is often impractical, which leads to the study on incomplete algorithms. Traditionally incomplete algorithms can only find locally optimal solution with no quality guarantees. Recent work on ksize-optimality has established bounds on solution quality, but size is not the only criteria for forming local optimization groups. In addition, there is only one algorithm for computing solutions for arbitrary k and it is quite inefficient. We introduce t-distanceoptimality, which offers an alternative way to specify optimization groups. We establish bounds for this criteria that are often tighter than those for k-optimality. We then introduce an asynchronous local search algorithm for t-distance-optimality. We implement and evaluate the algorithm for both t and k optimality that offer significant improvements over KOPT – the only existing algorithm for ksize-optimality. Our experiment shows t-distance-optimality converges more quickly and to better solutions than k-size-optimality in scale-free graphs, but k-size-optimality has advantages for random graphs.
2009_17_teamcore_topt_yin.pdf
Janusz Marecki and Milind Tambe. 2009. “Planning with Continuous Resources for Agent Teams .” In The Eighth International Conference on Autonomous Agents and Multiagent Systems.Abstract
Many problems of multiagent planning under uncertainty require distributed reasoning with continuous resources and resource limits. Decentralized Markov Decision Problems (Dec-MDPs) are well-suited to address such problems, but unfortunately, prior DecMDP approaches either discretize resources at the expense of speed and quality guarantees, or avoid discretization only by limiting agents’ action choices or interactions (e.g. assumption of transition independence). To address these shortcomings, this paper proposes MDPFP, a novel algorithm for planning with continuous resources for agent teams, with three key features: (i) it maintains the agent team interaction graph to identify and prune the suboptimal policies and to allow the agents to be transition dependent, (ii) it operates in a continuous space of probability functions to provide the error bound on the solution quality and finally (iii) it focuses the search for policies on the most relevant parts of this search space to allow for a systematic trade-off of solution quality for speed. Our experiments show that M-DPFP finds high quality solutions and exhibits superior performance when compared with a discretization-based approach. We also show that M-DPFP is applicable to solving problems that are beyond the scope of existing approaches.
2009_4_teamcore_m_dpfp.pdf
J. Pita, M. Jain, C. Kiekintveld, H. Bellamane, J. Tsai, M. Tambe, and F. Ordonez. 2009. “Security Applications: Lessons of Real-World Deployment .” ACM SIGecom Exchanges, 8, 2.Abstract
Game theory has played an important role in security decisions. Recent work using Stackelberg games [Fudenberg and Tirole 1991] to model security domains has been particularly influential [Basilico et al. 2009; Kiekintveld et al. 2009; Paruchuri et al. 2008; Pita et al. 2008; Pita et al. 2009]. In a Stackelberg game, a leader (in this case the defender) acts first and commits to a randomized security policy. The follower (attacker) optimizes its reward considering the strategy chosen by the leader. These games are well-suited to representing the problem security forces face in allocating limited resources, such as officers, canine units, and checkpoints. In particular, the fact that the attacker is able to observe the policy reflects the way real terrorist organizations plan attacks using extensive surveillance and long planning cycles. Stackelberg game models are not just theoretical models; they are at the heart of deployed decision-support software now in use the the Los Angeles World Airport (LAWA) police and the United States Federal Air Marshals Service (FAMS). A new application is under development for the Transportation Security Administration (TSA), also using game-theoretic analysis. Moving from theoretical analysis to applying game theory in real applications posed many new challenged, and there remain many open questions to be solved in this exciting area of work. In this article we will highlight several of the main issues that have come up, including (i) developing efficient algorithms to solve large-scale Stackelberg Security Games, (ii) evaluating deployed security systems, (iii) knowledge acquisition from security experts to specify the game models, and (iv) handling mixed-initiative interactions. We begin with an overview of the deployed systems and then discuss these issues in turn.
2009_14_teamcore_usc1.pdf
Emma Bowring, Zhengyu Yin, Rob Zinkov, and Milind Tambe. 2009. “Sensitivity analysis for distributed optimization with resource constraints .” In The Eighth International Conference on Autonomous Agents and Multiagent Systems.Abstract
Previous work in multiagent coordination has addressed the challenge of planning in domains where agents must optimize a global goal, while satisfying local resource constraints. However, the imposition of resource constraints naturally raises the question of whether the agents could significantly improve their team performance if a few more resources were made available. Sensitivity analysis aims to answer that question. This paper focuses on sensitivity analysis in the context of the distributed coordination framework, Multiply-Constrained DCOP (MC-DCOP). There are three main challenges in performing sensitivity analysis: (i) to perform it in a distributed fashion, (ii) to avoid re-solving an NP-hard MC-DCOP optimization from scratch, and (iii) to avoid considering unproductive uses for extra resources. To meet these challenges, this paper presents three types of locally optimal algorithms: link analysis, local reoptimization and local constraint propagation. These algorithms are distributed and avoid redundant computation by ascertaining just the effects of local perturbations on the original problem. Deploying our algorithms on a large number of MC-DCOP problems revealed several results. While our cheapest algorithm successfully identified quality improvements for a few problems, our more complex techniques were necessary to identify the best uses for additional resources. Furthermore, we identified two heuristics that can help identify a priori which agents might benefit most from additional resources: density rank, which works well when nodes received identical resources and remaining resource rank, which works well when nodes received resources based on the number of neighbors they had.
2009_5_teamcore_bowring_aamas09.pdf
Jason Tsai, Zhengyu Yin, Jun-young Kwak, David Kempe, Christopher Kiekintveld, and Milind Tambe. 2009. “Strategic Security Placement in Network Domains with Applications to Transit Security .” In In IJCAI 2009 Workshop on Quantitative Risk Analysis for Security Applications.Abstract
Deterministic placement of security personnel creates serious vulnerabilities for any organization attempting to prevent intrusion. Recent work in use at the Los Angeles International Airport (LAX) and in progress with the United States Federal Air Marshal Service (FAMS) has applied game-theoretic analysis to the problem by modeling it as a Stackelberg game wherein security forces are the leaders that commit to a strategy that is observed and countered by attackers. In this work, we explore efficient techniques for performing the same analysis on games with a graph structure, wherein an attacker must follow a path from an entry point to a target. If we frame these problems in the straightforward manner with leader actions being sets of edges that can be guarded and follower actions being paths from entry to targets, the size of the game increases exponentially, quickly reaching memory limitations when using general Stackelberg solvers. In this work, we propose a novel linear program that is able to solve this type of problem efficiently. While it provides exact solutions for games where only one checkpoint is allowed, it is an approximation in the general case. Finally, we compare the performance of this and other methods by generating optimal policies for the Seoul Metropolitan Subway in Seoul, South Korea.
2009_16_teamcore_tsai.pdf
Matthew E. Taylor, Manish Jain, Prateek Tandon, and Milind Tambe. 2009. “Using DCOPs to Balance Exploration and Exploitation in Time-Critical Domains .” In IJCAI 2009 Workshop on Distributed Constraint Reasoning (DCR 2009) .Abstract
Substantial work has investigated balancing exploration and exploitation, but relatively little has addressed this tradeoff in the context of coordinated multi-agent interactions. This paper introduces a class of problems in which agents must maximize their on-line reward, a decomposable function dependent on pairs of agent’s decisions. Unlike previous work, agents must both learn the reward function and exploit it on-line, critical properties for a class of physicallymotivated systems, such as mobile wireless networks. This paper introduces algorithms motivated by the Distributed Constraint Optimization Problem framework and demonstrates when, and at what cost, increasing agents’ coordination can improve the global reward on such problems.
2009_15_teamcore_dcr09_taylor.pdf
James Pita, Manish Jain, Fernando Ordonez, Christopher Portway, Milind Tambe, Craig Western, Praveen Paruchuri, and Sarit Kraus. 2009. “Using Game Theory for Los Angeles Airport Security .” AI Magazine 30 (1), Pp. 43-57.Abstract
Security at major locations of economic or political importance is a key concern around the world, particularly given the threat of terrorism. Limited security resources prevent full security coverage at all times, which allows adversaries to observe and exploit patterns in selective patrolling or monitoring, e.g. they can plan an attack avoiding existing patrols. Hence, randomized patrolling or monitoring is important, but randomization must provide distinct weights to different actions based on their complex costs and benefits. To this end, this paper describes a promising transition of the latest in multi-agent algorithms into a deployed application. In particular, it describes a software assistant agent called ARMOR (Assistant for Randomized Monitoring over Routes) that casts this patrolling/monitoring problem as a Bayesian Stackelberg game, allowing the agent to appropriately weigh the different actions in randomization, as well as uncertainty over adversary types. ARMOR combines two key features: (i) It uses the fastest known solver for Bayesian Stackelberg games called DOBSS, where the dominant mixed strategies enable randomization; (ii) Its mixed-initiative based interface allows users to occasionally adjust or override the automated schedule based on their local constraints. ARMOR has been successfully deployed since August 2007 at the Los Angeles International Airport (LAX) to randomize checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals. This paper examines the information, design choices, challenges, and evaluation that went into designing ARMOR.
2009_9_teamcore_ai_magazine.pdf
P. Paruchuri, J. Pearce, J. Marecki, M. Tambe, F. Ordonez, and S. Kraus. 2009. “Coordinating randomized policies for increasing security of agent systems.” Journal of Information Technology and Management (ITM), 10, 1, Pp. 67-79.Abstract
We consider the problem of providing decision support to a patrolling or security service in an adversarial domain. The idea is to create patrols that can achieve a high level of coverage or reward while taking into account the presence of an adversary. We assume that the adversary can learn or observe the patrolling strategy and use this to its advantage. We follow two different approaches depending on what is known about the adversary. If there is no information about the adversary we use a Markov Decision Process (MDP) to represent patrols and identify randomized solutions that minimize the information available to the adversary. This lead to the development of algorithms CRLP and BRLP, for policy randomization of MDPs. Second, when there is partial information about the adversary we decide on efficient patrols by solving a Bayesian Stackelberg game. Here, the leader decides first on a patrolling strategy and then an adversary, of possibly many adversary types, selects its best response for the given patrol. We provide two efficient MIP formulations named DOBSS and ASAP to solve this NP-hard problem. Our experimental results show the efficiency of these algorithms and illustrate how these techniques provide optimal and secure patrolling policies. Note that DOBSS is at the heart of the ARMOR system that is currently deployed at the Los Angeles International airport (LAX) for randomizing checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals. Key words: Multiagent Systems, Decision Theory, Game Theory, Security, Randomized Policies
2009_1_teamcore_waid_journal.pdf
2008
J. Pita, Manish Jain, Fernando Ordonez, Christopher Portway, Milind Tambe, Craig Western, Praveen Paruchuri, and Sarit Kraus. 2008. “ARMOR Security for Los Angeles International Airport .” In AAAI Intelligent Systems Demonstrations.Abstract
Security at major locations of economic or political importance is a key concern around the world, particularly given the threat of terrorism. Limited security resources prevent full security coverage at all times, which allows adversaries to observe and exploit patterns in selective patrolling or monitoring, e.g. they can plan an attack avoiding existing patrols. Hence, randomized patrolling or monitoring is important, but randomization must provide distinct weights to different actions based on their complex costs and benefits. To this end, this demonstration showcases a promising transition of the latest in multi-agent algorithms into a deployed application. In particular, it exhibits a software assistant agent called ARMOR (Assistant for Randomized Monitoring over Routes) that casts this patrolling/monitoring problem as a Bayesian Stackelberg game, allowing the agent to appropriately weigh the different actions in randomization, as well as uncertainty over adversary types. ARMOR combines two key features: (i) It uses the fastest known solver for Bayesian Stackelberg games called DOBSS, where the dominant mixed strategies enable randomization; (ii) Its mixed-initiative based interface allows users to occasionally adjust or override the automated schedule based on their local constraints. ARMOR has been successfully deployed since August 2007 at the Los Angeles International Airport (LAX) to randomize checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals.
2008_9_teamcore_aaai_demo_08.pdf
Manish Jain, J Pita, Milind Tambe, Fernando Ordonez, Praveen Paruchuri, and Sarit Kraus. 2008. “Bayesian Stackelberg Games and their Application for Security at Los Angeles International Airport .” In SIGecom Exchanges.Abstract
Many multiagent settings are appropriately modeled as Stackelberg games [Fudenberg and Tirole 1991; Paruchuri et al. 2007], where a leader commits to a strategy first, and then a follower selfishly optimizes its own reward, considering the action chosen by the leader. Stackelberg games are commonly used to model attacker-defender scenarios in security domains [Brown et al. 2006] as well as in patrolling [Paruchuri et al. 2007; Paruchuri et al. 2008]. For example, security personnel patrolling an infrastructure commit to a patrolling strategy first, before their adversaries act taking this committed strategy into account. Indeed, Stackelberg games are being used at the Los Angeles International Airport to schedule security checkpoints and canine patrols [Murr 2007; Paruchuri et al. 2008; Pita et al. 2008a]. They could potentially be used in network routing, pricing in transportation systems and many other situations [Korilis et al. 1997; Cardinal et al. 2005]. Although the follower in a Stackelberg game is allowed to observe the leader’s strategy before choosing its own strategy, there is often an advantage for the leader over the case where both players must choose their moves simultaneously. To see the advantage of being the leader in a Stackelberg game, consider the game with the payoff as shown in Table I. The leader is the row player and the follower is the column player. The only pure-strategy Nash equilibrium for this game is when the leader plays a and the follower plays c which gives the leader a payoff of 2. However, if the leader commits to a mixed strategy of playing a and b with equal (0.5) probability, then the follower will play d, leading to an expected payoff for the leader of 3.5. c d
2008_12_teamcore_sigecom_article.pdf
J. Pita, Manish Jain, Craig Western, Christopher Portway, Milind Tambe, Fernando Ordonez, Sarit Kraus, and Praveen Paruchuri. 2008. “Deployed ARMOR protection: The application of a game theoretic model for security at the Los Angeles International Airport .” In International Joint Conference on Autonomous Agents and Multiagent Systems, 2008.Abstract
Security at major locations of economic or political importance is a key concern around the world, particularly given the threat of terrorism. Limited security resources prevent full security coverage at all times, which allows adversaries to observe and exploit patterns in selective patrolling or monitoring, e.g. they can plan an attack avoiding existing patrols. Hence, randomized patrolling or monitoring is important, but randomization must provide distinct weights to different actions based on their complex costs and benefits. To this end, this paper describes a promising transition of the latest in multi-agent algorithms – in fact, an algorithm that represents a culmination of research presented at AAMAS – into a deployed application. In particular, it describes a software assistant agent called ARMOR (Assistant for Randomized Monitoring over Routes) that casts this patrolling/monitoring problem as a Bayesian Stackelberg game, allowing the agent to appropriately weigh the different actions in randomization, as well as uncertainty over adversary types. ARMOR combines three key features: (i) It uses the fastest known solver for Bayesian Stackelberg games called DOBSS, where the dominant mixed strategies enable randomization; (ii) Its mixed-initiative based interface allows users to occasionally adjust or override the automated schedule based on their local constraints; (iii) It alerts the users if mixed-initiative overrides appear to degrade the overall desired randomization. ARMOR has been successfully deployed since August 2007 at the Los Angeles International Airport (LAX) to randomize checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals. This paper examines the information, design choices, challenges, and evaluation that went into designing ARMOR.
2008_7_teamcore_amasind2008final.pdf
Praveen Paruchuri, Jonathan P. Pearce, Janusz Marecki, Milind Tambe, Fernando Ordonez, and Sarit Kraus. 2008. “Efficient Algorithms to solve Bayesian Stackelberg Games for Security Applications .” In National Conference on Artificial Intelligence (AAAI).Abstract
In a class of games known as Stackelberg games, one agent (the leader) must commit to a strategy that can be observed by the other agent (the adversary/follower) before the adversary chooses its own strategy. We consider Bayesian Stackelberg games, in which the leader is uncertain about the type of the adversary it may face. Such games are important in security domains, where, for example, a security agent (leader) must commit to a strategy of patrolling certain areas, and an adversary (follower) can observe this strategy over time before choosing where to attack. We present here two different MIP-formulations, ASAP (providing approximate policies with controlled randomization) and DOBSS (providing optimal policies) for Bayesian Stackelberg games. DOBSS is currently the fastest optimal procedure for Bayesian Stackelberg games and is in use by police at the Los Angeles International Airport(LAX) to schedule their activities.
2008_8_teamcore_nectar08.pdf
M. Tasaki, Y. Yabu, Y. Iwanari, M. Yokoo, M. Tambe, J. Marecki, and P. Varakantham. 2008. “Introducing Communication in Dis-POMDPs with Locality of Interaction .” In IEEE International conference on Intelligent Agent Technology (IAT).Abstract
While Distributed POMDPs have become popular for modeling multiagent systems in uncertain domains, it is the Networked Distributed POMDPs (ND-POMDPs) model — a model tailored to real agent networks — that has begun to scale-up the number of agents. However, prior work in ND-POMDPs has failed to address communication, a shortcoming that has the side-effect of limiting the planning horizon. In particular, without communication, the size of a local policy at each agent within the ND-POMDPs grows exponentially in the time horizon. To overcome this problem, we extend existing algorithms (LID-JESP and SPIDER) so that agents periodically communicate their observation and action histories with each other. After communication, agents can start from new synchronized belief states. While by introducing communication, we can avoid the exponential growth in the size of local policies at agents, the key idea is to avoid an exponential number of synchronized belief states after communication. To this end, we introduce an idea that is similar the Point-based Value Iteration (PBVI) algorithm and approximate the value function with a fixed number of representative points and their α vectors. Our experimental results show that we can obtain much longer policies than existing algorithms as long as the interval between communications is small.
2008_13_teamcore_iat2008_0703.pdf
Emma Bowring, Jonathan P. Pearce, Christopher Portway, Manish Jain, and Milind Tambe. 2008. “On K-Optimal Distributed Constraint Optimization Algorithms: New Bounds and Algorithms .” In The Seventh International Conference on Autonomous Agents and Multiagent Systems.Abstract
Distributed constraint optimization (DCOP) is a promising approach to coordination, scheduling and task allocation in multi agent networks. In large-scale or low-bandwidth networks, finding the global optimum is often impractical. K-optimality is a promising new approach: for the first time it provides us a set of locally optimal algorithms with quality guarantees as a fraction of global optimum. Unfortunately, previous work in k-optimality did not address domains where we may have prior knowledge of reward structure; and it failed to provide quality guarantees or algorithms for domains with hard constraints (such as agents’ local resource constraints). This paper addresses these shortcomings with three key contributions. It provides: (i) improved lower-bounds on k-optima quality incorporating available prior knowledge of reward structure; (ii) lower bounds on k-optima quality for problems with hard constraints; and (iii) k-optimal algorithms for solving DCOPs with hard constraints and detailed experimental results on large-scale networks.
2008_1_teamcore_final.pdf
A. Freedy, O. Sert, E. Freedy, G. Weltman, J. Mcdonough, Milind Tambe, and Tapana Gupta. 2008. “Multiagent adjustable autonomy framework (MAAF) for multirobot, multihuman teams .” In International symposium on collaborative technologies (CTS).Abstract
This paper describes the ongoing development of a Multiagent Adjustable Autonomy Framework (MAAF) for multi-robot, multi-human teams performing tactical maneuvers. The challenge being addressed in this SBIR Phase I R&D project is how to exploit fully the unique capabilities of heterogeneous teams composed of a mixture of Robots, Agents or Persons (RAPs): that is, how to improve the safety, efficiency, reliability and cost of achieving mission goals while maintaining dynamic adaptation to the unique limitations and contingencies of a real-world operating environment. Our response to this challenge is the creation of a new infrastructure that will facilitate cooperative and collaborative performance of human and robots as equal team partners through the application of advances in goal-oriented, multiagent planning and coordination technology. At the heart of our approach is the USC Teamcore Group’s Machinetta, a state-of-the-art robot proxy framework with adjustable autonomy. Machinetta facilitates robot-human role allocation decisions and collaborative sharing of team tasks in the non-deterministic and unpredictable military environment through the use of a domain-independent teamwork model that supports flexible teamwork. This paper presents our innovative proxy architecture and its constituent algorithms, and also describes our initial demonstration of technical feasibility in a realistic simulation scenario.
2008_11_teamcore_maaf_cts2008paper.pdf

Pages