Matthew E. Taylor, Chris Kiekintveld, Craig Western, and Milind Tambe. 7/2009. “Is There a Chink in Your ARMOR? Towards Robust Evaluations for Deployed Security Systems .” In IJCAI 2009 Workshop on Quantitative Risk Analysis for Security Applications. QRASA-7/2009. Abstract
A growing number of security applications, designed to reduce risk from adversaries’ actions, are being developed and deployed. However, there are many challenges when attempting to evaluate such systems, both in the lab and in the real world. Traditional evaluations used by computer scientists, such as runtime analysis and optimality proofs, may be largely irrelevant. The primary contribution of this paper is to provide a preliminary framework which can guide the evaluation of such systems and to apply the framework to the evaluation of ARMOR (a system deployed at LAX since August 2007). This framework helps determine what experiments could, and should, be run in order to measure a system’s overall utility. A secondary contribution of this paper is to help familiarize our community with some of the difficulties inherent in evaluating deployed applications, focusing on those in security domains.
Jason Tsai, Emma Bowring, Shira Epstein, Natalie Fridman, Prakhar Garg, Gal Kaminka, Andrew Ogden, Milind Tambe, and Matthew Taylor. 2009. “Agent-based Evacuation Modeling: Simulating the Los Angeles International Airport .” In Workshop on Emergency Management: Incident, Resource, and Supply Chain Management (EMWS09).Abstract
In the aftermath of a terrorist attack on a large public venue such as an airport, a train station, or a theme park, rapid but safe evacuation is critical. For example, multiple IED explosions at the Los Angeles International Airport (LAX) will require evacuation of thousands of travelers and airport staff, while mitigating the risks from possible secondary explosions. In such cases, it is crucial to obtain information on the fastest evacuation routes, the time needed to evacuate people, the lag between the disaster and the arrival of a bomb squad or other emergency personnel, and what tactics and procedures authorities can use to avoid stampedes, confusion, and loss of life. By understanding possible scenarios in simulation before hand, emergency personnel can be trained so that they respond in the event of an actual evacuation. Unfortunately, conducting live exercises for evacuating thousands of people is generally impossible. It would be time-consuming and would result in tremendous lost revenue. Moreover, a staged evacuation would necessarily miss crucial aspects of the required response (e.g. fear, confusion) on the part of both emergency personnel and crowd evacuees, or the exercise would be considered unethical. Smaller scale evacuation exercises miss the most important feature of an evacuation: its massive scale. Simulations provide one attractive answer. Evacuation simulations allow us to meet the required training, evacuation planning, and tactics development goals; by running large numbers of “faster than real-time” simulations, we can obtain data from large numbers of scenarios that would be near-impossible in live exercises. This information can be used by policy-makers to predetermine optimal solutions to timecritical situations such as those involving injuries, IEDs, or active shooters. Additionally, real-time simulations can be created for officers on the ground who may only see a handful of real emergencies in their careers and thus could benefit immensely from repeated scenario-style training tools to learn with. In building these simulations, we bring to bear over two-decades of experience in agent-based simulations, including battlefield simulations of battalions of virtual helicopter pilots or teams of virtual fighter pilots for DARPA’s Synthetic Theater of War program and disaster response simulations in collaboration with the Los Angeles Fire Department.
J. Pita, M. Jain, C. Western, P. Paruchuri, J. Marecki, M. Tambe, F. Ordonez, and S. Kraus. 2009. “ARMOR Software: A game theoretic approach to airport security .” In Protecting Airline Passengers in the Age of Terrorism, edited by P. Seidenstat. Praeger Publishers.Abstract
Protecting national infrastructure such as airports, is a challenging task for police and security agencies around the world; a challenge that is exacerbated by the threat of terrorism. Such protection of these important locations includes, but is not limited to, tasks such as monitoring all entrances or inbound roads and checking inbound traffic. However, limited resources imply that it is typically impossible to provide full security coverage at all times. Furthermore, adversaries can observe security arrangements over time and exploit any predictable patterns to their advantage. Randomizing schedules for patrolling, checking, or monitoring is thus an important tool in the police arsenal to avoid the vulnerability that comes with predictability. This chapter focuses on a deployed software assistant agent that can aid police or other security agencies in randomizing their security schedules. We face at least three key challenges in building such a software assistant. First, the assistant must provide quality guarantees in randomization by appropriately weighing the costs and benefits of the different options available. For example, if an attack on one part of an infrastructure will cause economic damage while an attack on another could potentially cost human lives, we must weigh the two options differently – giving higher weight (probability) to guarding the latter. Second, the assistant must address the uncertainty in information that security forces have about the adversary. Third, the assistant must enable a mixed-initiative interaction with potential users rather than dictating a schedule; the assistant may be unaware of users’ real-world constraints and hence users must be able to shape the schedule development. We have addressed these challenges in a software assistant agent called ARMOR (Assistant for Randomized Monitoring over Routes). Based on game-theoretic principles, ARMOR combines three key features to address each of the challenges outlined above. Game theory is a well-established foundational principle within multi-agent systems to reason about multiple agents each pursuing their own interests (Fudenberg & Tirole 1991). We build on these game theoretic foundations to reason about two agents – the police force and their adversary – in providing a method of randomization. In particular, the main contribution of our work is mapping the problem of security scheduling as a Bayesian Stackelberg game (Conitzer & Sandholm 2006) and solving it within our software system using the fastest optimal algorithm for such games (Paruchuri et al. 2008), addressing the first two challenges. While a Bayesian game allows us to address uncertainty over adversary types, by optimally solving such Bayesian Stackelberg games (which yield optimal randomized strategies as solutions), ARMOR provides quality guarantees on the schedules generated. The algorithm used builds on several years of research regarding multi-agent systems and security (Paruchuri et al. 205; 2006; 2007). ARMOR employs an algorithm that is a logical culmination of this line of research; in particular, ARMOR relies on an optimal algorithm called DOBSS (Decomposed Optimal Bayesian Stackelberg Solver) (Paruchuri et al. 2008). The third challenge is addressed by ARMOR’s use of a mixed-initiative based interface, where users are allowed to graphically enter different constraints to shape the schedule generated. ARMOR is thus a collaborative assistant that iterates over generated schedules rather than a rigid one-shot scheduler. ARMOR also alerts users in case overrides may potentially deteriorate schedule quality. ARMOR therefore represents a very promising transition of multi-agent research into a deployed application. ARMOR has been successfully deployed since August 2007 at the Los Angeles International Airport (LAX) to assist the Los Angeles World Airport (LAWA) police in randomized scheduling of checkpoints, and since November 2007 for generating randomized patrolling schedules for canine units. In particular, it assists police in determining where to randomly set up checkpoints and where to randomly allocate canines to terminals. Indeed, February 2008 marked the successful end of the six month trial period of ARMOR deployment at LAX. The feedback from police at the end of this six month period is extremely positive; ARMOR will continue to be deployed at LAX and expand to other police activities at LAX.
Emma Bowring and Milind Tambe. 2009. “Bridging the Gap: Introducing Agents and Multiagent Systems to Undergraduate Students .” In Educational Uses of Multi-Agent Systems (EDUMAS).Abstract
The field of “intelligent agents and multiagent systems” is maturing; no longer is it a special topic to be introduced to graduate students after years of training in computer science and many introductory courses in Artificial Intelligence. Instead, time is ripe to face the challenge of introducing agents and multiagents directly to undergraduate students, whether majoring in computer science or not. This paper focuses on exactly this challenge, drawing on the co-authors’ experience of teaching several such undergraduate courses on agents and multiagents, over the last three years at two different universities. The paper outlines three key issues that must be addressed. The first issue is facilitating students’ intuitive understanding of fundamental concepts of multiagent systems; we illustrate uses of science fiction materials and classroom games to not only provide students with the necessary intuitive understanding but with the excitement and motivation for studying multiagent systems. The second is in selecting the right material — either science-fiction material or games — for providing students the necessary motivation and intuition; we outline several criteria that have been useful in selecting such material. The third issue is in educating students about the fundamental philosophical, ethical and social issues surrounding agents and multiagent systems: we outline course materials and classroom activities that allow students to obtain this “big picture” futuristic vision of our science. We conclude with feedback received, lessons learned and impact on both the computer science students and non computer-science students.
Christopher Kiekintveld, Manish Jain, Jason Tsai, James Pita, Fernando Ordóñez, and Milind Tambe. 2009. “Computing Optimal Randomized Resource Allocations for Massive Security Games .” In The Eighth International Conference on Autonomous Agents and Multiagent Systems.Abstract
Predictable allocations of security resources such as police officers, canine units, or checkpoints are vulnerable to exploitation by attackers. Recent work has applied game-theoretic methods to find optimal randomized security policies, including a fielded application at the Los Angeles International Airport (LAX). This approach has promising applications in many similar domains, including police patrolling for subway and bus systems, randomized baggage screening, and scheduling for the Federal Air Marshal Service (FAMS) on commercial flights. However, the existing methods scale poorly when the security policy requires coordination of many resources, which is central to many of these potential applications. We develop new models and algorithms that scale to much more complex instances of security games. The key idea is to use a compact model of security games, which allows exponential improvements in both memory and runtime relative to the best known algorithms for solving general Stackelberg games. We develop even faster algorithms for security games under payoff restrictions that are natural in many security domains. Finally, introduce additional realistic scheduling constraints while retaining comparable performance improvements. The empirical evaluation comprises both random data and realistic instances of the FAMS and LAX problems. Our new methods scale to problems several orders of magnitude larger than the fastest known algorithm.
Manish Jain, Matthew Taylor, Milind Tambe, and Makoto Yokoo. 2009. “DCOPs Meet the RealWorld: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks .” In International Joint Conference on Artificial Intelligence (IJCAI).Abstract
Buoyed by recent successes in the area of distributed constraint optimization problems (DCOPs), this paper addresses challenges faced when applying DCOPs to real-world domains. Three fundamental challenges must be addressed for a class of real-world domains, requiring novel DCOP algorithms. First, agents may not know the payoff matrix and must explore the environment to determine rewards associated with variable settings. Second, agents may need to maximize total accumulated reward rather than instantaneous final reward. Third, limited time horizons disallow exhaustive exploration of the environment. We propose and implement a set of novel algorithms that combine decision-theoretic exploration approaches with DCOP-mandated coordination. In addition to simulation results, we implement these algorithms on robots, deploying DCOPs on a distributed mobile sensor network.
James Pita, Manish Jain, Fernando Ordóñez, Milind Tambe, Sarit Kraus, and Reuma Magori-Cohen. 2009. “Effective Solutions for Real-World Stackelberg Games: When Agents Must Deal with Human Uncertainties .” In The Eighth International Conference on Autonomous Agents and Multiagent Systems.Abstract
How do we build multiagent algorithms for agent interactions with human adversaries? Stackelberg games are natural models for many important applications that involve human interaction, such as oligopolistic markets and security domains. In Stackelberg games, one player, the leader, commits to a strategy and the follower makes their decision with knowledge of the leader’s commitment. Existing algorithms for Stackelberg games efficiently find optimal solutions (leader strategy), but they critically assume that the follower plays optimally. Unfortunately, in real-world applications, agents face human followers who — because of their bounded rationality and limited observation of the leader strategy — may deviate from their expected optimal response. Not taking into account these likely deviations when dealing with human adversaries can cause an unacceptable degradation in the leader’s reward, particularly in security applications where these algorithms have seen real-world deployment. To address this crucial problem, this paper introduces three new mixed-integer linear programs (MILPs) for Stackelberg games to consider human followers, incorporating: (i) novel anchoring theories on human perception of probability distributions and (ii) robustness approaches for MILPs to address human imprecision. Since these new approaches consider human followers, traditional proofs of correctness or optimality are insufficient; instead, it is necessary to rely on empirical validation. To that end, this paper considers two settings based on real deployed security systems, and compares 6 different approaches (three new with three previous approaches), in 4 different observability conditions, involving 98 human subjects playing 1360 games in total. The final conclusion was that a model which incorporates both the ideas of robustness and anchoring achieves statistically significant better rewards and also maintains equivalent or faster solution speeds compared to existing approaches.
Pradeep Varakantham, Jun-young Kwak, Matthew Taylor, Janusz Marecki, Paul Scerri, and Milind Tambe. 2009. “Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping .” In International conference on automated planning and scheduling.Abstract
Distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, but NEXPComplete complexity hinders their scalability and application in real-world domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, an algorithm to solve such distributed POMDPs. The primary novelty of TREMOR is that agents plan individually with a single agent POMDP solver and use social model shaping to implicitly coordinate with other agents. Experiments demonstrate that TREMOR can provide solutions orders of magnitude faster than existing algorithms while achieving comparable, or even superior, solution quality
Nathan Schurr, Janusz Marecki, and Milind Tambe. 2009. “Improving Adjustable Autonomy Strategies for Time-Critical Domains .” In The Eighth International Conference on Autonomous Agents and Multiagent Systems .Abstract
As agents begin to perform complex tasks alongside humans as collaborative teammates, it becomes crucial that the resulting humanmultiagent teams adapt to time-critical domains. In such domains, adjustable autonomy has proven useful by allowing for a dynamic transfer of control of decision making between human and agents. However, existing adjustable autonomy algorithms commonly discretize time, which not only results in high algorithm runtimes but also translates into inaccurate transfer of control policies. In addition, existing techniques fail to address decision making inconsistencies often encountered in human multiagent decision making. To address these limitations, we present novel approach for Resolving Inconsistencies in Adjustable Autonomy in Continuous Time (RIAACT) that makes three contributions: First, we apply continuous time planning paradigm to adjustable autonomy, resulting in high-accuracy transfer of control policies. Second, our new adjustable autonomy framework both models and plans for the resolving of inconsistencies between human and agent decisions. Third, we introduce a new model, Interruptible Action Time-dependent Markov Decision Problem (IA-TMDP), which allows for actions to be interrupted at any point in continuous time. We show how to solve IA-TMDPs efficiently and leverage them to plan for the resolving of inconsistencies in RIAACT. Furthermore, these contributions have been realized and evaluated in a complex disaster response simulation system.
Jason Tsai, Shyamsunder Rathi, Christopher Kiekintveld, Fernando Ordonez, and Milind Tambe. 2009. “IRIS - A Tool for Strategic Security Allocation in Transportation Networks .” In The Eighth International Conference on Autonomous Agents and Multiagent Systems - Industry Track.Abstract
Security is a concern of major importance to governments and companies throughout the world. With limited resources, complete coverage of potential points of attack is not possible. Deterministic allocation of available law enforcement agents introduces predictable vulnerabilities that can be exploited by adversaries. Strategic randomization is a game theoretic alternative that we implement in Intelligent Randomization In Scheduling (IRIS) system, a software scheduling assistant for the Federal Air Marshals (FAMs) that provide law enforcement aboard U.S. commercial flights. In IRIS, we model the problem as a Stackelberg game, with FAMS as leaders that commit to a flight coverage schedule and terrorists as followers that attempt to attack a flight. The FAMS domain presents three challenges unique to transportation network security that we address in the implementation of IRIS. First, with tens of thousands of commercial flights per day, the size of the Stackelberg game we need to solve is tremendous. We use ERASERC, the fastest known algorithm for solving this class of Stackelberg games. Second, creating the game itself becomes a challenge due to number of payoffs we must enter for these large games. To address this, we create an attribute-based preference elicitation system to determine reward values. Third, the complex scheduling constraints in transportation networks make it computationally prohibitive to model the game by explicitly modeling all combinations of valid schedules. Instead, we model the leader’s strategy space by incorporating a representation of the underlying scheduling constraints. The scheduling assistant has been delivered to the FAMS and is currently undergoing testing and review for possible incorporation into their scheduling practices. In this paper, we discuss the design choices and challenges encountered during the implementation of IRIS.
Zhengyu Yin, Christopher Kiekintveld, Atul Kumar, and Milind Tambe. 2009. “Local Optimal Solutions for DCOP: New Criteria, Bound, and Algorithm .” In AAMAS 2009 Workshop on Optimisation in Multi-Agent Systems (OptMas).Abstract
Distributed constraint optimization (DCOP) is a popular formalism for modeling cooperative multi-agent systems. In large-scale networks, finding a global optimum using complete algorithms is often impractical, which leads to the study on incomplete algorithms. Traditionally incomplete algorithms can only find locally optimal solution with no quality guarantees. Recent work on ksize-optimality has established bounds on solution quality, but size is not the only criteria for forming local optimization groups. In addition, there is only one algorithm for computing solutions for arbitrary k and it is quite inefficient. We introduce t-distanceoptimality, which offers an alternative way to specify optimization groups. We establish bounds for this criteria that are often tighter than those for k-optimality. We then introduce an asynchronous local search algorithm for t-distance-optimality. We implement and evaluate the algorithm for both t and k optimality that offer significant improvements over KOPT – the only existing algorithm for ksize-optimality. Our experiment shows t-distance-optimality converges more quickly and to better solutions than k-size-optimality in scale-free graphs, but k-size-optimality has advantages for random graphs.
Janusz Marecki and Milind Tambe. 2009. “Planning with Continuous Resources for Agent Teams .” In The Eighth International Conference on Autonomous Agents and Multiagent Systems.Abstract
Many problems of multiagent planning under uncertainty require distributed reasoning with continuous resources and resource limits. Decentralized Markov Decision Problems (Dec-MDPs) are well-suited to address such problems, but unfortunately, prior DecMDP approaches either discretize resources at the expense of speed and quality guarantees, or avoid discretization only by limiting agents’ action choices or interactions (e.g. assumption of transition independence). To address these shortcomings, this paper proposes MDPFP, a novel algorithm for planning with continuous resources for agent teams, with three key features: (i) it maintains the agent team interaction graph to identify and prune the suboptimal policies and to allow the agents to be transition dependent, (ii) it operates in a continuous space of probability functions to provide the error bound on the solution quality and finally (iii) it focuses the search for policies on the most relevant parts of this search space to allow for a systematic trade-off of solution quality for speed. Our experiments show that M-DPFP finds high quality solutions and exhibits superior performance when compared with a discretization-based approach. We also show that M-DPFP is applicable to solving problems that are beyond the scope of existing approaches.
J. Pita, M. Jain, C. Kiekintveld, H. Bellamane, J. Tsai, M. Tambe, and F. Ordonez. 2009. “Security Applications: Lessons of Real-World Deployment .” ACM SIGecom Exchanges, 8, 2.Abstract
Game theory has played an important role in security decisions. Recent work using Stackelberg games [Fudenberg and Tirole 1991] to model security domains has been particularly influential [Basilico et al. 2009; Kiekintveld et al. 2009; Paruchuri et al. 2008; Pita et al. 2008; Pita et al. 2009]. In a Stackelberg game, a leader (in this case the defender) acts first and commits to a randomized security policy. The follower (attacker) optimizes its reward considering the strategy chosen by the leader. These games are well-suited to representing the problem security forces face in allocating limited resources, such as officers, canine units, and checkpoints. In particular, the fact that the attacker is able to observe the policy reflects the way real terrorist organizations plan attacks using extensive surveillance and long planning cycles. Stackelberg game models are not just theoretical models; they are at the heart of deployed decision-support software now in use the the Los Angeles World Airport (LAWA) police and the United States Federal Air Marshals Service (FAMS). A new application is under development for the Transportation Security Administration (TSA), also using game-theoretic analysis. Moving from theoretical analysis to applying game theory in real applications posed many new challenged, and there remain many open questions to be solved in this exciting area of work. In this article we will highlight several of the main issues that have come up, including (i) developing efficient algorithms to solve large-scale Stackelberg Security Games, (ii) evaluating deployed security systems, (iii) knowledge acquisition from security experts to specify the game models, and (iv) handling mixed-initiative interactions. We begin with an overview of the deployed systems and then discuss these issues in turn.
Emma Bowring, Zhengyu Yin, Rob Zinkov, and Milind Tambe. 2009. “Sensitivity analysis for distributed optimization with resource constraints .” In The Eighth International Conference on Autonomous Agents and Multiagent Systems.Abstract
Previous work in multiagent coordination has addressed the challenge of planning in domains where agents must optimize a global goal, while satisfying local resource constraints. However, the imposition of resource constraints naturally raises the question of whether the agents could significantly improve their team performance if a few more resources were made available. Sensitivity analysis aims to answer that question. This paper focuses on sensitivity analysis in the context of the distributed coordination framework, Multiply-Constrained DCOP (MC-DCOP). There are three main challenges in performing sensitivity analysis: (i) to perform it in a distributed fashion, (ii) to avoid re-solving an NP-hard MC-DCOP optimization from scratch, and (iii) to avoid considering unproductive uses for extra resources. To meet these challenges, this paper presents three types of locally optimal algorithms: link analysis, local reoptimization and local constraint propagation. These algorithms are distributed and avoid redundant computation by ascertaining just the effects of local perturbations on the original problem. Deploying our algorithms on a large number of MC-DCOP problems revealed several results. While our cheapest algorithm successfully identified quality improvements for a few problems, our more complex techniques were necessary to identify the best uses for additional resources. Furthermore, we identified two heuristics that can help identify a priori which agents might benefit most from additional resources: density rank, which works well when nodes received identical resources and remaining resource rank, which works well when nodes received resources based on the number of neighbors they had.
Jason Tsai, Zhengyu Yin, Jun-young Kwak, David Kempe, Christopher Kiekintveld, and Milind Tambe. 2009. “Strategic Security Placement in Network Domains with Applications to Transit Security .” In In IJCAI 2009 Workshop on Quantitative Risk Analysis for Security Applications.Abstract
Deterministic placement of security personnel creates serious vulnerabilities for any organization attempting to prevent intrusion. Recent work in use at the Los Angeles International Airport (LAX) and in progress with the United States Federal Air Marshal Service (FAMS) has applied game-theoretic analysis to the problem by modeling it as a Stackelberg game wherein security forces are the leaders that commit to a strategy that is observed and countered by attackers. In this work, we explore efficient techniques for performing the same analysis on games with a graph structure, wherein an attacker must follow a path from an entry point to a target. If we frame these problems in the straightforward manner with leader actions being sets of edges that can be guarded and follower actions being paths from entry to targets, the size of the game increases exponentially, quickly reaching memory limitations when using general Stackelberg solvers. In this work, we propose a novel linear program that is able to solve this type of problem efficiently. While it provides exact solutions for games where only one checkpoint is allowed, it is an approximation in the general case. Finally, we compare the performance of this and other methods by generating optimal policies for the Seoul Metropolitan Subway in Seoul, South Korea.
Matthew E. Taylor, Manish Jain, Prateek Tandon, and Milind Tambe. 2009. “Using DCOPs to Balance Exploration and Exploitation in Time-Critical Domains .” In IJCAI 2009 Workshop on Distributed Constraint Reasoning (DCR 2009) .Abstract
Substantial work has investigated balancing exploration and exploitation, but relatively little has addressed this tradeoff in the context of coordinated multi-agent interactions. This paper introduces a class of problems in which agents must maximize their on-line reward, a decomposable function dependent on pairs of agent’s decisions. Unlike previous work, agents must both learn the reward function and exploit it on-line, critical properties for a class of physicallymotivated systems, such as mobile wireless networks. This paper introduces algorithms motivated by the Distributed Constraint Optimization Problem framework and demonstrates when, and at what cost, increasing agents’ coordination can improve the global reward on such problems.
James Pita, Manish Jain, Fernando Ordonez, Christopher Portway, Milind Tambe, Craig Western, Praveen Paruchuri, and Sarit Kraus. 2009. “Using Game Theory for Los Angeles Airport Security .” AI Magazine 30 (1), Pp. 43-57.Abstract
Security at major locations of economic or political importance is a key concern around the world, particularly given the threat of terrorism. Limited security resources prevent full security coverage at all times, which allows adversaries to observe and exploit patterns in selective patrolling or monitoring, e.g. they can plan an attack avoiding existing patrols. Hence, randomized patrolling or monitoring is important, but randomization must provide distinct weights to different actions based on their complex costs and benefits. To this end, this paper describes a promising transition of the latest in multi-agent algorithms into a deployed application. In particular, it describes a software assistant agent called ARMOR (Assistant for Randomized Monitoring over Routes) that casts this patrolling/monitoring problem as a Bayesian Stackelberg game, allowing the agent to appropriately weigh the different actions in randomization, as well as uncertainty over adversary types. ARMOR combines two key features: (i) It uses the fastest known solver for Bayesian Stackelberg games called DOBSS, where the dominant mixed strategies enable randomization; (ii) Its mixed-initiative based interface allows users to occasionally adjust or override the automated schedule based on their local constraints. ARMOR has been successfully deployed since August 2007 at the Los Angeles International Airport (LAX) to randomize checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals. This paper examines the information, design choices, challenges, and evaluation that went into designing ARMOR.
P. Paruchuri, J. Pearce, J. Marecki, M. Tambe, F. Ordonez, and S. Kraus. 2009. “Coordinating randomized policies for increasing security of agent systems.” Journal of Information Technology and Management (ITM), 10, 1, Pp. 67-79.Abstract
We consider the problem of providing decision support to a patrolling or security service in an adversarial domain. The idea is to create patrols that can achieve a high level of coverage or reward while taking into account the presence of an adversary. We assume that the adversary can learn or observe the patrolling strategy and use this to its advantage. We follow two different approaches depending on what is known about the adversary. If there is no information about the adversary we use a Markov Decision Process (MDP) to represent patrols and identify randomized solutions that minimize the information available to the adversary. This lead to the development of algorithms CRLP and BRLP, for policy randomization of MDPs. Second, when there is partial information about the adversary we decide on efficient patrols by solving a Bayesian Stackelberg game. Here, the leader decides first on a patrolling strategy and then an adversary, of possibly many adversary types, selects its best response for the given patrol. We provide two efficient MIP formulations named DOBSS and ASAP to solve this NP-hard problem. Our experimental results show the efficiency of these algorithms and illustrate how these techniques provide optimal and secure patrolling policies. Note that DOBSS is at the heart of the ARMOR system that is currently deployed at the Los Angeles International airport (LAX) for randomizing checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals. Key words: Multiagent Systems, Decision Theory, Game Theory, Security, Randomized Policies