Fei Fang, Peter Stone, and Milind Tambe. 2015. “When Security Games Go Green: Designing Defender Strategies to Prevent Poaching and Illegal Fishing.” In International Joint Conference on Artificial Intelligence (IJCAI).Abstract
Building on the successful applications of Stackelberg Security Games (SSGs) to protect infrastructure, researchers have begun focusing on applying game theory to green security domains such as protection of endangered animals and fish stocks. Previous efforts in these domains optimize defender strategies based on the standard Stackelberg assumption that the adversaries become fully aware of the defender’s strategy before taking action. Unfortunately, this assumption is inappropriate since adversaries in green security domains often lack the resources to fully track the defender strategy. This paper (i) introduces Green Security Games (GSGs), a novel game model for green security domains with a generalized Stackelberg assumption; (ii) provides algorithms to plan effective sequential defender strategies — such planning was absent in previous work; (iii) proposes a novel approach to learn adversary models that further improves defender performance; and (iv) provides detailed experimental analysis of proposed approaches.
Y. Vorobeychik, B. An, M. Tambe, and S. Singh. 6/2014. “Computing Solutions in Infinite-Horizon Discounted Adversarial Patrolling Game .” In International Conference on Automated Planning and Scheduling (ICAPS).Abstract
Stackelberg games form the core of a number of tools deployed for computing optimal patrolling strategies in adversarial domains, such as the US Federal Air Marshall Service and the US Coast Guard. In traditional Stackelberg security game models the attacker knows only the probability that each target is covered by the defender, but is oblivious to the detailed timing of the coverage schedule. In many real-world situations, however, the attacker can observe the current location of the defender and can exploit this knowledge to reason about the defender’s future moves. We show that this general modeling framework can be captured using adversarial patrolling games (APGs) in which the defender sequentially moves between targets, with moves constrained by a graph, while the attacker can observe the defender’s current location and his (stochastic) policy concerning future moves. We offer a very general model of infinite-horizon discounted adversarial patrolling games. Our first contribution is to show that defender policies that condition only on the previous defense move (i.e., Markov stationary policies) can be arbitrarily suboptimal for general APGs. We then offer a mixed-integer nonlinear programming (MINLP) formulation for computing optimal randomized policies for the defender that can condition on history of bounded, but arbitrary, length, as well as a mixed-integer linear programming (MILP) formulation to approximate these, with provable quality guarantees. Additionally, we present a non-linear programming (NLP) formulation for solving zero-sum APGs. We show experimentally that MILP significantly outperforms the MINLP formulation, and is, in turn, significantly outperformed by the NLP specialized to zero-sum games.
Benjamin Ford, Debarun Kar, Francesco M. Delle Fave, Rong Yang, and Milind Tambe. 5/2014. “PAWS: Adaptive Game-theoretic Patrolling for Wildlife Protection (Demonstration).” In Conference on Autonomous Agents and Multiagent Systems (AAMAS).Abstract
Endangered species around the world are in danger of extinction from poaching. From the start of the 20th century, the African rhino population has dropped over 98% [4] and the global tiger population has dropped over 95% [5], resulting in multiple species extinctions in both groups. Species extinctions have negative consequences on local ecosystems, economies, and communities. To protect these species, countries have set up conservation agencies and national parks, such as Uganda’s Queen Elizabeth National Park (QENP). However, a common lack of funding for these agencies results in a lack of law enforcement resources to protect these large, rural areas. As an example of the scale of disparity, one wildlife crime study in 2007 reported an actual coverage density of one ranger per 167 square kilometers [2]. Because of the hazards involved (e.g., armed poachers, wild animals), rangers patrol in groups, further increasing the amount of area they are responsible for patrolling. Security game research has typically been concerned with combating terrorism, and this field has indeed benefited from a range of successfully deployed applications [1, 6]. These applications have enabled security agencies to make more efficient use of their limited resources. In this previous research, adversary data has been absent during the development of these solutions, and thus, it has been difficult to make accurate adversary behavior models during algorithm development. In a domain such as wildlife crime, interactions with the adversary are frequent and repeated, thus enabling conservation agencies to collect data. This presence of data enables security game researchers to begin developing algorithms that incorporate this data into, potentially, more accurate behavior models and consequently better security solutions. Developed in conjunction with staff at QENP, the Protection Assistant for Wildlife Security (PAWS) generates optimized defender strategies for use by park rangers [7]. Due to the repeated nature of wildlife crime, PAWS is able to leverage crime event data - a previously unrealized capability in security games research. Thus, PAWS implements a novel adaptive algorithm that processes crime event data, builds multiple human behavior models, and, based on those models, predicts where adversaries will attack next. These predictions are then used to generate a patrol strategy for the rangers (i.e., a set of patrol waypoints) that can be viewed on a GPS unit. Against this background, the demonstration presented in this paper introduces two contributions. First, we present the PAWS system which incorporates the algorithm in [7] into a scheduling system and a GPS visualizer. Second, we present a software interface to run a number of human subject experiments (HSE) to evaluate and improve the efficacy of PAWS before its deployment in QENP. By conducting these HSEs, we can (i) test the PAWS algorithms with repeated interactions with humans, thus providing a more realistic testing environment than in its previous simulations; (ii) generate data that can be used to initialize PAWS’s human behavior models for deployment, and (iii) compare the current PAWS algorithms’ performance to alternatives and determine if additional improvements are needed prior to deployment. To provide proper context for the presentation, this paper also presents a brief overview of the PAWS system data flow and its adaptive algorithms. The demonstration will engage audience members by having them participate in the HSEs and using the GPS unit to visualize a patrol schedule in QENP.
Matthew Brown, William B. Haskell, and Milind Tambe. 2014. “Addressing Scalability and Robustness in Security Games with Multiple Boundedly Rational Adversaries .” In Conference on Decision and Game Theory for Security (GameSec).Abstract
Boundedly rational human adversaries pose a serious challenge to security because they deviate from the classical assumption of perfect rationality. An emerging trend in security game research addresses this challenge by using behavioral models such as quantal response (QR) and subjective utility quantal response (SUQR). These models improve the quality of the defender’s strategy by more accurately modeling the decisions made by real human adversaries. Work on incorporating human behavioral models into security games has typically followed two threads. The first thread, scalability, seeks to develop efficient algorithms to design patrols for large-scale domains that protect against a single adversary. However, this thread cannot handle the common situation of multiple adversary types with heterogeneous behavioral models. Having multiple adversary types introduces considerable uncertainty into the defender’s planning problem. The second thread, robustness, uses either Bayesian or maximin approaches to handle this uncertainty caused by multiple adversary types. However, the robust approach has so far not been able to scale up to complex, large-scale security games. Thus, each of these two threads alone fails to work in key real world security games. Our present work addresses this shortcoming and merges these two research threads to yield a scalable and robust algorithm, MIDAS (MaxImin Defense Against SUQR), for generating game-theoretic patrols to defend against multiple boundedly rational human adversaries. Given the size of the defender’s optimization problem, the key component of MIDAS is incremental cut and strategy generation using a master/slave optimization approach. Innovations in MIDAS include (i) a maximin mixed-integer linear programming formulation in the master and (ii) a compact transition graph formulation in the slave. Additionally, we provide a theoretical analysis of our new model and report its performance in simulations. In collaboration with the United States Coast Guard (USCG), we consider the problem of defending fishery stocks from illegal fishing in the Gulf of Mexico and use MIDAS to handle heterogeneity in adversary types (i.e., illegal fishermen) in order to construct robust patrol strategies for USCG assets.
L. S. Marcolino, B. Kolev, S. Price, S. P. Veetil, D. Gerber, J. Musil, and M. Tambe. 2014. “Aggregating Opinions to Design Energy-Efficient Buildings .” In 8th Multidisciplinary Workshop on Advances in Preference Handling (M-PREF 2014).Abstract
In this research-in-progress paper we present a new real world domain for studying the aggregation of different opinions: early stage architectural design of buildings. This is an important real world application, not only because building design and construction is one of the world’s largest industries measured by global expenditures, but also because the early stage design decision making has a significant impact on the energy consumption of buildings. We present a mapping between the domain of architecture and engineering research and that of the agent models present in the literature. We study the importance of forming diverse teams when aggregating the opinions of different agents for architectural design, and also the effect of having agents optimizing for different factors of a multi-objective optimization design problem. We find that a diverse team of agents is able to provide a higher number of top ranked solutions for the early stage designer to choose from. Finally, we present the next steps for a deeper exploration of our questions.
Jun-young Kwak, Debarun Kar, William Haskell, Pradeep Varakantham, and Milind Tambe. 2014. “Building THINC: User Incentivization and Meeting Rescheduling for Energy Savings.” In International Conference on Autonomous Agents and Multiagent Systems (AAMAS).Abstract
This paper presents THINC, an agent developed for saving energy in real-world commercial buildings. While previous work has presented techniques for computing energy-efficient schedules, it fails to address two issues, centered on human users, that are essential in real-world agent deployments: (i) incentivizing users for their energy saving activities and (ii) interacting with users to reschedule key “energy-consuming” meetings in a timely fashion, while handling the uncertainty in such interactions. THINC addresses these shortcomings by providing four new major contributions. First, THINC computes fair division of credits from energy savings. For this fair division, THINC provides novel algorithmic advances for efficient computation of Shapley value. Second, THINC includes a novel robust algorithm to optimally reschedule identified key meetings addressing user interaction uncertainty. Third, THINC provides an end-to-end integration within a single agent of energy efficient scheduling, rescheduling and credit allocation. Finally, we deploy THINC in the real-world as a pilot project at one of the main libraries at the University of Southern California and present results illustrating the benefits in saving energy.
A. Jiang, M. Jain, and M. Tambe. 2014. “Computational Game Theory for Security and Sustainability .” Journal of Information Processing(JIP), (Invited article), 22, 2, Pp. 176-185.Abstract
Security is a critical concern around the world that arises in protecting our ports, airports, transportation and other critical national infrastructure from adversaries, in protecting our wildlife and forests from poachers and smugglers, and in curtailing the illegal flow of weapons, drugs and money; and it arises in problems ranging from physical to cyber-physical systems. In all of these problems, we have limited security resources which prevent full security coverage at all times; instead, security resources must be deployed intelligently taking into account differences in priorities of targets requiring security coverage, the responses of the attackers to the security posture, and potential uncertainty over the types, capabilities, knowledge and priorities of attackers faced. Game theory, which studies interactions among multiple selfinterested agents, is well-suited to the adversarial reasoning required for security resource allocation and scheduling problems. Casting the problem as a Bayesian Stackelberg game, we have developed new algorithms for efficiently solving such games that provide randomized patrolling or inspection strategies. These algorithms have led to some initial successes in this challenging problem arena, leading to advances over previous approaches in security scheduling and allocation, e.g., by addressing key weaknesses of predictability of human schedulers. These algorithms are now deployed in multiple applications: ARMOR has been deployed at the Los Angeles International Airport (LAX) since 2007 to randomize checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals [17]; IRIS, a game-theoretic scheduler for randomized deployment of the US Federal Air Marshals (FAMS) requiring significant scaleup in underlying algorithms, has been in use since 2009 [17]; PROTECT, which schedules the US Coast Guard’s randomized patrolling of ports using a new set of algorithms based on modeling bounded-rational human attackers, has been deployed in the port of Boston since April 2011 and is in use at the port of New York since February 2012 [34], and is headed for nationwide deployment; another application for deploying escort boats to protect ferries has been deployed by the US Coast Guard since April 2013 [10]; GUARDS is under evaluation for national deployment by the US Transportation Security Administration (TSA) [32], and TRUSTS [43] has been evaluated in field trials by the Los Angeles Sheriffs Department (LASD) in the LA Metro system and a nation-wide deployment is now being evaluated at TSA. These initial successes point the way to major future applications in a wide range of security domains; with major research challenges in scaling up our game-theoretic algorithms, in addressing human adversaries’ bounded rationality and uncertainties in action execution and observation, as well as in multiagent learning. This paper will provide an overview of the models and algorithms, key research challenges and a brief description of our successful deployments.
Milind Tambe, Albert Jiang, Bo An, and Manish Jain. 2014. “Computational game theory for security: Progress and challenges .” In AAAI Spring Symposium on Applied Computational Game Theory.Abstract
The goal of this paper is to (re)introduce a real-world challenge problem for researchers in multiagent systems and beyond, where our collective efforts may have a significant impact on activities in the real-world. The challenge is in applying game theory for security: our goal is to not only introduce the research challenges for algorithmic and behavioral game theory in service of this problem, but also to provide initial exemplars of successes of deployed systems, and to challenges introduced by these deployments of computational game theory in the field. We also wish to provide an overview of key open research challenges and pointers to getting started in this research.
Haifeng Xu, Fei Fang, Albert Xin Jiang, Vincent Conitzer, Shaddin Dughmi, and Milind Tambe. 2014. “Computing Minimax Strategy for Discretized Spatio-Temporal Zero-Sum Security Games .” In International Joint Workshop on Optimization in Multi-Agent Systems and Distributed Constraint Reasoning (OPTMAS-DCR) In Conjunction with AAMAS 2014. Paris, France.Abstract
Among the many deployment areas of Stackelberg Security games, a major area involves games played out in space and time, which includes applications in multiple mobile defender resources protecting multiple mobile targets. Previous algorithms for such spatio-temporal security games fail to scale-up and little is known of the computational complexity properties of these problems. This paper provides a novel oracle-based algorithmic framework for a systematic study of different problem variants of computing optimal (minimax) strategies in spatio-temporal security games. Our framework enables efficient computation of a minimax strategy when the problem admits a polynomial-time oracle. Furthermore, for the cases in which efficient oracles are difficult to find, we propose approximations or prove hardness results.
Chao Zhang, Albert Xin Jiang, Martin B. Short, Jeffrey P. Brantingham, and Milind Tambe. 2014. “Defending Against Opportunistic Criminals: New Game-Theoretic Frameworks and Algorithms .” In In Conference on Decision and Game Theory for Security (Gamesec) 2014.Abstract
This paper introduces a new game-theoretic framework and algorithms for addressing opportunistic crime. The Stackelberg Security Game (SSG), which models highly strategic and resourceful adversaries, has become an important computational framework within multiagent systems. Unfortunately, SSG is ill-suited as a framework for handling opportunistic crimes, which are committed by criminals who are less strategic in planning attacks and more flexible in executing them than SSG assumes. Yet, opportunistic crime is what is commonly seen in most urban settings.We therefore introduce the Opportunistic Security Game (OSG), a computational framework to recommend deployment strategies for defenders to control opportunistic crimes. Our first contribution in OSG is a novel model for opportunistic adversaries, who (i) opportunistically and repeatedly seek targets; (ii) react to real-time information at execution time rather than planning attacks in advance; and (iii) have limited observation of defender strategies. Our second contribution to OSG is a new exact algorithm EOSG to optimize defender strategies given our opportunistic adversaries. Our third contribution is the development of a fast heuristic algorithm to solve largescale OSG problems, exploiting a compact representation.We use urban transportation systems as a critical motivating domain, and provide detailed experimental results based on a real-world system.
Leandro Soriano Marcolino, Chao Zhang, Albert Xin Jiang, and Milind Tambe. 2014. “A Detailed Analysis of a Multi-agent Diverse Team .” In Coordination, Organizations, Institutions and Norms in Agent Systems IX. Springer-Verlag Lecture Notes in AI, edited by T. Balke, A. Chopra, F. Dignum, and B. van Riemsdijk.Abstract
In an open system we can have many different kinds of agents. However, it is a challenge to decide which agents to pick when forming multi-agent teams. In some scenarios, agents coordinate by voting continuously. When forming such teams, should we focus on the diversity of the team or on the strength of each member? Can a team of diverse (and weak) agents outperform a uniform team of strong agents? We propose a new model to address these questions. Our key contributions include: (i) we show that a diverse team can overcome a uniform team and we give the necessary conditions for it to happen; (ii) we present optimal voting rules for a diverse team; (iii) we perform synthetic experiments that demonstrate that both diversity and strength contribute to the performance of a team; (iv) we show experiments that demonstrate the usefulness of our model in one of the most difficult challenges for Artificial Intelligence: Computer Go.
A.X. Jiang, L. S. Marcolino, A. D. Procaccia, T. Sandholm, N. Shah, and M. Tambe. 2014. “Diverse Randomized Agents Vote to Win .” In 28th Neural Information Processing Systems Conference (NIPS).Abstract
We investigate the power of voting among diverse, randomized software agents. With teams of computer Go agents in mind, we develop a novel theoretical model of two-stage noisy voting that builds on recent work in machine learning. This model allows us to reason about a collection of agents with different biases (determined by the first-stage noise models), which, furthermore, apply randomized algorithms to evaluate alternatives and produce votes (captured by the secondstage noise models). We analytically demonstrate that a uniform team, consisting of multiple instances of any single agent, must make a significant number of mistakes, whereas a diverse team converges to perfection as the number of agents grows. Our experiments, which pit teams of computer Go agents against strong agents, provide evidence for the effectiveness of voting when agents are diverse.
Francesco M. Delle Fave, Eric Shieh, Manish Jain, Albert Xin Jiang, Heather Rosoff, Milind Tambe, and John P. Sullivan. 2014. “Efficient Solutions for Joint Activity Based Security Games: Fast Algorithms, Results and a Field Experiment on a Transit System .” Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS).Abstract
In recent years, several security agencies have been deploying scheduling systems based on algorithmic advances in Stackelberg security games (SSGs). Unfortunately, none of the existing algorithms can scale up to domains where benefits are accrued from multiple defender resources performing jointly coordinated activities. Yet in many domains, including port patrolling where SSGs are in use, enabling multiple defender resources to perform jointly coordinated activities would significantly enhance the effectiveness of the patrols. To address this challenge, this paper presents four contributions. First, we present Smart (Security games with Multiple coordinated Activities and Resources that are Time-dependent), a novel SSG model that explicitly represents jointly coordinated activities between defender’s resources. Second,we present two branch-and-price algorithms, SmartO—an optimal algorithm, and SmartH— a heuristic approach, to solve Smart instances. The two algorithms present three novel features: (i) a novel approach to generate individual defender strategies by ordering the search space during column generation using insights from the Traveling Salesman Problem(TSP); (ii) exploitation of iterative modification of rewards of multiple defender resources to generate coordinated strategies and (iii) generation of tight upper bounds for pruning using the structure of the problem. Third, we present an extensive empirical and theoretical analysis of both SmartO and SmartH. Fourth, we describe a large scale real-world experiment whereby we run the first head-to-head comparison between game-theoretic schedules generated using SmartH against schedules generated by humans on a one-day patrol exercise over one train line of the Los Angeles Metro System. Our results show that game-theoretic schedules were evaluated to be superior to ones generated by humans.
Matthew Brown, Bo An, Christopher Kiekintveld, Fernando Ordonez, and Milind Tambe. 2014. “An Extended Study on Multi-Objective Security Games .” Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS), 28, 1, Pp. 31-71.Abstract
The burgeoning area of security games has focused on real-world domains where security agencies protect critical infrastructure from a diverse set of adaptive adversaries. In such domains, decision makers have multiple competing objectives they must consider which may take different forms that are not readily comparable including safety, cost, and public perception. Thus, it can be difficult to know how to weigh the different objectives when deciding on a security strategy. To address the challenges of these domains, we propose a fundamentally different solution concept, multi-objective security games (MOSG). Instead of a single optimal solution, MOSGs have a set of Pareto optimal (non-dominated) solutions referred to as the Pareto frontier, which can be generated by solving a sequence of constrained single-objective optimization problems (CSOP). The Pareto frontier allows the decision maker to analyze the tradeoffs that exist between the multiple objectives. Our contributions include: (i) an algorithm, Iterative--Constraints, for generating the sequence of CSOPs; (ii) an exact approach for solving an MILP formulation of a CSOP; (iii) heuristics that achieve speed up by exploiting the structure of security games to further constrain the MILP; (iv) an approximate approach for solving a CSOP built off those same heuristics, increasing the scalability of our approach with quality guarantees. Additional contributions of this paper include proofs on the level of approximation, detailed experimental evaluation of the proposed approaches and heuristics, as well as a discussion on techniques for visualizing the Pareto frontier.
F. M. Delle Fave, A.X. Jiang, Z. Yin, C. Zhang, M. Tambe, S. Kraus, and J.P. Sullivan. 2014. “Game-theoretic Security Patrolling with Dynamic Execution Uncertainty and a Case Study on a Real Transit System .” Journal of Artificial Intelligence Research, 50, Pp. 321-367.Abstract
Attacker-Defender Stackelberg security games (SSGs) have emerged as an important research area in multi-agent systems. However, existing SSGs models yield fixed, static, schedules which fail in dynamic domains where defenders face execution uncertainty, i.e., in domains where defenders may face unanticipated disruptions of their schedules. A concrete example is an application involving checking fares on trains, where a defender’s schedule is frequently interrupted by fare evaders, making static schedules useless. To address this shortcoming, this paper provides four main contributions. First, we present a novel general Bayesian Stackelberg game model for security resource allocation in dynamic uncertain domains. In this new model, execution uncertainty is handled by using a Markov decision process (MDP) for generating defender policies. Second, we study the problem of computing a Stackelberg equilibrium for this game and exploit problem structure to reduce it to a polynomial-sized optimization problem. Shifting to evaluation, our third contribution shows in simulation that our MDP-based policies overcome the failures of previous SSG algorithms. In so doing, we can now build a complete system, that enables handling of schedule interruptions and, consequently, to conduct some of the first controlled experiments on SSGs in the field. Hence, as our final contribution, we present results from a real-world experiment on Metro trains in Los Angeles validating our MDPbased model, and most importantly, concretely measuring the benefits of SSGs for security resource allocation.
J. Tsai, T. Nguyen, N. Weller, and M. Tambe. 2014. “Game-Theoretic Target Selection in Contagion-based Domains .” In The Computer Journal, 57, 6, Pp. 893-905.Abstract
Many strategic actions carry a ‘contagious’ component beyond the immediate locale of the effort itself. Viral marketing and peacekeeping operations have both been observed to have a spreading effect. In this work, we use counterinsurgency as our illustrative domain. Defined as the effort to block the spread of support for an insurgency, such operations lack the manpower to defend the entire population and must focus on the opinions of a subset of local leaders. As past researchers of security resource allocation have done, we propose using game theory to develop such policies and model the interconnected network of leaders as a graph. Unlike this past work in security games, actions in these domains possess a probabilistic, nonlocal impact. To address this new class of security games, we combine recent research in influence blocking maximization with a double oracle approach and create novel heuristic oracles to generate mixed strategies for a real-world leadership network from Afghanistan, synthetic leadership networks, and scale-free graphs. We find that leadership networks that exhibit highly interconnected clusters can be solved equally well by our heuristic methods, but our more sophisticated heuristics outperform simpler ones in less interconnected scale-free graphs.
L. S. Marcolino, H. Xu, A.X. Jiang, M. Tambe, and E. Bowring. 2014. “Give a Hard Problem to a Diverse Team: Exploring Large Action Spaces .” In 28th Conference on Artificial Intelligence (AAAI 2014). Québec, Canada.Abstract
Recent work has shown that diverse teams can outperform a uniform team made of copies of the best agent. However, there are fundamental questions that were not asked before. When should we use diverse or uniform teams? How does the performance change as the action space or the teams get larger? Hence, we present a new model of diversity for teams, that is more general than previous models. We prove that the performance of a diverse team improves as the size of the action space gets larger. Concerning the size of the diverse team, we show that the performance converges exponentially fast to the optimal one as we increase the number of agents. We present synthetic experiments that allow us to gain further insights: even though a diverse team outperforms a uniform team when the size of the action space increases, the uniform team will eventually again play better than the diverse team for a large enough action space. We verify our predictions in a system of Go playing agents, where we show a diverse team that improves in performance as the board size increases, and eventually overcomes a uniform team.
Rong Yang. 2014. “Human Adversaries in Security Games: Integrating Models of Bounded Rationality and Fast Algorithms ”.Abstract
Security is a world-wide concern in a diverse set of settings, such as protecting ports, airport and other critical infrastructures, interdicting the illegal flow of drugs, weapons and money, preventing illegal poaching/hunting of endangered species and fish, suppressing crime in urban areas and securing cyberspace. Unfortunately, with limited security resources, not all the potential targets can be protected at all times. Game-theoretic approaches — in the form of ”security games” — have recently gained significant interest from researchers as a tool for analyzing real-world security resource allocation problems leading to multiple deployed systems in day-to-day use to enhance security of US ports, airports and transportation infrastructure. One of the key challenges that remains open in enhancing current security game applications and enabling new ones originates from the perfect rationality assumption of the adversaries — an assumption may not hold in the real world due to the bounded rationality of human adversaries and hence could potentially reduce the effectiveness of solutions offered. My thesis focuses on addressing the human decision-making in security games. It seeks to bridge the gap between two important subfields in game theory: algorithmic game theory and behavioral game theory. The former focuses on efficient computation of equilibrium solution concepts, and the latter develops models to predict the behaviors of human players in various game settings. More specifically, I provide: (i) the answer to the question of which of the existing models best represents the salient features of the security problems, by empirically exploring different human behavioral models from the literature; (ii) algorithms to efficiently compute the resource allocation strategies for the security agencies considering these new models of the adversaries; (iii) real-world deployed systems that range from security of ports to wildlife security.
Yundi Qian, William B. Haskell, Albert Xin Jiang, and Milind Tambe. 2014. “Online Learning and Planning in Resource Conservation Games .” In In AAMAS 2014 Workshop on Adaptive Learning Agents (ALA).Abstract
Protecting our environment and natural resources is a major global challenge. “Protectors” (law enforcement agencies) try to protect these natural resources, while “extractors” (criminals) seek to exploit them. In many domains, such as illegal fishing, the extractors know more about the distribution and richness of the resources than the protectors, making it extremely difficult for the protectors to optimally allocate their assets for patrol and interdiction. Fortunately, extractors carry out frequent illegal extractions, so protectors can learn the richness of resources by observing the extractor’s behavior. This paper presents an approach for allocating protector assets based on learning from extractors. We make the following four specific contributions: (i) we model resource conservation as a repeated game and transform this repeated game into a POMDP, which cannot be solved by the latest general POMDP solvers due to its exponential state space; (ii) in response, we propose GMOP, a dedicated algorithm that combines Gibbs sampling with Monte Carlo tree search for online planning in this POMDP; (iii) for a specific class of our game, we speed up the GMOP algorithm without sacrificing solution quality, as well as provide a heuristic that trades off solution quality for lower computational cost; (iv) we explore the continuous utility scenario where the POMDP becomes a continuous-state POMDP, and provide a solution in special cases.
Yundi Qian, William B. Haskell, Albert Xin Jiang, and Milind Tambe. 2014. “Online Planning for Optimal Protector Strategies in Resource Conservation Games .” In International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2014).Abstract
Protecting our environment and natural resources is a major global challenge. “Protectors” (law enforcement agencies) try to protect these natural resources, while “extractors” (criminals) seek to exploit them. In many domains, such as illegal fishing, the extractors know more about the distribution and richness of the resources than the protectors, making it extremely difficult for the protectors to optimally allocate their assets for patrol and interdiction. Fortunately, extractors carry out frequent illegal extractions, so protectors can learn about the richness of resources by observing the extractor’s behavior. This paper presents an approach for allocating protector assets based on learning from extractors. We make the following four specific contributions: (i) we model resource conservation as a repeated game; (ii) we transform this repeated game into a POMDP by adopting a fixed model for the adversary’s behavior, which cannot be solved by the latest general POMDP solvers due to its exponential state space; (iii) in response, we propose GMOP, a dedicated algorithm that combines Gibbs sampling with Monte Carlo tree search for online planning in this POMDP; (iv) for a specific class of our game, we can speed up the GMOP algorithm without sacrificing solution quality, as well as provide a heuristic that trades off solution quality for lower computational cost.