2019

Teamcore research published in 2019
Kai Wang, Bryan Wilder, Sze-Chuan Suen, Bistra Dilkina, and Milind Tambe. 2019. “Improving GP-UCB Algorithm by Harnessing Decomposed Feedback.” In The 4th Workshop on Data Science for Social Good at ECML-PKDD, 2019.Abstract
Gaussian processes (GPs) have been widely applied to machine learning and nonparametric approximation. Given existing observations, a GP allows the decision maker to update a posterior belief over the unknown underlying function. Usually, observations from a complex system come with noise and decomposed feedback from intermediate layers. For example, the decomposed feedback could be the components that constitute the final objective value, or the various feedback gotten from sensors. Previous literature has shown that GPs can successfully deal with noise, but has neglected decomposed feedback. We therefore propose a decomposed GP regression algorithm to incorporate this feedback, leading to less average root-mean-squared error with respect to the target function, especially when the samples are scarce. We also introduce a decomposed GP-UCB algorithm to solve the resulting bandit problem with decomposed feedback. We prove that our algorithm converges to the optimal solution and preserves the noregret property. To demonstrate the wide applicability of this work, we execute our algorithm on two disparate social problems: infectious disease control and weather monitoring. The numerical results show that our method provides significant improvement against previous methods that do not utilize these feedback, showcasing the advantage of considering decomposed feedback.
Kai Wang, Aditya Mate, Bryan Wilder, Andrew Perrault, and Milind Tambe. 2019. “Using Graph Convolutional Networks to Learn Interdiction Games .” In AI for Social Good workshop (AI4SG) at International Joint Conference on Artificial Intelligence (IJCAI) 2019.Abstract
Illegal smuggling is one of the most important issues across countries, where more than $10 billion a year of illegal wildlife trafficking is conducted within transnational criminal networks. Governments have tried to deploy inspections at checkpoints to stop illegal smuggling, though the effect is quite limited due to vast protection areas but limited human resources. We study these problems from the perspective of network interdiction games with a boundedly rational attacker. In this paper, we aim to improve the efficiency of the limited number of checkpoints. The problem involves two main stages: i) a predictive stage to predict the attacker’s behavior based on the historical interdiction; ii) a prescriptive stage to optimally allocate limited checkpoints to interdict the attacker. In this paper, we propose a novel boundedly rational model which resolves the issue of exponentially many attacker strategies by making memoryless assumption about the attacker’s behavior. We show that the attacker’s behavior can be reduced to an absorbing Markov chain, where the success probability of reaching any target can be computed analytically, thus optimized via any gradient-based optimization technique. We incorporate graph convolutional neural networks with K-hops look-ahead to model the attacker’s reasoning. Our proposed model provides a new perspective to study the boundedly rationality in traditional interdiction games with graph structure. This novel model possesses nice analytical properties and scales up very well by avoiding enumerating all paths in the graph.
Bryan Wilder, Jackson A. Killian, Amit Sharma, Vinod Choudhary, Bistra Dilkina, and Milind Tambe. 2019. “Integrating optimization and learning to prescribe interventions for tuberculosis patients .” In 10th International Workshop on Optimization in Multiagent Systems (OptMAS).Abstract
Creating impact in real-world settings requires agents which navigate the full pipeline from data, to predictive models, to decisions. These components are typically approached separately: a machine learning model is first trained via a measure of predictive accuracy, and then its predictions are used as input into an optimization algorithm which produces a decision. However, the loss function used to train the model may easily be misaligned with the end goal of the agent, which is to make the best decisions possible. We focus on combinatorial optimization problems and introduce a general framework for decision-focused learning, where the machine learning model is directly trained in conjunction with the optimization algorithm to produce high-quality decisions. Technically, our contribution is a means of integrating common classes of discrete optimization problems into deep learning or other predictive models, which are typically trained via gradient descent. The main idea is to use a continuous relaxation of the discrete problem to propagate gradients through the optimization procedure. We instantiate this framework for two broad classes of combinatorial problems: linear programs and submodular maximization. We then provide an application of such techniques to a real problem of societal importance: improving interventions in tuberculosis treatment. Using data on 17,000 Indian patients provided by the NGO Everwell, we consider the problem of predicting which patients are likely to miss doses of medication in the near future and optimizing interventions by health workers to avert such treatment failures. We find the decisionfocused learning improves the number of successful interventions by approximately 15% compared to standard machine learning approaches, demonstrating that aligning the goals of learning and decision making can yield substantial benefits in a socially critical application.
Edward A. Cranford, Cleotilde Gonzalez, Palvi Aggarwal, Sarah Cooney, Milind Tambe, and Christian Lebiere. 2019. “Towards Personalized Deceptive Signaling for Cyber Defense Using Cognitive Models .” In International Conference on Cognitive Modeling (ICCM), 2019.Abstract
Recent research in cybersecurity has begun to develop active defense strategies using game-theoretic optimization of the allocation of limited defenses combined with deceptive signaling. While effective, the algorithms are optimized against perfectly rational adversaries. In a laboratory experiment, we pit humans against the defense algorithm in an online game designed to simulate an insider attack scenario. Humans attack far more often than predicted under perfect rationality. Optimizing against human bounded rationality is vitally important. We propose a cognitive model based on instancebased learning theory and built in ACT-R that accurately predicts human performance and biases in the game. We show that the algorithm does not defend well, largely due to its static nature and lack of adaptation to the particular individual’s actions. Thus, we propose an adaptive method of signaling that uses the cognitive model to trace an individual’s experience in real time, in order to optimize defenses. We discuss the results and implications of personalized defense. Keywords: cyber deception; cognitive models; instance-based learning; knowledge-tracing; model-tracing
Elizabeth Bondi. 2019. “Visionary Security: Using Uncertain Real-Time Information in Signaling Games .” In International Joint Conference on Artificial Intelligence (IJCAI) 2019 (Doctoral Consortium).Abstract
In important domains from natural resource conservation to public safety, real-time information is becoming increasingly important. Strategic deployment of security cameras and mobile sensors such as drones can provide real-time updates on illegal activities. To help plan for such strategic deployments of sensors and human patrollers, as well as warning signals to ward off adversaries, the defender-attacker security games framework can be used. [Zhang et al., 2019] has shown that real-time data (e.g., human view from a helicopter) may be used in conjunction with security game models to interdict criminals. Other recent work relies on real-time information from sensors that can notify the patroller when an opponent is detected [Basilico et al., 2017; Xu et al., 2018]. Despite considering real-time information in all cases, these works do not consider the combined situation of uncertainty in real-time information in addition to strategically signaling to adversaries. In this thesis, we will not only address this gap, but also improve the overall security result by considering security game models and computer vision algorithms together. A major aspect of this work is in applying it to real-world challenges, such as conservation. Although it applies to many environmental challenges, such as protecting forests and avoiding illegal mining, we will focus particularly on reducing poaching of endangered wildlife as an example. To reduce poaching, human patrollers typically search for snares and poaching activity as they patrol, as well as intervene if poaching activity is found. Drones are useful patrolling aids due to their ability to cover additional ground, but they must interpret their environments, notify nearby human patrollers for intervention, and send potentially deceptive signals to the adversary to deter poaching. Rather than treating these as separate tasks, models must coordinate to handle challenges found in real-world conservation scenarios (Fig. 1). We will determine the success of this work both in simulated experiments and through work with conservation agencies such as Air Shepherd to implement the system in the real world.
Alan Tsang, Bryan Wilder, Eric Rice, Milind Tambe, and Yair Zick. 2019. “Group-Fairness in Influence Maximization .” In International Joint Conference on Artificial Intelligence (IJCAI) 2019.Abstract
Influence maximization is a widely used model for information dissemination in social networks. Recent work has employed such interventions across a wide range of social problems, spanning public health, substance abuse, and international development (to name a few examples). A critical but understudied question is whether the benefits of such interventions are fairly distributed across different groups in the population; e.g., avoiding discrimination with respect to sensitive attributes such as race or gender. Drawing on legal and game-theoretic concepts, we introduce formal definitions of fairness in influence maximization. We provide an algorithmic framework to find solutions which satisfy fairness constraints, and in the process improve the state of the art for general multi-objective submodular maximization problems. Experimental results on real data from an HIV prevention intervention for homeless youth show that standard influence maximization techniques oftentimes neglect smaller groups which contribute less to overall utility, resulting in a disparity which our proposed algorithms substantially reduce.
Sarah Cooney, Kai Wang, Elizabeth Bondi, Thanh Nguyen, Phebe Vayanos, Hailey Winetrobe, Edward A. Cranford, Cleotilde Gonzalez, Christian Lebiere, and Milind Tambe. 2019. “Learning to Signal in the Goldilocks Zone: Improving Adversary Compliance in Security Games .” In European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, (ECML-PKDD), 2019.Abstract
Many real-world security scenarios can be modeled via a game-theoretic framework known as a security game in which there is a defender trying to protect potential targets from an attacker. Recent work in security games has shown that deceptive signaling by the defender can convince an attacker to withdraw his attack. For instance, a warning message to commuters indicating speed enforcement is in progress ahead might lead to them driving more slowly, even if it turns out no enforcement is in progress. However, the results of this work are limited by the unrealistic assumption that the attackers will behave with perfect rationality, meaning they always choose an action that gives them the best expected reward. We address the problem of training boundedly rational (human) attackers to comply with signals via repeated interaction with signaling without incurring a loss to the defender, and offer the four following contributions: (i) We learn new decision tree and neural network-based models of attacker compliance with signaling. (ii) Based on these machine learning models of a boundedly rational attacker’s response to signaling, we develop a theory of signaling in the Goldilocks zone, a balance of signaling and deception that increases attacker compliance and improves defender utility. (iii) We present game-theoretic algorithms to solve for signaling schemes based on the learned models of attacker compliance with signaling. (iv) We conduct extensive human subject experiments using an online game. The game simulates the scenario of an inside attacker trying to steal sensitive information from company computers, and results show that our algorithms based on learned models of attacker behavior lead to better attacker compliance and improved defender utility compared to the state-of-the-art algorithm for rational attackers with signaling.
Qingyu Guo, Jiarui Gan, Fei Fang, Long Tran-Thanh, Milind Tambe, and Bo An. 2019. “On the Inducibility of Stackelberg Equilibrium for Security Games .” In AAAI conference on Artificial Intelligence (AAAI-19).Abstract
Strong Stackelberg equilibrium (SSE) is the standard solution concept of Stackelberg security games. As opposed to the weak Stackelberg equilibrium (WSE), the SSE assumes that the follower breaks ties in favor of the leader and this is widely acknowledged and justified by the assertion that the defender can often induce the attacker to choose a preferred action by making an infinitesimal adjustment to her strategy. Unfortunately, in security games with resource assignment constraints, the assertion might not be valid; it is possible that the defender cannot induce the desired outcome. As a result, many results claimed in the literature may be overly optimistic. To remedy, we first formally define the utility guarantee of a defender strategy and provide examples to show that the utility of SSE can be higher than its utility guarantee. Second, inspired by the analysis of leader’s payoff by Von Stengel and Zamir (2004), we provide the solution concept called the inducible Stackelberg equilibrium (ISE), which owns the highest utility guarantee and always exists. Third, we show the conditions when ISE coincides with SSE and the fact that in general case, SSE can be extremely worse with respect to utility guarantee. Moreover, introducing the ISE does not invalidate existing algorithmic results as the problem of computing an ISE polynomially reduces to that of computing an SSE. We also provide an algorithmic implementation for computing ISE, with which our experiment
Bryan Wilder, Bistra Dilkina, and Milind Tambe. 2019. “Melding the Data-Decisions Pipeline: Decision-Focused Learning for Combinatorial Optimization .” In AAAI conference on Artificial Intelligence (AAAI-19).Abstract
Creating impact in real-world settings requires artificial intelligence techniques to span the full pipeline from data, to predictive models, to decisions. These components are typically approached separately: a machine learning model is first trained via a measure of predictive accuracy, and then its predictions are used as input into an optimization algorithm which produces a decision. However, the loss function used to train the model may easily be misaligned with the end goal, which is to make the best decisions possible. Hand-tuning the loss function to align with optimization is a difficult and error-prone process (which is often skipped entirely). We focus on combinatorial optimization problems and introduce a general framework for decision-focused learning, where the machine learning model is directly trained in conjunction with the optimization algorithm to produce highquality decisions. Technically, our contribution is a means of integrating common classes of discrete optimization problems into deep learning or other predictive models, which are typically trained via gradient descent. The main idea is to use a continuous relaxation of the discrete problem to propagate gradients through the optimization procedure. We instantiate this framework for two broad classes of combinatorial problems: linear programs and submodular maximization. Experimental results across a variety of domains show that decisionfocused learning often leads to improved optimization performance compared to traditional methods. We find that standard measures of accuracy are not a reliable proxy for a predictive model’s utility in optimization, and our method’s ability to specify the true goal as the model’s training objective yields substantial dividends across a range of decision problems.
Shahrzad Gholami, Amulya Yadav, Long Tran-Thanh, Bistra Dilkina, and Milind Tambe. 2019. “Don’t Put All Your Strategies in One Basket: Playing Green Security Games with Imperfect Prior Knowledge .” In International Conference on Autonomous Agents and Multiagent Systems (AAMAS-19).Abstract
Security efforts for wildlife monitoring and protection of endangered species (e.g., elephants, rhinos, etc.) are constrained by limited resources available to law enforcement agencies. Recent progress in Green Security Games (GSGs) has led to patrol planning algorithms for strategic allocation of limited patrollers to deter adversaries in environmental settings. Unfortunately, previous approaches to these problems suffer from several limitations. Most notably, (i) previous work in GSG literature relies on exploitation of error-prone machine learning (ML) models of poachers’ behavior trained on (spatially) biased historical data; and (ii) online learning approaches for repeated security games (similar to GSGs) do not account for spatio-temporal scheduling constraints while planning patrols, potentially causing significant shortcomings in the effectiveness of the planned patrols. Thus, this paper makes the following novel contributions: (I) We propose MINION-sm, a novel online learning algorithm for GSGs which does not rely on any prior error-prone model of attacker behavior, instead, it builds an implicit model of the attacker on-the-fly while simultaneously generating schedulingconstraint-aware patrols. MINION-sm achieves a sublinear regret against an optimal hindsight patrol strategy. (II) We also propose MINION, a hybrid approach where our MINION-sm model and an ML model (based on historical data) are considered as two patrol planning experts and we obtain a balance between them based on their observed empirical performance. (III) We show that our online learning algorithms significantly outperform existing state-of-theart solvers for GSGs.
Nitin Kamra, Umang Gupta, Kai Wang, Fei Fang, Yan Liu, and Milind Tambe. 2019. “Deep Fictitious Play for Games with Continuous Action Spaces .” In International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) (AAMAS-19).Abstract
Fictitious play has been a classic algorithm to solve two-player adversarial games with discrete action spaces. In this work we develop an approximate extension of fictitious play to two-player games with high-dimensional continuous action spaces. We use generative neural networks to approximate players’ best responses while also learning a differentiable approximate model to the players’ rewards given their actions. Both these networks are trained jointly with gradient-based optimization to emulate fictitious play. We explore our approach in zero-sum games, non zero-sum games and security game domains.
Elizabeth Bondi. 2019. “Bridging the Gap Between High-Level Reasoning in Strategic Agent Coordination and Low-Level Agent Development .” In International Conference on Autonomous Agents and Multiagent Systems (Doctoral Consortium) (AAMAS-19).Abstract
Recent advances in fields such as computer vision and natural language processing have paved the way for developing agents capable of automatically interpreting their surrounding environment. Concurrently, advances in artificial intelligence have made the coordination of many such agents possible. However, there is little work considering both the low-level reasoning that allows agents to interpret their environment, such as deep learning techniques, and the high-level reasoning that coordinates such agents. By considering both together, we can better handle real-world scenarios, for example by planning at a high level with low-level uncertainty in mind, or even by improving low-level processing by using high level reasoning to place the agent in the best scenario for success.
Elizabeth Bondi, Hoon Oh, Haifeng Xu, Fei Fang, Bistra Dilkina, and Milind Tambe. 2019. “Using Game Theory in Real Time in the Real World: A Conservation Case Study (Demonstration) .” In International Conference on Autonomous Agents and Multiagent Systems (Demonstration) (AAMAS-19).Abstract
In the real world, real-time data are now widely available, especially in security domains. Security cameras, aerial imagery, and even social media keep defenders informed when protecting important events, locations, and people. Further, advances in artificial intelligence have led to tools that can interpret these data automatically. Game theoretic models, for example, have shown great success in security. However, most of them ignore real-time information. In this paper, we demonstrate the potential to use real-time information from imagery to better inform our decisions in game theoretic models for security. As a concrete example, a conservation group called Air Shepherd uses conservation drones equipped with thermal infrared cameras to locate poachers at night and alert park rangers. They have also used lights aboard the drones, or signaled, to warn poachers of their presence, which often deters the poachers. We propose a system that (i) allocates drones and humans strategically throughout a protected area, (ii) detects poachers in the thermal infrared videos recorded by the conservation drones flying through the protected area in the predetermined location, and (iii) recommends moving to the location and/or signaling to the poacher that a patroller is nearby depending on real-time detections. View the demonstration.
Elizabeth Bondi, Hoon Oh, Haifeng Xu, Fei Fang, Bistra Dilkina, and Milind Tambe. 2019. “Broken Signals in Security Games: Coordinating Mobile Patrollers and Sensors in the Real World (Extended Abstract).” In International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) (AAMAS-19).Abstract
Mobile sensors, e.g., unmanned aerial vehicles (UAVs), are becoming increasingly important in security domains and can be used for tasks such as searching for poachers in conservation areas. Such mobile sensors augment human patrollers by assisting in surveillance and in signaling potentially deceptive information to adversaries, and their coordinated deployment could be modeled via the well-known security games framework. Unfortunately, real-world uncertainty in the sensor’s detection of adversaries and adversaries’ observation of the sensor’s signals present major challenges in the sensors’ use. This leads to significant detriments in security performance. We first discuss the current shortcomings in more detail, and then propose a novel game model that incorporates uncertainty with sensors. The defender strategy in this game model will consist of three interdependent stages: an allocation stage, a signaling stage, and a reaction stage
Sarah Cooney, Phebe Vayanos, Thanh H. Nguyen, Cleotilde Gonzalez, Christian Lebiere, Edward A. Cranford, and Milind Tambe. 2019. “Warning Time: Optimizing Strategic Signaling for Security Against Boundedly Rational Adversaries.” In International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) (AAMAS-19).Abstract
Defender-attacker Stackelberg security games (SSGs) have been
applied for solving many real-world security problems. Recent work
in SSGs has incorporated a deceptive signaling scheme into the
SSG model, where the defender strategically reveals information
about her defensive strategy to the attacker, in order to inuence
the attacker’s decision making for the defender’s own benet. In
this work, we study the problem of signaling in security games
against a boundedly rational attacker.

Pages