# Publications

2019
Elizabeth Bondi. 2019. “Bridging the Gap Between High-Level Reasoning in Strategic Agent Coordination and Low-Level Agent Development .” In International Conference on Autonomous Agents and Multiagent Systems (Doctoral Consortium) (AAMAS-19).Abstract
Recent advances in fields such as computer vision and natural language processing have paved the way for developing agents capable of automatically interpreting their surrounding environment. Concurrently, advances in artificial intelligence have made the coordination of many such agents possible. However, there is little work considering both the low-level reasoning that allows agents to interpret their environment, such as deep learning techniques, and the high-level reasoning that coordinates such agents. By considering both together, we can better handle real-world scenarios, for example by planning at a high level with low-level uncertainty in mind, or even by improving low-level processing by using high level reasoning to place the agent in the best scenario for success.
Elizabeth Bondi, Hoon Oh, Haifeng Xu, Fei Fang, Bistra Dilkina, and Milind Tambe. 2019. “Broken Signals in Security Games: Coordinating Mobile Patrollers and Sensors in the Real World (Extended Abstract).” In International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) (AAMAS-19).Abstract
Mobile sensors, e.g., unmanned aerial vehicles (UAVs), are becoming increasingly important in security domains and can be used for tasks such as searching for poachers in conservation areas. Such mobile sensors augment human patrollers by assisting in surveillance and in signaling potentially deceptive information to adversaries, and their coordinated deployment could be modeled via the well-known security games framework. Unfortunately, real-world uncertainty in the sensor’s detection of adversaries and adversaries’ observation of the sensor’s signals present major challenges in the sensors’ use. This leads to significant detriments in security performance. We first discuss the current shortcomings in more detail, and then propose a novel game model that incorporates uncertainty with sensors. The defender strategy in this game model will consist of three interdependent stages: an allocation stage, a signaling stage, and a reaction stage
Omkar Thakoor, Milind Tambe, Phebe Vayanos, Haifeng Xu, Christopher Kiekintveld, and Fei Feng. 2019. “Cyber Camouflage Games for Strategic Deception.” In Conference on Decision and Game Theory for Security, 2019.Abstract
The rapid increase in cybercrime, causing a reported annual economic
loss of $600 billion (Lewis, 2018), has prompted a critical need for effective cyber defense. Strategic criminals conduct network reconnaissance prior to executing attacks to avoid detection and establish situational awareness via scanning and fingerprinting tools. Cyber deception attempts to foil these reconnaissance efforts by camouflaging network and system attributes to disguise valuable information. Game-theoretic models can identify decisions about strategically deceiving attackers, subject to domain constraints. For effectively deploying an optimal deceptive strategy, modeling the objectives and the abilities of the attackers, is a key challenge. To address this challenge, we present Cyber Camouflage Games (CCG), a general-sum game model that captures attackers which can be diversely equipped and motivated. We show that computing the optimal defender strategy is NP-hard even in the special case of unconstrained CCGs, and present an efficient approximate solution for it. We further provide an MILP formulation accelerated with cut-augmentation for the general constrained problem. Finally, we provide experimental evidence that our solution methods are efficient and effective. Andrew Perrault, Bryan Wilder, Eric Ewing, Aditya Mate, Bistra Dilkina, and Milind Tambe. 2019. “Decision-Focused Learning of Adversary Behavior in Security Games.” In GAIW: Games, Agents and Incentives Workshop at International Conference on Autonomous Agents and Multiagent Systems (AAMAS-19).Abstract Stackelberg security games are a critical tool for maximizing the utility of limited defense resources to protect important targets from an intelligent adversary. Motivated by green security, where the defender may only observe an adversary’s response to defense on a limited set of targets, we study the problem of defending against the same adversary on a larger set of targets from the same distribution. We give a theoretical justification for why standard two-stage learning approaches, where a model of the adversary is trained for predictive accuracy and then optimized against, may fail to maximize the defender’s expected utility in this setting. We develop a decision-focused learning approach, where the adversary behavior model is optimized for decision quality, and show empirically that it achieves higher defender expected utility than the two-stage approach when there is limited training data and a large number of target features. Nitin Kamra, Umang Gupta, Kai Wang, Fei Fang, Yan Liu, and Milind Tambe. 2019. “Deep Fictitious Play for Games with Continuous Action Spaces .” In International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) (AAMAS-19).Abstract Fictitious play has been a classic algorithm to solve two-player adversarial games with discrete action spaces. In this work we develop an approximate extension of fictitious play to two-player games with high-dimensional continuous action spaces. We use generative neural networks to approximate players’ best responses while also learning a differentiable approximate model to the players’ rewards given their actions. Both these networks are trained jointly with gradient-based optimization to emulate fictitious play. We explore our approach in zero-sum games, non zero-sum games and security game domains. Nitin Kamra, Umang Gupta, Kai Wang, Fei Fang, Yan Liu, and Milind Tambe. 2019. “DeepFP for Finding Nash Equilibrium in Continuous Action Spaces .” In Conference on Decision and Game Theory for Security, 2019.Abstract Finding Nash equilibrium in continuous action spaces is a challenging problem and has applications in domains such as protecting geographic areas from potential attackers. We present DeepFP, an approximate extension of fictitious play in continuous action spaces. DeepFP represents players’ approximate best responses via generative neural networks which are highly expressive implicit density approximators. It additionally uses a game-model network which approximates the players’ expected payoffs given their actions, and trains the networks end-to-end in a model-based learning regime. Further, DeepFP allows using domain-specific oracles if available and can hence exploit techniques such as mathematical programming to compute best responses for structured games. We demonstrate stable convergence to Nash equilibrium on several classic games and also apply DeepFP to a large forest security domain with a novel defender best response oracle. We show that DeepFP learns strategies robust to adversarial Shahrzad Gholami, Amulya Yadav, Long Tran-Thanh, Bistra Dilkina, and Milind Tambe. 2019. “Don’t Put All Your Strategies in One Basket: Playing Green Security Games with Imperfect Prior Knowledge .” In International Conference on Autonomous Agents and Multiagent Systems (AAMAS-19).Abstract Security efforts for wildlife monitoring and protection of endangered species (e.g., elephants, rhinos, etc.) are constrained by limited resources available to law enforcement agencies. Recent progress in Green Security Games (GSGs) has led to patrol planning algorithms for strategic allocation of limited patrollers to deter adversaries in environmental settings. Unfortunately, previous approaches to these problems suffer from several limitations. Most notably, (i) previous work in GSG literature relies on exploitation of error-prone machine learning (ML) models of poachers’ behavior trained on (spatially) biased historical data; and (ii) online learning approaches for repeated security games (similar to GSGs) do not account for spatio-temporal scheduling constraints while planning patrols, potentially causing significant shortcomings in the effectiveness of the planned patrols. Thus, this paper makes the following novel contributions: (I) We propose MINION-sm, a novel online learning algorithm for GSGs which does not rely on any prior error-prone model of attacker behavior, instead, it builds an implicit model of the attacker on-the-fly while simultaneously generating schedulingconstraint-aware patrols. MINION-sm achieves a sublinear regret against an optimal hindsight patrol strategy. (II) We also propose MINION, a hybrid approach where our MINION-sm model and an ML model (based on historical data) are considered as two patrol planning experts and we obtain a balance between them based on their observed empirical performance. (III) We show that our online learning algorithms significantly outperform existing state-of-theart solvers for GSGs. Omkar Thakoor, Milind Tambe, Phebe Vayanos, Haifeng Xu, and Christopher Kiekintveld. 2019. “General-Sum Cyber Deception Games under Partial Attacker Valuation Information.” In International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) (AAMAS-19).Abstract The rapid increase in cybercrime, causing a reported annual economic loss of$600 billion [20], has prompted a critical need for
effective cyber defense. Strategic criminals conduct network reconnaissance prior to executing attacks to avoid detection and establish
situational awareness via scanning and fingerprinting tools. Cyber
deception attempts to foil these reconnaissance efforts; by disguising network and system attributes, among several other techniques.
Cyber Deception Games (CDG) is a game-theoretic model for optimizing strategic deception, and can apply to various deception
methods. Recently introduced initial model for CDGs assumes zerosum payoffs, implying directly conflicting attacker motives, and
perfect defender knowledge on attacker preferences. These unrealistic assumptions are fundamental limitations of the initial zero-sum
model, which we address by proposing a general-sum model that
can also handle uncertainty in the defender’s knowledge.
Alan Tsang, Bryan Wilder, Eric Rice, Milind Tambe, and Yair Zick. 2019. “Group-Fairness in Influence Maximization .” In International Joint Conference on Artificial Intelligence (IJCAI) 2019.Abstract
Influence maximization is a widely used model for information dissemination in social networks. Recent work has employed such interventions across a wide range of social problems, spanning public health, substance abuse, and international development (to name a few examples). A critical but understudied question is whether the benefits of such interventions are fairly distributed across different groups in the population; e.g., avoiding discrimination with respect to sensitive attributes such as race or gender. Drawing on legal and game-theoretic concepts, we introduce formal definitions of fairness in influence maximization. We provide an algorithmic framework to find solutions which satisfy fairness constraints, and in the process improve the state of the art for general multi-objective submodular maximization problems. Experimental results on real data from an HIV prevention intervention for homeless youth show that standard influence maximization techniques oftentimes neglect smaller groups which contribute less to overall utility, resulting in a disparity which our proposed algorithms substantially reduce.
Kai Wang, Bryan Wilder, Sze-Chuan Suen, Bistra Dilkina, and Milind Tambe. 2019. “Improving GP-UCB Algorithm by Harnessing Decomposed Feedback.” In The 4th Workshop on Data Science for Social Good at ECML-PKDD, 2019.Abstract
Gaussian processes (GPs) have been widely applied to machine learning and nonparametric approximation. Given existing observations, a GP allows the decision maker to update a posterior belief over the unknown underlying function. Usually, observations from a complex system come with noise and decomposed feedback from intermediate layers. For example, the decomposed feedback could be the components that constitute the final objective value, or the various feedback gotten from sensors. Previous literature has shown that GPs can successfully deal with noise, but has neglected decomposed feedback. We therefore propose a decomposed GP regression algorithm to incorporate this feedback, leading to less average root-mean-squared error with respect to the target function, especially when the samples are scarce. We also introduce a decomposed GP-UCB algorithm to solve the resulting bandit problem with decomposed feedback. We prove that our algorithm converges to the optimal solution and preserves the noregret property. To demonstrate the wide applicability of this work, we execute our algorithm on two disparate social problems: infectious disease control and weather monitoring. The numerical results show that our method provides significant improvement against previous methods that do not utilize these feedback, showcasing the advantage of considering decomposed feedback.
Qingyu Guo, Jiarui Gan, Fei Fang, Long Tran-Thanh, Milind Tambe, and Bo An. 2019. “On the Inducibility of Stackelberg Equilibrium for Security Games .” In AAAI conference on Artificial Intelligence (AAAI-19).Abstract
Strong Stackelberg equilibrium (SSE) is the standard solution concept of Stackelberg security games. As opposed to the weak Stackelberg equilibrium (WSE), the SSE assumes that the follower breaks ties in favor of the leader and this is widely acknowledged and justified by the assertion that the defender can often induce the attacker to choose a preferred action by making an infinitesimal adjustment to her strategy. Unfortunately, in security games with resource assignment constraints, the assertion might not be valid; it is possible that the defender cannot induce the desired outcome. As a result, many results claimed in the literature may be overly optimistic. To remedy, we first formally define the utility guarantee of a defender strategy and provide examples to show that the utility of SSE can be higher than its utility guarantee. Second, inspired by the analysis of leader’s payoff by Von Stengel and Zamir (2004), we provide the solution concept called the inducible Stackelberg equilibrium (ISE), which owns the highest utility guarantee and always exists. Third, we show the conditions when ISE coincides with SSE and the fact that in general case, SSE can be extremely worse with respect to utility guarantee. Moreover, introducing the ISE does not invalidate existing algorithmic results as the problem of computing an ISE polynomially reduces to that of computing an SSE. We also provide an algorithmic implementation for computing ISE, with which our experiment
Bryan Wilder, Jackson A. Killian, Amit Sharma, Vinod Choudhary, Bistra Dilkina, and Milind Tambe. 2019. “Integrating optimization and learning to prescribe interventions for tuberculosis patients .” In 10th International Workshop on Optimization in Multiagent Systems (OptMAS).Abstract
Creating impact in real-world settings requires agents which navigate the full pipeline from data, to predictive models, to decisions. These components are typically approached separately: a machine learning model is first trained via a measure of predictive accuracy, and then its predictions are used as input into an optimization algorithm which produces a decision. However, the loss function used to train the model may easily be misaligned with the end goal of the agent, which is to make the best decisions possible. We focus on combinatorial optimization problems and introduce a general framework for decision-focused learning, where the machine learning model is directly trained in conjunction with the optimization algorithm to produce high-quality decisions. Technically, our contribution is a means of integrating common classes of discrete optimization problems into deep learning or other predictive models, which are typically trained via gradient descent. The main idea is to use a continuous relaxation of the discrete problem to propagate gradients through the optimization procedure. We instantiate this framework for two broad classes of combinatorial problems: linear programs and submodular maximization. We then provide an application of such techniques to a real problem of societal importance: improving interventions in tuberculosis treatment. Using data on 17,000 Indian patients provided by the NGO Everwell, we consider the problem of predicting which patients are likely to miss doses of medication in the near future and optimizing interventions by health workers to avert such treatment failures. We find the decisionfocused learning improves the number of successful interventions by approximately 15% compared to standard machine learning approaches, demonstrating that aligning the goals of learning and decision making can yield substantial benefits in a socially critical application.
Sarah Cooney, Kai Wang, Elizabeth Bondi, Thanh Nguyen, Phebe Vayanos, Hailey Winetrobe, Edward A. Cranford, Cleotilde Gonzalez, Christian Lebiere, and Milind Tambe. 2019. “Learning to Signal in the Goldilocks Zone: Improving Adversary Compliance in Security Games .” In European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, (ECML-PKDD), 2019.Abstract
Many real-world security scenarios can be modeled via a game-theoretic framework known as a security game in which there is a defender trying to protect potential targets from an attacker. Recent work in security games has shown that deceptive signaling by the defender can convince an attacker to withdraw his attack. For instance, a warning message to commuters indicating speed enforcement is in progress ahead might lead to them driving more slowly, even if it turns out no enforcement is in progress. However, the results of this work are limited by the unrealistic assumption that the attackers will behave with perfect rationality, meaning they always choose an action that gives them the best expected reward. We address the problem of training boundedly rational (human) attackers to comply with signals via repeated interaction with signaling without incurring a loss to the defender, and offer the four following contributions: (i) We learn new decision tree and neural network-based models of attacker compliance with signaling. (ii) Based on these machine learning models of a boundedly rational attacker’s response to signaling, we develop a theory of signaling in the Goldilocks zone, a balance of signaling and deception that increases attacker compliance and improves defender utility. (iii) We present game-theoretic algorithms to solve for signaling schemes based on the learned models of attacker compliance with signaling. (iv) We conduct extensive human subject experiments using an online game. The game simulates the scenario of an inside attacker trying to steal sensitive information from company computers, and results show that our algorithms based on learned models of attacker behavior lead to better attacker compliance and improved defender utility compared to the state-of-the-art algorithm for rational attackers with signaling.
Bryan Wilder, Bistra Dilkina, and Milind Tambe. 2019. “Melding the Data-Decisions Pipeline: Decision-Focused Learning for Combinatorial Optimization .” In AAAI conference on Artificial Intelligence (AAAI-19).Abstract
Creating impact in real-world settings requires artificial intelligence techniques to span the full pipeline from data, to predictive models, to decisions. These components are typically approached separately: a machine learning model is first trained via a measure of predictive accuracy, and then its predictions are used as input into an optimization algorithm which produces a decision. However, the loss function used to train the model may easily be misaligned with the end goal, which is to make the best decisions possible. Hand-tuning the loss function to align with optimization is a difficult and error-prone process (which is often skipped entirely). We focus on combinatorial optimization problems and introduce a general framework for decision-focused learning, where the machine learning model is directly trained in conjunction with the optimization algorithm to produce highquality decisions. Technically, our contribution is a means of integrating common classes of discrete optimization problems into deep learning or other predictive models, which are typically trained via gradient descent. The main idea is to use a continuous relaxation of the discrete problem to propagate gradients through the optimization procedure. We instantiate this framework for two broad classes of combinatorial problems: linear programs and submodular maximization. Experimental results across a variety of domains show that decisionfocused learning often leads to improved optimization performance compared to traditional methods. We find that standard measures of accuracy are not a reliable proxy for a predictive model’s utility in optimization, and our method’s ability to specify the true goal as the model’s training objective yields substantial dividends across a range of decision problems.
Sarah Cooney, Wendy Gomez, Kai Wang, Jorja Leap, P. Jefferey Brantingham, and Milind Tambe. 2019. “Mobile Game Theory with Street Gangs.” In The 4th Workshop on Data Science for Social Good at ECML-PKDD, 2019.Abstract
Gang violence remains a persistent public safety problem
in Los Angeles. Gang interventionists and community organizers are
turning to proactive peacekeeping, a process of addressing the underlying structures that cause young people to join gangs such as pervasive poverty and marginalization. Given the increasing prevalence and
decreasing cost of mobile technology, there may be opportunities for interventionists to employ technological solutions in their work. However,
before such solutions can be deployed, it is necessary to have accurate
models of the target users—in this case, gang-involved youth. Of particular interest with regard proactive peacekeeping is their propensity for
cooperation. However, given the unique circumstances surrounding the
lives of gang-involved youth, traditional laboratory-based experiments
measuring cooperation are infeasible. In this paper, we present a novel
method of collecting experimental data from gang-involved youth in the
Los Angeles area. We design a mobile application based on the classic Prisoner’s Dilemma model, which has been used to collect almost
3000 data points on cooperation from more than 20 participants. We
present initial results that show despite their unique life circumstances
gang-involved youth cooperate at roughly the same rate as university students in classic studies of cooperation. We conclude by addressing the
implications of this result for future work and proactive peacekeeping
endeavors
Shahrzad Gholami. 2019. “Predicting and Planning against Real-world Adversaries: An End-to-end Pipeline to Combat Illegal Wildlife Poachers on a Global Scale”.Abstract

Security is a global concern and a unifying theme in various security projects is strategic reasoning where the mathematical framework of machine learning and game theory can be integrated
and applied. For example, in the environmental sustainability domain, the problem of protecting
endangered wildlife from attacks (i.e., poachers’ strikes) can be abstracted as a game between
defender(s) and attacker(s). Applying previous research on security games to sustainability domains (denoted as Green Security Games) introduce several novel challenges that I address in
my thesis to create computationally feasible and accurate algorithms in order to model complex
adversarial behavior based on real-world data and to generate optimal defender strategy.
My thesis provides four main contributions to the emerging body of research in using machine
learning and game theory framework for the fundamental challenges existing in the environmental sustainability domain, namely (i) novel spatio-temporal and uncertainty-aware machine
learning models for complex adversarial behavior based on the imperfect real-world data, (ii) the
first large-scale field test evaluation of the machine learning models in the adversarial settings
concerning the environmental sustainability, (iii) a novel multi-expert online learning model for
constrained patrol planning, and (iv) the first game theoretical model to generate optimal defender
strategy against collusive adversaries.

In regard to the first contribution, I developed bounded rationality models for adversaries
based on the real-world data that account for the naturally occurring uncertainty in past attack
evidence collected by defenders. To that end, I proposed two novel predictive behavioral models,
which I improved progressively. The second major contribution of my thesis is a large-scale field
test evaluation of the proposed adversarial behavior model beyond the laboratory. Particularly,
my thesis is motivated by the challenges in wildlife poaching, where I directed the defenders
(i.e., rangers) to the hotspots of adversaries that they would have missed. During these experiments across multiple vast national parks, several snares and snared animals were detected, and
poachers were arrested, potentially more wildlife saved. The algorithm I proposed, that combines
machine learning and game-theoretic patrol planning is planned to be deployed at 600 national
parks around the world in the near future to combat illegal poaching.
The third contribution in my thesis introduces a novel multi-expert online learning model for
constrained and randomized patrol planning, which benefits from several expert planners where
insufficient or imperfect historical records of past attacks are available to learn adversarial behavior. The final contribution of my thesis is developing an optimal solution against collusive
adversaries in security games assuming both rational and boundedly rational adversaries. I conducted human subject experiments on Amazon Mechanical Turk involving 700 human subjects
using a web-based game that simulates collusive security games.

Aida Rahmattalabi, Phebe Vayanos, Anthony Fulginiti, and Milind Tambe. 2019. “Robust Peer-Monitoring on Graphs with an Application to Suicide Prevention in Social Networks .” In International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) (AAMAS-19).Abstract
We consider the problem of selecting a subset of nodes (individuals)
in a (social) network that can act as monitors capable of “watchingout” for their neighbors (friends) when the availability or performance of the chosen monitors is uncertain. Such problems arise
for example in the context of “Gatekeeper Trainings” for suicide
prevention. We formulate this problem as a two-stage robust optimization problem that aims to maximize the worst-case number of
covered nodes. Our model is capable of incorporating domain specific constraints, e.g., fairness constraints. We propose a practically
tractable approximation scheme and we provide empirical results
that demonstrate the effectiveness of our approach.
Edward A. Cranford, Cleotilde Gonzalez, Palvi Aggarwal, Sarah Cooney, Milind Tambe, and Christian Lebiere. 2019. “Towards Personalized Deceptive Signaling for Cyber Defense Using Cognitive Models .” In International Conference on Cognitive Modeling (ICCM), 2019.Abstract
Recent research in cybersecurity has begun to develop active defense strategies using game-theoretic optimization of the allocation of limited defenses combined with deceptive signaling. While effective, the algorithms are optimized against perfectly rational adversaries. In a laboratory experiment, we pit humans against the defense algorithm in an online game designed to simulate an insider attack scenario. Humans attack far more often than predicted under perfect rationality. Optimizing against human bounded rationality is vitally important. We propose a cognitive model based on instancebased learning theory and built in ACT-R that accurately predicts human performance and biases in the game. We show that the algorithm does not defend well, largely due to its static nature and lack of adaptation to the particular individual’s actions. Thus, we propose an adaptive method of signaling that uses the cognitive model to trace an individual’s experience in real time, in order to optimize defenses. We discuss the results and implications of personalized defense. Keywords: cyber deception; cognitive models; instance-based learning; knowledge-tracing; model-tracing
Elizabeth Bondi, Hoon Oh, Haifeng Xu, Fei Fang, Bistra Dilkina, and Milind Tambe. 2019. “Using Game Theory in Real Time in the Real World: A Conservation Case Study (Demonstration) .” In International Conference on Autonomous Agents and Multiagent Systems (Demonstration) (AAMAS-19).Abstract
In the real world, real-time data are now widely available, especially in security domains. Security cameras, aerial imagery, and even social media keep defenders informed when protecting important events, locations, and people. Further, advances in artificial intelligence have led to tools that can interpret these data automatically. Game theoretic models, for example, have shown great success in security. However, most of them ignore real-time information. In this paper, we demonstrate the potential to use real-time information from imagery to better inform our decisions in game theoretic models for security. As a concrete example, a conservation group called Air Shepherd uses conservation drones equipped with thermal infrared cameras to locate poachers at night and alert park rangers. They have also used lights aboard the drones, or signaled, to warn poachers of their presence, which often deters the poachers. We propose a system that (i) allocates drones and humans strategically throughout a protected area, (ii) detects poachers in the thermal infrared videos recorded by the conservation drones flying through the protected area in the predetermined location, and (iii) recommends moving to the location and/or signaling to the poacher that a patroller is nearby depending on real-time detections. View the demonstration.
Kai Wang, Aditya Mate, Bryan Wilder, Andrew Perrault, and Milind Tambe. 2019. “Using Graph Convolutional Networks to Learn Interdiction Games .” In AI for Social Good workshop (AI4SG) at International Joint Conference on Artificial Intelligence (IJCAI) 2019.Abstract
Illegal smuggling is one of the most important issues across countries, where more than \$10 billion a year of illegal wildlife trafficking is conducted within transnational criminal networks. Governments have tried to deploy inspections at checkpoints to stop illegal smuggling, though the effect is quite limited due to vast protection areas but limited human resources. We study these problems from the perspective of network interdiction games with a boundedly rational attacker. In this paper, we aim to improve the efficiency of the limited number of checkpoints. The problem involves two main stages: i) a predictive stage to predict the attacker’s behavior based on the historical interdiction; ii) a prescriptive stage to optimally allocate limited checkpoints to interdict the attacker. In this paper, we propose a novel boundedly rational model which resolves the issue of exponentially many attacker strategies by making memoryless assumption about the attacker’s behavior. We show that the attacker’s behavior can be reduced to an absorbing Markov chain, where the success probability of reaching any target can be computed analytically, thus optimized via any gradient-based optimization technique. We incorporate graph convolutional neural networks with K-hops look-ahead to model the attacker’s reasoning. Our proposed model provides a new perspective to study the boundedly rationality in traditional interdiction games with graph structure. This novel model possesses nice analytical properties and scales up very well by avoiding enumerating all paths in the graph.