Preventing poaching through ranger patrols is critical for protecting endangered wildlife. Combinatorial bandits have been used to allocate limited patrol resources, but existing approaches overlook the fact that each location is home to multiple species in varying proportions, so a patrol benefits each species to differing degrees. When some species are more vulnerable, we ought to offer more protection to these animals; unfortunately, existing combinatorial bandit approaches do not offer a way to prioritize important species. To bridge this gap, (1) We propose a novel combinatorial bandit objective that trades off between reward maximization and also accounts for prioritization over species, which we call ranked prioritization. We show this objective can be expressed as a weighted linear sum of Lipschitz-continuous reward functions. (2) We provide RankedCUCB, an algorithm to select combinatorial actions that optimize our prioritization-based objective, and prove that it achieves asymptotic no-regret. (3) We demonstrate empirically that RankedCUCB leads to up to 38% improvement in outcomes for endangered species using real-world wildlife conservation data. Along with adapting to other challenges such as preventing illegal logging and overfishing, our no-regret algorithm addresses the general combinatorial bandit problem with a weighted linear objective.
Wildlife poaching fuels the multi-billion dollar illegal wildlife trade and pushes countless species to the brink of extinction. To aid rangers in preventing poaching in protected areas around the world, we have developed PAWS, the Protection Assistant for Wildlife Security. We present technical advances in multi-armed bandits and robust sequential decision-making using reinforcement learning, with research questions that emerged from on-the-ground challenges. We also discuss bridging the gap between research and practice, presenting results from field deployment in Cambodia and large-scale deployment through integration with SMART, the leading software system for protected area management used by over 1,000 wildlife parks worldwide.
Green security domains feature defenders who plan patrols in the face of uncertainty about the adversarial behavior of poachers, illegal loggers, and illegal fishers. Importantly, the deterrence effect of patrols on adversaries' future behavior makes patrol planning a sequential decision-making problem. Therefore, we focus on robust sequential patrol planning for green security following the minimax regret criterion, which has not been considered in the literature. We formulate the problem as a game between the defender and nature who controls the parameter values of the adversarial behavior and design an algorithm MIRROR to find a robust policy. MIRROR uses two reinforcement learning-based oracles and solves a restricted game considering limited defender strategies and parameter values. We evaluate MIRROR on real-world poaching data.
Conservation efforts in green security domains to protect wildlife and forests are constrained by the limited availability of defenders (i.e., patrollers), who must patrol vast areas to protect from attackers (e.g., poachers or illegal loggers). Defenders must choose how much time to spend in each region of the protected area, balancing exploration of infrequently visited regions and exploitation of known hotspots. We formulate the problem as a stochastic multi-armed bandit, where each action represents a patrol strategy, enabling us to guarantee the rate of convergence of the patrolling policy. However, a naive bandit approach would compromise short-term performance for long-term optimality, resulting in animals poached and forests destroyed. To speed up performance, we leverage smoothness in the reward function and decomposability of actions. We show a synergy between Lipschitz-continuity and decomposition as each aids the convergence of the other. In doing so, we bridge the gap between combinatorial and Lipschitz bandits, presenting a no-regret approach that tightens existing guarantees while optimizing for short-term performance. We demonstrate that our algorithm, LIZARD, improves performance on real-world poaching data from Cambodia.
Applications of artificial intelligence for wildlife protection have focused on learning models of poacher behavior based on historical patterns. However, poachers' behaviors are described not only by their historical preferences, but also their reaction to ranger patrols. Past work applying machine learning and game theory to combat poaching have hypothesized that ranger patrols deter poachers, but have been unable to find evidence to identify how or even if deterrence occurs. Here for the first time, we demonstrate a measurable deterrence effect on real-world poaching data. We show that increased patrols in one region deter poaching in the next timestep, but poachers then move to neighboring regions. Our findings offer guidance on how adversaries should be modeled in realistic game-theoretic settings.
Illegal wildlife poaching threatens ecosystems and drives endangered species toward extinction. However, efforts for wildlife protection are constrained by the limited resources of law enforcement agencies. To help combat poaching, the Protection Assistant for Wildlife Security (PAWS) is a machine learning pipeline that has been developed as a data-driven approach to identify areas at high risk of poaching throughout protected areas and compute optimal patrol routes. In this paper, we take an end-to-end approach to the data-to-deployment pipeline for anti-poaching. In doing so, we address challenges including extreme class imbalance (up to 1:200), bias, and uncertainty in wildlife poaching data to enhance PAWS, and we apply our methodology to three national parks with diverse characteristics. (i) We use Gaussian processes to quantify predictive uncertainty, which we exploit to improve robustness of our prescribed patrols and increase detection of snares by an average of 30%. We evaluate our approach on real-world historical poaching data from Murchison Falls and Queen Elizabeth National Parks in Uganda and, for the first time, Srepok Wildlife Sanctuary in Cambodia. (ii) We present the results of large-scale field tests conducted in Murchison Falls and Srepok Wildlife Sanctuary which confirm that the predictive power of PAWS extends promisingly to multiple parks. This paper is part of an effort to expand PAWS to 800 parks around the world through integration with SMART conservation software.
Poachers are engaged in extinction level wholesale slaughter, so it is critical to harness historical data for predicting poachers’ behavior. However, in these domains, data collected about adversarial actions are remarkably imperfect, where reported negative instances of crime may be mislabeled or uncertain. Unfortunately, past attempts to develop predictive and prescriptive models to address this problem suffer from shortcomings from a modeling perspective as well as in the implementability of their techniques. Most notably these models i) neglect the uncertainty in crime data, leading to inaccurate and biased predictions of adversary behavior, ii) use coarse-grained crime analysis and iii) do not provide a convincing evaluation as they only look at a single protected area. Additionally, they iv) proposed time-consuming techniques which cannot be directly integrated into low resource outposts. In this innovative application paper, we (I) introduce iWare-E a novel imperfect-observation aWare Ensemble (iWare-E) technique, which is designed to handle the uncertainty in crime information efficiently. This approach leads to superior accuracy and efficiency for adversary behavior prediction compared to the previous stateof-the-art. We also demonstrate the country-wide efficiency of the models and are the first to (II) evaluate our adversary behavioral model across different protected areas in Uganda, i.e., Murchison Fall and Queen Elizabeth National Park, (totaling about 7500 km2) as well as (III) on fine-grained temporal resolutions. Lastly, (IV) we provide a scalable planning algorithm to design fine-grained patrol routes for the rangers, which achieves up to 150% improvement in number of predicted attacks detected.
Poaching is considered a major driver for the population drop of key species such as tigers, elephants, and rhinos, which can be detrimental to whole ecosystems. While conducting foot patrols is the most commonly used approach in many countries to prevent poaching, such patrols often do not make the best use of the limited patrolling resources. This paper presents PAWS, a game-theoretic application deployed in Southeast Asia for optimizing foot patrols to combat poaching. In this paper, we report on the significant evolution of PAWS from a proposed decision aid introduced in 2014 to a regularly deployed application. We outline key technical advances that lead to PAWS’s regular deployment: (i) incorporating complex topographic features, e.g., ridgelines, in generating patrol routes; (ii) handling uncertainties in species distribution (game theoretic payoffs); (iii) ensuring scalability for patrolling large-scale conservation areas with fine-grained guidance; and (iv) handling complex patrol scheduling constraints.
Worldwide, conservation agencies employ rangers to protect conservation areas from poachers. However, agencies lack the manpower to have rangers effectively patrol these vast areas frequently. While past work has modeled poachers’ behavior so as to aid rangers in planning future patrols, those models’ predictions were not validated by extensive field tests. In this paper, we present a hybrid spatio-temporal model that predicts poaching threat levels and results from a five-month field test of our model in Uganda’s Queen Elizabeth Protected Area (QEPA). To our knowledge, this is the first time that a predictive model has been evaluated through such an extensive field test in this domain. We present two major contributions. First, our hybrid model consists of two components: (i) an ensemble model which can work with the limited data common to this domain and (ii) a spatio-temporal model to boost the ensemble’s predictions when sufficient data are available. When evaluated on real-world historical data from QEPA, our hybrid model achieves significantly better performance than previous approaches with either temporally-aware dynamic Bayesian networks or an ensemble of spatially-aware models. Second, in collaboration with the Wildlife Conservation Society and Uganda Wildlife Authority, we present results from a five-month controlled experiment where rangers patrolled over 450 sq km across QEPA. We demonstrate that our model successfully predicted (1) where snaring activity would occur and (2) where it would not occur; in areas where we predicted a high rate of snaring activity, rangers found more snares and snared animals than in areas of lower predicted activity. These findings demonstrate that (1) our model’s predictions are selective, (2) our model’s superior laboratory performance extends to the real world, and (3) these predictive models can aid rangers in focusing their efforts to prevent wildlife poaching and save animals.
Wildlife poaching presents a serious extinction threat to many animal species. Agencies (“defenders”) focused on protecting such animals need tools that help analyze, model and predict poacher activities, so they can more effectively combat such poaching; such tools could also assist in planning effective defender patrols, building on the previous security games research. To that end, we have built a new predictive anti-poaching tool, CAPTURE (Comprehensive Anti-Poaching tool with Temporal and observation Uncertainty REasoning). CAPTURE provides four main contributions. First, CAPTURE’s modeling of poachers provides significant advances over previous models from behavioral game theory and conservation biology. This accounts for: (i) the defender’s imperfect detection of poaching signs; (ii) complex temporal dependencies in the poacher’s behaviors; (iii) lack of knowledge of numbers of poachers. Second, we provide two new heuristics: parameter separation and target abstraction to reduce the computational complexity in learning the poacher models. Third, we present a new game-theoretic algorithm for computing the defender’s optimal patrolling given the complex poacher model. Finally, we present detailed models and analysis of realworld poaching data collected over 12 years in Queen Elizabeth National Park in Uganda to evaluate our new model’s prediction accuracy. This paper thus presents the largest dataset of real-world defender-adversary interactions analyzed in the security games literature. CAPTURE will be tested in Uganda in early 2016.
Poaching is a serious threat to the conservation of key species and whole ecosystems. While conducting foot patrols is the most commonly used approach in many countries to prevent poaching, such patrols often do not make the best use of limited patrolling resources. To remedy this situation, prior work introduced a novel emerging application called PAWS (Protection Assistant for Wildlife Security); PAWS was proposed as a game-theoretic (“security games”) decision aid to optimize the use of patrolling resources. This paper reports on PAWS’s significant evolution from a proposed decision aid to a regularly deployed application, reporting on the lessons from the first tests in Africa in Spring 2014, through its continued evolution since then, to current regular use in Southeast Asia and plans for future worldwide deployment. In this process, we have worked closely with two NGOs (Panthera and Rimba) and incorporated extensive feedback from professional patrolling teams. We outline key technical advances that lead to PAWS’s regular deployment: (i) incorporating complex topographic features, e.g., ridgelines, in generating patrol routes; (ii) handling uncertainties in species distribution (game theoretic payoffs); (iii) ensuring scalability for patrolling large-scale conservation areas with fine-grained guidance; and (iv) handling complex patrol scheduling constraints.
While human behavior models based on repeated Stackelberg games have been proposed for domains such as “wildlife crime” where there is repeated interaction between the defender and the adversary, there has been no empirical study with human subjects to show the effectiveness of such models. This paper presents an initial study based on extensive human subject experiments with participants on Amazon Mechanical Turk (AMT). Our findings include: (i) attackers may view the defender’s coverage probability in a non-linear fashion; specifically it follows an S-shaped curve, and (ii) there are significant losses in defender utility when strategies generated by existing models are deployed in repeated Stackelberg game settings against human subjects.
Recent research on Green Security Games (GSG), i.e., security games for the protection of wildlife, forest and fisheries, relies on the promise of an abundance of available data in these domains to learn adversary behavioral models and determine game payoffs. This research suggests that adversary behavior models (capturing bounded rationality) can be learned from real-world data on where adversaries have attacked, and that game payoffs can be determined precisely from data on animal densities. However, previous work has, as yet, failed to demonstrate the usefulness of these behavioral models in capturing adversary behaviors based on real-world data in GSGs. Previous work has also been unable to address situations where available data is insufficient to accurately estimate behavioral models or to obtain the required precision in the payoff values. In addressing these limitations, as our first contribution, this paper, for the first time, provides validation of the aforementioned adversary behavioral models based on real-world data from a wildlife park in Uganda. Our second contribution addresses situations where real-world data is not precise enough to determine exact payoffs in GSG, by providing the first algorithm to handle payoff uncertainty in the presence of adversary behavioral models. This algorithm is based on the notion of minimax regret. Furthermore, in scenarios where the data is not even sufficient to learn adversary behaviors, our third contribution is to provide a novel algorithm to address payoff uncertainty assuming a perfectly rational attacker (instead of relying on a behavioral model); this algorithm allows for a significant scaleup for large security games. Finally, to reduce the problems due to paucity of data, given mobile sensors such as Unmanned Aerial Vehicles (UAV), we introduce new payoff elicitation strategies to strategically reduce uncertainty.
Building on the successful applications of Stackelberg Security Games (SSGs) to protect infrastructure, researchers have begun focusing on applying game theory to green security domains such as protection of endangered animals and fish stocks. Previous efforts in these domains optimize defender strategies based on the standard Stackelberg assumption that the adversaries become fully aware of the defender’s strategy before taking action. Unfortunately, this assumption is inappropriate since adversaries in green security domains often lack the resources to fully track the defender strategy. This paper (i) introduces Green Security Games (GSGs), a novel game model for green security domains with a generalized Stackelberg assumption; (ii) provides algorithms to plan effective sequential defender strategies — such planning was absent in previous work; (iii) proposes a novel approach to learn adversary models that further improves defender performance; and (iv) provides detailed experimental analysis of proposed approaches.
Endangered species around the world are in danger of extinction from poaching. From the start of the 20th century, the African rhino population has dropped over 98%  and the global tiger population has dropped over 95% , resulting in multiple species extinctions in both groups. Species extinctions have negative consequences on local ecosystems, economies, and communities. To protect these species, countries have set up conservation agencies and national parks, such as Uganda’s Queen Elizabeth National Park (QENP). However, a common lack of funding for these agencies results in a lack of law enforcement resources to protect these large, rural areas. As an example of the scale of disparity, one wildlife crime study in 2007 reported an actual coverage density of one ranger per 167 square kilometers . Because of the hazards involved (e.g., armed poachers, wild animals), rangers patrol in groups, further increasing the amount of area they are responsible for patrolling. Security game research has typically been concerned with combating terrorism, and this field has indeed benefited from a range of successfully deployed applications [1, 6]. These applications have enabled security agencies to make more efficient use of their limited resources. In this previous research, adversary data has been absent during the development of these solutions, and thus, it has been difficult to make accurate adversary behavior models during algorithm development. In a domain such as wildlife crime, interactions with the adversary are frequent and repeated, thus enabling conservation agencies to collect data. This presence of data enables security game researchers to begin developing algorithms that incorporate this data into, potentially, more accurate behavior models and consequently better security solutions. Developed in conjunction with staff at QENP, the Protection Assistant for Wildlife Security (PAWS) generates optimized defender strategies for use by park rangers . Due to the repeated nature of wildlife crime, PAWS is able to leverage crime event data - a previously unrealized capability in security games research. Thus, PAWS implements a novel adaptive algorithm that processes crime event data, builds multiple human behavior models, and, based on those models, predicts where adversaries will attack next. These predictions are then used to generate a patrol strategy for the rangers (i.e., a set of patrol waypoints) that can be viewed on a GPS unit. Against this background, the demonstration presented in this paper introduces two contributions. First, we present the PAWS system which incorporates the algorithm in  into a scheduling system and a GPS visualizer. Second, we present a software interface to run a number of human subject experiments (HSE) to evaluate and improve the efficacy of PAWS before its deployment in QENP. By conducting these HSEs, we can (i) test the PAWS algorithms with repeated interactions with humans, thus providing a more realistic testing environment than in its previous simulations; (ii) generate data that can be used to initialize PAWS’s human behavior models for deployment, and (iii) compare the current PAWS algorithms’ performance to alternatives and determine if additional improvements are needed prior to deployment. To provide proper context for the presentation, this paper also presents a brief overview of the PAWS system data flow and its adaptive algorithms. The demonstration will engage audience members by having them participate in the HSEs and using the GPS unit to visualize a patrol schedule in QENP.