Beyond "To Act or Not to Act": Fast Lagrangian Approaches to General Multi-Action Restless Bandits
Jackson A Killian, Andrew Perrault, and Milind Tambe. 5/2021. “Beyond "To Act or Not to Act": Fast Lagrangian Approaches to General Multi-Action Restless Bandits.” In 20th International Conference on Autonomous Agents and Multiagent Systems. multi_action_bandits_aamas_2021_preprint.pdf
Alok Talekar, Sharad Shriram, Nidhin Vaidhiyan, Gaurav Aggarwal, Jiangzhuo Chen, Srini Venkatramanan, Lijing Wang, Aniruddha Adiga, Adam Sadilek, Ashish Tendulkar, Madhav Marathe, Rajesh Sundaresan, and Milind Tambe. 5/2021. “Cohorting to isolate asymptomatic spreaders: An agent-based simulation study on the Mumbai Suburban Railway.” In 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) as a Short Paper). Publisher's Version 20210212_talekar_cohort_crc.pdf
Kai Wang, Jeffrey Brantingham, and Milind Tambe. 5/2021. “Learning Opportunistic Adversarial Model on Global Wildlife Trade.” In AAMAS Autonomous Agents for Social Good Workshop (AASG).Abstract
Global illegal wildlife trade threatens biodiversity and acts as a potential crisis of invasive species and disease spread. Despite a wide range of national and international policies and regulations designed to stop illegal wildlife trade, high profit margins and increasing demand drive a vigorous global illicit trade network. In this paper, we aim to build an adversarial model to predict the future wildlife trade based on the historical trade data. We hypothesize that the majority of illegal wildlife trade is opportunistic crime, which is highly correlated to legal wildlife trade. We can therefore leverage the abundant legal wildlife trade data to forecast the future wildlife trade, where a fixed fraction of trade volume will reflect the opportunistic wildlife trade volume. To learn a legal wildlife trade model, we propose to use graph neural networks and meta-learning to handle the network and species dependencies, respectively. Lastly, we suggest to incorporate agent-based models on top of our model to study the evolution from opportunistic to more organized illegal wildlife trade behavior.
Lily Xu, Elizabeth Bondi, Fei Fang, Andrew Perrault, Kai Wang, and Milind Tambe. 2/2021. “Dual-Mandate Patrols: Multi-Armed Bandits for Green Security.” In Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21).Abstract
Conservation efforts in green security domains to protect wildlife and forests are constrained by the limited availability of defenders (i.e., patrollers), who must patrol vast areas to protect from attackers (e.g., poachers or illegal loggers). Defenders must choose how much time to spend in each region of the protected area, balancing exploration of infrequently visited regions and exploitation of known hotspots. We formulate the problem as a stochastic multi-armed bandit, where each action represents a patrol strategy, enabling us to guarantee the rate of convergence of the patrolling policy. However, a naive bandit approach would compromise short-term performance for long-term optimality, resulting in animals poached and forests destroyed. To speed up performance, we leverage smoothness in the reward function and decomposability of actions. We show a synergy between Lipschitz-continuity and decomposition as each aids the convergence of the other. In doing so, we bridge the gap between combinatorial and Lipschitz bandits, presenting a no-regret approach that tightens existing guarantees while optimizing for short-term performance. We demonstrate that our algorithm, LIZARD, improves performance on real-world poaching data from Cambodia.
Aida Rahmattalabi *, Shahin Jabbari *, Himabindu Lakkaraju, Phebe Vayanos, Max Izenberg, Ryan Brown, Eric Rice, and Milind Tambe. 2/2021. “Fair Influence Maximization: A Welfare Optimization Approach.” AAAI Conference on Artificial Intelligence 2/2021. Abstract
Several behavioral, social, and public health interventions, such as suicide/HIV prevention or community preparedness against natural disasters, leverage social network information to maximize outreach. Algorithmic influence maximization techniques have been proposed to aid with the choice of “peer leaders” or “influencers” in such interventions. Yet, traditional algorithms for influence maximization have not been designed with these interventions in mind. As a result, they may disproportionately exclude minority communities from the benefits of the intervention. This has motivated research on fair influence maximization. Existing techniques come with two major drawbacks. First, they require committing to a single fairness measure. Second, these measures are typically imposed as strict constraints leading to undesirable properties such as wastage of resources. To address these shortcomings, we pro- vide a principled characterization of the properties that a fair influence maximization algorithm should satisfy. In particular, we propose a framework based on social welfare theory, wherein the cardinal utilities derived by each community are aggregated using the isoelastic social welfare functions. Under this framework, the trade-off between fairness and efficiency can be controlled by a single inequality aversion design parameter. We then show under what circumstances our proposed principles can be satisfied by a welfare function. The resulting optimization problem is monotone and submodular and can be solved efficiently with optimality guarantees. Our frame- work encompasses as special cases leximin and proportional fairness. Extensive experiments on synthetic and real world datasets including a case study on landslide risk management demonstrate the efficacy of the proposed framework.
Fair Influence Maximization: A Welfare Optimization Approach
Jackson A Killian, Andrew Perrault, and Milind Tambe. 1/2021. “Beyond “To Act or Not to Act”: Fast Lagrangian Approaches to General Multi-Action Restless Bandits (Short version).” In IJCAI 2020 AI for Social Good workshop. multi-action-bandits-short.pdf
Han-Ching Ou, Haipeng Chen, Shahin Jabbari, and Milind Tambe. 2021. “Active Screening for Recurrent Diseases: A Reinforcement Learning Approach.” In In 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS). London, UK.Abstract

Active screening is a common approach in controlling the spread of recurring infectious diseases such as tuberculosis and influenza. In this approach,  health workers periodically select a subset of population for screening. However, given the limited number of health workers, only a small subset of the population
can be visited in any given time period. Given the recurrent nature of the disease and rapid spreading, the goal is to minimize the number of infections over a long time horizon.
Active screening can be formalized as a sequential combinatorial optimization over the network of people and their connections.
The main computational challenges in this formalization arise from i) the combinatorial nature of the problem, ii) the need of sequential planning and iii) the uncertainties in the infectiousness states of the population.

Previous works on active screening fail to scale to large time horizon while fully considering the future effect of current interventions. In this paper, we propose a novel reinforcement learning (RL) approach based on Deep Q-Networks (DQN), with several innovative adaptations that are designed to address the above challenges. First, we use graph convolutional networks (GCNs) to represent the Q-function that exploit the node correlations of the underlying contact network. Second, to avoid solving a combinatorial optimization problem in each time period, we decompose the node set selection as a sub-sequence of decisions, and further design a two-level RL framework that solves the problem in a hierarchical way.
Finally, to speed-up the slow convergence of RL which arises from reward sparseness, we incorporate ideas from curriculum learning into our hierarchical RL approach. We evaluate our RL algorithm on several real-world networks. Results show that our RL algorithm can scale up to 10 times the problem size of state-of-the-art (the variant that considers the effect of future interventions but un-scalable) in terms of planning time horizon. Meanwhile, it outperforms state-of-the-art (the variant that scales up but does not consider the effect of future interventions) by up to 33% in solution quality.

Bryan Wilder, Laura Onasch-Vera, Graham Diguiseppi, Robin Petering, Chyna Hill, Amulya Yadav, Eric Rice, and Milind Tambe. 2021. “Clinical trial of an AI-augmented intervention for HIV prevention in youth experiencing homelessness.” In AAAI Conference on Artificial Intelligence.Abstract

Youth experiencing homelessness (YEH) are subject to substantially greater risk of HIV infection, compounded both by their lack of access to stable housing and the disproportionate representation of youth of marginalized racial, ethnic, and gender identity groups among YEH. A key goal for health equity is to improve adoption of protective behaviors in this population. One promising strategy for intervention is to recruit peer leaders from the population of YEH to promote behaviors such as condom usage and regular HIV testing to their social contacts. This raises a computational question: which youth should be selected as peer leaders to maximize the overall impact of the intervention? We developed an artificial intelligence system to optimize such social network interventions in a community health setting. We conducted a clinical trial enrolling 713 YEH at drop-in centers in a large US city. The clinical trial compared interventions planned with the algorithm to those where the highest-degree nodes in the youths' social network were recruited as peer leaders (the standard method in public health) and to an observation-only control group. Results from the clinical trial show that youth in the AI group experience statistically significant reductions in key risk behaviors for HIV transmission, while those in the other groups do not. This provides, to our knowledge, the first empirical validation of the usage of AI methods to optimize social network interventions for health. We conclude by discussing lessons learned over the course of the project which may inform future attempts to use AI in community-level interventions.

Nikhil Behari, Elizabeth Bondi, Christopher D. Golden, Hervet J. Randriamady, and Milind Tambe. 2021. “Satellite-Based Food Market Detection for Micronutrient Deficiency Prediction.” 5th International Workshop on Health Intelligence (W3PHIAI-21) at AAAI. w3phiai_21_workshop_paper.pdf
Anika Puri and Elizabeth Bondi. 2021. “Space, Time, and Counts: Improved Human vs Animal Detection in Thermal Infrared Drone Videos for Prevention of Wildlife Poaching” Fragile Earth (FEED) Workshop at KDD 2021. feed_kdd_2021_final.pdf
Bryan Wilder, Michael Mina, and Milind Tambe. 2021. “Tracking disease outbreaks from sparse data with Bayesian inference.” In AAAI Conference on Artificial Intelligence.Abstract

The COVID-19 pandemic provides new motivation for a classic problem in epidemiology: estimating the empirical rate of transmission during an outbreak (formally, the time-varying reproduction number) from case counts. While standard methods exist, they work best at coarse-grained national or state scales with abundant data, and struggle to accommodate the partial observability and sparse data common at finer scales (e.g., individual schools or towns). For example, case counts may be sparse when only a small fraction of infections are caught by a testing program. Or, whether an infected individual tests positive may depend on the kind of test and the point in time when they are tested. We propose a Bayesian framework which accommodates partial observability in a principled manner. Our model places a Gaussian process prior over the unknown reproduction number at each time step and models observations sampled from the distribution of a specific testing program. For example, our framework can accommodate a variety of kinds of tests (viral RNA, antibody, antigen, etc.) and sampling schemes (e.g., longitudinal or cross-sectional screening). Inference in this framework is complicated by the presence of tens or hundreds of thousands of discrete latent variables. To address this challenge, we propose an efficient stochastic variational inference method which relies on a novel gradient estimator for the variational objective. Experimental results for an example motivated by COVID-19 show that our method produces an accurate and well-calibrated posterior, while standard methods for estimating the reproduction number can fail badly.

Xinrun Wang, Tarun Nair, Haoyang Li, Rueben Wong, Nachiket Kelkar, Srinivas Vaidyanathan, Rajat Nayak, Bo An, Jagdish Krishnaswamy, and Milind Tambe. 12/6/2020. “Efficient Reservoir Management throughDeep Reinforcement Learning.” In NeuriPS2020 workshop on AI for Earth Sciences. google_dam_arxiv.pdf
Kai Wang, Bryan Wilder, Andrew Perrault, and Milind Tambe. 12/5/2020. “Automatically Learning Compact Quality-aware Surrogates for Optimization Problems.” In NeurIPS 2020 (spotlight). Vancouver, Canada.Abstract
Solving optimization problems with unknown parameters often requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values. Recent work has shown that including the optimization problem as a layer in the model training pipeline results in predictions of the unobserved parameters that lead to higher decision quality. Unfortunately, this process comes at a large computational cost because the optimization problem must be solved and differentiated through in each training iteration; furthermore, it may also sometimes fail to improve solution quality due to  non-smoothness issues that arise when training through a complex optimization layer. To address these shortcomings, we learn a low-dimensional surrogate model of a large optimization problem by representing the feasible space in terms of meta-variables, each of which is a linear combination of the original variables. By training a low-dimensional surrogate model end-to-end, and jointly with the predictive model, we achieve: i) a large reduction in training and inference time; and ii) improved performance by focusing attention on the more important variables in the optimization and learning in a smoother space. Empirically, we demonstrate these improvements on a non-convex adversary modeling task, a submodular recommendation task and a convex portfolio optimization task.
Aditya Mate*, Jackson A. Killian*, Haifeng Xu, Andrew Perrault, and Milind Tambe. 12/5/2020. “Collapsing Bandits and Their Application to Public Health Interventions.” In Advances in Neural and Information Processing Systems (NeurIPS) 12/5/2020. Vancouver, Canada. Publisher's VersionAbstract
We propose and study Collapsing Bandits, a new restless multi-armed bandit (RMAB) setting in which each arm follows a binary-state Markovian process with a special structure: when an arm is played, the state is fully observed, thus “collapsing” any uncertainty, but when an arm is passive, no observation is made, thus allowing uncertainty to evolve. The goal is to keep as many arms in the “good” state as possible by planning a limited budget of actions per round. Such Collapsing Bandits are natural models for many healthcare domains in which health workers must simultaneously monitor patients and deliver interventions in a way that maximizes the health of their patient cohort. Our main contributions are as follows: (i) Building on the Whittle index technique for RMABs, we derive conditions under which the Collapsing Bandits problem is indexable. Our derivation hinges on novel conditions that characterize when the optimal policies may take the form of either “forward” or “reverse” threshold policies. (ii) We exploit the optimality of threshold policies to build fast algorithms for computing the Whittle index, including a closed form. (iii) We evaluate our algorithm on several data distributions including data from a real-world healthcare task in which a worker must monitor and deliver interventions to maximize their patients’ adherence to tuberculosis medication. Our algorithm achieves a 3-order-of-magnitude speedup compared to state-of-the-art RMAB techniques, while achieving similar performance.
Aviva Prins, Aditya Mate, Jackson A Killian, Rediet Abebe, and Milind Tambe. 12/2020. “Incorporating Healthcare Motivated Constraints in Restless Bandit Based Resource Allocation.” NeurIPS 2020 Workshops: Challenges of Real World Reinforcement Learning, Machine Learning in Public Health (Best Lightning Paper), Machine Learning for Health (Best on Theme), Machine Learning for the Developing World. human_in_the_loop_rmab_short.pdf
Ankit Bhardwaj*, Han Ching Ou*, Haipeng Chen, Shahin Jabbari, Milind Tambe, Rahul Panicker, and Alpan Raval. 11/2020. “Robust Lock-Down Optimization for COVID-19 Policy Guidance.” In AAAI Fall Symposium. robust_lock-down_optimization_for_covid-19_policy_guidance.pdf
Daniel B. Larremore, Bryan Wilder, Evan Lester, Soraya Shehata, James M. Burke, James A. Hay, Milind Tambe, Michael J. Mina, and Roy Parker. 11/2020. “Test sensitivity is secondary to frequency and turnaround time for COVID-19 screening.” Science Advances. Publisher's VersionAbstract
The COVID-19 pandemic has created a public health crisis. Because SARS-CoV-2 can spread from individuals with pre-symptomatic, symptomatic, and asymptomatic infections, the re-opening of societies and the control of virus spread will be facilitated by robust population screening, for which virus testing will often be central. After infection, individuals undergo a period of incubation during which viral titers are usually too low to detect, followed by an exponential viral growth, leading to a peak viral load and infectiousness, and ending with declining viral levels and clearance. Given the pattern of viral load kinetics, we model the effectiveness of repeated population screening considering test sensitivities, frequency, and sample-to-answer reporting time. These results demonstrate that effective screening depends largely on frequency of testing and the speed of reporting, and is only marginally improved by high test sensitivity. We therefore conclude that screening should prioritize accessibility, frequency, and sample-to-answer time; analytical limits of detection should be secondary.
Omkar Thakoor, Shahin Jabbari, Palvi Aggarwal, Cleotilde Gonzalez, Milind Tambe, and Phebe Vayanos. 10/2020. “Exploiting Bounded Rationality in Risk-based Cyber Camouflage Games.” Conference on Decision and Game Theory for Security.Abstract
Recent works have growingly shown that Cyber deception can effectively impede the reconnaissance efforts of intelligent cyber attackers. Recently proposed models to optimize a deceptive defense based on camouflaging network and system attributes, have shown effective numerical results on simulated data. However, these models possess a fundamental drawback due to the assumption that an attempted attack is always successful — as a direct consequence of the deceptive strategies being deployed, the attacker runs a significant risk that the attack fails. Further, this risk or uncertainty in the rewards magnifies the boundedly rational behavior in humans which the previous models do not handle. To that end, we present Risk-based Cyber Camouflage Games — a general-sum game model that captures the uncertainty in the attack's success. In case of the rational attackers, we show that optimal defender strategy computation is NP-hard even in the zero-sum case.We provide an MILP formulation for the general problem with constraints on cost and feasibility, along with a pseudo-polynomial time algorithm for the special unconstrained setting. Second, for risk-averse attackers, we present a solution based on Prospect theoretic modeling along with a robust variant that minimizes regret. Third, we propose a solution that does not rely on the attacker behavior model or past data, and effective for the broad setting of strictly competitive games where previous solutions against bounded rationality prove ineffective. Finally, we provide numerical results that our solutions effectively lower the defender loss.
Palvi Aggarwal, Omkar Thakoor, Aditya Mate, Milind Tambe, Edward A. Cranford, Christian Lebiere, and Cleotilde Gonzalez. 10/2020. “An Exploratory Study of a Masking Strategy of Cyberdeception Using CyberVAN.” In 64th Human Factors and Ergonomics Society (HFES) Annual Conference.Abstract
During the network reconnaissance process, attackers scan the network to gather information before launching an attack. This is a good chance for defenders to use deception and disrupt the attacker’s learning process. In this paper, we present an exploratory experiment to test the effectiveness of a masking strategy (compared to a random masking strategy) to reduce the utility of attackers. A total of 30 human participants (in the role of attackers) are randomly assigned to one of the two experimental conditions: Optimal or Random (15 in each condition). Attackers appeared to be more successful in launching attacks in the optimal condition compared to the random condition but the total score of attackers was not different from the random masking strategy. Most importantly, we found a generalized tendency to act according to the certainty bias (or risk aversion). These observations will help to improve the current state-of-the-art masking algorithms of cyberdefense.
An Exploratory Study of a Masking Strategy of Cyberdeception Using CyberVAN
Jackson A. Killian, Aditya Mate, Haifeng Xu, Andrew Perrault, and Milind Tambe. 10/2020. “Personalized Adherence Management in TB: Using AI to Schedule Targeted Interventions.” 51st Union World Conference on Lung Health - Accepted Oral Abstract.