Lily Xu. 10/24/2021. “Learning, Optimization, and Planning Under Uncertainty for Wildlife Conservation.” INFORMS Doing Good with Good OR.Abstract

Wildlife poaching fuels the multi-billion dollar illegal wildlife trade and pushes countless species to the brink of extinction. To aid rangers in preventing poaching in protected areas around the world, we have developed PAWS, the Protection Assistant for Wildlife Security. We present technical advances in multi-armed bandits and robust sequential decision-making using reinforcement learning, with research questions that emerged from on-the-ground challenges. We also discuss bridging the gap between research and practice, presenting results from field deployment in Cambodia and large-scale deployment through integration with SMART, the leading software system for protected area management used by over 1,000 wildlife parks worldwide.

Ramesha Karunasena, Mohammad Sarparajul Ambiya, Arunesh Sinha, Ruchit Nagar, Saachi Dalal, Divy Thakkar, Dhyanesh Narayanan, and Milind Tambe. 10/5/2021. “Measuring Data Collection Diligence for Community Healthcare.” In ACM conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO '21). eaamo21-12_taps_output.pdf
Lily Xu. 8/21/2021. “Learning and Planning Under Uncertainty for Green Security.” 30th International Joint Conference on Artificial Intelligence (IJCAI). xu-dc-ijcai21.pdf
Arpita Biswas, Gaurav Aggarwal, Pradeep Varakantham, and Milind Tambe. 8/2021. “Learn to Intervene: An Adaptive Learning Policy for Restless Bandits in Application to Preventive Healthcare.” In International Joint Conference on Artificial Intelligence (IJCAI).Abstract
In many public health settings, it is important for patients to adhere to health programs, such as taking medications and periodic health checks. Unfortunately, beneficiaries may gradually disengage from such programs, which is detrimental to their health. A concrete example of gradual disengagement has been observed by an organization that carries out a free automated call-based program for spreading preventive care information among pregnant women. Many women stop picking up calls after being enrolled for a few months. To avoid such disengagements, it is important to provide timely interventions. Such interventions are often expensive, and can be provided to only a small fraction of the beneficiaries. We model this scenario as a restless multi-armed bandit (RMAB) problem, where each beneficiary is assumed to transition from one state to another depending on the intervention. Moreover, since the transition probabilities are unknown a priori, we propose a Whittle index based Q-Learning mechanism and show that it converges to the optimal solution. Our method improves over existing learning-based methods for RMABs on multiple benchmarks from literature and also on the maternal healthcare dataset.
Jackson A Killian, Arpita Biswas, Sanket Shah, and Milind Tambe. 8/2021. “Q-Learning Lagrange Policies for Multi-Action Restless Bandits.” Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. maiql_kdd21_main_teamcore.pdf maiql_kdd21_online_appendix.pdf
Lily Xu, Andrew Perrault, Fei Fang, Haipeng Chen, and Milind Tambe. 7/27/2021. “Robust Reinforcement Learning Under Minimax Regret for Green Security.” Conference on Uncertainty in Artificial Intelligence (UAI).Abstract
Green security domains feature defenders who plan patrols in the face of uncertainty about the adversarial behavior of poachers, illegal loggers, and illegal fishers. Importantly, the deterrence effect of patrols on adversaries' future behavior makes patrol planning a sequential decision-making problem. Therefore, we focus on robust sequential patrol planning for green security following the minimax regret criterion, which has not been considered in the literature. We formulate the problem as a game between the defender and nature who controls the parameter values of the adversarial behavior and design an algorithm MIRROR to find a robust policy. MIRROR uses two reinforcement learning-based oracles and solves a restricted game considering limited defender strategies and parameter values. We evaluate MIRROR on real-world poaching data.
Haipeng Chen, Wei Qiu, Han-Ching Ou, Bo An, and Milind Tambe. 7/25/2021. “Contingency-Aware Influence Maximization: A Reinforcement Learning Approach.” In Conference on Uncertainty in Artificial Intelligence. uai21.pdf
Bryan Wilder. 7/15/2021. “AI for Population Health: Melding Data and Algorithms on Networks.” PhD Thesis, Computer Science, Harvard University. ai_for_population_health-_melding_data_and_algorithms_on_networks.pdf
Edward Cranford, Cleotilde Gonzalez, Palvi Aggarwal, Milind Tambe, Sarah Cooney, and Christian Lebiere. 7/7/2021. “Towards a Cognitive Theory of Cyber Deception.” Cognitive Science. Publisher's Version cogs.13013.pdf
Sushant Agarwal, Shahin Jabbari, Chirag Agarwal, Sohini Upadhyay, Zhiwei Steven Wu, and Himabindu Lakkaraju. 7/1/2021. “Towards the Unification and Robustness of Perturbation and Gradient Based Explanations.” Proceedings of the 38th International Conference on Machine Learning. Virtual Only. Publisher's VersionAbstract
As machine learning black boxes are increasingly being deployed in critical domains such as health- care and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a post hoc manner. In this work, we analyze two popular post hoc interpreta- tion techniques: SmoothGrad which is a gradient based method, and a variant of LIME which is a perturbation based method. More specifically, we derive explicit closed form expressions for the ex- planations output by these two methods and show that they both converge to the same explanation in expectation, i.e., when the number of perturbed samples used by these methods is large. We then leverage this connection to establish other desir- able properties, such as robustness, for these tech- niques. We also derive finite sample complexity bounds for the number of perturbations required for these methods to converge to their expected explanation. Finally, we empirically validate our theory using extensive experimentation on both synthetic and real world datasets.
Towards the Unification and Robustness of Perturbation and Gradient Based Explanations
Elizabeth Bondi, Catherine Ressijac, and Peter Boucher. 6/25/2021. “Preliminary Detection of Rhino Middens for Understanding Rhino Behavior.” CVPR 2021 Workshop on Computer Vision for Animal Behavior Tracking and Modeling. middens_cv4animals.pdf
Eric Rice, Bryan Wilder, Laura Onasch-Vera, Graham Diguiseppi, Robin Petering, Chyna Hill, Amulya Yadav, Sung-Jae Lee, and Milind Tambe. 6/21/2021. “A Peer-Led, Artificial Intelligence-Augmented Social Network Intervention to Prevent HIV among Youth Experiencing Homelessness.” to appear in the Journal of Acquired Immune Deficiency Syndrome (JAIDS).Abstract
Youth experiencing homelessness (YEH) are at elevated risk for HIV/AIDS and disproportionately identify as racial, ethnic, sexual, and gender minorities. We developed a new peer change agent (PCA) HIV prevention intervention with three arms: (1) an arm using an Artificial Intelligence (AI) planning algorithm to select PCAs; (2) a popularity arm, the standard PCA approach, operationalized as highest degree centrality (DC); and (3) an observation-only comparison group.
Arpita Biswas and Suvam Mukherjee. 5/19/2021. “Ensuring Fairness under Prior Probability Shifts.” In AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES). AAAI/ACM. arXiv
Elizabeth Bondi*, Lily Xu*, Diana Acosta-Navas, and Jackson A. Killian. 5/19/2021. “Envisioning Communities: A Participatory Approach Towards AI for Social Good.” In AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES). arXiv LinkAbstract
Research in artificial intelligence (AI) for social good presupposes some definition of social good, but potential definitions have been seldom suggested and never agreed upon. The normative question of what AI for social good research should be "for" is not thoughtfully elaborated, or is frequently addressed with a utilitarian outlook that prioritizes the needs of the majority over those who have been historically marginalized, brushing aside realities of injustice and inequity. We argue that AI for social good ought to be assessed by the communities that the AI system will impact, using as a guide the capabilities approach, a framework to measure the ability of different policies to improve human welfare equity. Furthermore, we lay out how AI research has the potential to catalyze social progress by expanding and equalizing capabilities. We show how the capabilities approach aligns with a participatory approach for the design and implementation of AI for social good research in a framework we introduce called PACT, in which community members affected should be brought in as partners and their input prioritized throughout the project. We conclude by providing an incomplete set of guiding questions for carrying out such participatory AI research in a way that elicits and respects a community's own definition of social good.
Arpita Biswas, Gaurav Aggarwal, Pradeep Varakantham, and Milind Tambe. 5/7/2021. “Learning Index Policies for Restless Bandits with Application to Maternal Healthcare (Extended abstract).” In International Conference on Autonomous Agents and Multiagent Systems (AAMAS). p1467.pdf
Aditya Mate, Andrew Perrault, and Milind Tambe. 5/7/2021. “Risk-Aware Interventions in Public Health: Planning with Restless Multi-Armed Bandits.” In 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS). London, UK.Abstract
Community Health Workers (CHWs) form an important component of health-care systems globally, especially in low-resource settings. CHWs are often tasked with monitoring the health of and intervening on their patient cohort. Previous work has developed several classes of Restless Multi-Armed Bandits (RMABs) that are computationally tractable and indexable, a condition that guarantees asymptotic optimality, for solving such health monitoring and intervention problems (HMIPs).
However, existing solutions to HMIPs fail to account for risk-sensitivity considerations of CHWs in the planning stage and may run the danger of ignoring some patients completely because they are deemed less valuable to intervene on.
Additionally, these also rely on patients reporting their state of adherence accurately when intervened upon. Towards tackling these issues, our contributions in this paper are as follows: 
(1) We develop an RMAB solution to HMIPs that allows for reward functions that are monotone increasing, rather than linear, in the belief state and also supports a wider class of observations.
(2) We prove theoretical guarantees on the asymptotic optimality of our algorithm for any arbitrary reward function. Additionally, we show that for the specific reward function considered in previous work, our theoretical conditions are stronger than the state-of-the-art guarantees.
(3) We show the applicability of these new results for addressing the three issues pertaining to: risk-sensitive planning, equitable allocation and reliance on perfect observations as highlighted above. We evaluate these techniques on both simulated as well as real data from a prevalent CHW task of monitoring adherence of tuberculosis patients to their prescribed medication in Mumbai, India and show improved performance over the state-of-the-art. The simulation code is available at:
Aravind Venugopal, Elizabeth Bondi, Harshavardhan Kamarthi, Keval Dholakia, Balaraman Ravindran, and Milind Tambe. 5/5/2021. “Reinforcement Learning for Unified Allocation and Patrolling in Signaling Games with Uncertainty.” In 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS). combsgpo_venugopal_aamas_2021.pdf
Lily Xu, Andrew Perrault, Fei Fang, Haipeng Chen, and Milind Tambe. 5/5/2021. “Robustness in Green Security: Minimax Regret Optimality with Reinforcement Learning.” AAMAS Workshop on Autonomous Agents for Social Good. robust_rl_aamas_aasg.pdf
Siddhart Nisthala, Lovish Madaan, Aditya Mate, Harshavardhan Kamarthi, Anirudh Grama, Divy Thakkar, Dhyanesh Narayanan, Suresh Chaudhary, Neha Madhiwala, Ramesh Padhmanabhan, Aparna Hegde, Pradeep Varakantham, Balaram Ravindran, and Milind Tambe. 5/5/2021. “Selective Intervention Planning using Restless Multi-Armed Bandits to Improve Maternal and Child Health Outcomes.” In AAMAS workshop on Autonomous Agents for social good. armman-rmab.pdf
Feiran Jia, Aditya Mate, Zun Li, Shahin Jabbari, Mithun Chakraborty, Milind Tambe, Michael Wellman, and Yevgeniy Vorobeychik. 5/1/2021. “A Game-Theoretic Approach for Hierarchical Policy-Making.” 2nd International (Virtual) Workshop on Autonomous Agents for Social Good (AASG 2021). Publisher's VersionAbstract
We present the design and analysis of a multi-level game-theoretic model of hierarchical policy-making, inspired by policy responses to the COVID-19 pandemic. Our model captures the potentially mismatched priorities among a hierarchy of policy-makers (e.g., federal, state, and local governments) with respect to two main cost components that have opposite dependence on the policy strength, such as post-intervention infection rates and the cost of policy implementation. Our model further includes a crucial third fac- tor in decisions: a cost of non-compliance with the policy-maker immediately above in the hierarchy, such as non-compliance of state with federal policies. Our first contribution is a closed-form approximation of a recently published agent-based model to com- pute the number of infections for any implemented policy. Second, we present a novel equilibrium selection criterion that addresses common issues with equilibrium multiplicity in our setting. Third, we propose a hierarchical algorithm based on best response dynamics for computing an approximate equilibrium of the hierarchical policy-making game consistent with our solution concept. Finally, we present an empirical investigation of equilibrium policy strategies in this game as a function of game parameters, such as the degree of centralization and disagreements about policy priorities among the agents, the extent of free riding as well as fairness in the distribution of costs.
A Game-Theoretic Approach for Hierarchical Policy-Making