Publications

2023
Kai Wang*, Lily Xu*, Aparna Taneja, and Milind Tambe. 2/14/2023. “Optimistic Whittle Index Policy: Online Learning for Restless Bandits.” AAAI Conference on Artificial Intelligence (AAAI). arXiv linkAbstract
Restless multi-armed bandits (RMABs) extend multi-armed bandits to allow for stateful arms, where the state of each arm evolves restlessly with different transitions depending on whether that arm is pulled. Solving RMABs requires information on transition dynamics, which are often unknown upfront. To plan in RMAB settings with unknown transitions, we propose the first online learning algorithm based on the Whittle index policy, using an upper confidence bound (UCB) approach to learn transition dynamics. Specifically, we estimate confidence bounds of the transition probabilities and formulate a bilinear program to compute optimistic Whittle indices using these estimates. Our algorithm, UCWhittle, achieves sublinear $O(H \sqrt{T \log T})$ frequentist regret to solve RMABs with unknown transitions in $T$ episodes with a constant horizon~$H$. Empirically, we demonstrate that UCWhittle leverages the structure of RMABs and the Whittle index policy solution to achieve better performance than existing online learning baselines across three domains, including one constructed from a real-world maternal and childcare dataset.
Kai Wang*, Shresth Verma*, Aditya Mate, Sanket Shah, Aparna Taneja, Neha Madhiwalla, Aparna Hegde, and Milind Tambe. 2/14/2023. “Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Health.” AAAI Conference on Artificial Intelligence (AAAI).Abstract
This paper studies restless multi-armed bandit (RMAB) problems with unknown arm transition dynamics but with known correlated arm features. The goal is to learn a model to predict transition dynamics given features, where the Whittle index policy solves the RMAB problems using predicted transitions. However, prior works often learn the model by maximizing the predictive accuracy instead of final RMAB solution quality, causing a mismatch between training and evaluation objectives. To address this shortcoming, we propose a novel approach for decision-focused learning in RMAB that directly trains the predictive model to maximize the Whittle index solution quality. We present three key contributions: (i) we establish differentiability of the Whittle index policy to support decision-focused learning; (ii) we significantly improve the scalability of decision-focused learning approaches in sequential problems, specifically RMAB problems; (iii) we apply our algorithm to a previously collected dataset of maternal and child health to demonstrate its performance. Indeed, our algorithm is the first for decision-focused learning in RMAB that scales to real-world problem sizes.
10767.wangk_full.pdf
Shresth Verma, Gargi Singh, Aditya Mate, Paritosh Verma, Sruthi Gorantala, Neha Madhiwalla, Aparna Hegde, Divy Thakkar, Manish Jain, Milind Tambe, and Aparna Taneja. 2/10/2023. “Increasing Impact of Mobile Health Programs: SAHELI for Maternal and ChildCare.” In Innovative Applications of Artificial Intelligence (IAAI). iaai_2023_armman_rmab_deployment_5.pdf
Jackson A. Killian*, Arpita Biswas*, Lily Xu*, Shresth Verma*, Vineet Nair, Aparna Taneja, Aparna Hegde, Neha Madhiwalla, Paula Rodriguez Diaz, Sonja Johnson-Yu, and Milind Tambe. 2/9/2023. “Robust Planning over Restless Groups: Engagement Interventions for a Large-Scale Maternal Telehealth Program.” In AAAI Conference on Artificial Intelligence. armman_groups.pdf
Paula Rodriguez Diaz, Jackson A Killian, Lily Xu, Arun Sai Suggala, Aparna Taneja, and Milind Tambe. 2/7/2023. “Flexible Budgets in Restless Bandits: A Primal-Dual Algorithm for Efficient Budget Allocation.” In AAAI Conference on Artificial Intelligence (AAAI). aaai23_frmab_cr.pdf
Jackson A. Killian, Arshika Lalan, Aditya Mate, Manish Jain, Aparna Taneja, and Milind Tambe. 2/2/2023. “Adherence Bandits.” AAAI AI4SG-23 Workshop. adherence_bandits_ai4sg23_workshop.pdf
Paritosh Verma, Shresth Verma, Aditya Mate, Aparna Taneja, and Milind Tambe. 2/2/2023. “Decision-Focused Evaluation: Analyzing Performance of Deployed Restless Multi-Arm Bandits.” In AAAI 2023 workshop on AI for Social Good (AI4SG). ai4sg_workshop_aaai_23_submission.pdf
Shresth Verma, Gargi Singh, Aditya Mate, Neha Madhiwalla, Aparna Hegde, Divy Thakkar, Manish Jain, Milind Tambe, and Aparna Taneja. 2/2/2023. “SAHELI for Mobile Health Programs in Maternal and Child Care: FurtherAnalysis.” In AAAI 2023 workshop on AI for Social Good (AI4SG). ai4sg_aaai_2023_saheli_further_analysis_1.pdf
E. Cranford, H. Ou, C. Gonzalez, M. Tambe, and C. Lebiere. 1/1/2023. “Accounting for Uncertainty in Deceptive Signaling for Cybersecurity.” In Hawaii International Conference on System Sciences. cranford_etal_hicss2023_0085.pdf
Elizabeth Bondi-Kelly, Haipeng Chen, Christopher Golden, Nikhil Behari, and Milind Tambe. 2023. “Predicting Micronutrient Deficiency with Publicly Available Satellite Data.” AI magazine (to appear). mnd_conference_paper_ai_magazine.pdf
Sonja Johnson-Yu, Jessie Finocchiaro, Kai Wang, Yevgeniy Vorobeychik, Arunesh Sinha, Aparna Taneja, and Milind Tambe. 2023. “Characterizing and Improving the Robustness of Predict-Then-Optimize Frameworks.” In Conference on Decision and Game Theory for Security. Avignon: Springer. robust_dfl_gamesec_camera-ready.pdf
2022
Shresth Verma, Aditya Mate, Kai Wang, Aparna Taneja, and Milind Tambe. 12/10/2022. “Case Study: Applying Decision Focused Learning inthe RealWorld.” In NeurIPS 2022 workshop on Trustworthy and Socially Responsible Machine Learning. neurips_2022_trustworthy_ai_workshop_3.pdf
Sanket Shah, Kai Wang, Bryan Wilder, Andrew Perrault, and Milind Tambe. 12/2022. “Decision-Focused Learning without Differentiable Optimization: Learning Locally Optimized Decision Losses.” In Conference on Neural Information Processing Systems (NeurIPS). Vol. 36. New Orleans.Abstract
Decision-Focused Learning (DFL) is a paradigm for tailoring a predictive model to a downstream optimization task that uses its predictions in order to perform better on that specific task. The main technical challenge associated with DFL is that it requires being able to differentiate through the optimization problem, which is difficult due to discontinuous solutions and other challenges. Past work has largely gotten around this this issue by handcrafting task-specific surrogates to the original optimization problem that provide informative gradients when differentiated through. However, the need to handcraft surrogates for each new task limits the usability of DFL. In addition, there are often no guarantees about the convexity of the resulting surrogates and, as a result, training a predictive model using them can lead to inferior local optima. In this paper, we do away with surrogates altogether and instead learn loss functions that capture task-specific information. To the best of our knowledge, ours is the first approach that entirely replaces the optimization component of decision-focused learning with a loss that is automatically learned. Our approach (a) only requires access to a black-box oracle that can solve the optimization problem and is thus generalizable, and (b) can be convex by construction and so can be easily optimized over. We evaluate our approach on three resource allocation problems from the literature and find that our approach outperforms learning without taking into account task-structure in all three domains, and even hand-crafted surrogates from the literature.
lodl_-_neurips.pdf
Aditya Mate. 10/16/2022. “Optimization and Planning of Limited Resources for Assisting Non-Profits in Improving Maternal and Child Health.” INFORMS Doing Good with Good OR.Abstract

The maternal mortality rate in India is appalling, largely fueled by lack of access to preventive care information, especially in low resource households. We partner with non-profit, ARMMAN, that aims to use mobile health technologies to improve the maternal and child health outcomes.

 

To assisst ARMMAN and such non-profits, we develop a Restless Multi-Armed Bandit (RMAB) based solution to help improve accessibility of critical health information, via increased engagement of beneficiaries with their program. We address fundamental research challenges that crop up along the way and present technical advances in RMABs and Planning Algorithms for Limited-Resource Allocation. Transcending the boundaries of typical laboratory research, we also deploy our models in the field, and present results from a first-of-its-kind pilot test employing and evaluating RMABs in a real-world public health application.

informs-final-essay_1.pdf
Zun Li, Feiran Jia, Aditya Mate, Shahin Jabbari, Mithun Chakraborty, Milind Tambe, and Yevgeniy Vorobeychik. 8/2022. “Solving Structured Hierarchical Games Using Differential Backward Induction.” In Conference on Uncertainty in Artificial Intelligence (UAI). Eindhoven, Netherlands.Abstract
From large-scale organizations to decentralized political systems, hierarchical strategic decision making is commonplace. We introduce a novel class of structured hierarchical games (SHGs) that formally capture such hierarchical strategic interactions. In an SHG, each player is a node in a tree, and strategic choices of players are sequenced from root to leaves, with root moving first, followed by its children, then followed by their children, and so on until the leaves. A player’s utility in an SHG depends on its own decision, and on the choices of its parent and all the tree leaves. SHGs thus generalize simultaneous-move games, as well as Stackelberg games with many followers. We leverage the structure of both the sequence of player moves as well as payoff dependence to develop a gradientbased back propagation-style algorithm, which we call Differential Backward Induction (DBI), for approximating equilibria of SHGs. We provide a sufficient condition for convergence of DBI and demonstrate its efficacy in finding approximate equilibrium solutions to several SHG models of hierarchical policy-making problems.
solving-structured-games-uai2022.pdf
Jackson A. Killian, Lily Xu, Arpita Biswas, and Milind Tambe. 8/2022. “Restless and Uncertain: Robust Policies for Restless Bandits via Deep Multi-Agent Reinforcement Learning.” In Uncertainty in Artificial Intelligence (UAI).Abstract
We introduce robustness in restless multi-armed bandits (RMABs), a popular model for constrained resource allocation among independent stochastic processes (arms). Nearly all RMAB techniques assume stochastic dynamics are precisely known. However, in many real-world settings, dynamics are estimated with significant uncertainty, e.g., via historical data, which can lead to bad outcomes if ignored. To address this, we develop an algorithm to compute minimax regret--robust policies for RMABs. Our approach uses a double oracle framework (oracles for agent and nature), which is often used for single-process robust planning but requires significant new techniques to accommodate the combinatorial nature of RMABs. Specifically, we design a deep reinforcement learning (RL) algorithm, DDLPO, which tackles the combinatorial challenge by learning an auxiliary "λ-network" in tandem with policy networks per arm, greatly reducing sample complexity, with guarantees on convergence. DDLPO, of general interest, implements our reward-maximizing agent oracle. We then tackle the challenging regret-maximizing nature oracle, a non-stationary RL challenge, by formulating it as a multi-agent RL problem between a policy optimizer and adversarial nature. This formulation is of general interest---we solve it for RMABs by creating a multi-agent extension of DDLPO with a shared critic. We show our approaches work well in three experimental domains.
killian_uai_2022_restless_uncertain.pdf killian_uai_2022_restless_uncertain-supp.pdf
Lily Xu, Arpita Biswas, Fei Fang, and Milind Tambe. 7/23/2022. “Ranked Prioritization of Groups in Combinatorial Bandit Allocation.” International Joint Conference on Artificial Intelligence (IJCAI) 31. Vienna, Austria. arXiv linkAbstract
Preventing poaching through ranger patrols is critical for protecting endangered wildlife. Combinatorial bandits have been used to allocate limited patrol resources, but existing approaches overlook the fact that each location is home to multiple species in varying proportions, so a patrol benefits each species to differing degrees. When some species are more vulnerable, we ought to offer more protection to these animals; unfortunately, existing combinatorial bandit approaches do not offer a way to prioritize important species. To bridge this gap, (1) We propose a novel combinatorial bandit objective that trades off between reward maximization and also accounts for prioritization over species, which we call ranked prioritization. We show this objective can be expressed as a weighted linear sum of Lipschitz-continuous reward functions. (2) We provide RankedCUCB, an algorithm to select combinatorial actions that optimize our prioritization-based objective, and prove that it achieves asymptotic no-regret. (3) We demonstrate empirically that RankedCUCB leads to up to 38% improvement in outcomes for endangered species using real-world wildlife conservation data. Along with adapting to other challenges such as preventing illegal logging and overfishing, our no-regret algorithm addresses the general combinatorial bandit problem with a weighted linear objective.
E. Cranford, S. Jabbari, H-C. Ou, M. Tambe, C. Gonzalez, and C. Lebiere. 7/2022. “Combining Machine Learning and Cognitive Models for Adaptive Phishing Training.” In Internation Conference on Cognitive Modeling (ICCM).Abstract
Organizations typically use simulation campaigns to train employees to detect phishing emails but are non-personalized and fail to account for human experiential learning and adaptivity. We propose a method to improve the effectiveness of training by combining cognitive modeling with machine learning methods. We frame the problem as one of scheduling and use the restless multi-armed bandit (RMAB) framework to select which users to target for intervention at each trial, while using a cognitive model of phishing susceptibility to inform the parameters of the RMAB. We compare the effectiveness of the RMAB solution to two purely cognitive approaches in a series of simulation studies using the cognitive model as simulated participants. Both approaches show improvement compared to random selection and we highlight the pros and cons of each approach. We discuss the implications of these findings and future research that aims to combine the benefits of both methods for a more effective solution.
Combining Machine Learning and Cognitive Models for Adaptive Phishing Training
Vineet Nair, Kritika Prakash, Michael Wilbur, Aparna Taneja, Corrine Namblard, Oyindamola Adeyemo, Abhishek Dubey, Abiodun Adereni, Milind Tambe, and Ayan Mukhopadhyay. 7/2022. “ADVISER: AI-Driven Vaccination Intervention Optimiser for Increasing Vaccine Uptake in Nigeria.” In International Joint Conference on AI (IJCAI) 7/2022. Abstract
More than 5 million children under five years die from largely preventable or treatable medical conditions every year, with an overwhelmingly large proportion of deaths occurring in under-developed countries with low vaccination uptake. One of the United Nations’ sustainable development goals (SDG 3) aims to end preventable deaths of new-borns and children under five years of age. We focus on Nigeria, where the rate of infant mortal-ity is appalling. We collaborate with HelpMum, a large non-profit organization in Nigeria to design and optimize the allocation of heterogeneous health interventions under uncertainty to increase vaccination uptake, the first such collaboration in Nigeria. Our framework, ADVISER: AI-Driven Vaccination Intervention Optimiser, is based on an integer linear program that seeks to maximize the cumulative probability of successful vaccination. Our optimization formulation is intractable in practice. We present a heuristic approach that enables us to solve the problem for real-world use-cases. We also present theoretical bounds for the heuristic method. Finally, we show that the proposed approach out-performs baseline methods in terms of vaccination uptake through experimental evaluation. HelpMum is currently planning a pilot program based on our approach to be deployed in the largest city of Nigeria, which would be the first deployment of an AI-driven vaccination uptake program in the country and hopefully, pave the way for other data-driven programs to improve health outcomes in Nigeria.
adviser_ai_driven_vaccination_intervention_optimiser_for_increasing_vaccine.pdf
Adam Żychowski, Jacek Mańdziuk, Elizabeth Bondi, Aravind Venugopal, Milind Tambe, and Balaraman Ravindran. 7/2022. “Evolutionary Approach to Security Games with Signaling.” International Joint Conference on AI (IJCAI) 7/2022. Publisher's Version

Pages