Publications

2022
Aditya Mate, Arpita Biswas, Christoph Siebenbrunner, Susobhan Ghosh, and Milind Tambe. 5/2022. “Efficient Algorithms for Finite Horizon and Streaming RestlessMulti-Armed Bandit Problems.” In International Conference on Autonomous Agents and Multiagent Systems (AAMAS). streamingbandits-camready-full.pdf
Kai Wang, Lily Xu, Andrew Perrault, Michael K. Reiter, and Milind Tambe. 2/22/2022. “Coordinating Followers to Reach Better Equilibria: End-to-End Gradient Descent for Stackelberg Games.” In AAAI Conference on Artificial Intelligence.Abstract
A growing body of work in game theory extends the traditional Stackelberg game to settings with one leader and multiple followers who play a Nash equilibrium. Standard approaches for computing equilibria in these games reformulate the followers' best response as constraints in the leader's optimization problem. These reformulation approaches can sometimes be effective, but make limiting assumptions on the followers' objectives and the equilibrium reached by followers, e.g., uniqueness, optimism, or pessimism. To overcome these limitations, we run gradient descent to update the leader's strategy by differentiating through the equilibrium reached by followers. Our approach generalizes to any stochastic equilibrium selection procedure that chooses from multiple equilibria, where we compute the stochastic gradient by back-propagating through a sampled Nash equilibrium using the solution to a partial differential equation to establish the unbiasedness of the stochastic gradient. Using the unbiased gradient estimate, we implement the gradient-based approach to solve three Stackelberg problems with multiple followers. Our approach consistently outperforms existing baselines to achieve higher utility for the leader.
stackelberg_games_multiple_followers_Wang_AAAI2022.pdf
Elizabeth Bondi, Haipeng Chen, Christopher Golden, Nikhil Behari, and Milind Tambe. 2/20/2022. “Micronutrient Deficiency Prediction via Publicly Available Satellite Data.” In Innovative Applications of Artificial Intelligence (IAAI). mnd_conference_paper_iaai_2.pdf
Susobhan Ghosh, Pradeep Varakantham, Aniket Bhatkhande, Tamanna Ahmad, Anish Andheria, Wenjun Li, Aparna Taneja, Divy Thakkar, and Milind Tambe. 2/15/2022. “Facilitating Human-Wildlife Cohabitation through Conflict Prediction.” Innovative Applications of Artificial Intelligence. iaai_wct.pdf
Haipeng Chen, Susobhan Ghosh, Gregory Fan, Nikhil Behari, Arpita Biswas, Mollie Williams, Nancy E. Oriol, and Milind Tambe. 2/15/2022. “Using Public Data to Predict Demand for Mobile Health Clinics.” In The 34th Annual Conference on Innovative Applications of Artificial Intelligence (IAAI) . 22iaai-family-van.pdf
Aditya Mate*, Lovish Madaan*, Aparna Taneja, Neha Madhiwalla, Shresth Verma, Gargi Singh, Aparna Hegde, Pradeep Varakantham, and Milind Tambe. 2/2022. “Field Study in Deploying Restless Multi-Armed Bandits: Assisting Non-Profits in Improving Maternal and Child Health.” In AAAI Conference on Artificial Intelligence. Vancouver, Canada.Abstract
The widespread availability of cell phones has enabled nonprofits to deliver critical health information to their beneficiaries in a timely manner. This paper describes our work to assist non-profits that employ automated messaging programs to deliver timely preventive care information to beneficiaries (new and expecting mothers) during pregnancy and after delivery. Unfortunately, a key challenge in such information delivery programs is that a significant fraction of beneficiaries drop out of the program. Yet, non-profits often have limited health-worker resources (time) to place crucial service calls for live interaction with beneficiaries to prevent such engagement drops. To assist non-profits in optimizing this limited resource, we developed a Restless Multi-Armed Bandits (RMABs) system. One key technical contribution in this system is a novel clustering method of offline historical data to infer unknown RMAB parameters. Our second major contribution is evaluation of our RMAB system in collaboration with an NGO, via a real-world service quality improvement study. The study compared strategies for optimizing service calls to 23003 participants over a period of 7 weeks to reduce engagement drops. We show that the RMAB group provides statistically significant improvement over other comparison groups, reducing ∼ 30% engagement drops. To the best of our knowledge, this is the first study demonstrating the utility of RMABs in real world public health settings. We are transitioning our RMAB system to the NGO for real-world use.
aaai_rmab_armman_camready.pdf
Han-Ching Ou*, Christoph Siebenbrunner*, Jackson Killian, Meredith B Brooks, David Kempe, Yevgeniy Vorobeychik, and Milind Tambe. 2022. “Networked Restless Multi-Armed Bandits for Mobile Interventions.” In 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022). Online. aamas_2022_network_bandit.pdf
2021
Haipeng Chen, Susobhan Ghosh, Gregory Fan, Nikhil Behari, Arpita Biswas, Mollie Williams, Nancy E. Oriol, and Milind Tambe. 12/2021. “Demand prediction of mobile clinics using public data.” In In MLPH: Machine Learning in Public Health NeurIPS 2021 Workshop. demand_prediction_for_mobile_health_clinics_using_public_data.pdf
Kai Wang, Sanket Shah, Haipeng Chen, Andrew Perrault, Finale Doshi-Velez, and Milind Tambe. 12/2021. “Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning.” In NeurIPS 2021 (spotlight).Abstract
In the predict-then-optimize framework, the objective is to train a predictive model, mapping from environment features to parameters of an optimization problem, which maximizes decision quality when the optimization is subsequently solved. Recent work on decision-focused learning shows that embedding the optimization problem in the training pipeline can improve decision quality and help generalize better to unseen tasks compared to relying on an intermediate loss function for evaluating prediction quality. We study the predict-then-optimize framework in the context of \emph{sequential} decision problems (formulated as MDPs) that are solved via reinforcement learning. In particular, we are given environment features and a set of trajectories from training MDPs, which we use to train a predictive model that generalizes to unseen test MDPs without trajectories. Two significant computational challenges arise in applying decision-focused learning to MDPs: (i) large state and action spaces make it infeasible for existing techniques to differentiate through MDP problems, and (ii) the high-dimensional policy space, as parameterized by a neural network, makes differentiating through a policy expensive. We resolve the first challenge by sampling provably unbiased derivatives to approximate and differentiate through optimality conditions, and the second challenge by using a low-rank approximation to the high-dimensional sample-based derivatives. We implement both Bellman--based and policy gradient--based decision-focused learning on three different MDP problems with missing parameters, and show that decision-focused learning performs better in generalization to unseen tasks.
decision_focused_rl_wang_neurips2021_update.pdf
Aditya Mate*, Lovish Madaan*, Aparna Taneja, Neha Madhiwalla, Shresth Verma, Gargi Singh, Aparna Hegde, Pradeep Varakantham, and Milind Tambe. 12/2021. “Restless Bandits in the Field: Real-World Study for Improving Maternal and Child Health Outcomes.” In MLPH: Machine Learning in Public Health NeurIPS 2021 Workshop.Abstract

The widespread availability of cell phones has enabled non-profits to deliver critical health information to their beneficiaries in a timely manner. This paper describes our work in assisting non-profits employing automated messaging programs to deliver timely preventive care information to new and expecting mothers during pregnancy and after delivery. Unfortunately, a key challenge in such information delivery programs is that a significant fraction of beneficiaries tend to drop out. Yet, non-profits often have limited health-worker resources (time) to place crucial service calls for live interaction with beneficiaries to prevent such engagement drops. To assist non-profits in optimizing this limited resource, we developed a Restless Multi-Armed Bandits (RMABs) system. One key technical contribution in this system is a novel clustering method of offline historical data to infer unknown RMAB parameters. Our second major contribution is evaluation of our RMAB system in collaboration with an NGO, via a real-world service quality improvement study. The study compared strategies for optimizing service calls to 23003 participants over a period of 7 weeks to reduce engagement drops. We show that the RMAB group provides statistically significant improvement over other comparison groups, reducing 30% engagement drops. To the best of our knowledge, this is the first study demonstrating the utility of RMABs in real world public health settings. We are transitioning our system to the NGO for real-world use.

neurips-workshop-mlph-restlessbandits.pdf
Lily Xu. 10/24/2021. “Learning, Optimization, and Planning Under Uncertainty for Wildlife Conservation.” INFORMS Doing Good with Good OR.Abstract

In collaboration with conservation NGOs, our project helps plan effective ranger patrols to protect endangered animals from poaching. Algorithmically, the problem is to optimize limited resources to maximize the number of snares confiscated. Given limited and incomplete data, we leverage linear programming, multi-armed bandits, and game theory to handle uncertainty about poacher behavior. Our approaches are supported with theorems, experiments, and real-world field tests. Our system is being integrated into existing conservation software to become available to 1,000 protected areas worldwide.

paws_informs.pdf
Ramesha Karunasena, Mohammad Sarparajul Ambiya, Arunesh Sinha, Ruchit Nagar, Saachi Dalal, Divy Thakkar, Dhyanesh Narayanan, and Milind Tambe. 10/5/2021. “Measuring Data Collection Diligence for Community Healthcare.” In ACM conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO '21). eaamo21-12_taps_output.pdf
Lily Xu. 8/21/2021. “Learning and Planning Under Uncertainty for Green Security.” 30th International Joint Conference on Artificial Intelligence (IJCAI). xu-dc-ijcai21.pdf
Arpita Biswas, Gaurav Aggarwal, Pradeep Varakantham, and Milind Tambe. 8/2021. “Learn to Intervene: An Adaptive Learning Policy for Restless Bandits in Application to Preventive Healthcare.” In International Joint Conference on Artificial Intelligence (IJCAI).Abstract
In many public health settings, it is important for patients to adhere to health programs, such as taking medications and periodic health checks. Unfortunately, beneficiaries may gradually disengage from such programs, which is detrimental to their health. A concrete example of gradual disengagement has been observed by an organization that carries out a free automated call-based program for spreading preventive care information among pregnant women. Many women stop picking up calls after being enrolled for a few months. To avoid such disengagements, it is important to provide timely interventions. Such interventions are often expensive, and can be provided to only a small fraction of the beneficiaries. We model this scenario as a restless multi-armed bandit (RMAB) problem, where each beneficiary is assumed to transition from one state to another depending on the intervention. Moreover, since the transition probabilities are unknown a priori, we propose a Whittle index based Q-Learning mechanism and show that it converges to the optimal solution. Our method improves over existing learning-based methods for RMABs on multiple benchmarks from literature and also on the maternal healthcare dataset.
wiql_ijcai21.pdf
Jackson A Killian, Arpita Biswas, Sanket Shah, and Milind Tambe. 8/2021. “Q-Learning Lagrange Policies for Multi-Action Restless Bandits.” Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. maiql_kdd21_main_teamcore.pdf maiql_kdd21_online_appendix.pdf
Lily Xu, Andrew Perrault, Fei Fang, Haipeng Chen, and Milind Tambe. 7/27/2021. “Robust Reinforcement Learning Under Minimax Regret for Green Security.” Conference on Uncertainty in Artificial Intelligence. xu_uai21_robust_rl.pdf
Haipeng Chen, Wei Qiu, Han-Ching Ou, Bo An, and Milind Tambe. 7/25/2021. “Contingency-Aware Influence Maximization: A Reinforcement Learning Approach.” In Conference on Uncertainty in Artificial Intelligence. uai21.pdf
Bryan Wilder. 7/15/2021. “AI for Population Health: Melding Data and Algorithms on Networks.” PhD Thesis, Computer Science, Harvard University. ai_for_population_health-_melding_data_and_algorithms_on_networks.pdf
Edward Cranford, Cleotilde Gonzalez, Palvi Aggarwal, Milind Tambe, Sarah Cooney, and Christian Lebiere. 7/7/2021. “Towards a Cognitive Theory of Cyber Deception.” Cognitive Science. Publisher's Version cogs.13013.pdf
Sushant Agarwal, Shahin Jabbari, Chirag Agarwal, Sohini Upadhyay, Zhiwei Steven Wu, and Himabindu Lakkaraju. 7/1/2021. “Towards the Unification and Robustness of Perturbation and Gradient Based Explanations.” Proceedings of the 38th International Conference on Machine Learning. Virtual Only. Publisher's VersionAbstract
As machine learning black boxes are increasingly being deployed in critical domains such as health- care and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a post hoc manner. In this work, we analyze two popular post hoc interpreta- tion techniques: SmoothGrad which is a gradient based method, and a variant of LIME which is a perturbation based method. More specifically, we derive explicit closed form expressions for the ex- planations output by these two methods and show that they both converge to the same explanation in expectation, i.e., when the number of perturbed samples used by these methods is large. We then leverage this connection to establish other desir- able properties, such as robustness, for these tech- niques. We also derive finite sample complexity bounds for the number of perturbations required for these methods to converge to their expected explanation. Finally, we empirically validate our theory using extensive experimentation on both synthetic and real world datasets.
Towards the Unification and Robustness of Perturbation and Gradient Based Explanations

Pages