Kai Wang, Sanket Shah, Haipeng Chen, Andrew Perrault, Finale Doshi-Velez, and Milind Tambe. 12/2021. “Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning.” In NeurIPS 2021 (spotlight).Abstract
In the predict-then-optimize framework, the objective is to train a predictive model, mapping from environment features to parameters of an optimization problem, which maximizes decision quality when the optimization is subsequently solved. Recent work on decision-focused learning shows that embedding the optimization problem in the training pipeline can improve decision quality and help generalize better to unseen tasks compared to relying on an intermediate loss function for evaluating prediction quality. We study the predict-then-optimize framework in the context of \emph{sequential} decision problems (formulated as MDPs) that are solved via reinforcement learning. In particular, we are given environment features and a set of trajectories from training MDPs, which we use to train a predictive model that generalizes to unseen test MDPs without trajectories. Two significant computational challenges arise in applying decision-focused learning to MDPs: (i) large state and action spaces make it infeasible for existing techniques to differentiate through MDP problems, and (ii) the high-dimensional policy space, as parameterized by a neural network, makes differentiating through a policy expensive. We resolve the first challenge by sampling provably unbiased derivatives to approximate and differentiate through optimality conditions, and the second challenge by using a low-rank approximation to the high-dimensional sample-based derivatives. We implement both Bellman--based and policy gradient--based decision-focused learning on three different MDP problems with missing parameters, and show that decision-focused learning performs better in generalization to unseen tasks.
Aditya Mate*, Lovish Madaan*, Aparna Taneja, Neha Madhiwalla, Shresth Verma, Gargi Singh, Aparna Hegde, Pradeep Varakantham, and Milind Tambe. 12/2021. “Restless Bandits in the Field: Real-World Study for Improving Maternal and Child Health Outcomes.” In MLPH: Machine Learning in Public Health NeurIPS 2021 Workshop.Abstract

The widespread availability of cell phones has enabled non-profits to deliver critical health information to their beneficiaries in a timely manner. This paper describes our work in assisting non-profits employing automated messaging programs to deliver timely preventive care information to new and expecting mothers during pregnancy and after delivery. Unfortunately, a key challenge in such information delivery programs is that a significant fraction of beneficiaries tend to drop out. Yet, non-profits often have limited health-worker resources (time) to place crucial service calls for live interaction with beneficiaries to prevent such engagement drops. To assist non-profits in optimizing this limited resource, we developed a Restless Multi-Armed Bandits (RMABs) system. One key technical contribution in this system is a novel clustering method of offline historical data to infer unknown RMAB parameters. Our second major contribution is evaluation of our RMAB system in collaboration with an NGO, via a real-world service quality improvement study. The study compared strategies for optimizing service calls to 23003 participants over a period of 7 weeks to reduce engagement drops. We show that the RMAB group provides statistically significant improvement over other comparison groups, reducing 30% engagement drops. To the best of our knowledge, this is the first study demonstrating the utility of RMABs in real world public health settings. We are transitioning our system to the NGO for real-world use.

Ramesha Karunasena, Mohammad Sarparajul Ambiya, Arunesh Sinha, Ruchit Nagar, Saachi Dalal, Divy Thakkar, Dhyanesh Narayanan, and Milind Tambe. 10/5/2021. “Measuring Data Collection Diligence for Community Healthcare.” In ACM conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO '21). eaamo21-12_taps_output.pdf
Lily Xu. 8/21/2021. “Learning and Planning Under Uncertainty for Green Security.” 30th International Joint Conference on Artificial Intelligence (IJCAI). xu-dc-ijcai21.pdf
Arpita Biswas, Gaurav Aggarwal, Pradeep Varakantham, and Milind Tambe. 8/2021. “Learn to Intervene: An Adaptive Learning Policy for Restless Bandits in Application to Preventive Healthcare.” In International Joint Conference on Artificial Intelligence (IJCAI).Abstract
In many public health settings, it is important for patients to adhere to health programs, such as taking medications and periodic health checks. Unfortunately, beneficiaries may gradually disengage from such programs, which is detrimental to their health. A concrete example of gradual disengagement has been observed by an organization that carries out a free automated call-based program for spreading preventive care information among pregnant women. Many women stop picking up calls after being enrolled for a few months. To avoid such disengagements, it is important to provide timely interventions. Such interventions are often expensive, and can be provided to only a small fraction of the beneficiaries. We model this scenario as a restless multi-armed bandit (RMAB) problem, where each beneficiary is assumed to transition from one state to another depending on the intervention. Moreover, since the transition probabilities are unknown a priori, we propose a Whittle index based Q-Learning mechanism and show that it converges to the optimal solution. Our method improves over existing learning-based methods for RMABs on multiple benchmarks from literature and also on the maternal healthcare dataset.
Jackson A Killian, Arpita Biswas, Sanket Shah, and Milind Tambe. 8/2021. “Q-Learning Lagrange Policies for Multi-Action Restless Bandits.” Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. maiql_kdd21_main_teamcore.pdf maiql_kdd21_online_appendix.pdf
Lily Xu, Andrew Perrault, Fei Fang, Haipeng Chen, and Milind Tambe. 7/27/2021. “Robust Reinforcement Learning Under Minimax Regret for Green Security.” Conference on Uncertainty in Artificial Intelligence. xu_uai21_robust_rl.pdf
Haipeng Chen, Wei Qiu, Han-Ching Ou, Bo An, and Milind Tambe. 7/25/2021. “Contingency-Aware Influence Maximization: A Reinforcement Learning Approach.” In Conference on Uncertainty in Artificial Intelligence. uai21.pdf
Bryan Wilder. 7/15/2021. “AI for Population Health: Melding Data and Algorithms on Networks.” PhD Thesis, Computer Science, Harvard University. ai_for_population_health-_melding_data_and_algorithms_on_networks.pdf
Edward Cranford, Cleotilde Gonzalez, Palvi Aggarwal, Milind Tambe, Sarah Cooney, and Christian Lebiere. 7/7/2021. “Towards a Cognitive Theory of Cyber Deception.” Cognitive Science. Publisher's Version cogs.13013.pdf
Sushant Agarwal, Shahin Jabbari, Chirag Agarwal, Sohini Upadhyay, Zhiwei Steven Wu, and Himabindu Lakkaraju. 7/1/2021. “Towards the Unification and Robustness of Perturbation and Gradient Based Explanations.” Proceedings of the 38th International Conference on Machine Learning. Virtual Only. Publisher's VersionAbstract
As machine learning black boxes are increasingly being deployed in critical domains such as health- care and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a post hoc manner. In this work, we analyze two popular post hoc interpreta- tion techniques: SmoothGrad which is a gradient based method, and a variant of LIME which is a perturbation based method. More specifically, we derive explicit closed form expressions for the ex- planations output by these two methods and show that they both converge to the same explanation in expectation, i.e., when the number of perturbed samples used by these methods is large. We then leverage this connection to establish other desir- able properties, such as robustness, for these tech- niques. We also derive finite sample complexity bounds for the number of perturbations required for these methods to converge to their expected explanation. Finally, we empirically validate our theory using extensive experimentation on both synthetic and real world datasets.
Towards the Unification and Robustness of Perturbation and Gradient Based Explanations
Elizabeth Bondi, Catherine Ressijac, and Peter Boucher. 6/25/2021. “Preliminary Detection of Rhino Middens for Understanding Rhino Behavior.” CVPR 2021 Workshop on Computer Vision for Animal Behavior Tracking and Modeling. middens_cv4animals.pdf
Eric Rice, Bryan Wilder, Laura Onasch-Vera, Graham Diguiseppi, Robin Petering, Chyna Hill, Amulya Yadav, Sung-Jae Lee, and Milind Tambe. 6/21/2021. “A Peer-Led, Artificial Intelligence-Augmented Social Network Intervention to Prevent HIV among Youth Experiencing Homelessness.” to appear in the Journal of Acquired Immune Deficiency Syndrome (JAIDS).Abstract
Youth experiencing homelessness (YEH) are at elevated risk for HIV/AIDS and disproportionately identify as racial, ethnic, sexual, and gender minorities. We developed a new peer change agent (PCA) HIV prevention intervention with three arms: (1) an arm using an Artificial Intelligence (AI) planning algorithm to select PCAs; (2) a popularity arm, the standard PCA approach, operationalized as highest degree centrality (DC); and (3) an observation-only comparison group.
Arpita Biswas and Suvam Mukherjee. 5/19/2021. “Ensuring Fairness under Prior Probability Shifts.” In AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES). AAAI/ACM. arXiv
Elizabeth Bondi*, Lily Xu*, Diana Acosta-Navas, and Jackson A. Killian. 5/19/2021. “Envisioning Communities: A Participatory Approach Towards AI for Social Good.” In AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES). arXiv Link
Arpita Biswas, Gaurav Aggarwal, Pradeep Varakantham, and Milind Tambe. 5/7/2021. “Learning Index Policies for Restless Bandits with Application to Maternal Healthcare (Extended abstract).” In International Conference on Autonomous Agents and Multiagent Systems (AAMAS). p1467.pdf
Aditya Mate, Andrew Perrault, and Milind Tambe. 5/7/2021. “Risk-Aware Interventions in Public Health: Planning with Restless Multi-Armed Bandits.” In 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS). London, UK.Abstract
Community Health Workers (CHWs) form an important component of health-care systems globally, especially in low-resource settings. CHWs are often tasked with monitoring the health of and intervening on their patient cohort. Previous work has developed several classes of Restless Multi-Armed Bandits (RMABs) that are computationally tractable and indexable, a condition that guarantees asymptotic optimality, for solving such health monitoring and intervention problems (HMIPs).
However, existing solutions to HMIPs fail to account for risk-sensitivity considerations of CHWs in the planning stage and may run the danger of ignoring some patients completely because they are deemed less valuable to intervene on.
Additionally, these also rely on patients reporting their state of adherence accurately when intervened upon. Towards tackling these issues, our contributions in this paper are as follows: 
(1) We develop an RMAB solution to HMIPs that allows for reward functions that are monotone increasing, rather than linear, in the belief state and also supports a wider class of observations.
(2) We prove theoretical guarantees on the asymptotic optimality of our algorithm for any arbitrary reward function. Additionally, we show that for the specific reward function considered in previous work, our theoretical conditions are stronger than the state-of-the-art guarantees.
(3) We show the applicability of these new results for addressing the three issues pertaining to: risk-sensitive planning, equitable allocation and reliance on perfect observations as highlighted above. We evaluate these techniques on both simulated as well as real data from a prevalent CHW task of monitoring adherence of tuberculosis patients to their prescribed medication in Mumbai, India and show improved performance over the state-of-the-art. The simulation code is available at: https://github.com/AdityaMate/risk-aware-bandits.
Aravind Venugopal, Elizabeth Bondi, Harshavardhan Kamarthi, Keval Dholakia, Balaraman Ravindran, and Milind Tambe. 5/5/2021. “Reinforcement Learning for Unified Allocation and Patrolling in Signaling Games with Uncertainty.” In 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS). combsgpo_venugopal_aamas_2021.pdf
Siddhart Nisthala, Lovish Madaan, Aditya Mate, Harshavardhan Kamarthi, Anirudh Grama, Divy Thakkar, Dhyanesh Narayanan, Suresh Chaudhary, Neha Madhiwala, Ramesh Padhmanabhan, Aparna Hegde, Pradeep Varakantham, Balaram Ravindran, and Milind Tambe. 5/5/2021. “Selective Intervention Planning using Restless Multi-Armed Bandits to Improve Maternal and Child Health Outcomes.” In AAMAS workshop on Autonomous Agents for social good. armman-rmab.pdf
Feiran Jia, Aditya Mate, Zun Li, Shahin Jabbari, Mithun Chakraborty, Milind Tambe, Michael Wellman, and Yevgeniy Vorobeychik. 5/1/2021. “A Game-Theoretic Approach for Hierarchical Policy-Making.” 2nd International (Virtual) Workshop on Autonomous Agents for Social Good (AASG 2021). Publisher's VersionAbstract
We present the design and analysis of a multi-level game-theoretic model of hierarchical policy-making, inspired by policy responses to the COVID-19 pandemic. Our model captures the potentially mismatched priorities among a hierarchy of policy-makers (e.g., federal, state, and local governments) with respect to two main cost components that have opposite dependence on the policy strength, such as post-intervention infection rates and the cost of policy implementation. Our model further includes a crucial third fac- tor in decisions: a cost of non-compliance with the policy-maker immediately above in the hierarchy, such as non-compliance of state with federal policies. Our first contribution is a closed-form approximation of a recently published agent-based model to com- pute the number of infections for any implemented policy. Second, we present a novel equilibrium selection criterion that addresses common issues with equilibrium multiplicity in our setting. Third, we propose a hierarchical algorithm based on best response dynamics for computing an approximate equilibrium of the hierarchical policy-making game consistent with our solution concept. Finally, we present an empirical investigation of equilibrium policy strategies in this game as a function of game parameters, such as the degree of centralization and disagreements about policy priorities among the agents, the extent of free riding as well as fairness in the distribution of costs.
A Game-Theoretic Approach for Hierarchical Policy-Making