AI for Social Work, Public Health, and Medical Decision Making

2022
Jackson A. Killian, Lily Xu, Arpita Biswas, and Milind Tambe. 8/2022. “Restless and Uncertain: Robust Policies for Restless Bandits via Deep Multi-Agent Reinforcement Learning.” In Uncertainty in Artificial Intelligence (UAI).Abstract
We introduce robustness in restless multi-armed bandits (RMABs), a popular model for constrained resource allocation among independent stochastic processes (arms). Nearly all RMAB techniques assume stochastic dynamics are precisely known. However, in many real-world settings, dynamics are estimated with significant uncertainty, e.g., via historical data, which can lead to bad outcomes if ignored. To address this, we develop an algorithm to compute minimax regret--robust policies for RMABs. Our approach uses a double oracle framework (oracles for agent and nature), which is often used for single-process robust planning but requires significant new techniques to accommodate the combinatorial nature of RMABs. Specifically, we design a deep reinforcement learning (RL) algorithm, DDLPO, which tackles the combinatorial challenge by learning an auxiliary "λ-network" in tandem with policy networks per arm, greatly reducing sample complexity, with guarantees on convergence. DDLPO, of general interest, implements our reward-maximizing agent oracle. We then tackle the challenging regret-maximizing nature oracle, a non-stationary RL challenge, by formulating it as a multi-agent RL problem between a policy optimizer and adversarial nature. This formulation is of general interest---we solve it for RMABs by creating a multi-agent extension of DDLPO with a shared critic. We show our approaches work well in three experimental domains.
killian_uai_2022_restless_uncertain.pdf killian_uai_2022_restless_uncertain-supp.pdf
Vineet Nair, Kritika Prakash, Michael Wilbur, Aparna Taneja, Corrine Namblard, Oyindamola Adeyemo, Abhishek Dubey, Abiodun Adereni, Milind Tambe, and Ayan Mukhopadhyay. 7/2022. “ADVISER: AI-Driven Vaccination Intervention Optimiser for Increasing Vaccine Uptake in Nigeria.” In International Joint Conference on AI (IJCAI) 7/2022. Abstract
More than 5 million children under five years die from largely preventable or treatable medical conditions every year, with an overwhelmingly large proportion of deaths occurring in under-developed countries with low vaccination uptake. One of the United Nations’ sustainable development goals (SDG 3) aims to end preventable deaths of new-borns and children under five years of age. We focus on Nigeria, where the rate of infant mortal-ity is appalling. We collaborate with HelpMum, a large non-profit organization in Nigeria to design and optimize the allocation of heterogeneous health interventions under uncertainty to increase vaccination uptake, the first such collaboration in Nigeria. Our framework, ADVISER: AI-Driven Vaccination Intervention Optimiser, is based on an integer linear program that seeks to maximize the cumulative probability of successful vaccination. Our optimization formulation is intractable in practice. We present a heuristic approach that enables us to solve the problem for real-world use-cases. We also present theoretical bounds for the heuristic method. Finally, we show that the proposed approach out-performs baseline methods in terms of vaccination uptake through experimental evaluation. HelpMum is currently planning a pilot program based on our approach to be deployed in the largest city of Nigeria, which would be the first deployment of an AI-driven vaccination uptake program in the country and hopefully, pave the way for other data-driven programs to improve health outcomes in Nigeria.
adviser_ai_driven_vaccination_intervention_optimiser_for_increasing_vaccine.pdf
Aditya Mate, Arpita Biswas, Christoph Siebenbrunner, Susobhan Ghosh, and Milind Tambe. 5/2022. “Efficient Algorithms for Finite Horizon and Streaming RestlessMulti-Armed Bandit Problems.” In International Conference on Autonomous Agents and Multiagent Systems (AAMAS). streamingbandits-camready-full.pdf
Han-Ching Ou*, Christoph Siebenbrunner*, Jackson Killian, Meredith B Brooks, David Kempe, Yevgeniy Vorobeychik, and Milind Tambe. 5/2022. “Networked Restless Multi-Armed Bandits for Mobile Interventions.” In 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022). Online. aamas_2022_network_bandit.pdf
Han-Ching Ou. 3/31/2022. “Sequential Network Planning Problems for Public Health Applications.” PhD Thesis, Computer Science, Harvard University.Abstract

In the past decade, breakthroughs of Artificial Intelligence (AI) in its multiple sub-area have made new applications in various domains possible. One typical yet essential example is the public health domain. There are many challenges for humans in our never-ending battle with diseases. Among them, problems involving harnessing data with network structures and future planning, such as disease control or resource allocation, demand effective solutions significantly. However, unfortunately, some of them are too complicated or unscalable for humans to solve optimally. This thesis tackles these challenging sequential network planning problems for the public health domain by advancing the state-of-the-art to a new level of effectiveness.

In particular, My thesis provides three main contributions to overcome the emerging challenges when applying sequential network planning problems in the public health domain, namely (1) a novel sequential network-based screening/contact tracing framework under uncertainty, (2) a novel sequential network-based mobile interventions framework, (3) theoretical analysis, algorithmic solutions and empirical experiments that shows superior performance compared to previous approaches both theoretically and empirically.

More concretely, the first part of this thesis studies the active screening problem as an emerging application for disease prevention. I introduce a new approach to modeling multi-round network-based screening/contact tracing under uncertainty. Based on the well-known network SIS model in computational epidemiology, which is applicable for many diseases, I propose a model of the multi-agent active screening problem (ACTS) and prove its NP-hardness. I further proposed the REMEDY (REcurrent screening Multi-round Efficient DYnamic agent) algorithm for solving this problem. With a time and solution quality trade-off, REMEDY has two variants, Full- and Fast-REMEDY. It is a Frank-Wolfe-style gradient descent algorithm realized by compacting the representation of belief states to represent uncertainty. As shown in the experiment conducted, Full- and Fast-REMEDY are not only being superior in controlling diseases to all the previous approaches; they are also robust to varying levels of missing
information in the social graph and budget change, thus enabling
the use of our agent to improve the current practice of real-world
screening contexts.

The second part of this thesis focuses on the scalability issue for the time horizon for the ACTS problem. Although Full-REMEDY provides excellent solution qualities, it fails to scale to large time horizons while fully considering the future effect of current interventions. Thus, I proposed a novel reinforcement learning (RL) approach based on Deep Q-Networks (DQN). Due to the nature of the ACTS problem, several challenges that the traditional RL can not handle have emerged, including (1) the combinatorial nature of the problem, (2) the need for sequential planning, and (3) the uncertainties in the infectiousness states of the population. I design several innovative adaptations in my RL approach to address the above challenges. I will introduce why and how these adaptations are made in this part.

For the third part, I introduce a novel sequential network-based mobile interventions framework. It is a restless multi-armed bandits (RMABs) with network pulling effects. In the proposed model, arms are partially recharging and connected through a graph. Pulling one arm also improves the state of neighboring arms, significantly extending the previously studied setting of fully recharging bandits with no network effects. Such network effect may arise due to regular population movements (such as commuting between home and work) for mobile intervention applications. In my thesis, I show that network effects in RMABs induce strong reward coupling that is not accounted for by existing solution methods. I also propose a new solution approach for the networked RMABs by exploiting concavity properties that arise under natural assumptions on the structure of intervention effects. In addition, I show the optimality of such a method in idealized settings and demonstrate that it empirically outperforms state-of-the-art baselines.

han_ching_ou_phd_thesis_and_dissertation.pdf
Aditya Mate*, Lovish Madaan*, Aparna Taneja, Neha Madhiwalla, Shresth Verma, Gargi Singh, Aparna Hegde, Pradeep Varakantham, and Milind Tambe. 2/2022. “Field Study in Deploying Restless Multi-Armed Bandits: Assisting Non-Profits in Improving Maternal and Child Health.” In AAAI Conference on Artificial Intelligence. Vancouver, Canada.Abstract
The widespread availability of cell phones has enabled nonprofits to deliver critical health information to their beneficiaries in a timely manner. This paper describes our work to assist non-profits that employ automated messaging programs to deliver timely preventive care information to beneficiaries (new and expecting mothers) during pregnancy and after delivery. Unfortunately, a key challenge in such information delivery programs is that a significant fraction of beneficiaries drop out of the program. Yet, non-profits often have limited health-worker resources (time) to place crucial service calls for live interaction with beneficiaries to prevent such engagement drops. To assist non-profits in optimizing this limited resource, we developed a Restless Multi-Armed Bandits (RMABs) system. One key technical contribution in this system is a novel clustering method of offline historical data to infer unknown RMAB parameters. Our second major contribution is evaluation of our RMAB system in collaboration with an NGO, via a real-world service quality improvement study. The study compared strategies for optimizing service calls to 23003 participants over a period of 7 weeks to reduce engagement drops. We show that the RMAB group provides statistically significant improvement over other comparison groups, reducing ∼ 30% engagement drops. To the best of our knowledge, this is the first study demonstrating the utility of RMABs in real world public health settings. We are transitioning our RMAB system to the NGO for real-world use.
aaai_rmab_armman_camready.pdf
2021
Aditya Mate*, Lovish Madaan*, Aparna Taneja, Neha Madhiwalla, Shresth Verma, Gargi Singh, Aparna Hegde, Pradeep Varakantham, and Milind Tambe. 12/2021. “Restless Bandits in the Field: Real-World Study for Improving Maternal and Child Health Outcomes.” In MLPH: Machine Learning in Public Health NeurIPS 2021 Workshop.Abstract

The widespread availability of cell phones has enabled non-profits to deliver critical health information to their beneficiaries in a timely manner. This paper describes our work in assisting non-profits employing automated messaging programs to deliver timely preventive care information to new and expecting mothers during pregnancy and after delivery. Unfortunately, a key challenge in such information delivery programs is that a significant fraction of beneficiaries tend to drop out. Yet, non-profits often have limited health-worker resources (time) to place crucial service calls for live interaction with beneficiaries to prevent such engagement drops. To assist non-profits in optimizing this limited resource, we developed a Restless Multi-Armed Bandits (RMABs) system. One key technical contribution in this system is a novel clustering method of offline historical data to infer unknown RMAB parameters. Our second major contribution is evaluation of our RMAB system in collaboration with an NGO, via a real-world service quality improvement study. The study compared strategies for optimizing service calls to 23003 participants over a period of 7 weeks to reduce engagement drops. We show that the RMAB group provides statistically significant improvement over other comparison groups, reducing 30% engagement drops. To the best of our knowledge, this is the first study demonstrating the utility of RMABs in real world public health settings. We are transitioning our system to the NGO for real-world use.

neurips-workshop-mlph-restlessbandits.pdf
Haipeng Chen, Wei Qiu, Han-Ching Ou, Bo An, and Milind Tambe. 7/25/2021. “Contingency-Aware Influence Maximization: A Reinforcement Learning Approach.” In Conference on Uncertainty in Artificial Intelligence. uai21.pdf
Aditya Mate, Andrew Perrault, and Milind Tambe. 5/7/2021. “Risk-Aware Interventions in Public Health: Planning with Restless Multi-Armed Bandits.” In 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS). London, UK.Abstract
Community Health Workers (CHWs) form an important component of health-care systems globally, especially in low-resource settings. CHWs are often tasked with monitoring the health of and intervening on their patient cohort. Previous work has developed several classes of Restless Multi-Armed Bandits (RMABs) that are computationally tractable and indexable, a condition that guarantees asymptotic optimality, for solving such health monitoring and intervention problems (HMIPs).
However, existing solutions to HMIPs fail to account for risk-sensitivity considerations of CHWs in the planning stage and may run the danger of ignoring some patients completely because they are deemed less valuable to intervene on.
Additionally, these also rely on patients reporting their state of adherence accurately when intervened upon. Towards tackling these issues, our contributions in this paper are as follows: 
(1) We develop an RMAB solution to HMIPs that allows for reward functions that are monotone increasing, rather than linear, in the belief state and also supports a wider class of observations.
(2) We prove theoretical guarantees on the asymptotic optimality of our algorithm for any arbitrary reward function. Additionally, we show that for the specific reward function considered in previous work, our theoretical conditions are stronger than the state-of-the-art guarantees.
(3) We show the applicability of these new results for addressing the three issues pertaining to: risk-sensitive planning, equitable allocation and reliance on perfect observations as highlighted above. We evaluate these techniques on both simulated as well as real data from a prevalent CHW task of monitoring adherence of tuberculosis patients to their prescribed medication in Mumbai, India and show improved performance over the state-of-the-art. The simulation code is available at: https://github.com/AdityaMate/risk-aware-bandits.
Risk-Aware-Bandits.pdf
Beyond "To Act or Not to Act": Fast Lagrangian Approaches to General Multi-Action Restless Bandits
Jackson A Killian, Andrew Perrault, and Milind Tambe. 5/2021. “Beyond "To Act or Not to Act": Fast Lagrangian Approaches to General Multi-Action Restless Bandits.” In 20th International Conference on Autonomous Agents and Multiagent Systems. multi_action_bandits_aamas_2021_preprint.pdf
2020
Aditya Mate*, Jackson A. Killian*, Haifeng Xu, Andrew Perrault, and Milind Tambe. 12/5/2020. “Collapsing Bandits and Their Application to Public Health Interventions.” In Advances in Neural and Information Processing Systems (NeurIPS) 12/5/2020. Vancouver, Canada. Publisher's VersionAbstract
We propose and study Collapsing Bandits, a new restless multi-armed bandit (RMAB) setting in which each arm follows a binary-state Markovian process with a special structure: when an arm is played, the state is fully observed, thus “collapsing” any uncertainty, but when an arm is passive, no observation is made, thus allowing uncertainty to evolve. The goal is to keep as many arms in the “good” state as possible by planning a limited budget of actions per round. Such Collapsing Bandits are natural models for many healthcare domains in which health workers must simultaneously monitor patients and deliver interventions in a way that maximizes the health of their patient cohort. Our main contributions are as follows: (i) Building on the Whittle index technique for RMABs, we derive conditions under which the Collapsing Bandits problem is indexable. Our derivation hinges on novel conditions that characterize when the optimal policies may take the form of either “forward” or “reverse” threshold policies. (ii) We exploit the optimality of threshold policies to build fast algorithms for computing the Whittle index, including a closed form. (iii) We evaluate our algorithm on several data distributions including data from a real-world healthcare task in which a worker must monitor and deliver interventions to maximize their patients’ adherence to tuberculosis medication. Our algorithm achieves a 3-order-of-magnitude speedup compared to state-of-the-art RMAB techniques, while achieving similar performance.
collapsing_bandits_full_paper_camready.pdf
Ankit Bhardwaj*, Han Ching Ou*, Haipeng Chen, Shahin Jabbari, Milind Tambe, Rahul Panicker, and Alpan Raval. 11/2020. “Robust Lock-Down Optimization for COVID-19 Policy Guidance.” In AAAI Fall Symposium. robust_lock-down_optimization_for_covid-19_policy_guidance.pdf
Bryan Wilder, Marie Charpignon, Jackson A Killian, Han-Ching Ou, Aditya Mate, Shahin Jabbari, Andrew Perrault, Angel Desai, Milind Tambe, and Maimuna S. Majumder. 9/24/2020. “Modeling between-population variation in COVID-19 dynamics in Hubei, Lombardy, and New York City.” Proceedings of the National Academy of Sciences. Publisher's Version pnas_full.pdf
Evaluating COVID-19 Lockdown and Business-Sector-Specific Reopening Policies for Three US States
Jackson A. Killian, Marie Charpignon, Bryan Wilder, Andrew Perrault, Milind Tambe, and Maimuna S. Majumder. 8/24/2020. “Evaluating COVID-19 Lockdown and Business-Sector-Specific Reopening Policies for Three US States.” In KDD 2020 Workshop on Humanitarian Mapping. Publisher's VersionAbstract
Background: The United States has been particularly hard-hit by COVID-19, accounting for approximately 30% of all global cases and deaths from the disease that have been reported as of May 20, 2020. We extended our agent-based model for COVID-19 transmission to study the effect of alternative lockdown and reopening policies on disease dynamics in Georgia, Florida, and Mississippi. Specifically, for each state we simulated the spread of the disease had the state enforced its lockdown approximately one week earlier than it did. We also simulated Georgia's reopening plan under various levels of physical distancing if enacted in each state, making projections until June 15, 2020.

Methods: We used an agent-based SEIR model that uses population-specific age distribution, household structure, contact patterns, and comorbidity rates to perform tailored simulations for each region. The model was first calibrated to each state using publicly available COVID-19 death data as of April 23, then implemented to simulate given lockdown or reopening policies.

Results: Our model estimated that imposing lockdowns one week earlier could have resulted in hundreds fewer COVID-19-related deaths in the context of all three states. These estimates quantify the effect of early action, a key metric to weigh in developing prospective policies to combat a potential second wave of infection in each of these states. Further, when simulating Georgia’s plan to reopen select businesses as of April 27, our model found that a reopening policy that includes physical distancing to ensure no more than 25% of pre-lockdown contact rates at reopened businesses could allow limited economic activity to resume in any of the three states, while also eventually flattening the curve of COVID-19-related deaths by June 15, 2020.
covid_19_us_states.pdf
Aniruddha Adiga, Lijing Wang, Adam Sadilek, Ashish Tendulkar, Srinivasan Venkatramanan, Anil Vullikanti, Gaurav Aggarwal, Alok Talekar, Xue Ben, Jiangzhuo Chen, Bryan Lewis, Samarth Swarup, Milind Tambe, and Madhav Marathe. 6/5/2020. “Interplay of global multi-scale human mobility, social distancing, government interventions, and COVID-19 dynamics”. Publisher's Version merrxiv.pdf
Han-Ching Ou, Arunesh Sinha, Sze-Chuan Suen, Andrew Perrault, Alpan Raval, and Milind Tambe. 5/9/2020. “Who and When to Screen Multi-Round Active Screening for Network Recurrent Infectious Diseases Under Uncertainty.” In International Conference on Autonomous Agents and Multiagent Systems (AAMAS-20). who_and_when_to_screen.pdf
Aditya Mate, Jackson A. Killian, Bryan Wilder, Marie Charpignon, Ananya Awasthi, Milind Tambe, and Maimuna S. Majumder. 4/13/2020. “Evaluating COVID-19 Lockdown Policies For India: A Preliminary Modeling Assessment for Individual States.” SSRN. Publisher's VersionAbstract
Background: On March 24, India ordered a 3-week nationwide lockdown in an effort to control the spread of COVID-19. While the lockdown has been effective, our model suggests that completely ending the lockdown after three weeks could have considerable adverse public health ramifications. We extend our individual-level model for COVID-19 transmission [1] to study the disease dynamics in India at the state level for Maharashtra and Uttar Pradesh to estimate the effect of further lockdown policies in each region. Specifically, we test policies which alternate between total lockdown and simple physical distancing to find "middle ground" policies that can provide social and economic relief as well as salutary population-level health effects.

Methods: We use an agent-based SEIR model that uses population-specific age distribution, household structure, contact patterns, and comorbidity rates to perform tailored simulations for each region. The model is first calibrated to each region using publicly available COVID-19 death data, then implemented to simulate a range of policies. We also compute the basic reproduction number R0 and case documentation rate for both regions.

Results: After the initial lockdown, our simulations demonstrate that even policies that enforce strict physical distancing while returning to normal activity could lead to widespread outbreaks in both states. However, "middle ground" policies that alternate weekly between total lockdown and physical distancing may lead to much lower rates of infection while simultaneously permitting some return to normalcy.
ssrn-covid_lockdown_policies_india.pdf
2019
Jackson Killian, Bryan Wilder, Amit Sharma, Vinod Choudhary, Bistra Dilkina, and Milind Tambe. 8/4/2019. “Learning to Prescribe Interventions for Tuberculosis Patients Using Digital Adherence Data.” In The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 8/4/2019. Abstract
Digital Adherence Technologies (DATs) are an increasingly popular method for verifying patient adherence to many medications.
We analyze data from one city served by 99DOTS, a phone-callbased DAT deployed for Tuberculosis (TB) treatment in India where
nearly 3 million people are afflicted with the disease each year. The
data contains nearly 17,000 patients and 2.1M dose records. We lay
the groundwork for learning from this real-world data, including
a method for avoiding the effects of unobserved interventions in
training data used for machine learning. We then construct a deep
learning model, demonstrate its interpretability, and show how it
can be adapted and trained in three different clinical scenarios to
better target and improve patient care. In the real-time risk prediction setting our model could be used to proactively intervene with
21% more patients and before 76% more missed doses than current
heuristic baselines. For outcome prediction, our model performs
40% better than baseline methods, allowing cities to target more
resources to clinics with a heavier burden of patients at risk of failure. Finally, we present a case study demonstrating how our model
can be trained in an end-to-end decision focused learning setting to
achieve 15% better solution quality in an example decision problem
faced by health workers.
killian-kdd-2019.pdf
Aida Rahmattalabi, Anamika Barman Adhikari, Phebe Vayanos, Milind Tambe, Eric Rice, and Robin Baker. 2019. “Social Network Based Substance Abuse Prevention via Network Modification (A Preliminary Study).” In Strategic Reasoning for Societal Challenges (SRSC) Workshop at International Conference on Autonomous Agents and Multiagent Systems (AAMAS-19).Abstract
Substance use and abuse is a significant public health problem in the
United States. Group-based intervention programs offer a promising
means of preventing and reducing substance abuse. While effective,
unfortunately, inappropriate intervention groups can result in an
increase in deviant behaviors among participants, a process known
as deviancy training. This paper investigates the problem of optimizing the social influence related to the deviant behavior via careful
construction of the intervention groups. We propose a Mixed Integer Optimization formulation that decides on the intervention
groups to be formed, captures the impact of the intervention groups
on the structure of the social network, and models the impact of
these changes on behavior propagation. In addition, we propose
a scalable hybrid meta-heuristic algorithm that combines Mixed
Integer Programming and Large Neighborhood Search to find nearoptimal network partitions. Our algorithm is packaged in the form
of GUIDE, an AI-based decision aid that recommends intervention groups. Being the first quantitative decision aid of this kind,
GUIDE is able to assist practitioners, in particular social workers, in
three key areas: (a) GUIDE proposes near-optimal solutions that are
shown, via extensive simulations, to significantly improve over the
traditional qualitative practices for forming intervention groups;
(b) GUIDE is able to identify circumstances when an intervention
will lead to deviancy training, thus saving time, money, and effort;
(c) GUIDE can evaluate current strategies of group formation and
discard strategies that will lead to deviancy training. In developing
GUIDE, we are primarily interested in substance use interventions
among homeless youth as a high risk and vulnerable population.
GUIDE is developed in collaboration with Urban Peak, a homelessyouth serving organization in Denver, CO, and is under preparation
for deployment.
2019_5_teamcore_aida_aamas2019_workshop_substance_abuse.pdf
2018
Lily Hu, Bryan Wilder, Amulya Yadav, Eric Rice, and Milind Tambe. 2018. “Activating the 'Breakfast Club': Modeling Influence Spread in Natural-World Social Networks.” In International Conference on Autonomous Agents and Multiagent Systems (AAMAS-18).Abstract
While reigning models of diffusion have privileged the structure of a given social network as the key to informational exchange, real human interactions do not appear to take place on a single graph of connections. Using data collected from a pilot study of the spread of HIV awareness in social networks of homeless youth, we show that health information did not diffuse in the field according to the processes outlined by dominant models. Since physical network diffusion scenarios often diverge from their more well-studied counterparts on digital networks, we propose an alternative Activation Jump Model (AJM) that describes information diffusion on physical networks from a multi-agent team perspective. Our model exhibits two main differentiating features from leading cascade and threshold models of influence spread: 1) The structural composition of a seed set team impacts each individual node’s influencing behavior, and 2) an influencing node may spread information to non-neighbors. We show that the AJM significantly outperforms existing models in its fit to the observed node-level influence data on the youth networks. We then prove theoretical results, showing that the AJM exhibits many well-behaved properties shared by dominant models. Our results suggest that the AJM presents a flexible and more accurate model of network diffusion that may better inform influence maximization in the field.
2018_23_teamcore_aamas_ajm.pdf

Pages