We introduce robustness in restless multi-armed bandits (RMABs), a popular model for constrained resource allocation among independent stochastic processes (arms). Nearly all RMAB techniques assume stochastic dynamics are precisely known. However, in many real-world settings, dynamics are estimated with significant uncertainty, e.g., via historical data, which can lead to bad outcomes if ignored. To address this, we develop an algorithm to compute minimax regret--robust policies for RMABs. Our approach uses a double oracle framework (oracles for agent and nature), which is often used for single-process robust planning but requires significant new techniques to accommodate the combinatorial nature of RMABs. Specifically, we design a deep reinforcement learning (RL) algorithm, DDLPO, which tackles the combinatorial challenge by learning an auxiliary "λ-network" in tandem with policy networks per arm, greatly reducing sample complexity, with guarantees on convergence. DDLPO, of general interest, implements our reward-maximizing agent oracle. We then tackle the challenging regret-maximizing nature oracle, a non-stationary RL challenge, by formulating it as a multi-agent RL problem between a policy optimizer and adversarial nature. This formulation is of general interest---we solve it for RMABs by creating a multi-agent extension of DDLPO with a shared critic. We show our approaches work well in three experimental domains.
More than 5 million children under five years die from largely preventable or treatable medical conditions every year, with an overwhelmingly large proportion of deaths occurring in under-developedcountries with low vaccination uptake.One ofthe United Nations’ sustainable development goals(SDG 3) aims to end preventable deaths of new-borns and children under five years of age.Wefocus on Nigeria, where the rate of infant mortal-ity is appalling. We collaborate with HelpMum, alarge non-profit organization in Nigeria to design and optimize the allocation of heterogeneous healthinterventions under uncertainty to increase vaccination uptake, the first such collaboration in Nigeria. Our framework, ADVISER: AI-Driven Vaccination Intervention Optimiser, is based on an integer linear program that seeks to maximize the cumulative probability of successful vaccination. Ouroptimization formulation is intractable in practice. We present a heuristic approach that enables us tosolve the problem for real-world use-cases. We also present theoretical bounds for the heuristic method.Finally, we show that the proposed approach out-performs baseline methods in terms of vaccinationuptake through experimental evaluation. HelpMum is currently planning a pilot program based on ourapproach to be deployed in the largest city of Nigeria, which would be the first deployment of an AI-driven vaccination uptake program in the countryand hopefully, pave the way for other data-drivenprograms to improve health outcomes in Nigeria.
In the past decade, breakthroughs of Artificial Intelligence (AI) in its multiple sub-area have made new applications in various domains possible. One typical yet essential example is the public health domain. There are many challenges for humans in our never-ending battle with diseases. Among them, problems involving harnessing data with network structures and future planning, such as disease control or resource allocation, demand effective solutions significantly. However, unfortunately, some of them are too complicated or unscalable for humans to solve optimally. This thesis tackles these challenging sequential network planning problems for the public health domain by advancing the state-of-the-art to a new level of effectiveness.
In particular, My thesis provides three main contributions to overcome the emerging challenges when applying sequential network planning problems in the public health domain, namely (1) a novel sequential network-based screening/contact tracing framework under uncertainty, (2) a novel sequential network-based mobile interventions framework, (3) theoretical analysis, algorithmic solutions and empirical experiments that shows superior performance compared to previous approaches both theoretically and empirically.
More concretely, the first part of this thesis studies the active screening problem as an emerging application for disease prevention. I introduce a new approach to modeling multi-round network-based screening/contact tracing under uncertainty. Based on the well-known network SIS model in computational epidemiology, which is applicable for many diseases, I propose a model of the multi-agent active screening problem (ACTS) and prove its NP-hardness. I further proposed the REMEDY (REcurrent screening Multi-round Efficient DYnamic agent) algorithm for solving this problem. With a time and solution quality trade-off, REMEDY has two variants, Full- and Fast-REMEDY. It is a Frank-Wolfe-style gradient descent algorithm realized by compacting the representation of belief states to represent uncertainty. As shown in the experiment conducted, Full- and Fast-REMEDY are not only being superior in controlling diseases to all the previous approaches; they are also robust to varying levels of missing information in the social graph and budget change, thus enabling the use of our agent to improve the current practice of real-world screening contexts.
The second part of this thesis focuses on the scalability issue for the time horizon for the ACTS problem. Although Full-REMEDY provides excellent solution qualities, it fails to scale to large time horizons while fully considering the future effect of current interventions. Thus, I proposed a novel reinforcement learning (RL) approach based on Deep Q-Networks (DQN). Due to the nature of the ACTS problem, several challenges that the traditional RL can not handle have emerged, including (1) the combinatorial nature of the problem, (2) the need for sequential planning, and (3) the uncertainties in the infectiousness states of the population. I design several innovative adaptations in my RL approach to address the above challenges. I will introduce why and how these adaptations are made in this part.
For the third part, I introduce a novel sequential network-based mobile interventions framework. It is a restless multi-armed bandits (RMABs) with network pulling effects. In the proposed model, arms are partially recharging and connected through a graph. Pulling one arm also improves the state of neighboring arms, significantly extending the previously studied setting of fully recharging bandits with no network effects. Such network effect may arise due to regular population movements (such as commuting between home and work) for mobile intervention applications. In my thesis, I show that network effects in RMABs induce strong reward coupling that is not accounted for by existing solution methods. I also propose a new solution approach for the networked RMABs by exploiting concavity properties that arise under natural assumptions on the structure of intervention effects. In addition, I show the optimality of such a method in idealized settings and demonstrate that it empirically outperforms state-of-the-art baselines.
The widespread availability of cell phones has enabled nonprofits to deliver critical health information to their beneficiaries in a timely manner. This paper describes our work to assist non-profits that employ automated messaging programs to deliver timely preventive care information to beneficiaries (new and expecting mothers) during pregnancy and after delivery. Unfortunately, a key challenge in such information delivery programs is that a significant fraction of beneficiaries drop out of the program. Yet, non-profits often have limited health-worker resources (time) to place crucial service calls for live interaction with beneficiaries to prevent such engagement drops. To assist non-profits in optimizing this limited resource, we developed a Restless Multi-Armed Bandits (RMABs) system. One key technical contribution in this system is a novel clustering method of offline historical data to infer unknown RMAB parameters. Our second major contribution is evaluation of our RMAB system in collaboration with an NGO, via a real-world service quality improvement study. The study compared strategies for optimizing service calls to 23003 participants over a period of 7 weeks to reduce engagement drops. We show that the RMAB group provides statistically significant improvement over other comparison groups, reducing ∼ 30% engagement drops. To the best of our knowledge, this is the first study demonstrating the utility of RMABs in real world public health settings. We are transitioning our RMAB system to the NGO for real-world use.
The widespread availability of cell phones has enabled non-profits to deliver critical health information to their beneficiaries in a timely manner. This paper describes our work in assisting non-profits employing automated messaging programs to deliver timely preventive care information to new and expecting mothers during pregnancy and after delivery. Unfortunately, a key challenge in such information delivery programs is that a significant fraction of beneficiaries tend to drop out. Yet, non-profits often have limited health-worker resources (time) to place crucial service calls for live interaction with beneficiaries to prevent such engagement drops. To assist non-profits in optimizing this limited resource, we developed a Restless Multi-Armed Bandits (RMABs) system. One key technical contribution in this system is a novel clustering method of offline historical data to infer unknown RMAB parameters. Our second major contribution is evaluation of our RMAB system in collaboration with an NGO, via a real-world service quality improvement study. The study compared strategies for optimizing service calls to 23003 participants over a period of 7 weeks to reduce engagement drops. We show that the RMAB group provides statistically significant improvement over other comparison groups, reducing 30% engagement drops. To the best of our knowledge, this is the first study demonstrating the utility of RMABs in real world public health settings. We are transitioning our system to the NGO for real-world use.
Community Health Workers (CHWs) form an important component of health-care systems globally, especially in low-resource settings. CHWs are often tasked with monitoring the health of and intervening on their patient cohort. Previous work has developed several classes of Restless Multi-Armed Bandits (RMABs) that are computationally tractable and indexable, a condition that guarantees asymptotic optimality, for solving such health monitoring and intervention problems (HMIPs). However, existing solutions to HMIPs fail to account for risk-sensitivity considerations of CHWs in the planning stage and may run the danger of ignoring some patients completely because they are deemed less valuable to intervene on. Additionally, these also rely on patients reporting their state of adherence accurately when intervened upon. Towards tackling these issues, our contributions in this paper are as follows: (1) We develop an RMAB solution to HMIPs that allows for reward functions that are monotone increasing, rather than linear, in the belief state and also supports a wider class of observations. (2) We prove theoretical guarantees on the asymptotic optimality of our algorithm for any arbitrary reward function. Additionally, we show that for the specific reward function considered in previous work, our theoretical conditions are stronger than the state-of-the-art guarantees. (3) We show the applicability of these new results for addressing the three issues pertaining to: risk-sensitive planning, equitable allocation and reliance on perfect observations as highlighted above. We evaluate these techniques on both simulated as well as real data from a prevalent CHW task of monitoring adherence of tuberculosis patients to their prescribed medication in Mumbai, India and show improved performance over the state-of-the-art. The simulation code is available at: https://github.com/AdityaMate/risk-aware-bandits.
We propose and study Collapsing Bandits, a new restless multi-armed bandit (RMAB) setting in which each arm follows a binary-state Markovian process with a special structure: when an arm is played, the state is fully observed, thus “collapsing” any uncertainty, but when an arm is passive, no observation is made, thus allowing uncertainty to evolve. The goal is to keep as many arms in the “good” state as possible by planning a limited budget of actions per round. Such Collapsing Bandits are natural models for many healthcare domains in which health workers must simultaneously monitor patients and deliver interventions in a way that maximizes the health of their patient cohort. Our main contributions are as follows: (i) Building on the Whittle index technique for RMABs, we derive conditions under which the Collapsing Bandits problem is indexable. Our derivation hinges on novel conditions that characterize when the optimal policies may take the form of either “forward” or “reverse” threshold policies. (ii) We exploit the optimality of threshold policies to build fast algorithms for computing the Whittle index, including a closed form. (iii) We evaluate our algorithm on several data distributions including data from a real-world healthcare task in which a worker must monitor and deliver interventions to maximize their patients’ adherence to tuberculosis medication. Our algorithm achieves a 3-order-of-magnitude speedup compared to state-of-the-art RMAB techniques, while achieving similar performance.
Background: The United States has been particularly hard-hit by COVID-19, accounting for approximately 30% of all global cases and deaths from the disease that have been reported as of May 20, 2020. We extended our agent-based model for COVID-19 transmission to study the effect of alternative lockdown and reopening policies on disease dynamics in Georgia, Florida, and Mississippi. Specifically, for each state we simulated the spread of the disease had the state enforced its lockdown approximately one week earlier than it did. We also simulated Georgia's reopening plan under various levels of physical distancing if enacted in each state, making projections until June 15, 2020.
Methods: We used an agent-based SEIR model that uses population-specific age distribution, household structure, contact patterns, and comorbidity rates to perform tailored simulations for each region. The model was first calibrated to each state using publicly available COVID-19 death data as of April 23, then implemented to simulate given lockdown or reopening policies.
Results: Our model estimated that imposing lockdowns one week earlier could have resulted in hundreds fewer COVID-19-related deaths in the context of all three states. These estimates quantify the effect of early action, a key metric to weigh in developing prospective policies to combat a potential second wave of infection in each of these states. Further, when simulating Georgia’s plan to reopen select businesses as of April 27, our model found that a reopening policy that includes physical distancing to ensure no more than 25% of pre-lockdown contact rates at reopened businesses could allow limited economic activity to resume in any of the three states, while also eventually flattening the curve of COVID-19-related deaths by June 15, 2020.
Background: On March 24, India ordered a 3-week nationwide lockdown in an effort to control the spread of COVID-19. While the lockdown has been effective, our model suggests that completely ending the lockdown after three weeks could have considerable adverse public health ramifications. We extend our individual-level model for COVID-19 transmission  to study the disease dynamics in India at the state level for Maharashtra and Uttar Pradesh to estimate the effect of further lockdown policies in each region. Specifically, we test policies which alternate between total lockdown and simple physical distancing to find "middle ground" policies that can provide social and economic relief as well as salutary population-level health effects.
Methods: We use an agent-based SEIR model that uses population-specific age distribution, household structure, contact patterns, and comorbidity rates to perform tailored simulations for each region. The model is first calibrated to each region using publicly available COVID-19 death data, then implemented to simulate a range of policies. We also compute the basic reproduction number R0 and case documentation rate for both regions.
Results: After the initial lockdown, our simulations demonstrate that even policies that enforce strict physical distancing while returning to normal activity could lead to widespread outbreaks in both states. However, "middle ground" policies that alternate weekly between total lockdown and physical distancing may lead to much lower rates of infection while simultaneously permitting some return to normalcy.
Digital Adherence Technologies (DATs) are an increasingly popular method for verifying patient adherence to many medications. We analyze data from one city served by 99DOTS, a phone-callbased DAT deployed for Tuberculosis (TB) treatment in India where nearly 3 million people are afflicted with the disease each year. The data contains nearly 17,000 patients and 2.1M dose records. We lay the groundwork for learning from this real-world data, including a method for avoiding the effects of unobserved interventions in training data used for machine learning. We then construct a deep learning model, demonstrate its interpretability, and show how it can be adapted and trained in three different clinical scenarios to better target and improve patient care. In the real-time risk prediction setting our model could be used to proactively intervene with 21% more patients and before 76% more missed doses than current heuristic baselines. For outcome prediction, our model performs 40% better than baseline methods, allowing cities to target more resources to clinics with a heavier burden of patients at risk of failure. Finally, we present a case study demonstrating how our model can be trained in an end-to-end decision focused learning setting to achieve 15% better solution quality in an example decision problem faced by health workers.
Substance use and abuse is a significant public health problem in the United States. Group-based intervention programs offer a promising means of preventing and reducing substance abuse. While effective, unfortunately, inappropriate intervention groups can result in an increase in deviant behaviors among participants, a process known as deviancy training. This paper investigates the problem of optimizing the social influence related to the deviant behavior via careful construction of the intervention groups. We propose a Mixed Integer Optimization formulation that decides on the intervention groups to be formed, captures the impact of the intervention groups on the structure of the social network, and models the impact of these changes on behavior propagation. In addition, we propose a scalable hybrid meta-heuristic algorithm that combines Mixed Integer Programming and Large Neighborhood Search to find nearoptimal network partitions. Our algorithm is packaged in the form of GUIDE, an AI-based decision aid that recommends intervention groups. Being the first quantitative decision aid of this kind, GUIDE is able to assist practitioners, in particular social workers, in three key areas: (a) GUIDE proposes near-optimal solutions that are shown, via extensive simulations, to significantly improve over the traditional qualitative practices for forming intervention groups; (b) GUIDE is able to identify circumstances when an intervention will lead to deviancy training, thus saving time, money, and effort; (c) GUIDE can evaluate current strategies of group formation and discard strategies that will lead to deviancy training. In developing GUIDE, we are primarily interested in substance use interventions among homeless youth as a high risk and vulnerable population. GUIDE is developed in collaboration with Urban Peak, a homelessyouth serving organization in Denver, CO, and is under preparation for deployment.
While reigning models of diffusion have privileged the structure of a given social network as the key to informational exchange, real human interactions do not appear to take place on a single graph of connections. Using data collected from a pilot study of the spread of HIV awareness in social networks of homeless youth, we show that health information did not diffuse in the field according to the processes outlined by dominant models. Since physical network diffusion scenarios often diverge from their more well-studied counterparts on digital networks, we propose an alternative Activation Jump Model (AJM) that describes information diffusion on physical networks from a multi-agent team perspective. Our model exhibits two main differentiating features from leading cascade and threshold models of influence spread: 1) The structural composition of a seed set team impacts each individual node’s influencing behavior, and 2) an influencing node may spread information to non-neighbors. We show that the AJM significantly outperforms existing models in its fit to the observed node-level influence data on the youth networks. We then prove theoretical results, showing that the AJM exhibits many well-behaved properties shared by dominant models. Our results suggest that the AJM presents a flexible and more accurate model of network diffusion that may better inform influence maximization in the field.