Prevention of Tuberculosis via Prediction and Multi-agent Planning

Tuberculosis Health Outpost

Prevention of Tuberculosis via Prediction and Multi-agent Planning



Tuberculosis (TB), an infectious disease primarly of the lungs, is still one of the top 10 causes of death worldwide and disproportionately affects under-resourced countries and communities. Though the disease is curable, the challenges for patients in treatment are myriad: treatment regimens require daily antibioitics for a minimum of six months, treatment side effects make it difficult to work, and patients often face social stigma. At the policy level, the disease also presents challenges as governments must try to make accurate models of disease spread with incomplete information in order to decide where to spend and deploy limited health resources. 




One of the biggest challenges to ending TB is ensuring medication adherence. When patients miss too many antibiotic doses, they risk reinfection or development of multi-drug resistant strains of TB. This project focuses on the development of decision support systems for TB health workers who need to plan intervention schedules targeted toward keeping patients adherent to their antibiotics. The earlier workers can detect risk of non-adherence, the sooner they can deliver preventative interventions, keeping patients from missing critical doses.

Through ongoing collaborations with the healthcare technology company Everwell, and the city of Mumbai, India, we developed a machine learning system trained on millions of daily adherence logs from the 99DOTS digital adherence monitoring system. In simulation, the system is capable of detecting patient missed doses almost twice as early as baselines. The work is described in the following video:





Collapsing Bandits and Their Application to Public Health Interventions (NeurIPS'20)

chw.jpg (800×536)
Photo Credit: Pippa Ranger

In this work, we develop a multi-agent system to tackle situations in which the daily adherence logs are not automatically available and health workers must call patients to gather the adherence information as well as simultaneously deliver interventions to patients to boost their future adherence. In such scenarios, the health workers must plan their interventions while balancing the exploration/exploitation tradeoff effectively so as to maximize the overall health outcomes for their patient cohort.

We propose and develop Collapsing Bandits, a new sub-class of the Restless Multi-Armed Bandits (RMAB) setting in which each arm follows a binary-state Markovian process with a special structure: when an arm is played, the state is fully observed, thus “collapsing” any uncertainty, but when an arm is passive, no observation is made, thus allowing uncertainty to evolve.  We use this framework to develop a planning algorithm that recommends to the health workers which patients would benefit most from intervention each day. We show that our algorithm achieves a 3-order of magnitude speedup as compared to previous RMAB approaches without sacrificing on performance, making our approach feasible to use with limited computational resources on real-world problem sizes (e.g., 100s of patients).


Beyond "To Act or Not to Act": Fast Lagrangian Approaches to General Multi-Action Restless Bandits (AAMAS'21) [paper]

multi-action bandit fig. 1 planner delivers interventions with varied costs/effects across many patients, respecting a budget

In previous work (NeurIPS'20), we used restless bandits to schedule adherence support interventions, but traditional restless bandits only allow the user to plan a single intervention type. In general, workers have many options for intervening (e.g., call, text, visit), each with their own cost and patient-dependent effect. The key challenge then is how to act optimally (i.e., maximize patient medication adherence) while staying under budget? We extend to the little-studied "Multi-action" Restless Bandits setting. Using techniques from Lagrangian relaxation and linear programming, we develop an algorithm with provable gaurantees and improved best-case run time scaling compared to existing baslines. In a simulated TB medication adherence intervention task, our algorithm gives state of the art intervention policies in a fraction of the time compared to baselines, taking a critical step toward enabling long-term planning -- that considers cost-benefit tradeoffs of multiple interventions -- in a way that scales to real-world problem sizes that we see in TB and other public health challenges.


Risk-Aware Interventions
in Public Health
Planning with Restless
Multi-Armed Bandits


Risk-aware planning results in fewer patients with less than 5% adherence, and more with greater than 95% adherence

This work also builds off of our earlier "Collapsing Bandits" framework (NeurIPS’20) that proposes an algorithm to recommend to the health workers each day, which patients they should intervene on. We address multiple challenges related to real-world deployment: 1) The previous algorithm may be insensitive to equitable resource allocation, meaning it may deem that some patients are "sub-optimal" to intervene on (e.g., they will be less likely to respond to an intervention compared to another patient) and may choose to ignore them completely. 2) The previous algorithm also assumes that the worker will be able to perfectly observe the patient's adherence and 3) assumes that the planner is risk-neutral — while real-world health workers may be risk-averse. To address these concerns, we propose a "Risk-Aware" algorithm that admits a more general, non-linear reward function that can be suitably shaped to capture the planner’s considerations and also accommodates imperfect observations. We also develop new theory to prove optimality guarantees for our approach. The end result is an algorithm that caters to real-world considerations, adding flexibility to allow the planner to specify the objectives that are meaningful in their setting.




Hand-writing on a sign with "Zero TB death"

While individuals with symptoms of infectious disease may seek treatment themselves (passive-screening), public health campaigns may fund active-screening programs where health workers seek out undiagnosed cases by canvasing at-risk individuals.  Individuals at particularly high risk–those who have come into contact with infected individuals–may be identified through interviews with patients and encouraged to seek screening and treatment.

In an ongoing project in collaboration with the Wadhwani AI Institute in Mumbai, India, we are developing an algorithm to help inform how contact tracing might be optimized, given a limited budget for disease screening and knowledge of the contact network between individuals in a community.  Who should be screened if some individuals are confirmed to be infected while others are merely suspected to have disease?  How does this change over time, with the graph structure, and disease progression?


Contact Tracing:
Planning Who and When to Screen
on the Contact Network
[paper] [video]

Active screening provides a powerful yet expensive means to control disease spread in the public health domain that passive screening cannot achieve due to its latency of cure. With limited but valuable health workers, we need to plan their visits smartly.

contact tracing on network

In this work, we built a novel active screening model based on the network SIS model commonly used in public health literature. The problem involved sequential planning of combinatorial actions that are very challenging and at least NP-hard. We developed the algorithm REMEDY with two variants. FULL-REMEDY considers the effect of future actions and finds a policy that provides high solution quality, where FAST-REMEDY scales linearly in the network's size. It leverages both the information gained from passive-screening and graph structure to provide high-quality solutions. In the simulations we have done using real-world contact network and disease parameters, both variants outperform baselines used in most practices with a large margin. The details are described in the above paper and video links.


Contact Tracing: Scaling up with
Multi-Agent Reinforcement Learning

While the prior work provides high-quality solutions for short-term planning, it fails to scale to a large time horizon while fully considering current interventions' future effects. In this work, we further address such challenge by using a novel reinforcement learning (RL) approach based on Deep Q-Networks (DQN). We proposed an innovative two-level multi-agent framework that hierarchically solves the problem. We also speed-up the slow convergence RL by incorporate ideas from curriculum learning into our approach. It can scale up to 10 times the problem size of Full-REMEDY in terms of planning time horizon and outperforms Fast-REMEDY by up to 33% in solution quality.



Treatable infectious diseases are a critical challenge for public health. While treatment regimens may exist, and even be offered at low cost to patients, often individuals do not recognize the need to seek treatment or delay doing so, thereby increasing transmission risk to others.  Outreach and education campaigns can encourage undiagnosed patients to seek treatment. However, such programs must be carefully targeted to appropriate demographics (for the specific disease) in resource-constrained settings.

In prior work, we developed an algorithm to optimally allocate limited outreach resources among demographic groups in the population. The algorithm uses a novel multi-agent model of disease spread which both captures the underlying population dynamics and is amenable to optimization. Our algorithm extends, with provable guarantees, to a stochastic setting where we have only a distribution over parameters such as the contact pattern between agents. We evaluate our algorithm on two instances where this distribution is inferred from real world data: tuberculosis in India and gonorrhea in the United States.

Child next to an x-ray image



While the medical literature offers a rich source of information about disease progression, screening, and treatment, the spread of disease on a population can be a complex process that requires simulation to understand.  We develop individual-level simulations of infectious and non-infectious disease to probabilistically infer who will acquire disease, project population disease metrics (such as prevalence and incidence) over long time horizons, and to evaluate the effectiveness (and cost) of interventions.  These simulations draw on detailed knowledge about disease progression, patient behavior, and treatment outcomes using information from medical literature and datasets.  In prior projects, we have developed individual-level simulations of tuberculosis, and efforts to model chronic disease in elderly persons are ongoing.  Ongoing work involves examining how different models (with different runtimes, fidelity to the real world, and noise) can be used in combination to quickly and accurately identify optimal policies for disease control given individual heterogeneity.


Aditya Mate, Andrew Perrault, and Milind Tambe. 5/7/2021. “Risk-Aware Interventions in Public Health: Planning with Restless Multi-Armed Bandits.” In 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS). London, UK.Abstract
Community Health Workers (CHWs) form an important component of health-care systems globally, especially in low-resource settings. CHWs are often tasked with monitoring the health of and intervening on their patient cohort. Previous work has developed several classes of Restless Multi-Armed Bandits (RMABs) that are computationally tractable and indexable, a condition that guarantees asymptotic optimality, for solving such health monitoring and intervention problems (HMIPs).
However, existing solutions to HMIPs fail to account for risk-sensitivity considerations of CHWs in the planning stage and may run the danger of ignoring some patients completely because they are deemed less valuable to intervene on.
Additionally, these also rely on patients reporting their state of adherence accurately when intervened upon. Towards tackling these issues, our contributions in this paper are as follows: 
(1) We develop an RMAB solution to HMIPs that allows for reward functions that are monotone increasing, rather than linear, in the belief state and also supports a wider class of observations.
(2) We prove theoretical guarantees on the asymptotic optimality of our algorithm for any arbitrary reward function. Additionally, we show that for the specific reward function considered in previous work, our theoretical conditions are stronger than the state-of-the-art guarantees.
(3) We show the applicability of these new results for addressing the three issues pertaining to: risk-sensitive planning, equitable allocation and reliance on perfect observations as highlighted above. We evaluate these techniques on both simulated as well as real data from a prevalent CHW task of monitoring adherence of tuberculosis patients to their prescribed medication in Mumbai, India and show improved performance over the state-of-the-art. The simulation code is available at:
Aditya Mate*, Jackson A. Killian*, Haifeng Xu, Andrew Perrault, and Milind Tambe. 12/5/2020. “Collapsing Bandits and Their Application to Public Health Interventions.” In Advances in Neural and Information Processing Systems (NeurIPS) 12/5/2020. Vancouver, Canada. Publisher's VersionAbstract
We propose and study Collapsing Bandits, a new restless multi-armed bandit (RMAB) setting in which each arm follows a binary-state Markovian process with a special structure: when an arm is played, the state is fully observed, thus “collapsing” any uncertainty, but when an arm is passive, no observation is made, thus allowing uncertainty to evolve. The goal is to keep as many arms in the “good” state as possible by planning a limited budget of actions per round. Such Collapsing Bandits are natural models for many healthcare domains in which health workers must simultaneously monitor patients and deliver interventions in a way that maximizes the health of their patient cohort. Our main contributions are as follows: (i) Building on the Whittle index technique for RMABs, we derive conditions under which the Collapsing Bandits problem is indexable. Our derivation hinges on novel conditions that characterize when the optimal policies may take the form of either “forward” or “reverse” threshold policies. (ii) We exploit the optimality of threshold policies to build fast algorithms for computing the Whittle index, including a closed form. (iii) We evaluate our algorithm on several data distributions including data from a real-world healthcare task in which a worker must monitor and deliver interventions to maximize their patients’ adherence to tuberculosis medication. Our algorithm achieves a 3-order-of-magnitude speedup compared to state-of-the-art RMAB techniques, while achieving similar performance.
Jackson Killian, Bryan Wilder, Amit Sharma, Vinod Choudhary, Bistra Dilkina, and Milind Tambe. 8/4/2019. “Learning to Prescribe Interventions for Tuberculosis Patients Using Digital Adherence Data.” In The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 8/4/2019. Abstract
Digital Adherence Technologies (DATs) are an increasingly popular method for verifying patient adherence to many medications.
We analyze data from one city served by 99DOTS, a phone-callbased DAT deployed for Tuberculosis (TB) treatment in India where
nearly 3 million people are afflicted with the disease each year. The
data contains nearly 17,000 patients and 2.1M dose records. We lay
the groundwork for learning from this real-world data, including
a method for avoiding the effects of unobserved interventions in
training data used for machine learning. We then construct a deep
learning model, demonstrate its interpretability, and show how it
can be adapted and trained in three different clinical scenarios to
better target and improve patient care. In the real-time risk prediction setting our model could be used to proactively intervene with
21% more patients and before 76% more missed doses than current
heuristic baselines. For outcome prediction, our model performs
40% better than baseline methods, allowing cities to target more
resources to clinics with a heavier burden of patients at risk of failure. Finally, we present a case study demonstrating how our model
can be trained in an end-to-end decision focused learning setting to
achieve 15% better solution quality in an example decision problem
faced by health workers.


Contact us about being a sponsor for this project