# Publications by Year: 2023

2023
Kai Wang*, Lily Xu*, Aparna Taneja, and Milind Tambe. 2/14/2023. “Optimistic Whittle Index Policy: Online Learning for Restless Bandits.” AAAI Conference on Artificial Intelligence (AAAI).Abstract
Restless multi-armed bandits (RMABs) extend multi-armed bandits to allow for stateful arms, where the state of each arm evolves restlessly with different transitions depending on whether that arm is pulled. Solving RMABs requires information on transition dynamics, which are often unknown upfront. To plan in RMAB settings with unknown transitions, we propose the first online learning algorithm based on the Whittle index policy, using an upper confidence bound (UCB) approach to learn transition dynamics. Specifically, we estimate confidence bounds of the transition probabilities and formulate a bilinear program to compute optimistic Whittle indices using these estimates. Our algorithm, UCWhittle, achieves sublinear $O(H \sqrt{T \log T})$ frequentist regret to solve RMABs with unknown transitions in $T$ episodes with a constant horizon $H$. Empirically, we demonstrate that UCWhittle leverages the structure of RMABs and the Whittle index policy solution to achieve better performance than existing online learning baselines across three domains, including one constructed via sampling from a real-world maternal and childcare dataset.
Kai Wang*, Shresth Verma*, Aditya Mate, Sanket Shah, Aparna Taneja, Neha Madhiwalla, Aparna Hegde, and Milind Tambe. 2/14/2023. “Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Health.” AAAI Conference on Artificial Intelligence (AAAI).Abstract
This paper studies restless multi-armed bandit (RMAB) problems with unknown arm transition dynamics but with known correlated arm features. The goal is to learn a model to predict transition dynamics given features, where the Whittle index policy solves the RMAB problems using predicted transitions. However, prior works often learn the model by maximizing the predictive accuracy instead of final RMAB solution quality, causing a mismatch between training and evaluation objectives. To address this shortcoming, we propose a novel approach for decision-focused learning in RMAB that directly trains the predictive model to maximize the Whittle index solution quality. We present three key contributions: (i) we establish differentiability of the Whittle index policy to support decision-focused learning; (ii) we significantly improve the scalability of decision-focused learning approaches in sequential problems, specifically RMAB problems; (iii) we apply our algorithm to a previously collected dataset of maternal and child health to demonstrate its performance. Indeed, our algorithm is the first for decision-focused learning in RMAB that scales to real-world problem sizes.
Shresth Verma, Gargi Singh, Aditya Mate, Paritosh Verma, Sruthi Gorantala, Neha Madhiwalla, Aparna Hegde, Divy Thakkar, Manish Jain, Milind Tambe, and Aparna Taneja. 2/10/2023. “Increasing Impact of Mobile Health Programs: SAHELI for Maternal and ChildCare.” In Innovative Applications of Artificial Intelligence (IAAI).
Paula Rodriguez Diaz, Jackson A Killian, Lily Xu, Arun Sai Suggala, Aparna Taneja, and Milind Tambe. 2/7/2023. “Flexible Budgets in Restless Bandits:A Primal-Dual Algorithm for Efficient Budget Allocation.” In AAAI Conference on Artificial Intelligence (AAAI).
Jackson A. Killian*, Lily Xu*, Arpita Biswas*, Shresth Verma*, Vineet Nair, Aparna Taneja, Aparna Hegde, Neha Madhiwalla, Paula Rodriguez Diaz, Sonja Johnson-Yu, and Milind Tambe. 2/2023. “Robust Planning over Restless Groups: Engagement Interventions for a Large-Scale Maternal Telehealth Program.” In AAAI Conference on Artificial Intelligence.
Arpita Biswas, Jackson A. Killian, Paula Rodriguez Diaz, Susobhan Ghosh, and Milind Tambe. 2023. “Fairness forWorkers Who Pull the Arms: An Index Based Policyfor Allocation of Restless Bandit Tasks.” In International Conference on Autonomous Agents and Multiagent Systems (AAMAS).Abstract

Motivated by applications such as machine repair, project monitoring, and anti-poaching patrol scheduling, we study intervention planning of stochastic processes under resource constraints. This planning problem has previously been modeled as restless multi-armed bandits (RMAB), where each arm is an interventiondependent Markov Decision Process. However, the existing literature assumes all intervention resources belong to a single uniform pool, limiting their applicability to real-world settings where interventions are carried out by a set of workers, each with their own costs, budgets, and intervention effects. In this work, we consider a novel RMAB setting, called multi-worker restless bandits (MWRMAB) with heterogeneous workers. The goal is to plan an intervention schedule that maximizes the expected reward while satisfying budget constraints on each worker as well as fairness in terms of the load assigned to each worker. Our contributions are two-fold: (1) we provide a multi-worker extension of the Whittle index to tackle heterogeneous costs and per-worker budget and (2) we develop an index-based scheduling policy to achieve fairness. Further, we evaluate our method on various cost structures and show that our method significantly outperforms other baselines in terms of fairness without sacrificing much in reward accumulated.