Zun Li, Feiran Jia, Aditya Mate, Shahin Jabbari, Mithun Chakraborty, Milind Tambe, and Yevgeniy Vorobeychik. 8/2022. “Solving Structured Hierarchical Games Using Differential Backward Induction.” In Conference on Uncertainty in Artificial Intelligence (UAI). Eindhoven, Netherlands.Abstract
From large-scale organizations to decentralized political systems, hierarchical strategic decision making is commonplace. We introduce a novel class of structured hierarchical games (SHGs) that formally capture such hierarchical strategic interactions. In an SHG, each player is a node in a tree, and strategic choices of players are sequenced from root to leaves, with root moving first, followed by its children, then followed by their children, and so on until the leaves. A player’s utility in an SHG depends on its own decision, and on the choices of its parent and all the tree leaves. SHGs thus generalize simultaneous-move games, as well as Stackelberg games with many followers. We leverage the structure of both the sequence of player moves as well as payoff dependence to develop a gradientbased back propagation-style algorithm, which we call Differential Backward Induction (DBI), for approximating equilibria of SHGs. We provide a sufficient condition for convergence of DBI and demonstrate its efficacy in finding approximate equilibrium solutions to several SHG models of hierarchical policy-making problems.
Jackson A. Killian, Lily Xu, Arpita Biswas, and Milind Tambe. 8/2022. “Restless and Uncertain: Robust Policies for Restless Bandits via Deep Multi-Agent Reinforcement Learning.” In Uncertainty in Artificial Intelligence (UAI).Abstract
We introduce robustness in restless multi-armed bandits (RMABs), a popular model for constrained resource allocation among independent stochastic processes (arms). Nearly all RMAB techniques assume stochastic dynamics are precisely known. However, in many real-world settings, dynamics are estimated with significant uncertainty, e.g., via historical data, which can lead to bad outcomes if ignored. To address this, we develop an algorithm to compute minimax regret--robust policies for RMABs. Our approach uses a double oracle framework (oracles for agent and nature), which is often used for single-process robust planning but requires significant new techniques to accommodate the combinatorial nature of RMABs. Specifically, we design a deep reinforcement learning (RL) algorithm, DDLPO, which tackles the combinatorial challenge by learning an auxiliary "λ-network" in tandem with policy networks per arm, greatly reducing sample complexity, with guarantees on convergence. DDLPO, of general interest, implements our reward-maximizing agent oracle. We then tackle the challenging regret-maximizing nature oracle, a non-stationary RL challenge, by formulating it as a multi-agent RL problem between a policy optimizer and adversarial nature. This formulation is of general interest---we solve it for RMABs by creating a multi-agent extension of DDLPO with a shared critic. We show our approaches work well in three experimental domains.
killian_uai_2022_restless_uncertain.pdf killian_uai_2022_restless_uncertain-supp.pdf
Lily Xu, Arpita Biswas, Fei Fang, and Milind Tambe. 7/23/2022. “Ranked Prioritization of Groups in Combinatorial Bandit Allocation.” International Joint Conference on Artificial Intelligence (IJCAI) 31. Vienna, Austria. arXiv linkAbstract
Preventing poaching through ranger patrols is critical for protecting endangered wildlife. Combinatorial bandits have been used to allocate limited patrol resources, but existing approaches overlook the fact that each location is home to multiple species in varying proportions, so a patrol benefits each species to differing degrees. When some species are more vulnerable, we ought to offer more protection to these animals; unfortunately, existing combinatorial bandit approaches do not offer a way to prioritize important species. To bridge this gap, (1) We propose a novel combinatorial bandit objective that trades off between reward maximization and also accounts for prioritization over species, which we call ranked prioritization. We show this objective can be expressed as a weighted linear sum of Lipschitz-continuous reward functions. (2) We provide RankedCUCB, an algorithm to select combinatorial actions that optimize our prioritization-based objective, and prove that it achieves asymptotic no-regret. (3) We demonstrate empirically that RankedCUCB leads to up to 38% improvement in outcomes for endangered species using real-world wildlife conservation data. Along with adapting to other challenges such as preventing illegal logging and overfishing, our no-regret algorithm addresses the general combinatorial bandit problem with a weighted linear objective.
E. Cranford, S. Jabbari, H-C. Ou, M. Tambe, C. Gonzalez, and C. Lebiere. 7/2022. “Combining Machine Learning and Cognitive Models for Adaptive Phishing Training.” In Internation Conference on Cognitive Modeling (ICCM).Abstract
Organizations typically use simulation campaigns to train employees to detect phishing emails but are non-personalized and fail to account for human experiential learning and adaptivity. We propose a method to improve the effectiveness of training by combining cognitive modeling with machine learning methods. We frame the problem as one of scheduling and use the restless multi-armed bandit (RMAB) framework to select which users to target for intervention at each trial, while using a cognitive model of phishing susceptibility to inform the parameters of the RMAB. We compare the effectiveness of the RMAB solution to two purely cognitive approaches in a series of simulation studies using the cognitive model as simulated participants. Both approaches show improvement compared to random selection and we highlight the pros and cons of each approach. We discuss the implications of these findings and future research that aims to combine the benefits of both methods for a more effective solution.
Combining Machine Learning and Cognitive Models for Adaptive Phishing Training
Vineet Nair, Kritika Prakash, Michael Wilbur, Aparna Taneja, Corrine Namblard, Oyindamola Adeyemo, Abhishek Dubey, Abiodun Adereni, Milind Tambe, and Ayan Mukhopadhyay. 7/2022. “ADVISER: AI-Driven Vaccination Intervention Optimiser for Increasing Vaccine Uptake in Nigeria.” In International Joint Conference on AI (IJCAI) 7/2022. Abstract
More than 5 million children under five years die from largely preventable or treatable medical conditions every year, with an overwhelmingly large proportion of deaths occurring in under-developed countries with low vaccination uptake. One of the United Nations’ sustainable development goals (SDG 3) aims to end preventable deaths of new-borns and children under five years of age. We focus on Nigeria, where the rate of infant mortal-ity is appalling. We collaborate with HelpMum, a large non-profit organization in Nigeria to design and optimize the allocation of heterogeneous health interventions under uncertainty to increase vaccination uptake, the first such collaboration in Nigeria. Our framework, ADVISER: AI-Driven Vaccination Intervention Optimiser, is based on an integer linear program that seeks to maximize the cumulative probability of successful vaccination. Our optimization formulation is intractable in practice. We present a heuristic approach that enables us to solve the problem for real-world use-cases. We also present theoretical bounds for the heuristic method. Finally, we show that the proposed approach out-performs baseline methods in terms of vaccination uptake through experimental evaluation. HelpMum is currently planning a pilot program based on our approach to be deployed in the largest city of Nigeria, which would be the first deployment of an AI-driven vaccination uptake program in the country and hopefully, pave the way for other data-driven programs to improve health outcomes in Nigeria.
Adam Żychowski, Jacek Mańdziuk, Elizabeth Bondi, Aravind Venugopal, Milind Tambe, and Balaraman Ravindran. 7/2022. “Evolutionary Approach to Security Games with Signaling.” International Joint Conference on AI (IJCAI) 7/2022. Publisher's Version
Aditya Mate, Arpita Biswas, Christoph Siebenbrunner, Susobhan Ghosh, and Milind Tambe. 5/2022. “Efficient Algorithms for Finite Horizon and Streaming RestlessMulti-Armed Bandit Problems.” In International Conference on Autonomous Agents and Multiagent Systems (AAMAS). streamingbandits-camready-full.pdf
Han-Ching Ou*, Christoph Siebenbrunner*, Jackson Killian, Meredith B Brooks, David Kempe, Yevgeniy Vorobeychik, and Milind Tambe. 5/2022. “Networked Restless Multi-Armed Bandits for Mobile Interventions.” In 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022). Online. aamas_2022_network_bandit.pdf
Aida Rahmattalabi. 4/26/2022. “Towards Trustworthy and Data-Driven Social Interventions.” PhD Thesis, Computer Science, University of Southern California.Abstract
This thesis examines social interventions conducted to address societal challenges such as homelessness,
substance abuse or suicide. In most of these applications, it is challenging to purposefully
collect data. Hence, we need to rely on social (e.g., social network data) or observational data (e.g.,
administrative data) to guide our decisions. Problematically, these datasets are prone to different
statistical or societal biases. When optimized and evaluated on these data, ostensibly impartial
algorithms may result in disparate impacts across different groups. In addition, these domains
are plagued by limited resources and/or limited data which create a computational challenge with
respect to improving the delivery of these interventions. In this thesis, I investigate the interplay
of fairness and these computational challenges which I present in two parts. In the first part, I
introduce the problem of fairness in social network-based interventions where I propose to use
social network data to enhance interventions that rely on individual’s social connectedness such
as HIV/suicide prevention or community preparedness against natural disasters. I demonstrate
how biases in the social network can manifest as disparate outcomes across groups and describe
my approach to mitigate such unfairness. In the second part, I focus on fairness challenges when
data is observational. Motivated by the homelessness crisis in the U.S., I study the problem of
learning fair resource allocation policies using observational data where I develop a methodology
that handles selection bias in the data. I conclude with a critique on the fairness metrics proposed
in the literature, both causal and observational (statistical), and I present a novel causal view
that addresses the shortcomings of existing approaches. In particular, my findings shed new light
on well-known impossibility results from the fair machine learning literature.
Han-Ching Ou. 3/31/2022. “Sequential Network Planning Problems for Public Health Applications.” PhD Thesis, Computer Science, Harvard University.Abstract

In the past decade, breakthroughs of Artificial Intelligence (AI) in its multiple sub-area have made new applications in various domains possible. One typical yet essential example is the public health domain. There are many challenges for humans in our never-ending battle with diseases. Among them, problems involving harnessing data with network structures and future planning, such as disease control or resource allocation, demand effective solutions significantly. However, unfortunately, some of them are too complicated or unscalable for humans to solve optimally. This thesis tackles these challenging sequential network planning problems for the public health domain by advancing the state-of-the-art to a new level of effectiveness.

In particular, My thesis provides three main contributions to overcome the emerging challenges when applying sequential network planning problems in the public health domain, namely (1) a novel sequential network-based screening/contact tracing framework under uncertainty, (2) a novel sequential network-based mobile interventions framework, (3) theoretical analysis, algorithmic solutions and empirical experiments that shows superior performance compared to previous approaches both theoretically and empirically.

More concretely, the first part of this thesis studies the active screening problem as an emerging application for disease prevention. I introduce a new approach to modeling multi-round network-based screening/contact tracing under uncertainty. Based on the well-known network SIS model in computational epidemiology, which is applicable for many diseases, I propose a model of the multi-agent active screening problem (ACTS) and prove its NP-hardness. I further proposed the REMEDY (REcurrent screening Multi-round Efficient DYnamic agent) algorithm for solving this problem. With a time and solution quality trade-off, REMEDY has two variants, Full- and Fast-REMEDY. It is a Frank-Wolfe-style gradient descent algorithm realized by compacting the representation of belief states to represent uncertainty. As shown in the experiment conducted, Full- and Fast-REMEDY are not only being superior in controlling diseases to all the previous approaches; they are also robust to varying levels of missing
information in the social graph and budget change, thus enabling
the use of our agent to improve the current practice of real-world
screening contexts.

The second part of this thesis focuses on the scalability issue for the time horizon for the ACTS problem. Although Full-REMEDY provides excellent solution qualities, it fails to scale to large time horizons while fully considering the future effect of current interventions. Thus, I proposed a novel reinforcement learning (RL) approach based on Deep Q-Networks (DQN). Due to the nature of the ACTS problem, several challenges that the traditional RL can not handle have emerged, including (1) the combinatorial nature of the problem, (2) the need for sequential planning, and (3) the uncertainties in the infectiousness states of the population. I design several innovative adaptations in my RL approach to address the above challenges. I will introduce why and how these adaptations are made in this part.

For the third part, I introduce a novel sequential network-based mobile interventions framework. It is a restless multi-armed bandits (RMABs) with network pulling effects. In the proposed model, arms are partially recharging and connected through a graph. Pulling one arm also improves the state of neighboring arms, significantly extending the previously studied setting of fully recharging bandits with no network effects. Such network effect may arise due to regular population movements (such as commuting between home and work) for mobile intervention applications. In my thesis, I show that network effects in RMABs induce strong reward coupling that is not accounted for by existing solution methods. I also propose a new solution approach for the networked RMABs by exploiting concavity properties that arise under natural assumptions on the structure of intervention effects. In addition, I show the optimality of such a method in idealized settings and demonstrate that it empirically outperforms state-of-the-art baselines.

Kai Wang, Lily Xu, Andrew Perrault, Michael K. Reiter, and Milind Tambe. 2/22/2022. “Coordinating Followers to Reach Better Equilibria: End-to-End Gradient Descent for Stackelberg Games.” In AAAI Conference on Artificial Intelligence.Abstract
A growing body of work in game theory extends the traditional Stackelberg game to settings with one leader and multiple followers who play a Nash equilibrium. Standard approaches for computing equilibria in these games reformulate the followers' best response as constraints in the leader's optimization problem. These reformulation approaches can sometimes be effective, but make limiting assumptions on the followers' objectives and the equilibrium reached by followers, e.g., uniqueness, optimism, or pessimism. To overcome these limitations, we run gradient descent to update the leader's strategy by differentiating through the equilibrium reached by followers. Our approach generalizes to any stochastic equilibrium selection procedure that chooses from multiple equilibria, where we compute the stochastic gradient by back-propagating through a sampled Nash equilibrium using the solution to a partial differential equation to establish the unbiasedness of the stochastic gradient. Using the unbiased gradient estimate, we implement the gradient-based approach to solve three Stackelberg problems with multiple followers. Our approach consistently outperforms existing baselines to achieve higher utility for the leader.
Elizabeth Bondi, Haipeng Chen, Christopher Golden, Nikhil Behari, and Milind Tambe. 2/20/2022. “Micronutrient Deficiency Prediction via Publicly Available Satellite Data.” In Innovative Applications of Artificial Intelligence (IAAI). mnd_conference_paper_iaai_2.pdf
Susobhan Ghosh, Pradeep Varakantham, Aniket Bhatkhande, Tamanna Ahmad, Anish Andheria, Wenjun Li, Aparna Taneja, Divy Thakkar, and Milind Tambe. 2/15/2022. “Facilitating Human-Wildlife Cohabitation through Conflict Prediction.” Innovative Applications of Artificial Intelligence. iaai_wct.pdf
Haipeng Chen, Susobhan Ghosh, Gregory Fan, Nikhil Behari, Arpita Biswas, Mollie Williams, Nancy E. Oriol, and Milind Tambe. 2/15/2022. “Using Public Data to Predict Demand for Mobile Health Clinics.” In The 34th Annual Conference on Innovative Applications of Artificial Intelligence (IAAI) . 22iaai-family-van.pdf
Aditya Mate*, Lovish Madaan*, Aparna Taneja, Neha Madhiwalla, Shresth Verma, Gargi Singh, Aparna Hegde, Pradeep Varakantham, and Milind Tambe. 2/2022. “Field Study in Deploying Restless Multi-Armed Bandits: Assisting Non-Profits in Improving Maternal and Child Health.” In AAAI Conference on Artificial Intelligence. Vancouver, Canada.Abstract
The widespread availability of cell phones has enabled nonprofits to deliver critical health information to their beneficiaries in a timely manner. This paper describes our work to assist non-profits that employ automated messaging programs to deliver timely preventive care information to beneficiaries (new and expecting mothers) during pregnancy and after delivery. Unfortunately, a key challenge in such information delivery programs is that a significant fraction of beneficiaries drop out of the program. Yet, non-profits often have limited health-worker resources (time) to place crucial service calls for live interaction with beneficiaries to prevent such engagement drops. To assist non-profits in optimizing this limited resource, we developed a Restless Multi-Armed Bandits (RMABs) system. One key technical contribution in this system is a novel clustering method of offline historical data to infer unknown RMAB parameters. Our second major contribution is evaluation of our RMAB system in collaboration with an NGO, via a real-world service quality improvement study. The study compared strategies for optimizing service calls to 23003 participants over a period of 7 weeks to reduce engagement drops. We show that the RMAB group provides statistically significant improvement over other comparison groups, reducing ∼ 30% engagement drops. To the best of our knowledge, this is the first study demonstrating the utility of RMABs in real world public health settings. We are transitioning our RMAB system to the NGO for real-world use.
Elizabeth Bondi, Haipeng Chen, Christopher D. Golden, Nikhil Behari, and Milind Tambe. 2022. “Micronutrient Deficiency Prediction via Publicly Available Satellite Data.” In IAAI. mnd_conference_paper_iaai.pdf
Elizabeth Carolyn Bondi-Kelly. 2022. “Translating AI to Impact: Uncertainty and Human-Agent Interactions in Multi-Agent Systems for Public Health and Conservation”. bondi-kelly_phd_dissertation.pdf