%0 Conference Proceedings %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2024 %T Efficient Public Health Intervention Planning Using Decomposition-Based Decision-Focused Learning %A Sanket Shah %A Arun Suggala %A Tambe, Milind %A Aparna Taneja %X

The declining participation of beneficiaries over time is a key concern in public health programs. A popular strategy for improving retention is to have health workers `intervene' on beneficiaries at risk of dropping out.  However, the availability and time of these health workers are limited resources. As a result, there has been a line of research on optimizing these limited intervention resources using Restless Multi-Armed Bandits (RMABs). The key technical barrier to using this framework in practice lies in estimating the beneficiaries' RMAB parameters from historical data. Recent research on Decision-Focused Learning (DFL) has shown that estimating parameters that maximize beneficiaries' cumulative returns rather than predictive accuracy, is essential to good performance. 

Unfortunately, these gains come at a high computational cost because of the need to solve and evaluate the RMAB in each DFL training step. Consequently, past approaches may not be sustainable for the NGOs that manage such programs in the long run, given that they operate under resource constraints. In this paper, we provide a principled way to exploit the structure of RMABs to speed up DFL by decoupling intervention planning for different beneficiaries. We use real-world data from an Indian NGO, ARMMAN, to show that our approach is up to two orders of magnitude faster than the state-of-the-art approach while also yielding superior model performance. This enables computationally efficient solutions, giving NGOs the ability to deploy such solutions to serve potentially millions of mothers, ultimately advancing progress toward UNSDG 3.1.

%B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %C Auckland, New Zealand %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2024 %T Improving Mobile Maternal and Child Health Care Programs:Collaborative Bandits for Time slot selection %A Soumyabrata Pal %A Tambe, Milind %A Arun Suggala %A Karthikeyan Shanmugam %A Aparna Taneja %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Journal Article %J JMIR Diabetes %D 2024 %T New Approach to Equitable Intervention Planning to Improve Engagement and Outcomes in a Digital Health Program: Simulation Study %A Jackson A. Killian %A Jain, Manish %A Yugang Jia %A Jonathan Amar %A Erich Huang %A Tambe, Milind %B JMIR Diabetes %V 9 %G eng %U https://diabetes.jmir.org/2024/1/e52688/ %0 Conference Paper %B Innovative Applications of Artificial Intelligence (IAAI) %D 2024 %T Improving Health Information Access in the World’s Largest Maternal MobileHealth Program via Bandit Algorithms %A Arshika Lalan %A Paula Rodriguez Diaz %A Panayiotis Danassis %A Amrita Mahale %A Kumar Madhu Sudan %A Aparna Hegde %A Tambe, Milind %A Aparna Taneja %B Innovative Applications of Artificial Intelligence (IAAI) %G eng %0 Conference Proceedings %B AAAI Conference on Artificial Intelligence (AAAI) %D 2024 %T Leaving the Nest: Going Beyond Local Loss Functions for Predict-Then-Optimize %A Sanket Shah %A Bryan Wilder %A Perrault, Andrew %A Tambe, Milind %X

Predict-then-Optimize is a framework for using machine learning to perform decision-making under uncertainty. The central research question it asks is, “How can we use the structure of a decision-making task to tailor ML models for that specific task?” To this end, recent work has proposed learning task- specific loss functions that capture this underlying structure. However, current approaches make restrictive assumptions about the form of these losses and their impact on ML model behavior. These assumptions both lead to approaches with high computational cost, and when they are violated in prac- tice, poor performance. In this paper, we propose solutions to these issues, avoiding the aforementioned assumptions and utilizing the ML model’s features to increase the sample effi- ciency of learning loss functions. We empirically show that our method achieves state-of-the-art results in four domains from the literature, often requiring an order of magnitude fewer samples than comparable methods from past work. Moreover, our approach outperforms the best existing method by nearly 200% when the localness assumption is broken.

%B AAAI Conference on Artificial Intelligence (AAAI) %C Vancouver, BC %G eng %0 Journal Article %J AI magazine (to appear) %D 2023 %T Expanding Impact of Mobile Health Programs: SAHELI for Maternal and Child Care %A Shresth Verma %A Gargi Singh %A Mate, Aditya %A Paritosh Verma %A Sruthi Gorantla %A Neha Madhiwalla %A Aparna Hegde %A Divy Thakkar %A Jain, Manish %A Tambe, Milind %A Aparna Taneja %X Underserved communities face critical health challenges due to lack of access to timely and reliable information. Non- governmental organizations are leveraging the widespread use of cellphones to combat these healthcare challenges and spread preventative awareness. The health workers at these organizations reach out individually to beneficiaries; however such programs still suffer from declining engagement.
We have deployed SAHELI, a system to efficiently utilize the limited availability of health workers for improving maternal and child health in India. SAHELI uses the Restless Multi- armed Bandit (RMAB) framework to identify beneficiaries for outreach. It is the first deployed application for RMABs in public health, and is already in continuous use by our part- ner NGO, ARMMAN. We have already reached ∼ 130K beneficiaries with SAHELI, and are on track to serve 1 mil- lion beneficiaries by the end of 2023. This scale and impact has been achieved through multiple innovations in the RMAB model and its development, in preparation of real world data, and in deployment practices; and through careful considera- tion of responsible AI practices. Specifically, in this paper, we describe our approach to learn from past data to improve the performance of SAHELI’s RMAB model, the real-world chal- lenges faced during deployment and adoption of SAHELI, and the end-to-end pipeline. %B AI magazine (to appear) %G eng %0 Conference Paper %B International Joint Conference on AI (IJCAI) 2023 %D 2023 %T Complex Contagion Influence Maximization: A Reinforcement Learning Approach %A Haipeng Chen %A Bryan Wilder %A Qiu, Wei %A An, Bo %A Eric Rice %A Tambe, Milind %X In influence maximization (IM), the goal is to find a set of seed nodes in a social network that maximizes the influence spread. While most IM problems focus on classical influence cascades (e.g., Independent Cascade and Linear Threshold) which assume indi- vidual influence cascade probability is independent of the number of neighbors, recent studies by soci- ologists show that many influence cascades follow a pattern called complex contagion (CC), where in- fluence cascade probability is much higher when more neighbors are influenced. Nonetheless, there are very limited studies for complex contagion in- fluence maximization (CCIM) problems. This is partly because CC is non-submodular, the solution of which has been an open challenge. In this study, we propose the first reinforcement learning (RL) approach to CCIM. We find that a key obstacle in applying existing RL approaches to CCIM is the reward sparseness issue, which comes from two dis- tinct sources. We then design a new RL algorithm that uses the CCIM problem structure to address the issue. Empirical results show that our approach achieves the state-of-the-art performance on 9 real- world networks. %B International Joint Conference on AI (IJCAI) 2023 %G eng %0 Conference Proceedings %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2023 %T Limited Resource Allocation in a Non-Markovian World: The Case of Maternal and Child Healthcare %A Panayiotis Danassis %A Shresth Verma %A Jackson A. Killian %A Aparna Taneja %A Tambe, Milind %X The success of many healthcare programs depends on participants' adherence. We consider the problem of scheduling interventions in low resource settings (e.g., placing timely support calls from health workers) to increase adherence and/or engagement. Past works have successfully developed several classes of Restless Multi-armed Bandit (RMAB) based solutions for this problem. Nevertheless, all past RMAB approaches assume that the participants' behaviour follows the Markov property. We demonstrate significant deviations from the Markov assumption on real-world data on a maternal health awareness program from our partner NGO, ARMMAN. Moreover, we extend RMABs to continuous state spaces, a previously understudied area. To tackle the generalised non-Markovian RMAB setting we (i) model each participant's trajectory as a time-series, (ii) leverage the power of time-series forecasting models to learn complex patterns and dynamics to predict future states, and (iii) propose the Time-series Arm Ranking Index (TARI) policy, a novel algorithm that selects the RMAB arms that will benefit the most from an intervention, given our future state predictions. We evaluate our approach on both synthetic data, and a secondary analysis on real data from ARMMAN, and demonstrate significant increase in engagement compared to the SOTA, deployed Whittle index solution. This translates to 16.3 hours of additional content listened, 90.8% more engagement drops prevented, and reaching more than twice as many high dropout-risk beneficiaries. %B International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Conference Proceedings %B Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI) %D 2023 %T Find Rhinos without Finding Rhinos: Active Learning with Multimodal Imagery of South African Rhino Habitats %A Lucia Gordon %A Nikhil Behari %A Samuel Collier %A Elizabeth Bondi-Kelly %A Jackson A. Killian %A Catherine Ressijac %A Peter Boucher %A Andrew Davies %A Tambe, Milind %X

Much of Earth's charismatic megafauna is endangered by human activities, particularly the rhino, which is at risk of extinction due to the poaching crisis in Africa. Monitoring rhinos' movement is crucial to their protection but has unfortunately proven difficult because rhinos are elusive. Therefore, instead of tracking rhinos, we propose the novel approach of mapping communal defecation sites, called middens, which give information about rhinos' spatial behavior valuable to anti-poaching, management, and reintroduction efforts. This paper provides the first-ever mapping of rhino midden locations by building classifiers to detect them using remotely sensed thermal, RGB, and LiDAR imagery in passive and active learning settings. As existing active learning methods perform poorly due to the extreme class imbalance in our dataset, we design MultimodAL, an active learning system employing a ranking technique and multimodality to achieve competitive performance with passive learning models with 94% fewer labels. Our methods could therefore save over 76 hours in labeling time when used on a similarly-sized dataset. Unexpectedly, our midden map reveals that rhino middens are not randomly distributed throughout the landscape; rather, they are clustered. Consequently, rangers should be targeted at areas with high midden densities to strengthen anti-poaching efforts, in line with UN Target 15.7.

%B Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI) %P 5977-5985 %G eng %U https://www.ijcai.org/proceedings/2023/0663.pdf %0 Conference Paper %B KDD workshop on Data Science for social good 2023 %D 2023 %T Analyzing and Predicting Low-Listenership Trends in a Large-Scale Mobile Health Program: A Preliminary Investigation %A Arshika Lalan %A Shresth Verma %A Kumar Madhu Sudan %A Amrita Mahale %A Aparna Hegde %A Tambe, Milind %A Aparna Taneja %B KDD workshop on Data Science for social good 2023 %G eng %0 Conference Paper %B International Conference on Machine Learning (ICML 2023) %D 2023 %T Improved Policy Evaluation for Randomized Trials of Algorithmic Resource Allocation %A Mate, Aditya %A Bryan Wilder %A Aparna Taneja %A Tambe, Milind %X

We consider the task of evaluating policies of algorithmic resource allocation through randomized controlled trials (RCTs). Such policies are tasked with optimizing the utilization of limited intervention resources, with the goal of maximizing the benefits derived. Evaluation of such allocation policies through RCTs proves difficult, notwithstanding the scale of the trial, because the individuals’ outcomes are inextricably interlinked through resource constraints controlling the policy decisions. Our key contribution is to present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT. We identify conditions under which such reassignments are permissible and can be leveraged to construct counterfactual trials, whose outcomes can be accurately ascertained, for free. We prove theoretically that such an estimator is more accurate than common estimators based on sample means — we show that it returns an unbiased estimate and simultaneously reduces variance. We demonstrate the value of our approach through empirical experiments on synthetic, semisynthetic as well as real case study data and show improved estimation accuracy across the board.

%B International Conference on Machine Learning (ICML 2023) %C Honolulu, Hawaii %G eng %0 Conference Proceedings %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2023 %T AI-driven Prices for Externalities and Sustainability in Production Markets (Extended Abstract) %A Panayiotis Danassis %A Aris Filos-Ratsikas %A Haipeng Chen %A Tambe, Milind %A Boi Faltings %X Markets do not account for negative externalities; indirect costs that some participants impose on others, such as the cost of over-appropriating a common-pool resource (which diminishes future stock, and thus harvest, for everyone). Quantifying appropriate interventions to market prices has proven to be quite challenging. We propose a practical approach to computing market prices and allocations via a deep reinforcement learning policymaker agent, operating in an environment of other learning agents. Our policymaker allows us to tune the prices with regard to diverse objectives such as sustainability and resource wastefulness, fairness, buyers' and sellers' welfare, etc. As a highlight of our findings, our policymaker is significantly more successful in maintaining resource sustainability, compared to the market equilibrium outcome, in scarce resource environments. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %C London, United Kingdom %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2023 %T Fairness for Workers Who Pull the Arms: An Index Based Policy for Allocation of Restless Bandit Tasks %A Arpita Biswas %A Jackson A. Killian %A Paula Rodriguez Diaz %A Susobhan Ghosh %A Tambe, Milind %X

Motivated by applications such as machine repair, project monitoring, and anti-poaching patrol scheduling, we study intervention planning of stochastic processes under resource constraints. This planning problem has previously been modeled as restless multi-armed bandits (RMAB), where each arm is an interventiondependent Markov Decision Process. However, the existing literature assumes all intervention resources belong to a single uniform pool, limiting their applicability to real-world settings where interventions are carried out by a set of workers, each with their own costs, budgets, and intervention effects. In this work, we consider a novel RMAB setting, called multi-worker restless bandits (MWRMAB) with heterogeneous workers. The goal is to plan an intervention schedule that maximizes the expected reward while satisfying budget constraints on each worker as well as fairness in terms of the load assigned to each worker. Our contributions are two-fold: (1) we provide a multi-worker extension of the Whittle index to tackle heterogeneous costs and per-worker budget and (2) we develop an index-based scheduling policy to achieve fairness. Further, we evaluate our method on various cost structures and show that our method significantly outperforms other baselines in terms of fairness without sacrificing much in reward accumulated.

%B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Thesis %B Harvard University %D 2023 %T Integrating Machine Learning and Optimization with Applications in Public Health and Sustainability %A Kai Wang %B Harvard University %G eng %9 PhD thesis %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2023 %T Restless Multi-Armed Bandits for Maternal and Child Health:Results from Decision-Focused Learning %A Shresth Verma %A Mate, Aditya %A Kai Wang %A Neha Madhiwala %A Aparna Hegde %A Aparna Taneja %A Tambe, Milind %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2023 %T Restless Multi-Armed Bandits for Maternal and Child Health:Results from Decision-Focused Learning %A Shresth Verma %A Mate, Aditya %A Kai Wang %A Neha Madhiwala %A Aparna Hegde %A Aparna Taneja %A Tambe, Milind %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2023 %T Indexability is Not Enough for Whittle: Improved, Near-Optimal Algorithms for Restless Bandits %A Abheek Ghosh %A Dheeraj Nagraj %A Jain, Manish %A Tambe, Milind %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B AAMAS 2023 workshop on Autonomous Agents for Social Good %D 2023 %T Preliminary Results in Low-Listenership Prediction in One of theLargest Mobile Health Programs in theWorld %A Sanket Shah %A Shresth Verma %A Amrita Mahale %A Kumar Madhu Sudan %A Aparna Hegde %A Aparna Taneja %A Tambe, Milind %B AAMAS 2023 workshop on Autonomous Agents for Social Good %G eng %U http://panosd.eu/aasg2023/papers/AASG_2023____Kilkari_Paper.pdf %0 Thesis %B Computer Science %D 2023 %T Actualizing Impact of AI in Public Health:Optimization of Scarce Health InterventionResources in the Real World %A Mate, Aditya %B Computer Science %G eng %9 PhD Thesis %0 Thesis %D 2023 %T Activity Allocation in an Under-Resourced World: Toward Improving Engagement with Public Health Programs via Restless Bandits %A Jackson A. Killian %G eng %9 PhD thesis %0 Journal Article %J AAAI Conference on Artificial Intelligence (AAAI) %D 2023 %T Optimistic Whittle Index Policy: Online Learning for Restless Bandits %A Kai Wang* %A Lily Xu* %A Aparna Taneja %A Tambe, Milind %X Restless multi-armed bandits (RMABs) extend multi-armed bandits to allow for stateful arms, where the state of each arm evolves restlessly with different transitions depending on whether that arm is pulled. Solving RMABs requires information on transition dynamics, which are often unknown upfront. To plan in RMAB settings with unknown transitions, we propose the first online learning algorithm based on the Whittle index policy, using an upper confidence bound (UCB) approach to learn transition dynamics. Specifically, we estimate confidence bounds of the transition probabilities and formulate a bilinear program to compute optimistic Whittle indices using these estimates. Our algorithm, UCWhittle, achieves sublinear $O(H \sqrt{T \log T})$ frequentist regret to solve RMABs with unknown transitions in $T$ episodes with a constant horizon~$H$. Empirically, we demonstrate that UCWhittle leverages the structure of RMABs and the Whittle index policy solution to achieve better performance than existing online learning baselines across three domains, including one constructed from a real-world maternal and childcare dataset. %B AAAI Conference on Artificial Intelligence (AAAI) %G eng %U https://arxiv.org/pdf/2205.15372.pdf %0 Journal Article %J AAAI Conference on Artificial Intelligence (AAAI) %D 2023 %T Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Health %A Kai Wang* %A Shresth Verma* %A Mate, Aditya %A Sanket Shah %A Aparna Taneja %A Neha Madhiwalla %A Aparna Hegde %A Tambe, Milind %X This paper studies restless multi-armed bandit (RMAB) problems with unknown arm transition dynamics but with known correlated arm features. The goal is to learn a model to predict transition dynamics given features, where the Whittle index policy solves the RMAB problems using predicted transitions. However, prior works often learn the model by maximizing the predictive accuracy instead of final RMAB solution quality, causing a mismatch between training and evaluation objectives. To address this shortcoming, we propose a novel approach for decision-focused learning in RMAB that directly trains the predictive model to maximize the Whittle index solution quality. We present three key contributions: (i) we establish differentiability of the Whittle index policy to support decision-focused learning; (ii) we significantly improve the scalability of decision-focused learning approaches in sequential problems, specifically RMAB problems; (iii) we apply our algorithm to a previously collected dataset of maternal and child health to demonstrate its performance. Indeed, our algorithm is the first for decision-focused learning in RMAB that scales to real-world problem sizes. %B AAAI Conference on Artificial Intelligence (AAAI) %G eng %0 Conference Paper %B Innovative Applications of Artificial Intelligence (IAAI) %D 2023 %T Increasing Impact of Mobile Health Programs: SAHELI for Maternal and ChildCare %A Shresth Verma %A Gargi Singh %A Mate, Aditya %A Paritosh Verma %A Sruthi Gorantala %A Neha Madhiwalla %A Aparna Hegde %A Divy Thakkar %A Jain, Manish %A Tambe, Milind %A Aparna Taneja %B Innovative Applications of Artificial Intelligence (IAAI) %G eng %0 Conference Paper %B AAAI Conference on Artificial Intelligence %D 2023 %T Robust Planning over Restless Groups: Engagement Interventions for a Large-Scale Maternal Telehealth Program %A Jackson A. Killian* %A Arpita Biswas* %A Lily Xu* %A Shresth Verma* %A Vineet Nair %A Aparna Taneja %A Aparna Hegde %A Neha Madhiwalla %A Paula Rodriguez Diaz %A Sonja Johnson-Yu %A Tambe, Milind %B AAAI Conference on Artificial Intelligence %G eng %0 Conference Paper %B AAAI Conference on Artificial Intelligence (AAAI) %D 2023 %T Flexible Budgets in Restless Bandits: A Primal-Dual Algorithm for Efficient Budget Allocation %A Paula Rodriguez Diaz %A Jackson A Killian %A Lily Xu %A Arun Sai Suggala %A Aparna Taneja %A Tambe, Milind %B AAAI Conference on Artificial Intelligence (AAAI) %G eng %0 Generic %D 2023 %T Adherence Bandits %A Jackson A. Killian %A Arshika Lalan %A Mate, Aditya %A Jain, Manish %A Aparna Taneja %A Tambe, Milind %B AAAI AI4SG-23 Workshop %G eng %0 Conference Paper %B AAAI 2023 workshop on AI for Social Good (AI4SG) %D 2023 %T Decision-Focused Evaluation: Analyzing Performance of Deployed Restless Multi-Arm Bandits %A Paritosh Verma %A Shresth Verma %A Mate, Aditya %A Aparna Taneja %A Tambe, Milind %B AAAI 2023 workshop on AI for Social Good (AI4SG) %G eng %0 Conference Paper %B AAAI 2023 workshop on AI for Social Good (AI4SG) %D 2023 %T SAHELI for Mobile Health Programs in Maternal and Child Care: FurtherAnalysis %A Shresth Verma %A Gargi Singh %A Mate, Aditya %A Neha Madhiwalla %A Aparna Hegde %A Divy Thakkar %A Jain, Manish %A Tambe, Milind %A Aparna Taneja %B AAAI 2023 workshop on AI for Social Good (AI4SG) %G eng %0 Conference Paper %B Hawaii International Conference on System Sciences %D 2023 %T Accounting for Uncertainty in Deceptive Signaling for Cybersecurity %A E. Cranford %A H. Ou %A Gonzalez, C. %A M. Tambe %A C. Lebiere %B Hawaii International Conference on System Sciences %G eng %0 Journal Article %J AI magazine (to appear) %D 2023 %T Predicting Micronutrient Deficiency with Publicly Available Satellite Data %A Elizabeth Bondi-Kelly %A Haipeng Chen %A Golden, Christopher %A Nikhil Behari %A Tambe, Milind %B AI magazine (to appear) %G eng %0 Conference Paper %B Conference on Decision and Game Theory for Security %D 2023 %T Characterizing and Improving the Robustness of Predict-Then-Optimize Frameworks %A Sonja Johnson-Yu %A Jessie Finocchiaro %A Kai Wang %A Vorobeychik, Yevgeniy %A Sinha, Arunesh %A Aparna Taneja %A Tambe, Milind %B Conference on Decision and Game Theory for Security %I Springer %C Avignon %G eng %0 Conference Paper %B NeurIPS 2022 workshop on Trustworthy and Socially Responsible Machine Learning %D 2022 %T Case Study: Applying Decision Focused Learning inthe RealWorld %A Shresth Verma %A Mate, Aditya %A Kai Wang %A Aparna Taneja %A Tambe, Milind %B NeurIPS 2022 workshop on Trustworthy and Socially Responsible Machine Learning %G eng %0 Conference Paper %B Conference on Neural Information Processing Systems (NeurIPS) %D 2022 %T Decision-Focused Learning without Differentiable Optimization: Learning Locally Optimized Decision Losses %A Sanket Shah %A Kai Wang %A Bryan Wilder %A Perrault, Andrew %A Tambe, Milind %X Decision-Focused Learning (DFL) is a paradigm for tailoring a predictive model to a downstream optimization task that uses its predictions in order to perform better on that specific task. The main technical challenge associated with DFL is that it requires being able to differentiate through the optimization problem, which is difficult due to discontinuous solutions and other challenges. Past work has largely gotten around this this issue by handcrafting task-specific surrogates to the original optimization problem that provide informative gradients when differentiated through. However, the need to handcraft surrogates for each new task limits the usability of DFL. In addition, there are often no guarantees about the convexity of the resulting surrogates and, as a result, training a predictive model using them can lead to inferior local optima. In this paper, we do away with surrogates altogether and instead learn loss functions that capture task-specific information. To the best of our knowledge, ours is the first approach that entirely replaces the optimization component of decision-focused learning with a loss that is automatically learned. Our approach (a) only requires access to a black-box oracle that can solve the optimization problem and is thus generalizable, and (b) can be convex by construction and so can be easily optimized over. We evaluate our approach on three resource allocation problems from the literature and find that our approach outperforms learning without taking into account task-structure in all three domains, and even hand-crafted surrogates from the literature. %B Conference on Neural Information Processing Systems (NeurIPS) %C New Orleans %V 36 %G eng %0 Journal Article %J INFORMS Doing Good with Good OR %D 2022 %T Optimization and Planning of Limited Resources for Assisting Non-Profits in Improving Maternal and Child Health %A Mate, Aditya %X

The maternal mortality rate in India is appalling, largely fueled by lack of access to preventive care information, especially in low resource households. We partner with non-profit, ARMMAN, that aims to use mobile health technologies to improve the maternal and child health outcomes.

 

To assisst ARMMAN and such non-profits, we develop a Restless Multi-Armed Bandit (RMAB) based solution to help improve accessibility of critical health information, via increased engagement of beneficiaries with their program. We address fundamental research challenges that crop up along the way and present technical advances in RMABs and Planning Algorithms for Limited-Resource Allocation. Transcending the boundaries of typical laboratory research, we also deploy our models in the field, and present results from a first-of-its-kind pilot test employing and evaluating RMABs in a real-world public health application.

%B INFORMS Doing Good with Good OR %G eng %0 Conference Paper %B Conference on Uncertainty in Artificial Intelligence (UAI) %D 2022 %T Solving Structured Hierarchical Games Using Differential Backward Induction %A Zun Li %A Feiran Jia %A Mate, Aditya %A Shahin Jabbari %A Mithun Chakraborty %A Tambe, Milind %A Vorobeychik, Yevgeniy %X From large-scale organizations to decentralized political systems, hierarchical strategic decision making is commonplace. We introduce a novel class of structured hierarchical games (SHGs) that formally capture such hierarchical strategic interactions. In an SHG, each player is a node in a tree, and strategic choices of players are sequenced from root to leaves, with root moving first, followed by its children, then followed by their children, and so on until the leaves. A player’s utility in an SHG depends on its own decision, and on the choices of its parent and all the tree leaves. SHGs thus generalize simultaneous-move games, as well as Stackelberg games with many followers. We leverage the structure of both the sequence of player moves as well as payoff dependence to develop a gradientbased back propagation-style algorithm, which we call Differential Backward Induction (DBI), for approximating equilibria of SHGs. We provide a sufficient condition for convergence of DBI and demonstrate its efficacy in finding approximate equilibrium solutions to several SHG models of hierarchical policy-making problems. %B Conference on Uncertainty in Artificial Intelligence (UAI) %C Eindhoven, Netherlands %G eng %0 Conference Paper %B Uncertainty in Artificial Intelligence (UAI) %D 2022 %T Restless and Uncertain: Robust Policies for Restless Bandits via Deep Multi-Agent Reinforcement Learning %A Jackson A. Killian %A Lily Xu %A Arpita Biswas %A Tambe, Milind %X We introduce robustness in restless multi-armed bandits (RMABs), a popular model for constrained resource allocation among independent stochastic processes (arms). Nearly all RMAB techniques assume stochastic dynamics are precisely known. However, in many real-world settings, dynamics are estimated with significant uncertainty, e.g., via historical data, which can lead to bad outcomes if ignored. To address this, we develop an algorithm to compute minimax regret--robust policies for RMABs. Our approach uses a double oracle framework (oracles for agent and nature), which is often used for single-process robust planning but requires significant new techniques to accommodate the combinatorial nature of RMABs. Specifically, we design a deep reinforcement learning (RL) algorithm, DDLPO, which tackles the combinatorial challenge by learning an auxiliary "λ-network" in tandem with policy networks per arm, greatly reducing sample complexity, with guarantees on convergence. DDLPO, of general interest, implements our reward-maximizing agent oracle. We then tackle the challenging regret-maximizing nature oracle, a non-stationary RL challenge, by formulating it as a multi-agent RL problem between a policy optimizer and adversarial nature. This formulation is of general interest---we solve it for RMABs by creating a multi-agent extension of DDLPO with a shared critic. We show our approaches work well in three experimental domains. %B Uncertainty in Artificial Intelligence (UAI) %G eng %0 Conference Proceedings %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2022 %T Ranked Prioritization of Groups in Combinatorial Bandit Allocation %A Lily Xu %A Arpita Biswas %A Fang, Fei %A Tambe, Milind %X Preventing poaching through ranger patrols is critical for protecting endangered wildlife. Combinatorial bandits have been used to allocate limited patrol resources, but existing approaches overlook the fact that each location is home to multiple species in varying proportions, so a patrol benefits each species to differing degrees. When some species are more vulnerable, we ought to offer more protection to these animals; unfortunately, existing combinatorial bandit approaches do not offer a way to prioritize important species. To bridge this gap, (1) We propose a novel combinatorial bandit objective that trades off between reward maximization and also accounts for prioritization over species, which we call ranked prioritization. We show this objective can be expressed as a weighted linear sum of Lipschitz-continuous reward functions. (2) We provide RankedCUCB, an algorithm to select combinatorial actions that optimize our prioritization-based objective, and prove that it achieves asymptotic no-regret. (3) We demonstrate empirically that RankedCUCB leads to up to 38% improvement in outcomes for endangered species using real-world wildlife conservation data. Along with adapting to other challenges such as preventing illegal logging and overfishing, our no-regret algorithm addresses the general combinatorial bandit problem with a weighted linear objective. %B International Joint Conference on Artificial Intelligence (IJCAI) %C Vienna, Austria %V 31 %G eng %U https://arxiv.org/pdf/2205.05659.pdf %0 Conference Paper %B Internation Conference on Cognitive Modeling (ICCM) %D 2022 %T Combining Machine Learning and Cognitive Models for Adaptive Phishing Training %A E. Cranford %A S. Jabbari %A H-C. Ou %A M. Tambe %A Gonzalez, C. %A C. Lebiere %X Organizations typically use simulation campaigns to train employees to detect phishing emails but are non-personalized and fail to account for human experiential learning and adaptivity. We propose a method to improve the effectiveness of training by combining cognitive modeling with machine learning methods. We frame the problem as one of scheduling and use the restless multi-armed bandit (RMAB) framework to select which users to target for intervention at each trial, while using a cognitive model of phishing susceptibility to inform the parameters of the RMAB. We compare the effectiveness of the RMAB solution to two purely cognitive approaches in a series of simulation studies using the cognitive model as simulated participants. Both approaches show improvement compared to random selection and we highlight the pros and cons of each approach. We discuss the implications of these findings and future research that aims to combine the benefits of both methods for a more effective solution. %B Internation Conference on Cognitive Modeling (ICCM) %G eng %0 Conference Paper %B International Joint Conference on AI (IJCAI) 2022 %D 2022 %T ADVISER: AI-Driven Vaccination Intervention Optimiser for Increasing Vaccine Uptake in Nigeria %A Vineet Nair %A Kritika Prakash %A Michael Wilbur %A Aparna Taneja %A Corrine Namblard %A Oyindamola Adeyemo %A Abhishek Dubey %A Abiodun Adereni %A Tambe, Milind %A Mukhopadhyay, Ayan %X More than 5 million children under five years die from largely preventable or treatable medical conditions every year, with an overwhelmingly large proportion of deaths occurring in under-developed countries with low vaccination uptake. One of the United Nations’ sustainable development goals (SDG 3) aims to end preventable deaths of new-borns and children under five years of age. We focus on Nigeria, where the rate of infant mortal-ity is appalling. We collaborate with HelpMum, a large non-profit organization in Nigeria to design and optimize the allocation of heterogeneous health interventions under uncertainty to increase vaccination uptake, the first such collaboration in Nigeria. Our framework, ADVISER: AI-Driven Vaccination Intervention Optimiser, is based on an integer linear program that seeks to maximize the cumulative probability of successful vaccination. Our optimization formulation is intractable in practice. We present a heuristic approach that enables us to solve the problem for real-world use-cases. We also present theoretical bounds for the heuristic method. Finally, we show that the proposed approach out-performs baseline methods in terms of vaccination uptake through experimental evaluation. HelpMum is currently planning a pilot program based on our approach to be deployed in the largest city of Nigeria, which would be the first deployment of an AI-driven vaccination uptake program in the country and hopefully, pave the way for other data-driven programs to improve health outcomes in Nigeria. %B International Joint Conference on AI (IJCAI) 2022 %G eng %0 Conference Proceedings %B International Joint Conference on AI (IJCAI) 2022 %D 2022 %T Evolutionary Approach to Security Games with Signaling %A Adam Żychowski %A Jacek Mańdziuk %A Bondi, Elizabeth %A Aravind Venugopal %A Tambe, Milind %A Balaraman Ravindran %B International Joint Conference on AI (IJCAI) 2022 %G eng %U https://arxiv.org/abs/2204.14173 %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2022 %T Efficient Algorithms for Finite Horizon and Streaming RestlessMulti-Armed Bandit Problems %A Mate, Aditya %A Arpita Biswas %A Christoph Siebenbrunner %A Susobhan Ghosh %A Tambe, Milind %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022) %D 2022 %T Networked Restless Multi-Armed Bandits for Mobile Interventions %A Han-Ching Ou* %A Christoph Siebenbrunner* %A Jackson Killian %A Meredith B Brooks %A David Kempe %A Vorobeychik, Yevgeniy %A Tambe, Milind %B 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022) %C Online %G eng %0 Thesis %B PhD Thesis, Computer Science, University of Southern California %D 2022 %T Towards Trustworthy and Data-Driven Social Interventions %A Rahmattalabi, Aida %X This thesis examines social interventions conducted to address societal challenges such as homelessness,
substance abuse or suicide. In most of these applications, it is challenging to purposefully
collect data. Hence, we need to rely on social (e.g., social network data) or observational data (e.g.,
administrative data) to guide our decisions. Problematically, these datasets are prone to different
statistical or societal biases. When optimized and evaluated on these data, ostensibly impartial
algorithms may result in disparate impacts across different groups. In addition, these domains
are plagued by limited resources and/or limited data which create a computational challenge with
respect to improving the delivery of these interventions. In this thesis, I investigate the interplay
of fairness and these computational challenges which I present in two parts. In the first part, I
introduce the problem of fairness in social network-based interventions where I propose to use
social network data to enhance interventions that rely on individual’s social connectedness such
as HIV/suicide prevention or community preparedness against natural disasters. I demonstrate
how biases in the social network can manifest as disparate outcomes across groups and describe
my approach to mitigate such unfairness. In the second part, I focus on fairness challenges when
data is observational. Motivated by the homelessness crisis in the U.S., I study the problem of
learning fair resource allocation policies using observational data where I develop a methodology
that handles selection bias in the data. I conclude with a critique on the fairness metrics proposed
in the literature, both causal and observational (statistical), and I present a novel causal view
that addresses the shortcomings of existing approaches. In particular, my findings shed new light
on well-known impossibility results from the fair machine learning literature. %B PhD Thesis, Computer Science, University of Southern California %G eng %9 PhD thesis %0 Thesis %B PhD Thesis, Computer Science, Harvard University %D 2022 %T Sequential Network Planning Problems for Public Health Applications %A Ou, Han-Ching %X

In the past decade, breakthroughs of Artificial Intelligence (AI) in its multiple sub-area have made new applications in various domains possible. One typical yet essential example is the public health domain. There are many challenges for humans in our never-ending battle with diseases. Among them, problems involving harnessing data with network structures and future planning, such as disease control or resource allocation, demand effective solutions significantly. However, unfortunately, some of them are too complicated or unscalable for humans to solve optimally. This thesis tackles these challenging sequential network planning problems for the public health domain by advancing the state-of-the-art to a new level of effectiveness.

In particular, My thesis provides three main contributions to overcome the emerging challenges when applying sequential network planning problems in the public health domain, namely (1) a novel sequential network-based screening/contact tracing framework under uncertainty, (2) a novel sequential network-based mobile interventions framework, (3) theoretical analysis, algorithmic solutions and empirical experiments that shows superior performance compared to previous approaches both theoretically and empirically.

More concretely, the first part of this thesis studies the active screening problem as an emerging application for disease prevention. I introduce a new approach to modeling multi-round network-based screening/contact tracing under uncertainty. Based on the well-known network SIS model in computational epidemiology, which is applicable for many diseases, I propose a model of the multi-agent active screening problem (ACTS) and prove its NP-hardness. I further proposed the REMEDY (REcurrent screening Multi-round Efficient DYnamic agent) algorithm for solving this problem. With a time and solution quality trade-off, REMEDY has two variants, Full- and Fast-REMEDY. It is a Frank-Wolfe-style gradient descent algorithm realized by compacting the representation of belief states to represent uncertainty. As shown in the experiment conducted, Full- and Fast-REMEDY are not only being superior in controlling diseases to all the previous approaches; they are also robust to varying levels of missing
information in the social graph and budget change, thus enabling
the use of our agent to improve the current practice of real-world
screening contexts.

The second part of this thesis focuses on the scalability issue for the time horizon for the ACTS problem. Although Full-REMEDY provides excellent solution qualities, it fails to scale to large time horizons while fully considering the future effect of current interventions. Thus, I proposed a novel reinforcement learning (RL) approach based on Deep Q-Networks (DQN). Due to the nature of the ACTS problem, several challenges that the traditional RL can not handle have emerged, including (1) the combinatorial nature of the problem, (2) the need for sequential planning, and (3) the uncertainties in the infectiousness states of the population. I design several innovative adaptations in my RL approach to address the above challenges. I will introduce why and how these adaptations are made in this part.

For the third part, I introduce a novel sequential network-based mobile interventions framework. It is a restless multi-armed bandits (RMABs) with network pulling effects. In the proposed model, arms are partially recharging and connected through a graph. Pulling one arm also improves the state of neighboring arms, significantly extending the previously studied setting of fully recharging bandits with no network effects. Such network effect may arise due to regular population movements (such as commuting between home and work) for mobile intervention applications. In my thesis, I show that network effects in RMABs induce strong reward coupling that is not accounted for by existing solution methods. I also propose a new solution approach for the networked RMABs by exploiting concavity properties that arise under natural assumptions on the structure of intervention effects. In addition, I show the optimality of such a method in idealized settings and demonstrate that it empirically outperforms state-of-the-art baselines.

%B PhD Thesis, Computer Science, Harvard University %G eng %9 PhD thesis %0 Conference Paper %B AAAI Conference on Artificial Intelligence %D 2022 %T Coordinating Followers to Reach Better Equilibria: End-to-End Gradient Descent for Stackelberg Games %A Kai Wang %A Lily Xu %A Perrault, Andrew %A Michael K. Reiter %A Tambe, Milind %X A growing body of work in game theory extends the traditional Stackelberg game to settings with one leader and multiple followers who play a Nash equilibrium. Standard approaches for computing equilibria in these games reformulate the followers' best response as constraints in the leader's optimization problem. These reformulation approaches can sometimes be effective, but make limiting assumptions on the followers' objectives and the equilibrium reached by followers, e.g., uniqueness, optimism, or pessimism. To overcome these limitations, we run gradient descent to update the leader's strategy by differentiating through the equilibrium reached by followers. Our approach generalizes to any stochastic equilibrium selection procedure that chooses from multiple equilibria, where we compute the stochastic gradient by back-propagating through a sampled Nash equilibrium using the solution to a partial differential equation to establish the unbiasedness of the stochastic gradient. Using the unbiased gradient estimate, we implement the gradient-based approach to solve three Stackelberg problems with multiple followers. Our approach consistently outperforms existing baselines to achieve higher utility for the leader. %B AAAI Conference on Artificial Intelligence %G eng %0 Conference Paper %B Innovative Applications of Artificial Intelligence (IAAI) %D 2022 %T Micronutrient Deficiency Prediction via Publicly Available Satellite Data %A Bondi, Elizabeth %A Haipeng Chen %A Golden, Christopher %A Nikhil Behari %A Tambe, Milind %B Innovative Applications of Artificial Intelligence (IAAI) %G eng %0 Journal Article %J Innovative Applications of Artificial Intelligence %D 2022 %T Facilitating Human-Wildlife Cohabitation through Conflict Prediction %A Susobhan Ghosh %A Varakantham, Pradeep %A Aniket Bhatkhande %A Tamanna Ahmad %A Anish Andheria %A Wenjun Li %A Aparna Taneja %A Divy Thakkar %A Tambe, Milind %B Innovative Applications of Artificial Intelligence %G eng %0 Conference Paper %B The 34th Annual Conference on Innovative Applications of Artificial Intelligence (IAAI) %D 2022 %T Using Public Data to Predict Demand for Mobile Health Clinics %A Haipeng Chen %A Susobhan Ghosh %A Gregory Fan %A Nikhil Behari %A Arpita Biswas %A Mollie Williams %A Nancy E. Oriol %A Tambe, Milind %B The 34th Annual Conference on Innovative Applications of Artificial Intelligence (IAAI) %G eng %0 Conference Paper %B AAAI Conference on Artificial Intelligence %D 2022 %T Field Study in Deploying Restless Multi-Armed Bandits: Assisting Non-Profits in Improving Maternal and Child Health %A Aditya Mate* %A Lovish Madaan* %A Aparna Taneja %A Neha Madhiwalla %A Shresth Verma %A Gargi Singh %A Aparna Hegde %A Varakantham, Pradeep %A Tambe, Milind %X The widespread availability of cell phones has enabled nonprofits to deliver critical health information to their beneficiaries in a timely manner. This paper describes our work to assist non-profits that employ automated messaging programs to deliver timely preventive care information to beneficiaries (new and expecting mothers) during pregnancy and after delivery. Unfortunately, a key challenge in such information delivery programs is that a significant fraction of beneficiaries drop out of the program. Yet, non-profits often have limited health-worker resources (time) to place crucial service calls for live interaction with beneficiaries to prevent such engagement drops. To assist non-profits in optimizing this limited resource, we developed a Restless Multi-Armed Bandits (RMABs) system. One key technical contribution in this system is a novel clustering method of offline historical data to infer unknown RMAB parameters. Our second major contribution is evaluation of our RMAB system in collaboration with an NGO, via a real-world service quality improvement study. The study compared strategies for optimizing service calls to 23003 participants over a period of 7 weeks to reduce engagement drops. We show that the RMAB group provides statistically significant improvement over other comparison groups, reducing ∼ 30% engagement drops. To the best of our knowledge, this is the first study demonstrating the utility of RMABs in real world public health settings. We are transitioning our RMAB system to the NGO for real-world use. %B AAAI Conference on Artificial Intelligence %C Vancouver, Canada %G eng %0 Conference Paper %B IAAI %D 2022 %T Micronutrient Deficiency Prediction via Publicly Available Satellite Data %A Bondi, Elizabeth %A Haipeng Chen %A Golden, Christopher D. %A Nikhil Behari %A Tambe, Milind %B IAAI %G eng %0 Thesis %D 2022 %T Translating AI to Impact: Uncertainty and Human-Agent Interactions in Multi-Agent Systems for Public Health and Conservation %A Elizabeth Carolyn Bondi-Kelly %G eng %9 PhD thesis %0 Conference Paper %B In MLPH: Machine Learning in Public Health NeurIPS 2021 Workshop %D 2021 %T Demand prediction of mobile clinics using public data %A Haipeng Chen %A Susobhan Ghosh %A Gregory Fan %A Nikhil Behari %A Arpita Biswas %A Mollie Williams %A Nancy E. Oriol %A Tambe, Milind %B In MLPH: Machine Learning in Public Health NeurIPS 2021 Workshop %G eng %0 Conference Paper %B NeurIPS 2021 (spotlight) %D 2021 %T Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning %A Kai Wang %A Sanket Shah %A Haipeng Chen %A Perrault, Andrew %A Doshi-Velez, Finale %A Tambe, Milind %X In the predict-then-optimize framework, the objective is to train a predictive model, mapping from environment features to parameters of an optimization problem, which maximizes decision quality when the optimization is subsequently solved. Recent work on decision-focused learning shows that embedding the optimization problem in the training pipeline can improve decision quality and help generalize better to unseen tasks compared to relying on an intermediate loss function for evaluating prediction quality. We study the predict-then-optimize framework in the context of \emph{sequential} decision problems (formulated as MDPs) that are solved via reinforcement learning. In particular, we are given environment features and a set of trajectories from training MDPs, which we use to train a predictive model that generalizes to unseen test MDPs without trajectories. Two significant computational challenges arise in applying decision-focused learning to MDPs: (i) large state and action spaces make it infeasible for existing techniques to differentiate through MDP problems, and (ii) the high-dimensional policy space, as parameterized by a neural network, makes differentiating through a policy expensive. We resolve the first challenge by sampling provably unbiased derivatives to approximate and differentiate through optimality conditions, and the second challenge by using a low-rank approximation to the high-dimensional sample-based derivatives. We implement both Bellman--based and policy gradient--based decision-focused learning on three different MDP problems with missing parameters, and show that decision-focused learning performs better in generalization to unseen tasks. %B NeurIPS 2021 (spotlight) %G eng %0 Conference Paper %B MLPH: Machine Learning in Public Health NeurIPS 2021 Workshop %D 2021 %T Restless Bandits in the Field: Real-World Study for Improving Maternal and Child Health Outcomes %A Aditya Mate* %A Lovish Madaan* %A Aparna Taneja %A Neha Madhiwalla %A Shresth Verma %A Gargi Singh %A Aparna Hegde %A Varakantham, Pradeep %A Tambe, Milind %X

The widespread availability of cell phones has enabled non-profits to deliver critical health information to their beneficiaries in a timely manner. This paper describes our work in assisting non-profits employing automated messaging programs to deliver timely preventive care information to new and expecting mothers during pregnancy and after delivery. Unfortunately, a key challenge in such information delivery programs is that a significant fraction of beneficiaries tend to drop out. Yet, non-profits often have limited health-worker resources (time) to place crucial service calls for live interaction with beneficiaries to prevent such engagement drops. To assist non-profits in optimizing this limited resource, we developed a Restless Multi-Armed Bandits (RMABs) system. One key technical contribution in this system is a novel clustering method of offline historical data to infer unknown RMAB parameters. Our second major contribution is evaluation of our RMAB system in collaboration with an NGO, via a real-world service quality improvement study. The study compared strategies for optimizing service calls to 23003 participants over a period of 7 weeks to reduce engagement drops. We show that the RMAB group provides statistically significant improvement over other comparison groups, reducing 30% engagement drops. To the best of our knowledge, this is the first study demonstrating the utility of RMABs in real world public health settings. We are transitioning our system to the NGO for real-world use.

%B MLPH: Machine Learning in Public Health NeurIPS 2021 Workshop %G eng %0 Journal Article %J INFORMS Doing Good with Good OR %D 2021 %T Learning, Optimization, and Planning Under Uncertainty for Wildlife Conservation %A Lily Xu %X

Wildlife poaching fuels the multi-billion dollar illegal wildlife trade and pushes countless species to the brink of extinction. To aid rangers in preventing poaching in protected areas around the world, we have developed PAWS, the Protection Assistant for Wildlife Security. We present technical advances in multi-armed bandits and robust sequential decision-making using reinforcement learning, with research questions that emerged from on-the-ground challenges. We also discuss bridging the gap between research and practice, presenting results from field deployment in Cambodia and large-scale deployment through integration with SMART, the leading software system for protected area management used by over 1,000 wildlife parks worldwide.

%B INFORMS Doing Good with Good OR %G eng %0 Conference Paper %B ACM conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO '21) %D 2021 %T Measuring Data Collection Diligence for Community Healthcare %A Ramesha Karunasena %A Mohammad Sarparajul Ambiya %A Sinha, Arunesh %A Ruchit Nagar %A Saachi Dalal %A Divy Thakkar %A Dhyanesh Narayanan %A Tambe, Milind %B ACM conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO '21) %G eng %0 Conference Proceedings %B 30th International Joint Conference on Artificial Intelligence (IJCAI) %D 2021 %T Learning and Planning Under Uncertainty for Green Security %A Lily Xu %B 30th International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2021 %T Learn to Intervene: An Adaptive Learning Policy for Restless Bandits in Application to Preventive Healthcare %A Arpita Biswas %A Gaurav Aggarwal %A Varakantham, Pradeep %A Tambe, Milind %X In many public health settings, it is important for patients to adhere to health programs, such as taking medications and periodic health checks. Unfortunately, beneficiaries may gradually disengage from such programs, which is detrimental to their health. A concrete example of gradual disengagement has been observed by an organization that carries out a free automated call-based program for spreading preventive care information among pregnant women. Many women stop picking up calls after being enrolled for a few months. To avoid such disengagements, it is important to provide timely interventions. Such interventions are often expensive, and can be provided to only a small fraction of the beneficiaries. We model this scenario as a restless multi-armed bandit (RMAB) problem, where each beneficiary is assumed to transition from one state to another depending on the intervention. Moreover, since the transition probabilities are unknown a priori, we propose a Whittle index based Q-Learning mechanism and show that it converges to the optimal solution. Our method improves over existing learning-based methods for RMABs on multiple benchmarks from literature and also on the maternal healthcare dataset. %B International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Journal Article %J Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining %D 2021 %T Q-Learning Lagrange Policies for Multi-Action Restless Bandits %A Jackson A Killian %A Arpita Biswas %A Sanket Shah %A Tambe, Milind %B Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining %G eng %0 Conference Proceedings %B Conference on Uncertainty in Artificial Intelligence (UAI) %D 2021 %T Robust Reinforcement Learning Under Minimax Regret for Green Security %A Lily Xu %A Perrault, Andrew %A Fang, Fei %A Haipeng Chen %A Tambe, Milind %X Green security domains feature defenders who plan patrols in the face of uncertainty about the adversarial behavior of poachers, illegal loggers, and illegal fishers. Importantly, the deterrence effect of patrols on adversaries' future behavior makes patrol planning a sequential decision-making problem. Therefore, we focus on robust sequential patrol planning for green security following the minimax regret criterion, which has not been considered in the literature. We formulate the problem as a game between the defender and nature who controls the parameter values of the adversarial behavior and design an algorithm MIRROR to find a robust policy. MIRROR uses two reinforcement learning-based oracles and solves a restricted game considering limited defender strategies and parameter values. We evaluate MIRROR on real-world poaching data. %B Conference on Uncertainty in Artificial Intelligence (UAI) %G eng %0 Conference Paper %B Conference on Uncertainty in Artificial Intelligence %D 2021 %T Contingency-Aware Influence Maximization: A Reinforcement Learning Approach %A Haipeng Chen %A Qiu, Wei %A Ou, Han-Ching %A An, Bo %A Tambe, Milind %B Conference on Uncertainty in Artificial Intelligence %G eng %0 Thesis %B PhD Thesis, Computer Science, Harvard University %D 2021 %T AI for Population Health: Melding Data and Algorithms on Networks %A Bryan Wilder %B PhD Thesis, Computer Science, Harvard University %G eng %0 Journal Article %J Cognitive Science %D 2021 %T Towards a Cognitive Theory of Cyber Deception %A Cranford, Edward %A Gonzalez, Cleotilde %A Palvi Aggarwal %A Tambe, Milind %A Cooney, Sarah %A Lebiere, Christian %B Cognitive Science %G eng %U https://onlinelibrary.wiley.com/doi/10.1111/cogs.13013 %0 Conference Proceedings %B Proceedings of the 38th International Conference on Machine Learning %D 2021 %T Towards the Unification and Robustness of Perturbation and Gradient Based Explanations %A Sushant Agarwal %A Shahin Jabbari %A Chirag Agarwal %A Sohini Upadhyay %A Zhiwei Steven Wu %A Himabindu Lakkaraju %X As machine learning black boxes are increasingly being deployed in critical domains such as health- care and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a post hoc manner. In this work, we analyze two popular post hoc interpreta- tion techniques: SmoothGrad which is a gradient based method, and a variant of LIME which is a perturbation based method. More specifically, we derive explicit closed form expressions for the ex- planations output by these two methods and show that they both converge to the same explanation in expectation, i.e., when the number of perturbed samples used by these methods is large. We then leverage this connection to establish other desir- able properties, such as robustness, for these tech- niques. We also derive finite sample complexity bounds for the number of perturbations required for these methods to converge to their expected explanation. Finally, we empirically validate our theory using extensive experimentation on both synthetic and real world datasets. %B Proceedings of the 38th International Conference on Machine Learning %C Virtual Only %G eng %U https://arxiv.org/abs/2102.10618 %0 Generic %D 2021 %T Preliminary Detection of Rhino Middens for Understanding Rhino Behavior %A Bondi, Elizabeth %A Catherine Ressijac %A Peter Boucher %B CVPR 2021 Workshop on Computer Vision for Animal Behavior Tracking and Modeling %G eng %0 Journal Article %J to appear in the Journal of Acquired Immune Deficiency Syndrome (JAIDS) %D 2021 %T A Peer-Led, Artificial Intelligence-Augmented Social Network Intervention to Prevent HIV among Youth Experiencing Homelessness %A Eric Rice %A Bryan Wilder %A Onasch-Vera, Laura %A Graham Diguiseppi %A Petering, Robin %A Chyna Hill %A Amulya Yadav %A Sung-Jae Lee %A Tambe, Milind %X Youth experiencing homelessness (YEH) are at elevated risk for HIV/AIDS and disproportionately identify as racial, ethnic, sexual, and gender minorities. We developed a new peer change agent (PCA) HIV prevention intervention with three arms: (1) an arm using an Artificial Intelligence (AI) planning algorithm to select PCAs; (2) a popularity arm, the standard PCA approach, operationalized as highest degree centrality (DC); and (3) an observation-only comparison group. %B to appear in the Journal of Acquired Immune Deficiency Syndrome (JAIDS) %G eng %0 Conference Paper %B AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES) %D 2021 %T Ensuring Fairness under Prior Probability Shifts %A Arpita Biswas %A Suvam Mukherjee %B AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES) %I AAAI/ACM %8 5/19/2021 %G eng %U https://arxiv.org/abs/2005.03474 %0 Conference Paper %B AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES) %D 2021 %T Envisioning Communities: A Participatory Approach Towards AI for Social Good %A Elizabeth Bondi* %A Lily Xu* %A Diana Acosta-Navas %A Jackson A. Killian %X Research in artificial intelligence (AI) for social good presupposes some definition of social good, but potential definitions have been seldom suggested and never agreed upon. The normative question of what AI for social good research should be "for" is not thoughtfully elaborated, or is frequently addressed with a utilitarian outlook that prioritizes the needs of the majority over those who have been historically marginalized, brushing aside realities of injustice and inequity. We argue that AI for social good ought to be assessed by the communities that the AI system will impact, using as a guide the capabilities approach, a framework to measure the ability of different policies to improve human welfare equity. Furthermore, we lay out how AI research has the potential to catalyze social progress by expanding and equalizing capabilities. We show how the capabilities approach aligns with a participatory approach for the design and implementation of AI for social good research in a framework we introduce called PACT, in which community members affected should be brought in as partners and their input prioritized throughout the project. We conclude by providing an incomplete set of guiding questions for carrying out such participatory AI research in a way that elicits and respects a community's own definition of social good. %B AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES) %G eng %U https://arxiv.org/abs/2105.01774 %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2021 %T Learning Index Policies for Restless Bandits with Application to Maternal Healthcare (Extended abstract) %A Arpita Biswas %A Gaurav Aggarwal %A Varakantham, Pradeep %A Tambe, Milind %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2021 %T Risk-Aware Interventions in Public Health: Planning with Restless Multi-Armed Bandits %A Mate, Aditya %A Perrault, Andrew %A Tambe, Milind %X Community Health Workers (CHWs) form an important component of health-care systems globally, especially in low-resource settings. CHWs are often tasked with monitoring the health of and intervening on their patient cohort. Previous work has developed several classes of Restless Multi-Armed Bandits (RMABs) that are computationally tractable and indexable, a condition that guarantees asymptotic optimality, for solving such health monitoring and intervention problems (HMIPs).
However, existing solutions to HMIPs fail to account for risk-sensitivity considerations of CHWs in the planning stage and may run the danger of ignoring some patients completely because they are deemed less valuable to intervene on.
Additionally, these also rely on patients reporting their state of adherence accurately when intervened upon. Towards tackling these issues, our contributions in this paper are as follows: 
(1) We develop an RMAB solution to HMIPs that allows for reward functions that are monotone increasing, rather than linear, in the belief state and also supports a wider class of observations.
(2) We prove theoretical guarantees on the asymptotic optimality of our algorithm for any arbitrary reward function. Additionally, we show that for the specific reward function considered in previous work, our theoretical conditions are stronger than the state-of-the-art guarantees.
(3) We show the applicability of these new results for addressing the three issues pertaining to: risk-sensitive planning, equitable allocation and reliance on perfect observations as highlighted above. We evaluate these techniques on both simulated as well as real data from a prevalent CHW task of monitoring adherence of tuberculosis patients to their prescribed medication in Mumbai, India and show improved performance over the state-of-the-art. The simulation code is available at: https://github.com/AdityaMate/risk-aware-bandits. %B 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %C London, UK %G eng %0 Conference Paper %B 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2021 %T Reinforcement Learning for Unified Allocation and Patrolling in Signaling Games with Uncertainty %A Aravind Venugopal %A Bondi, Elizabeth %A Harshavardhan Kamarthi %A Keval Dholakia %A Balaraman Ravindran %A Tambe, Milind %B 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Proceedings %B AAMAS Workshop on Autonomous Agents for Social Good %D 2021 %T Robustness in Green Security: Minimax Regret Optimality with Reinforcement Learning %A Lily Xu %A Perrault, Andrew %A Fang, Fei %A Haipeng Chen %A Tambe, Milind %B AAMAS Workshop on Autonomous Agents for Social Good %G eng %0 Conference Paper %B AAMAS workshop on Autonomous Agents for social good %D 2021 %T Selective Intervention Planning using Restless Multi-Armed Bandits to Improve Maternal and Child Health Outcomes %A Siddhart Nisthala %A Lovish Madaan %A Mate, Aditya %A Harshavardhan Kamarthi %A Anirudh Grama %A Divy Thakkar %A Dhyanesh Narayanan %A Suresh Chaudhary %A Neha Madhiwala %A Ramesh Padhmanabhan %A Aparna Hegde %A Varakantham, Pradeep %A Balaram Ravindran %A Tambe, Milind %B AAMAS workshop on Autonomous Agents for social good %G eng %0 Journal Article %J 2nd International (Virtual) Workshop on Autonomous Agents for Social Good (AASG 2021) %D 2021 %T A Game-Theoretic Approach for Hierarchical Policy-Making %A Feiran Jia %A Mate, Aditya %A Zun Li %A Shahin Jabbari %A Mithun Chakraborty %A Tambe, Milind %A Michael Wellman %A Vorobeychik, Yevgeniy %X We present the design and analysis of a multi-level game-theoretic model of hierarchical policy-making, inspired by policy responses to the COVID-19 pandemic. Our model captures the potentially mismatched priorities among a hierarchy of policy-makers (e.g., federal, state, and local governments) with respect to two main cost components that have opposite dependence on the policy strength, such as post-intervention infection rates and the cost of policy implementation. Our model further includes a crucial third fac- tor in decisions: a cost of non-compliance with the policy-maker immediately above in the hierarchy, such as non-compliance of state with federal policies. Our first contribution is a closed-form approximation of a recently published agent-based model to com- pute the number of infections for any implemented policy. Second, we present a novel equilibrium selection criterion that addresses common issues with equilibrium multiplicity in our setting. Third, we propose a hierarchical algorithm based on best response dynamics for computing an approximate equilibrium of the hierarchical policy-making game consistent with our solution concept. Finally, we present an empirical investigation of equilibrium policy strategies in this game as a function of game parameters, such as the degree of centralization and disagreements about policy priorities among the agents, the extent of free riding as well as fairness in the distribution of costs. %B 2nd International (Virtual) Workshop on Autonomous Agents for Social Good (AASG 2021) %G eng %U https://teamcore.seas.harvard.edu/files/teamcore/files/aasg_2021_paper_9.pdf %0 Conference Paper %B 20th International Conference on Autonomous Agents and Multiagent Systems %D 2021 %T Beyond "To Act or Not to Act": Fast Lagrangian Approaches to General Multi-Action Restless Bandits %A Jackson A Killian %A Perrault, Andrew %A Tambe, Milind %B 20th International Conference on Autonomous Agents and Multiagent Systems %G eng %0 Conference Proceedings %B In 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) as a Short Paper) %D 2021 %T Cohorting to isolate asymptomatic spreaders: An agent-based simulation study on the Mumbai Suburban Railway %A Alok Talekar %A Sharad Shriram %A Nidhin Vaidhiyan %A Gaurav Aggarwal %A Jiangzhuo Chen %A Srini Venkatramanan %A Lijing Wang %A Aniruddha Adiga %A Adam Sadilek %A Ashish Tendulkar %A Madhav Marathe %A Rajesh Sundaresan %A Tambe, Milind %B In 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) as a Short Paper) %G eng %U https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7781320/ %0 Conference Paper %B AAMAS Autonomous Agents for Social Good Workshop (AASG) %D 2021 %T Learning Opportunistic Adversarial Model on Global Wildlife Trade %A Kai Wang %A Jeffrey Brantingham %A Tambe, Milind %X Global illegal wildlife trade threatens biodiversity and acts as a potential crisis of invasive species and disease spread. Despite a wide range of national and international policies and regulations designed to stop illegal wildlife trade, high profit margins and increasing demand drive a vigorous global illicit trade network. In this paper, we aim to build an adversarial model to predict the future wildlife trade based on the historical trade data. We hypothesize that the majority of illegal wildlife trade is opportunistic crime, which is highly correlated to legal wildlife trade. We can therefore leverage the abundant legal wildlife trade data to forecast the future wildlife trade, where a fixed fraction of trade volume will reflect the opportunistic wildlife trade volume. To learn a legal wildlife trade model, we propose to use graph neural networks and meta-learning to handle the network and species dependencies, respectively. Lastly, we suggest to incorporate agent-based models on top of our model to study the evolution from opportunistic to more organized illegal wildlife trade behavior. %B AAMAS Autonomous Agents for Social Good Workshop (AASG) %G eng %0 Conference Paper %B Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21) %D 2021 %T Dual-Mandate Patrols: Multi-Armed Bandits for Green Security %A Lily Xu %A Bondi, Elizabeth %A Fang, Fei %A Perrault, Andrew %A Kai Wang %A Tambe, Milind %X Conservation efforts in green security domains to protect wildlife and forests are constrained by the limited availability of defenders (i.e., patrollers), who must patrol vast areas to protect from attackers (e.g., poachers or illegal loggers). Defenders must choose how much time to spend in each region of the protected area, balancing exploration of infrequently visited regions and exploitation of known hotspots. We formulate the problem as a stochastic multi-armed bandit, where each action represents a patrol strategy, enabling us to guarantee the rate of convergence of the patrolling policy. However, a naive bandit approach would compromise short-term performance for long-term optimality, resulting in animals poached and forests destroyed. To speed up performance, we leverage smoothness in the reward function and decomposability of actions. We show a synergy between Lipschitz-continuity and decomposition as each aids the convergence of the other. In doing so, we bridge the gap between combinatorial and Lipschitz bandits, presenting a no-regret approach that tightens existing guarantees while optimizing for short-term performance. We demonstrate that our algorithm, LIZARD, improves performance on real-world poaching data from Cambodia. %B Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21) %G eng %0 Conference Proceedings %B AAAI Conference on Artificial Intelligence 2021 %D 2021 %T Fair Influence Maximization: A Welfare Optimization Approach %A Aida Rahmattalabi * %A Shahin Jabbari * %A Himabindu Lakkaraju %A Vayanos, Phebe %A Max Izenberg %A Ryan Brown %A Eric Rice %A Tambe, Milind %X Several behavioral, social, and public health interventions, such as suicide/HIV prevention or community preparedness against natural disasters, leverage social network information to maximize outreach. Algorithmic influence maximization techniques have been proposed to aid with the choice of “peer leaders” or “influencers” in such interventions. Yet, traditional algorithms for influence maximization have not been designed with these interventions in mind. As a result, they may disproportionately exclude minority communities from the benefits of the intervention. This has motivated research on fair influence maximization. Existing techniques come with two major drawbacks. First, they require committing to a single fairness measure. Second, these measures are typically imposed as strict constraints leading to undesirable properties such as wastage of resources. To address these shortcomings, we pro- vide a principled characterization of the properties that a fair influence maximization algorithm should satisfy. In particular, we propose a framework based on social welfare theory, wherein the cardinal utilities derived by each community are aggregated using the isoelastic social welfare functions. Under this framework, the trade-off between fairness and efficiency can be controlled by a single inequality aversion design parameter. We then show under what circumstances our proposed principles can be satisfied by a welfare function. The resulting optimization problem is monotone and submodular and can be solved efficiently with optimality guarantees. Our frame- work encompasses as special cases leximin and proportional fairness. Extensive experiments on synthetic and real world datasets including a case study on landslide risk management demonstrate the efficacy of the proposed framework. %B AAAI Conference on Artificial Intelligence 2021 %G eng %0 Conference Paper %B IJCAI 2020 AI for Social Good workshop %D 2021 %T Beyond “To Act or Not to Act”: Fast Lagrangian Approaches to General Multi-Action Restless Bandits (Short version) %A Jackson A Killian %A Perrault, Andrew %A Tambe, Milind %B IJCAI 2020 AI for Social Good workshop %G eng %0 Conference Paper %B In 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS). London, UK. %D 2021 %T Active Screening for Recurrent Diseases: A Reinforcement Learning Approach %A Ou, Han-Ching %A Haipeng Chen %A Shahin Jabbari %A Tambe, Milind %X

Active screening is a common approach in controlling the spread of recurring infectious diseases such as tuberculosis and influenza. In this approach,  health workers periodically select a subset of population for screening. However, given the limited number of health workers, only a small subset of the population
can be visited in any given time period. Given the recurrent nature of the disease and rapid spreading, the goal is to minimize the number of infections over a long time horizon.
Active screening can be formalized as a sequential combinatorial optimization over the network of people and their connections.
The main computational challenges in this formalization arise from i) the combinatorial nature of the problem, ii) the need of sequential planning and iii) the uncertainties in the infectiousness states of the population.


Previous works on active screening fail to scale to large time horizon while fully considering the future effect of current interventions. In this paper, we propose a novel reinforcement learning (RL) approach based on Deep Q-Networks (DQN), with several innovative adaptations that are designed to address the above challenges. First, we use graph convolutional networks (GCNs) to represent the Q-function that exploit the node correlations of the underlying contact network. Second, to avoid solving a combinatorial optimization problem in each time period, we decompose the node set selection as a sub-sequence of decisions, and further design a two-level RL framework that solves the problem in a hierarchical way.
Finally, to speed-up the slow convergence of RL which arises from reward sparseness, we incorporate ideas from curriculum learning into our hierarchical RL approach. We evaluate our RL algorithm on several real-world networks. Results show that our RL algorithm can scale up to 10 times the problem size of state-of-the-art (the variant that considers the effect of future interventions but un-scalable) in terms of planning time horizon. Meanwhile, it outperforms state-of-the-art (the variant that scales up but does not consider the effect of future interventions) by up to 33% in solution quality.

%B In 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS). London, UK. %G eng %0 Conference Paper %B AAAI Conference on Artificial Intelligence %D 2021 %T Clinical trial of an AI-augmented intervention for HIV prevention in youth experiencing homelessness %A Bryan Wilder %A Onasch-Vera, Laura %A Graham Diguiseppi %A Petering, Robin %A Chyna Hill %A Amulya Yadav %A Eric Rice %A Tambe, Milind %X

Youth experiencing homelessness (YEH) are subject to substantially greater risk of HIV infection, compounded both by their lack of access to stable housing and the disproportionate representation of youth of marginalized racial, ethnic, and gender identity groups among YEH. A key goal for health equity is to improve adoption of protective behaviors in this population. One promising strategy for intervention is to recruit peer leaders from the population of YEH to promote behaviors such as condom usage and regular HIV testing to their social contacts. This raises a computational question: which youth should be selected as peer leaders to maximize the overall impact of the intervention? We developed an artificial intelligence system to optimize such social network interventions in a community health setting. We conducted a clinical trial enrolling 713 YEH at drop-in centers in a large US city. The clinical trial compared interventions planned with the algorithm to those where the highest-degree nodes in the youths' social network were recruited as peer leaders (the standard method in public health) and to an observation-only control group. Results from the clinical trial show that youth in the AI group experience statistically significant reductions in key risk behaviors for HIV transmission, while those in the other groups do not. This provides, to our knowledge, the first empirical validation of the usage of AI methods to optimize social network interventions for health. We conclude by discussing lessons learned over the course of the project which may inform future attempts to use AI in community-level interventions.

%B AAAI Conference on Artificial Intelligence %G eng %0 Generic %D 2021 %T Satellite-Based Food Market Detection for Micronutrient Deficiency Prediction %A Nikhil Behari %A Bondi, Elizabeth %A Golden, Christopher D. %A Hervet J. Randriamady %A Tambe, Milind %B 5th International Workshop on Health Intelligence (W3PHIAI-21) at AAAI %G eng %0 Generic %D 2021 %T Space, Time, and Counts: Improved Human vs Animal Detection in Thermal Infrared Drone Videos for Prevention of Wildlife Poaching %A Puri, Anika %A Bondi, Elizabeth %V Fragile Earth (FEED) Workshop at KDD 2021 %G eng %0 Conference Paper %B AAAI Conference on Artificial Intelligence %D 2021 %T Tracking disease outbreaks from sparse data with Bayesian inference %A Bryan Wilder %A Michael Mina %A Tambe, Milind %X

The COVID-19 pandemic provides new motivation for a classic problem in epidemiology: estimating the empirical rate of transmission during an outbreak (formally, the time-varying reproduction number) from case counts. While standard methods exist, they work best at coarse-grained national or state scales with abundant data, and struggle to accommodate the partial observability and sparse data common at finer scales (e.g., individual schools or towns). For example, case counts may be sparse when only a small fraction of infections are caught by a testing program. Or, whether an infected individual tests positive may depend on the kind of test and the point in time when they are tested. We propose a Bayesian framework which accommodates partial observability in a principled manner. Our model places a Gaussian process prior over the unknown reproduction number at each time step and models observations sampled from the distribution of a specific testing program. For example, our framework can accommodate a variety of kinds of tests (viral RNA, antibody, antigen, etc.) and sampling schemes (e.g., longitudinal or cross-sectional screening). Inference in this framework is complicated by the presence of tens or hundreds of thousands of discrete latent variables. To address this challenge, we propose an efficient stochastic variational inference method which relies on a novel gradient estimator for the variational objective. Experimental results for an example motivated by COVID-19 show that our method produces an accurate and well-calibrated posterior, while standard methods for estimating the reproduction number can fail badly.

%B AAAI Conference on Artificial Intelligence %G eng %0 Generic %D 2020 %T Enhancing Poaching Predictions for Under-Resourced Wildlife Conservation Parks Using Remote Sensing Imagery %A Rachel Guo %A Lily Xu %A Drew Cronin %A Francis Okeke %A Plumptre, Andrew %A Tambe, Milind %G eng %U https://arxiv.org/abs/2011.10666 %0 Conference Paper %B NeuriPS2020 workshop on AI for Earth Sciences %D 2020 %T Efficient Reservoir Management throughDeep Reinforcement Learning %A Xinrun Wang %A Tarun Nair %A Haoyang Li %A Rueben Wong %A Nachiket Kelkar %A Srinivas Vaidyanathan %A Rajat Nayak %A An, Bo %A Jagdish Krishnaswamy %A Tambe, Milind %B NeuriPS2020 workshop on AI for Earth Sciences %G eng %0 Conference Paper %B NeurIPS 2020 (spotlight) %D 2020 %T Automatically Learning Compact Quality-aware Surrogates for Optimization Problems %A Kai Wang %A Bryan Wilder %A Perrault, Andrew %A Tambe, Milind %X Solving optimization problems with unknown parameters often requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values. Recent work has shown that including the optimization problem as a layer in the model training pipeline results in predictions of the unobserved parameters that lead to higher decision quality. Unfortunately, this process comes at a large computational cost because the optimization problem must be solved and differentiated through in each training iteration; furthermore, it may also sometimes fail to improve solution quality due to  non-smoothness issues that arise when training through a complex optimization layer. To address these shortcomings, we learn a low-dimensional surrogate model of a large optimization problem by representing the feasible space in terms of meta-variables, each of which is a linear combination of the original variables. By training a low-dimensional surrogate model end-to-end, and jointly with the predictive model, we achieve: i) a large reduction in training and inference time; and ii) improved performance by focusing attention on the more important variables in the optimization and learning in a smoother space. Empirically, we demonstrate these improvements on a non-convex adversary modeling task, a submodular recommendation task and a convex portfolio optimization task. %B NeurIPS 2020 (spotlight) %C Vancouver, Canada %G eng %0 Conference Paper %B Advances in Neural and Information Processing Systems (NeurIPS) 2020 %D 2020 %T Collapsing Bandits and Their Application to Public Health Interventions %A Aditya Mate* %A Jackson A. Killian* %A Haifeng Xu %A Perrault, Andrew %A Tambe, Milind %X We propose and study Collapsing Bandits, a new restless multi-armed bandit (RMAB) setting in which each arm follows a binary-state Markovian process with a special structure: when an arm is played, the state is fully observed, thus “collapsing” any uncertainty, but when an arm is passive, no observation is made, thus allowing uncertainty to evolve. The goal is to keep as many arms in the “good” state as possible by planning a limited budget of actions per round. Such Collapsing Bandits are natural models for many healthcare domains in which health workers must simultaneously monitor patients and deliver interventions in a way that maximizes the health of their patient cohort. Our main contributions are as follows: (i) Building on the Whittle index technique for RMABs, we derive conditions under which the Collapsing Bandits problem is indexable. Our derivation hinges on novel conditions that characterize when the optimal policies may take the form of either “forward” or “reverse” threshold policies. (ii) We exploit the optimality of threshold policies to build fast algorithms for computing the Whittle index, including a closed form. (iii) We evaluate our algorithm on several data distributions including data from a real-world healthcare task in which a worker must monitor and deliver interventions to maximize their patients’ adherence to tuberculosis medication. Our algorithm achieves a 3-order-of-magnitude speedup compared to state-of-the-art RMAB techniques, while achieving similar performance. %B Advances in Neural and Information Processing Systems (NeurIPS) 2020 %C Vancouver, Canada %G eng %U https://arxiv.org/pdf/2007.04432.pdf %0 Journal Article %J NeurIPS 2020 Workshops: Challenges of Real World Reinforcement Learning, Machine Learning in Public Health (Best Lightning Paper), Machine Learning for Health (Best on Theme), Machine Learning for the Developing World %D 2020 %T Incorporating Healthcare Motivated Constraints in Restless Bandit Based Resource Allocation %A Aviva Prins %A Mate, Aditya %A Jackson A Killian %A Rediet Abebe %A Tambe, Milind %B NeurIPS 2020 Workshops: Challenges of Real World Reinforcement Learning, Machine Learning in Public Health (Best Lightning Paper), Machine Learning for Health (Best on Theme), Machine Learning for the Developing World %G eng %0 Conference Paper %B AAAI Fall Symposium %D 2020 %T Robust Lock-Down Optimization for COVID-19 Policy Guidance %A Ankit Bhardwaj* %A Ou*, Han Ching %A Haipeng Chen %A Shahin Jabbari %A Tambe, Milind %A Rahul Panicker %A Alpan Raval %B AAAI Fall Symposium %G eng %0 Journal Article %J Science Advances %D 2020 %T Test sensitivity is secondary to frequency and turnaround time for COVID-19 screening %A Daniel B. Larremore %A Bryan Wilder %A Lester, Evan %A Shehata, Soraya %A James M. Burke %A James A. Hay %A Tambe, Milind %A Michael J. Mina %A Parker, Roy %X The COVID-19 pandemic has created a public health crisis. Because SARS-CoV-2 can spread from individuals with pre-symptomatic, symptomatic, and asymptomatic infections, the re-opening of societies and the control of virus spread will be facilitated by robust population screening, for which virus testing will often be central. After infection, individuals undergo a period of incubation during which viral titers are usually too low to detect, followed by an exponential viral growth, leading to a peak viral load and infectiousness, and ending with declining viral levels and clearance. Given the pattern of viral load kinetics, we model the effectiveness of repeated population screening considering test sensitivities, frequency, and sample-to-answer reporting time. These results demonstrate that effective screening depends largely on frequency of testing and the speed of reporting, and is only marginally improved by high test sensitivity. We therefore conclude that screening should prioritize accessibility, frequency, and sample-to-answer time; analytical limits of detection should be secondary. %B Science Advances %G eng %U https://advances.sciencemag.org/content/advances/early/2020/11/20/sciadv.abd5393.1.full.pdf %0 Conference Proceedings %B Conference on Decision and Game Theory for Security %D 2020 %T Exploiting Bounded Rationality in Risk-based Cyber Camouflage Games %A Thakoor, Omkar %A Shahin Jabbari %A Palvi Aggarwal %A Gonzalez, Cleotilde %A Tambe, Milind %A Vayanos, Phebe %X Recent works have growingly shown that Cyber deception can effectively impede the reconnaissance efforts of intelligent cyber attackers. Recently proposed models to optimize a deceptive defense based on camouflaging network and system attributes, have shown effective numerical results on simulated data. However, these models possess a fundamental drawback due to the assumption that an attempted attack is always successful — as a direct consequence of the deceptive strategies being deployed, the attacker runs a significant risk that the attack fails. Further, this risk or uncertainty in the rewards magnifies the boundedly rational behavior in humans which the previous models do not handle. To that end, we present Risk-based Cyber Camouflage Games — a general-sum game model that captures the uncertainty in the attack's success. In case of the rational attackers, we show that optimal defender strategy computation is NP-hard even in the zero-sum case.We provide an MILP formulation for the general problem with constraints on cost and feasibility, along with a pseudo-polynomial time algorithm for the special unconstrained setting. Second, for risk-averse attackers, we present a solution based on Prospect theoretic modeling along with a robust variant that minimizes regret. Third, we propose a solution that does not rely on the attacker behavior model or past data, and effective for the broad setting of strictly competitive games where previous solutions against bounded rationality prove ineffective. Finally, we provide numerical results that our solutions effectively lower the defender loss. %B Conference on Decision and Game Theory for Security %G eng %0 Conference Paper %B 64th Human Factors and Ergonomics Society (HFES) Annual Conference %D 2020 %T An Exploratory Study of a Masking Strategy of Cyberdeception Using CyberVAN %A Palvi Aggarwal %A Thakoor, Omkar %A Mate, Aditya %A Tambe, Milind %A Edward A. Cranford %A Lebiere, Christian %A Gonzalez, Cleotilde %X During the network reconnaissance process, attackers scan the network to gather information before launching an attack. This is a good chance for defenders to use deception and disrupt the attacker’s learning process. In this paper, we present an exploratory experiment to test the effectiveness of a masking strategy (compared to a random masking strategy) to reduce the utility of attackers. A total of 30 human participants (in the role of attackers) are randomly assigned to one of the two experimental conditions: Optimal or Random (15 in each condition). Attackers appeared to be more successful in launching attacks in the optimal condition compared to the random condition but the total score of attackers was not different from the random masking strategy. Most importantly, we found a generalized tendency to act according to the certainty bias (or risk aversion). These observations will help to improve the current state-of-the-art masking algorithms of cyberdefense. %B 64th Human Factors and Ergonomics Society (HFES) Annual Conference %G eng %0 Generic %D 2020 %T Personalized Adherence Management in TB: Using AI to Schedule Targeted Interventions %A Jackson A. Killian %A Mate, Aditya %A Haifeng Xu %A Perrault, Andrew %A Tambe, Milind %B 51st Union World Conference on Lung Health - Accepted Oral Abstract %G eng %0 Journal Article %J Proceedings of the National Academy of Sciences %D 2020 %T Modeling between-population variation in COVID-19 dynamics in Hubei, Lombardy, and New York City %A Bryan Wilder %A Marie Charpignon %A Jackson A Killian %A Ou, Han-Ching %A Mate, Aditya %A Shahin Jabbari %A Perrault, Andrew %A Angel Desai %A Tambe, Milind %A Maimuna S. Majumder %B Proceedings of the National Academy of Sciences %G eng %U https://www.pnas.org/content/early/2020/09/23/2010651117 %0 Journal Article %J Journal of the Society for Social Work and Research %D 2020 %T Getting to the root of the problem: A decision-tree analysis for suicide risk among young people experiencing homelessness %A Anthony Fulgianti %A Avi Segal %A Jennifer Wilson %A Chyna Hill %A Tambe, Milind %A Carl Castro %A Eric Rice %X Objective : The assessment and prediction of suicide risk among young people experiencing
homelessness (YEH) has proven difficult. Although a large number of suicide risk factors have
been identified, there is limited guidance about their relative importance and the combinations of
factors (i.e., profiles) that heighten risk. Method : Using survey and social network methods, we
gathered information about 940 YEH and their relationships. We then used a machine learning
approach to construct Classification and Regression Tree models to predict suicidal ideation and
suicide attempts. Results : Thirteen variables were important correlates in the decision tree
models. This included prominent individual risk factors (e.g., trauma, depression), but over half
of them were social network factors (e.g., hard drug use). For suicidal ideation, the model had an
area under the receiver operating characteristic curve (AUC) value of 0.79, with Accuracy of
68%, Sensitivity of 48%, and Specificity of 73%. For suicide attempt, the model had an AUC
value of 0.86, with Accuracy of 71%, Sensitivity of 68%, and Specificity of 72%. Conclusions :
Effective suicide prevention programming should target the syndemic that threatens YEH (i.e.,
co-occurrence of trauma-depression-substance use-violence), including social norms in their
environments. With refinement, our decision trees may be useful aids for suicide risk screening
and guiding targeted intervention. %B Journal of the Society for Social Work and Research %V (to appear) %G eng %0 Conference Paper %B KDD 2020 Workshop on Humanitarian Mapping %D 2020 %T Evaluating COVID-19 Lockdown and Business-Sector-Specific Reopening Policies for Three US States %A Jackson A. Killian %A Marie Charpignon %A Bryan Wilder %A Perrault, Andrew %A Tambe, Milind %A Maimuna S. Majumder %X Background: The United States has been particularly hard-hit by COVID-19, accounting for approximately 30% of all global cases and deaths from the disease that have been reported as of May 20, 2020. We extended our agent-based model for COVID-19 transmission to study the effect of alternative lockdown and reopening policies on disease dynamics in Georgia, Florida, and Mississippi. Specifically, for each state we simulated the spread of the disease had the state enforced its lockdown approximately one week earlier than it did. We also simulated Georgia's reopening plan under various levels of physical distancing if enacted in each state, making projections until June 15, 2020.

Methods: We used an agent-based SEIR model that uses population-specific age distribution, household structure, contact patterns, and comorbidity rates to perform tailored simulations for each region. The model was first calibrated to each state using publicly available COVID-19 death data as of April 23, then implemented to simulate given lockdown or reopening policies.

Results: Our model estimated that imposing lockdowns one week earlier could have resulted in hundreds fewer COVID-19-related deaths in the context of all three states. These estimates quantify the effect of early action, a key metric to weigh in developing prospective policies to combat a potential second wave of infection in each of these states. Further, when simulating Georgia’s plan to reopen select businesses as of April 27, our model found that a reopening policy that includes physical distancing to ensure no more than 25% of pre-lockdown contact rates at reopened businesses could allow limited economic activity to resume in any of the three states, while also eventually flattening the curve of COVID-19-related deaths by June 15, 2020. %B KDD 2020 Workshop on Humanitarian Mapping %G eng %U https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3598744 %0 Generic %D 2020 %T Evaluating COVID-19 Lockdown Policies for India: A Preliminary Modeling Assessment for Individual States %A Mate, Aditya %A Jackson A. Killian %A Bryan Wilder %A Marie Charpignon %A Ananya Awasthi %A Tambe, Milind %A Maimuna S. Majumder %X Background: On March 24, India ordered a 3-week nationwide lockdown in an effort to control the spread of COVID-19. While the lockdown has been effective, our model suggests that completely ending the lockdown after three weeks could have considerable adverse public health ramifications. We extend our individual-level model for COVID-19 transmission [1] to study the disease dynamics in India at the state level for Maharashtra and Uttar Pradesh to estimate the effect of further lockdown policies in each region. Specifically, we test policies which alternate between total lockdown and simple physical distancing to find "middle ground" policies that can provide social and economic relief as well as salutary population-level health effects. Methods: We use an agent-based SEIR model that uses population-specific age distribution, household structure, contact patterns, and comorbidity rates to perform tailored simulations for each region. The model is first calibrated to each region using publicly available COVID-19 death data, then implemented to simulate a range of policies. We also compute the basic reproduction number R0 and case documentation rate for both regions. Results: After the initial lockdown, our simulations demonstrate that even policies that enforce strict physical distancing while returning to normal activity could lead to widespread outbreaks in both states. However, "middle ground" policies that alternate weekly between total lockdown and physical distancing may lead to much lower rates of infection while simultaneously permitting some return to normalcy. %B KDD 2020 Workshop on Humanitarian Mapping %G eng %0 Journal Article %J Topics in Cognitive Science %D 2020 %T Toward Personalized Deceptive Signaling for CyberDefense Using Cognitive Models %A Edward A. Cranford %A Gonzalez, Cleotilde %A Palvi Aggarwal %A Cooney, Sarah %A Tambe, Milind %A Lebiere, Christian %X Recent research in cybersecurity has begun to develop active defense strategies using game-theoretic optimization of the allocation of limited defenses combined with deceptive signaling. These algorithms assume rational human behavior. However, human behavior in an online game designed to simulate an insider attack scenario shows that humans, playing the role of attackers, attack far more often than predicted under perfect rationality. We describe an instance-based learning cognitive model, built in ACT-R, that accurately predicts human performance and biases in the game. To improve defenses, we propose an adaptive method of signaling that uses the cognitive model to trace an individual’s experience in real time. We discuss the results and implications of this adaptive signaling method for personalized defense. %B Topics in Cognitive Science %V 12 %P 992-1011 %G eng %U http://dx.doi.org/10.1111/tops.12513 %N 3 %0 Conference Paper %B Harvard CRCS workshop on AI for Social Good %D 2020 %T Missed calls, Automated Calls and Health Support: Using AI to improve maternalhealth outcomes by increasing program engagement %A Siddharth Nishtala %A Harshavardhan Kamarthi %A Divy Thakkar %A Dhyanesh Narayanan %A Anirudh Grama %A Aparna Hegde %A Ramesh Padmanabhan %A Neha Madhiwala %A Suresh Chaudhary %A Balaram Ravindran %A Tambe, Milind %B Harvard CRCS workshop on AI for Social Good %G eng %0 Conference Proceedings %B Harvard CRCS Workshop on AI for Social Good %D 2020 %T Game Theory on the Ground: The Effect of Increased Patrols on Deterring Poachers %A Lily Xu %A Perrault, Andrew %A Plumptre, Andrew %A Driciru, Margaret %A Wanyama, Fred %A Rwetsiba, Aggrey %A Tambe, Milind %X Applications of artificial intelligence for wildlife protection have focused on learning models of poacher behavior based on historical patterns. However, poachers' behaviors are described not only by their historical preferences, but also their reaction to ranger patrols. Past work applying machine learning and game theory to combat poaching have hypothesized that ranger patrols deter poachers, but have been unable to find evidence to identify how or even if deterrence occurs. Here for the first time, we demonstrate a measurable deterrence effect on real-world poaching data. We show that increased patrols in one region deter poaching in the next timestep, but poachers then move to neighboring regions. Our findings offer guidance on how adversaries should be modeled in realistic game-theoretic settings. %B Harvard CRCS Workshop on AI for Social Good %G eng %U http://arxiv.org/abs/2006.12411 %0 Journal Article %J Social Networks %D 2020 %T Modeling the dynamism of HIV information diffusion in multiplex networks of homeless youth %A Lindsay Young %A Jerome Mayaud %A Suen, Sze-Chuan %A Tambe, Milind %A Eric Rice %B Social Networks %V 63 %P 112-121 %G eng %0 Journal Article %J medRxiv %D 2020 %T Surveillance testing of SARS-CoV-2 %A Larremore, Daniel B %A Bryan Wilder %A Lester, Evan %A Shehata, Soraya %A Burke, James M %A Hay, James A %A Tambe, Milind %A Mina, Michael J %A Parker, Roy %X The COVID-19 pandemic has created a public health crisis. Because SARS-CoV-2 can spread from individuals with pre-symptomatic, symptomatic, and asymptomatic infections, the re-opening of societies and the control of virus spread will be facilitated by robust surveillance, for which virus testing will often be central. After infection, individuals undergo a period of incubation during which viral titers are usually too low to detect, followed by an exponential growth of virus, leading to a peak viral load and infectiousness, and ending with declining viral levels and clearance. Given the pattern of viral load kinetics, we model surveillance effectiveness considering test sensitivities, frequency, and sample-to-answer reporting time. These results demonstrate that effective surveillance, including time to first detection and outbreak control, depends largely on frequency of testing and the speed of reporting, and is only marginally improved by high test sensitivity. We therefore conclude that surveillance should prioritize accessibility, frequency, and sample-to-answer time; analytical limits of detection should be secondary. %B medRxiv %I Cold Spring Harbor Laboratory Press %G eng %U https://www.medrxiv.org/content/early/2020/06/25/2020.06.22.20136309 %R 10.1101/2020.06.22.20136309 %0 Conference Paper %B AAAI 2020 Workshop on Health Intelligence, preliminary version %D 2020 %T Fair Influence Maximization: A Welfare Optimization Approach %A Rahmattalabi, Aida %A Shahin Jabbari %A Himabindu Lakkaraju %A Vayanos, Phebe %A Eric Rice %A Tambe, Milind %X Several social interventions (e.g., suicide and HIV prevention) leverage social network information to maximize outreach. Algorithmic influence maximization techniques have been proposed to aid with the choice of “influencers” (often referred to as “peer leaders”) in such interventions. Traditional algorithms for influence maximization have not been designed with social interventions in mind. As a result, they may disproportionately exclude minority communities from the benefits of the intervention. This has motivated research on fair influence maximization. Existing techniques require committing to a single domain-specific fairness measure. This makes it hard for a decision maker to meaningfully compare these notions and their resulting trade-offs across different applications. We address these shortcomings by extending the principles of cardinal welfare to the influence maximization setting, which is underlain by complex connections between members of different communities. We generalize the theory regarding these principles and show under what circumstances these principles can be satisfied by a welfare function. We then propose a family of welfare functions that are governed by a single inequity aversion parameter which allows a decision maker to study task-dependent trade-offs between fairness and total influence and effectively trade off quantities like influence gap by varying this parameter. We use these welfare functions as a fairness notion to rule out undesirable allocations. We show that the resulting optimization problem is monotone and submodular and can be solved with optimality guarantees. Finally, we carry out a detailed experimental analysis on synthetic and real social networks and should that high welfare can be achieved without sacrificing the total influence significantly. Interestingly we can show there exists welfare functions that empirically satisfy all of the principles. %B AAAI 2020 Workshop on Health Intelligence, preliminary version %G eng %0 Generic %D 2020 %T Interplay of global multi-scale human mobility, social distancing, government interventions, and COVID-19 dynamics %A Aniruddha Adiga %A Lijing Wang %A Adam Sadilek %A Ashish Tendulkar %A Srinivasan Venkatramanan %A Anil Vullikanti %A Gaurav Aggarwal %A Alok Talekar %A Xue Ben %A Jiangzhuo Chen %A Bryan Lewis %A Samarth Swarup %A Tambe, Milind %A Madhav Marathe %G eng %U https://www.medrxiv.org/content/10.1101/2020.06.05.20123760v3 %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-20) %D 2020 %T Who and When to Screen Multi-Round Active Screening for Network Recurrent Infectious Diseases Under Uncertainty %A Ou, Han-Ching %A Sinha, Arunesh %A Suen, Sze-Chuan %A Perrault, Andrew %A Alpan Raval %A Tambe, Milind %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-20) %G eng %0 Conference Paper %B IEEE International Conference on Data Engineering (ICDE-20) %D 2020 %T Stay Ahead of Poachers: Illegal Wildlife Poaching Prediction and Patrol Planning Under Uncertainty with Field Test Evaluations %A Lily Xu* %A Gholami *, Shahrzad %A Mc Carthy, Sara %A Dilkina, Bistra %A Plumptre, Andrew %A Tambe, Milind %A Singh, Rohit %A Mustapha Nsubuga %A Mabonga, Joshua %A Driciru, Margaret %A Wanyama, Fred %A Rwetsiba, Aggrey %A Tom Okello %A Eric Enyel %X Illegal wildlife poaching threatens ecosystems and drives endangered species toward extinction. However, efforts for wildlife protection are constrained by the limited resources of law enforcement agencies. To help combat poaching, the Protection Assistant for Wildlife Security (PAWS) is a machine learning pipeline that has been developed as a data-driven approach to identify areas at high risk of poaching throughout protected areas and compute optimal patrol routes. In this paper, we take an end-to-end approach to the data-to-deployment pipeline for anti-poaching. In doing so, we address challenges including extreme class imbalance (up to 1:200), bias, and uncertainty in wildlife poaching data to enhance PAWS, and we apply our methodology to three national parks with diverse characteristics. (i) We use Gaussian processes to quantify predictive uncertainty, which we exploit to improve robustness of our prescribed patrols and increase detection of snares by an average of 30%. We evaluate our approach on real-world historical poaching data from Murchison Falls and Queen Elizabeth National Parks in Uganda and, for the first time, Srepok Wildlife Sanctuary in Cambodia. (ii) We present the results of large-scale field tests conducted in Murchison Falls and Srepok Wildlife Sanctuary which confirm that the predictive power of PAWS extends promisingly to multiple parks. This paper is part of an effort to expand PAWS to 800 parks around the world through integration with SMART conservation software.  %B IEEE International Conference on Data Engineering (ICDE-20) %G eng %0 Unpublished Work %D 2020 %T Evaluating COVID-19 Lockdown Policies For India: A Preliminary Modeling Assessment for Individual States %A Mate, Aditya %A Jackson A. Killian %A Bryan Wilder %A Marie Charpignon %A Ananya Awasthi %A Tambe, Milind %A Maimuna S. Majumder %X Background: On March 24, India ordered a 3-week nationwide lockdown in an effort to control the spread of COVID-19. While the lockdown has been effective, our model suggests that completely ending the lockdown after three weeks could have considerable adverse public health ramifications. We extend our individual-level model for COVID-19 transmission [1] to study the disease dynamics in India at the state level for Maharashtra and Uttar Pradesh to estimate the effect of further lockdown policies in each region. Specifically, we test policies which alternate between total lockdown and simple physical distancing to find "middle ground" policies that can provide social and economic relief as well as salutary population-level health effects.

Methods: We use an agent-based SEIR model that uses population-specific age distribution, household structure, contact patterns, and comorbidity rates to perform tailored simulations for each region. The model is first calibrated to each region using publicly available COVID-19 death data, then implemented to simulate a range of policies. We also compute the basic reproduction number R0 and case documentation rate for both regions.

Results: After the initial lockdown, our simulations demonstrate that even policies that enforce strict physical distancing while returning to normal activity could lead to widespread outbreaks in both states. However, "middle ground" policies that alternate weekly between total lockdown and physical distancing may lead to much lower rates of infection while simultaneously permitting some return to normalcy. %B SSRN %G eng %U https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3575207 %0 Conference Paper %B AAAI Health Intelligence Workshop %D 2020 %T Fairness in Time-Critical Influence Maximizationwith Applications to Public Health Preventative Interventions %A Aida Rahmattalabi * %A Shahin Jabbari * %A Vayanos, Phebe %A Himabindu Lakkaraju %A Tambe, Milind %B AAAI Health Intelligence Workshop %G eng %0 Conference Paper %B 53rd Hawaii International Conference on System Sciences %D 2020 %T Adaptive Cyber Deception: Cognitively Informed Signaling for Cyber Defense %A Edward A. Cranford %A Palvi Aggarwal %A Gonzalez, Cleotilde %A Cooney, Sarah %A Tambe, Milind %A Lebiere, Christian %B 53rd Hawaii International Conference on System Sciences %P 1885-1894 %G eng %0 Magazine Article %D 2020 %T AI for Social Impact: Learning and Planning in the Data-to-Deployment Pipeline %A Perrault, Andrew %A Fang, Fei %A Sinha, Arunesh %A Tambe, Milind %B AI Magazine %G eng %0 Conference Paper %B Doctoral Consortium at International Conference on Autonomous Agents and Multi-agent Systems (AAMAS) %D 2020 %T Balance Between Scalability and Optimality in Network Security Games %A Kai Wang %X Network security games (NSGs) are widely used in security related domain to model the interaction between the attacker and the defender. However, due to the complex graph structure of the entire network, finding a Nash equilibrium even when the attacker is fully rational is not well-studied yet. There is no efficient algorithms known with valid guarantees. We identify two major issues of NSGs: i) non-linearity ii) correlation between edges. NSGs with non-linear objective function are usually hard to optimize, while correlated edges might create exponentially many strategies and impact the scalability. In this paper, we analyze the distortion of linear and non-linear formulations of NSGs with fully rational attacker. We provide theoretical bounds on these different formulations, which can quantify the approximation ratio between linear and non-linear assumption. This result can help us understand how much loss will the linearization incur in exchange for the scalability. %B Doctoral Consortium at International Conference on Autonomous Agents and Multi-agent Systems (AAMAS) %G eng %0 Conference Paper %B WACV %D 2020 %T BIRDSAI: A Dataset for Detection and Tracking in Aerial Thermal Infrared Videos %A Bondi, Elizabeth %A Raghav Jain %A Palash Aggrawal %A Saket Anand %A Hannaford, Robert %A Kapoor, Ashish %A Piavis, Jim %A Shah, Shital %A Joppa, Lucas %A Dilkina, Bistra %A Tambe, Milind %B WACV %G eng %0 Conference Paper %B ICML 2020 Workshop on Human Interpretability in Machine Learning, preliminary version %D 2020 %T An Empirical Study of the Trade-Offs Between Interpretability and Fairness %A Shahin Jabbari %A Ou, Han-Ching %A Himabindu Lakkaraju %A Tambe, Milind %X As machine learning models are increasingly being deployed in critical domains such as criminal justice and healthcare, there has been a growing interest in developing algorithms that are interpretable and fair. While there has been a lot of research on each of these topics in isolation, there has been little work on their intersection. In this paper, we present an empirical study for understanding the relationship between model interpretability and fairness. To this end, we propose a novel evaluation framework and outline appropriate evaluation metrics to determine this relationship across various classes of models in both synthetic and real world datasets. %B ICML 2020 Workshop on Human Interpretability in Machine Learning, preliminary version %G eng %0 Conference Proceedings %B AAAI Conference on Artificial Intelligence %D 2020 %T End-to-End Game-Focused Learning of Adversary Behavior in Security Games %A Perrault, Andrew %A Bryan Wilder %A Ewing, Eric %A Mate, Aditya %A Dilkina, Bistra %A Tambe, Milind %B AAAI Conference on Artificial Intelligence %C New York %G eng %0 Generic %D 2020 %T In the Shadow of Disaster: Finding Shadows to Improve Damage Detection %A Ilkin Bayramli %A Bondi, Elizabeth %A Tambe, Milind %B Harvard CRCS Workshop on AI for Social Good %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems %D 2020 %T Influence maximization in unknown social networks: Learning Policies for Effective Graph Sampling %A Harshavardhan Kamarthi %A Priyesh Vijayan %A Bryan Wilder %A Balaraman Ravindran %A Tambe, Milind %X
A serious challenge when finding influential actors in real-world social networks is the lack of knowledge about the structure of the underlying network. Current state-of-the-art methods rely on hand-crafted sampling algorithms; these methods sample nodes and their neighbours in a carefully constructed order and choose opinion leaders from this discovered network to maximize influence spread in the (unknown) complete network. In this work, we propose a reinforcement learning framework for network discovery that automatically learns useful node and graph representations that encode important structural properties of the network. At training time, the method identifies portions of the network such that the nodes selected from this sampled subgraph can effectively influence nodes in the complete network. The realization of such transferable network structure based adaptable policies is attributed to the meticulous design of the framework that encodes relevant node and graph signatures driven by an appropriate reward scheme. We experiment with real-world social networks from four different domains and show that the policies learned by our RL agent provide a 10-36% improvement over the current state-of-the-art method.
%B International Conference on Autonomous Agents and Multiagent Systems %G eng %0 Generic %D 2020 %T Mapping for Public Health: Initial Plan for Using Satellite Imagery for Micronutrient Deficiency Prediction %A Bondi, Elizabeth %A Perrault, Andrew %A Fang, Fei %A Benjamin L. Rice %A Golden, Christopher D. %A Tambe, Milind %B KDD 2020 Workshop on Humanitarian Mapping %G eng %0 Conference Paper %B AAAI Conference on Artificial Intelligence %D 2020 %T MIPaaL: Mixed Integer Program as a Layer %A Aaron Ferber %A Bryan Wilder %A Dilkina, Bistra %A Tambe, Milind %X Machine learning components commonly appear in larger decision-making pipelines; however, the model training process typically focuses only on a loss that measures average accuracy between predicted values and ground truth values. Decision-focused learning explicitly integrates the downstream decision problem when training the predictive model, in order to optimize the quality of decisions induced by the predictions. It has been successfully applied to several limited combinatorial problem classes, such as those that can be expressed as linear programs (LP), and submodular optimization. However, these previous applications have uniformly focused on problems with simple constraints. Here, we enable decision-focused learning for the broad class of problems that can be encoded as a mixed integer linear program (MIP), hence supporting arbitrary linear constraints over discrete and continuous variables. We show how to differentiate through a MIP by employing a cutting planes solution approach, an algorithm that iteratively tightens the continuous relaxation by adding constraints removing fractional solutions. We evaluate our new end-to-end approach on several real world domains and show that it outperforms the standard two phase approaches that treat prediction and optimization separately, as well as a baseline approach of simply applying decision-focused learning to the LP relaxation of the MIP. Lastly, we demonstrate generalization performance in several transfer learning tasks. %B AAAI Conference on Artificial Intelligence %G eng %0 Conference Proceedings %B Conference on Uncertainty in Artificial Intelligence (UAI) %D 2020 %T Robust Spatial-Temporal Incident Prediction %A Mukhopadhyay, Ayan %A Kai Wang %A Perrault, Andrew %A Mykel Kochenderfer %A Tambe, Milind %A Vorobeychik, Yevgeniy %B Conference on Uncertainty in Artificial Intelligence (UAI) %C Toronto %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multi-agent Systems (AAMAS) %D 2020 %T Scalable Game-Focused Learning of Adversary Models: Data-to-Decisions in Network Security Games %A Kai Wang %A Perrault, Andrew %A Mate, Aditya %A Tambe, Milind %X Previous approaches to adversary modeling in network security games (NSGs) have been caught in the paradigm of first building a full adversary model, either from expert input or historical attack data, and then solving the game. Motivated by the need to disrupt the multibillion dollar illegal smuggling networks, such as wildlife and drug trafficking, this paper introduces a fundamental shift in learning adversary behavior in NSGs by focusing on the accuracy of the model using the downstream game that will be solved. Further, the paper addresses technical challenges in building such a game-focused learning model by i) applying graph convolutional networks to NSGs to achieve tractability and differentiability and ii) using randomized block updates of the coefficients of the defender's optimization in order to scale the approach to large networks. We show that our game-focused approach yields scalability and higher defender expected utility than models trained for accuracy only. %B International Conference on Autonomous Agents and Multi-agent Systems (AAMAS) %G eng %0 Conference Proceedings %B AAAI Conference on Artificial Intelligence %D 2020 %T Solving Online Threat Screening Games Using Constrained Action Space Reinforcement Learning %A Sanket Shah %A Sinha, Arunesh %A Varakantham, Pradeep %A Perrault, Andrew %A Tambe, Milind %B AAAI Conference on Artificial Intelligence %C New York %G eng %0 Conference Paper %B AAAI conference on Artificial Intelligence %D 2020 %T To Signal or Not To Signal: Exploiting Uncertain Real-Time Information in Signaling Games for Security and Sustainability %A Bondi, Elizabeth %A Oh, Hoon %A Haifeng Xu %A Fang, Fei %A Dilkina, Bistra %A Tambe, Milind %B AAAI conference on Artificial Intelligence %G eng %0 Conference Paper %B AAMAS Doctoral Consortium %D 2020 %T Vision for Decisions: Utilizing Uncertain Real-Time Information and Signaling for Conservation %A Bondi, Elizabeth %B AAMAS Doctoral Consortium %G eng %0 Conference Paper %B 64th Human Factors and Ergonomics Society (HFES) Annual Conference %D 2020 %T What Attackers Know and What They Have to Lose: Framing Effects on Cyber-attacker Decision Making %A Edward A. Cranford %A Gonzalez, Cleotilde %A Palvi Aggarwal %A Tambe, Milind %A Lebiere, Christian %B 64th Human Factors and Ergonomics Society (HFES) Annual Conference %G eng %0 Conference Paper %B The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2019 %D 2019 %T Learning to Prescribe Interventions for Tuberculosis Patients Using Digital Adherence Data %A Jackson Killian %A Bryan Wilder %A Amit Sharma %A Vinod Choudhary %A Dilkina, Bistra %A Tambe, Milind %X Digital Adherence Technologies (DATs) are an increasingly popular method for verifying patient adherence to many medications.
We analyze data from one city served by 99DOTS, a phone-callbased DAT deployed for Tuberculosis (TB) treatment in India where
nearly 3 million people are afflicted with the disease each year. The
data contains nearly 17,000 patients and 2.1M dose records. We lay
the groundwork for learning from this real-world data, including
a method for avoiding the effects of unobserved interventions in
training data used for machine learning. We then construct a deep
learning model, demonstrate its interpretability, and show how it
can be adapted and trained in three different clinical scenarios to
better target and improve patient care. In the real-time risk prediction setting our model could be used to proactively intervene with
21% more patients and before 76% more missed doses than current
heuristic baselines. For outcome prediction, our model performs
40% better than baseline methods, allowing cities to target more
resources to clinics with a heavier burden of patients at risk of failure. Finally, we present a case study demonstrating how our model
can be trained in an end-to-end decision focused learning setting to
achieve 15% better solution quality in an example decision problem
faced by health workers. %B The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2019 %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (Doctoral Consortium) (AAMAS-19) %D 2019 %T Bridging the Gap Between High-Level Reasoning in Strategic Agent Coordination and Low-Level Agent Development %A Bondi, Elizabeth %X Recent advances in fields such as computer vision and natural language processing have paved the way for developing agents capable of automatically interpreting their surrounding environment. Concurrently, advances in artificial intelligence have made the coordination of many such agents possible. However, there is little work considering both the low-level reasoning that allows agents to interpret their environment, such as deep learning techniques, and the high-level reasoning that coordinates such agents. By considering both together, we can better handle real-world scenarios, for example by planning at a high level with low-level uncertainty in mind, or even by improving low-level processing by using high level reasoning to place the agent in the best scenario for success. %B International Conference on Autonomous Agents and Multiagent Systems (Doctoral Consortium) (AAMAS-19) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) (AAMAS-19) %D 2019 %T Broken Signals in Security Games: Coordinating Mobile Patrollers and Sensors in the Real World (Extended Abstract) %A Bondi, Elizabeth %A Oh, Hoon %A Haifeng Xu %A Fang, Fei %A Dilkina, Bistra %A Tambe, Milind %X Mobile sensors, e.g., unmanned aerial vehicles (UAVs), are becoming increasingly important in security domains and can be used for tasks such as searching for poachers in conservation areas. Such mobile sensors augment human patrollers by assisting in surveillance and in signaling potentially deceptive information to adversaries, and their coordinated deployment could be modeled via the well-known security games framework. Unfortunately, real-world uncertainty in the sensor’s detection of adversaries and adversaries’ observation of the sensor’s signals present major challenges in the sensors’ use. This leads to significant detriments in security performance. We first discuss the current shortcomings in more detail, and then propose a novel game model that incorporates uncertainty with sensors. The defender strategy in this game model will consist of three interdependent stages: an allocation stage, a signaling stage, and a reaction stage %B International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) (AAMAS-19) %G eng %0 Conference Paper %B Conference on Decision and Game Theory for Security, 2019 %D 2019 %T Cyber Camouflage Games for Strategic Deception %A Thakoor, Omkar %A Tambe, Milind %A Vayanos, Phebe %A Haifeng Xu %A Kiekintveld, Christopher %A Feng, Fei %X  The rapid increase in cybercrime, causing a reported annual economic
loss of $600 billion (Lewis, 2018), has prompted a critical need for effective cyber defense. Strategic criminals conduct network reconnaissance prior to executing attacks to avoid detection and establish situational awareness via scanning
and fingerprinting tools. Cyber deception attempts to foil these reconnaissance
efforts by camouflaging network and system attributes to disguise valuable information. Game-theoretic models can identify decisions about strategically deceiving attackers, subject to domain constraints. For effectively deploying an optimal
deceptive strategy, modeling the objectives and the abilities of the attackers, is a
key challenge. To address this challenge, we present Cyber Camouflage Games
(CCG), a general-sum game model that captures attackers which can be diversely
equipped and motivated. We show that computing the optimal defender strategy is
NP-hard even in the special case of unconstrained CCGs, and present an efficient
approximate solution for it. We further provide an MILP formulation accelerated
with cut-augmentation for the general constrained problem. Finally, we provide
experimental evidence that our solution methods are efficient and effective. %B Conference on Decision and Game Theory for Security, 2019 %G eng %0 Conference Paper %B GAIW: Games, Agents and Incentives Workshop at International Conference on Autonomous Agents and Multiagent Systems (AAMAS-19) %D 2019 %T Decision-Focused Learning of Adversary Behavior in Security Games %A Perrault, Andrew %A Bryan Wilder %A Ewing, Eric %A Mate, Aditya %A Dilkina, Bistra %A Tambe, Milind %X Stackelberg security games are a critical tool for maximizing the
utility of limited defense resources to protect important targets
from an intelligent adversary. Motivated by green security, where
the defender may only observe an adversary’s response to defense
on a limited set of targets, we study the problem of defending
against the same adversary on a larger set of targets from the same
distribution. We give a theoretical justification for why standard
two-stage learning approaches, where a model of the adversary is
trained for predictive accuracy and then optimized against, may
fail to maximize the defender’s expected utility in this setting. We
develop a decision-focused learning approach, where the adversary behavior model is optimized for decision quality, and show
empirically that it achieves higher defender expected utility than
the two-stage approach when there is limited training data and a
large number of target features. %B GAIW: Games, Agents and Incentives Workshop at International Conference on Autonomous Agents and Multiagent Systems (AAMAS-19) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) (AAMAS-19) %D 2019 %T Deep Fictitious Play for Games with Continuous Action Spaces %A Kamra, Nitin %A Gupta, Umang %A Kai Wang %A Fang, Fei %A Liu, Yan %A Tambe, Milind %X Fictitious play has been a classic algorithm to solve two-player adversarial games with discrete action spaces. In this work we develop an approximate extension of fictitious play to two-player games with high-dimensional continuous action spaces. We use generative neural networks to approximate players’ best responses while also learning a differentiable approximate model to the players’ rewards given their actions. Both these networks are trained jointly with gradient-based optimization to emulate fictitious play. We explore our approach in zero-sum games, non zero-sum games and security game domains. %B International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) (AAMAS-19) %G eng %0 Conference Paper %B Conference on Decision and Game Theory for Security, 2019 %D 2019 %T DeepFP for Finding Nash Equilibrium in Continuous Action Spaces %A Kamra, Nitin %A Gupta, Umang %A Kai Wang %A Fang, Fei %A Liu, Yan %A Tambe, Milind %X Finding Nash equilibrium in continuous action spaces is a
challenging problem and has applications in domains such as protecting geographic areas from potential attackers. We present DeepFP, an
approximate extension of fictitious play in continuous action spaces.
DeepFP represents players’ approximate best responses via generative
neural networks which are highly expressive implicit density approximators. It additionally uses a game-model network which approximates
the players’ expected payoffs given their actions, and trains the networks
end-to-end in a model-based learning regime. Further, DeepFP allows using domain-specific oracles if available and can hence exploit techniques
such as mathematical programming to compute best responses for structured games. We demonstrate stable convergence to Nash equilibrium
on several classic games and also apply DeepFP to a large forest security domain with a novel defender best response oracle. We show that
DeepFP learns strategies robust to adversarial  %B Conference on Decision and Game Theory for Security, 2019 %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-19) %D 2019 %T Don’t Put All Your Strategies in One Basket: Playing Green Security Games with Imperfect Prior Knowledge %A Gholami, Shahrzad %A Amulya Yadav %A Long Tran-Thanh %A Dilkina, Bistra %A Tambe, Milind %X Security efforts for wildlife monitoring and protection of endangered species (e.g., elephants, rhinos, etc.) are constrained by limited resources available to law enforcement agencies. Recent progress in Green Security Games (GSGs) has led to patrol planning algorithms for strategic allocation of limited patrollers to deter adversaries in environmental settings. Unfortunately, previous approaches to these problems suffer from several limitations. Most notably, (i) previous work in GSG literature relies on exploitation of error-prone machine learning (ML) models of poachers’ behavior trained on (spatially) biased historical data; and (ii) online learning approaches for repeated security games (similar to GSGs) do not account for spatio-temporal scheduling constraints while planning patrols, potentially causing significant shortcomings in the effectiveness of the planned patrols. Thus, this paper makes the following novel contributions: (I) We propose MINION-sm, a novel online learning algorithm for GSGs which does not rely on any prior error-prone model of attacker behavior, instead, it builds an implicit model of the attacker on-the-fly while simultaneously generating schedulingconstraint-aware patrols. MINION-sm achieves a sublinear regret against an optimal hindsight patrol strategy. (II) We also propose MINION, a hybrid approach where our MINION-sm model and an ML model (based on historical data) are considered as two patrol planning experts and we obtain a balance between them based on their observed empirical performance. (III) We show that our online learning algorithms significantly outperform existing state-of-theart solvers for GSGs. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-19) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) (AAMAS-19) %D 2019 %T General-Sum Cyber Deception Games under Partial Attacker Valuation Information %A Thakoor, Omkar %A Tambe, Milind %A Vayanos, Phebe %A Haifeng Xu %A Kiekintveld, Christopher %X The rapid increase in cybercrime, causing a reported annual economic loss of $600 billion [20], has prompted a critical need for
effective cyber defense. Strategic criminals conduct network reconnaissance prior to executing attacks to avoid detection and establish
situational awareness via scanning and fingerprinting tools. Cyber
deception attempts to foil these reconnaissance efforts; by disguising network and system attributes, among several other techniques.
Cyber Deception Games (CDG) is a game-theoretic model for optimizing strategic deception, and can apply to various deception
methods. Recently introduced initial model for CDGs assumes zerosum payoffs, implying directly conflicting attacker motives, and
perfect defender knowledge on attacker preferences. These unrealistic assumptions are fundamental limitations of the initial zero-sum
model, which we address by proposing a general-sum model that
can also handle uncertainty in the defender’s knowledge. %B International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) (AAMAS-19) %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) 2019 %D 2019 %T Group-Fairness in Influence Maximization %A Alan Tsang %A Bryan Wilder %A Eric Rice %A Tambe, Milind %A Yair Zick %X Influence maximization is a widely used model for information dissemination in social networks. Recent work has employed such interventions across a wide range of social problems, spanning public health, substance abuse, and international development (to name a few examples). A critical but understudied question is whether the benefits of such interventions are fairly distributed across different groups in the population; e.g., avoiding discrimination with respect to sensitive attributes such as race or gender. Drawing on legal and game-theoretic concepts, we introduce formal definitions of fairness in influence maximization. We provide an algorithmic framework to find solutions which satisfy fairness constraints, and in the process improve the state of the art for general multi-objective submodular maximization problems. Experimental results on real data from an HIV prevention intervention for homeless youth show that standard influence maximization techniques oftentimes neglect smaller groups which contribute less to overall utility, resulting in a disparity which our proposed algorithms substantially reduce. %B International Joint Conference on Artificial Intelligence (IJCAI) 2019 %G eng %0 Conference Paper %B The 4th Workshop on Data Science for Social Good at ECML-PKDD, 2019 %D 2019 %T Improving GP-UCB Algorithm by Harnessing Decomposed Feedback %A Kai Wang %A Bryan Wilder %A Suen, Sze-Chuan %A Dilkina, Bistra %A Tambe, Milind %X Gaussian processes (GPs) have been widely applied to machine learning and nonparametric approximation. Given existing observations, a GP allows the decision maker to update a posterior belief over the unknown underlying function. Usually, observations from a complex system come with noise and decomposed feedback from intermediate layers. For example, the decomposed feedback could be the components that constitute the final objective value, or the various feedback gotten from sensors. Previous literature has shown that GPs can successfully deal with noise, but has neglected decomposed feedback. We therefore propose a decomposed GP regression algorithm to incorporate this feedback, leading to less average root-mean-squared error with respect to the target function, especially when the samples are scarce. We also introduce a decomposed GP-UCB algorithm to solve the resulting bandit problem with decomposed feedback. We prove that our algorithm converges to the optimal solution and preserves the noregret property. To demonstrate the wide applicability of this work, we execute our algorithm on two disparate social problems: infectious disease control and weather monitoring. The numerical results show that our method provides significant improvement against previous methods that do not utilize these feedback, showcasing the advantage of considering decomposed feedback. %B The 4th Workshop on Data Science for Social Good at ECML-PKDD, 2019 %G eng %0 Conference Paper %B AAAI conference on Artificial Intelligence (AAAI-19) %D 2019 %T On the Inducibility of Stackelberg Equilibrium for Security Games %A Qingyu Guo %A Jiarui Gan %A Fang, Fei %A Long Tran-Thanh %A Tambe, Milind %A An, Bo %X Strong Stackelberg equilibrium (SSE) is the standard solution concept of Stackelberg security games. As opposed to the weak Stackelberg equilibrium (WSE), the SSE assumes that the follower breaks ties in favor of the leader and this is widely acknowledged and justified by the assertion that the defender can often induce the attacker to choose a preferred action by making an infinitesimal adjustment to her strategy. Unfortunately, in security games with resource assignment constraints, the assertion might not be valid; it is possible that the defender cannot induce the desired outcome. As a result, many results claimed in the literature may be overly optimistic. To remedy, we first formally define the utility guarantee of a defender strategy and provide examples to show that the utility of SSE can be higher than its utility guarantee. Second, inspired by the analysis of leader’s payoff by Von Stengel and Zamir (2004), we provide the solution concept called the inducible Stackelberg equilibrium (ISE), which owns the highest utility guarantee and always exists. Third, we show the conditions when ISE coincides with SSE and the fact that in general case, SSE can be extremely worse with respect to utility guarantee. Moreover, introducing the ISE does not invalidate existing algorithmic results as the problem of computing an ISE polynomially reduces to that of computing an SSE. We also provide an algorithmic implementation for computing ISE, with which our experiment %B AAAI conference on Artificial Intelligence (AAAI-19) %G eng %0 Conference Paper %B 10th International Workshop on Optimization in Multiagent Systems (OptMAS) %D 2019 %T Integrating optimization and learning to prescribe interventions for tuberculosis patients %A Bryan Wilder %A Jackson A. Killian %A Amit Sharma %A Vinod Choudhary %A Dilkina, Bistra %A Tambe, Milind %X Creating impact in real-world settings requires agents which navigate the full pipeline from data, to predictive models, to decisions. These components are typically approached separately: a machine learning model is first trained via a measure of predictive accuracy, and then its predictions are used as input into an optimization algorithm which produces a decision. However, the loss function used to train the model may easily be misaligned with the end goal of the agent, which is to make the best decisions possible. We focus on combinatorial optimization problems and introduce a general framework for decision-focused learning, where the machine learning model is directly trained in conjunction with the optimization algorithm to produce high-quality decisions. Technically, our contribution is a means of integrating common classes of discrete optimization problems into deep learning or other predictive models, which are typically trained via gradient descent. The main idea is to use a continuous relaxation of the discrete problem to propagate gradients through the optimization procedure. We instantiate this framework for two broad classes of combinatorial problems: linear programs and submodular maximization. We then provide an application of such techniques to a real problem of societal importance: improving interventions in tuberculosis treatment. Using data on 17,000 Indian patients provided by the NGO Everwell, we consider the problem of predicting which patients are likely to miss doses of medication in the near future and optimizing interventions by health workers to avert such treatment failures. We find the decisionfocused learning improves the number of successful interventions by approximately 15% compared to standard machine learning approaches, demonstrating that aligning the goals of learning and decision making can yield substantial benefits in a socially critical application. %B 10th International Workshop on Optimization in Multiagent Systems (OptMAS) %G eng %0 Conference Paper %B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, (ECML-PKDD), 2019 %D 2019 %T Learning to Signal in the Goldilocks Zone: Improving Adversary Compliance in Security Games %A Cooney, Sarah %A Kai Wang %A Bondi, Elizabeth %A Nguyen, Thanh %A Vayanos, Phebe %A Winetrobe, Hailey %A Edward A. Cranford %A Gonzalez, Cleotilde %A Lebiere, Christian %A Tambe, Milind %X Many real-world security scenarios can be modeled via a game-theoretic framework known as a security game in which there is a defender trying to protect potential targets from an attacker. Recent work in security games has shown that deceptive signaling by the defender can convince an attacker to withdraw his attack. For instance, a warning message to commuters indicating speed enforcement is in progress ahead might lead to them driving more slowly, even if it turns out no enforcement is in progress. However, the results of this work are limited by the unrealistic assumption that the attackers will behave with perfect rationality, meaning they always choose an action that gives them the best expected reward. We address the problem of training boundedly rational (human) attackers to comply with signals via repeated interaction with signaling without incurring a loss to the defender, and offer the four following contributions: (i) We learn new decision tree and neural network-based models of attacker compliance with signaling. (ii) Based on these machine learning models of a boundedly rational attacker’s response to signaling, we develop a theory of signaling in the Goldilocks zone, a balance of signaling and deception that increases attacker compliance and improves defender utility. (iii) We present game-theoretic algorithms to solve for signaling schemes based on the learned models of attacker compliance with signaling. (iv) We conduct extensive human subject experiments using an online game. The game simulates the scenario of an inside attacker trying to steal sensitive information from company computers, and results show that our algorithms based on learned models of attacker behavior lead to better attacker compliance and improved defender utility compared to the state-of-the-art algorithm for rational attackers with signaling. %B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, (ECML-PKDD), 2019 %G eng %0 Conference Paper %B AAAI conference on Artificial Intelligence (AAAI-19) %D 2019 %T Melding the Data-Decisions Pipeline: Decision-Focused Learning for Combinatorial Optimization %A Bryan Wilder %A Dilkina, Bistra %A Tambe, Milind %X Creating impact in real-world settings requires artificial intelligence techniques to span the full pipeline from data, to predictive models, to decisions. These components are typically approached separately: a machine learning model is first trained via a measure of predictive accuracy, and then its predictions are used as input into an optimization algorithm which produces a decision. However, the loss function used to train the model may easily be misaligned with the end goal, which is to make the best decisions possible. Hand-tuning the loss function to align with optimization is a difficult and error-prone process (which is often skipped entirely). We focus on combinatorial optimization problems and introduce a general framework for decision-focused learning, where the machine learning model is directly trained in conjunction with the optimization algorithm to produce highquality decisions. Technically, our contribution is a means of integrating common classes of discrete optimization problems into deep learning or other predictive models, which are typically trained via gradient descent. The main idea is to use a continuous relaxation of the discrete problem to propagate gradients through the optimization procedure. We instantiate this framework for two broad classes of combinatorial problems: linear programs and submodular maximization. Experimental results across a variety of domains show that decisionfocused learning often leads to improved optimization performance compared to traditional methods. We find that standard measures of accuracy are not a reliable proxy for a predictive model’s utility in optimization, and our method’s ability to specify the true goal as the model’s training objective yields substantial dividends across a range of decision problems. %B AAAI conference on Artificial Intelligence (AAAI-19) %G eng %0 Conference Paper %B The 4th Workshop on Data Science for Social Good at ECML-PKDD, 2019 %D 2019 %T Mobile Game Theory with Street Gangs %A Cooney, Sarah %A Wendy Gomez %A Kai Wang %A Leap, Jorja %A P. Jefferey Brantingham %A Tambe, Milind %X Gang violence remains a persistent public safety problem
in Los Angeles. Gang interventionists and community organizers are
turning to proactive peacekeeping, a process of addressing the underlying structures that cause young people to join gangs such as pervasive poverty and marginalization. Given the increasing prevalence and
decreasing cost of mobile technology, there may be opportunities for interventionists to employ technological solutions in their work. However,
before such solutions can be deployed, it is necessary to have accurate
models of the target users—in this case, gang-involved youth. Of particular interest with regard proactive peacekeeping is their propensity for
cooperation. However, given the unique circumstances surrounding the
lives of gang-involved youth, traditional laboratory-based experiments
measuring cooperation are infeasible. In this paper, we present a novel
method of collecting experimental data from gang-involved youth in the
Los Angeles area. We design a mobile application based on the classic Prisoner’s Dilemma model, which has been used to collect almost
3000 data points on cooperation from more than 20 participants. We
present initial results that show despite their unique life circumstances
gang-involved youth cooperate at roughly the same rate as university students in classic studies of cooperation. We conclude by addressing the
implications of this result for future work and proactive peacekeeping
endeavors %B The 4th Workshop on Data Science for Social Good at ECML-PKDD, 2019 %G eng %0 Thesis %D 2019 %T Predicting and Planning against Real-world Adversaries: An End-to-end Pipeline to Combat Illegal Wildlife Poachers on a Global Scale %A Gholami, Shahrzad %X

Security is a global concern and a unifying theme in various security projects is strategic reasoning where the mathematical framework of machine learning and game theory can be integrated
and applied. For example, in the environmental sustainability domain, the problem of protecting
endangered wildlife from attacks (i.e., poachers’ strikes) can be abstracted as a game between
defender(s) and attacker(s). Applying previous research on security games to sustainability domains (denoted as Green Security Games) introduce several novel challenges that I address in
my thesis to create computationally feasible and accurate algorithms in order to model complex
adversarial behavior based on real-world data and to generate optimal defender strategy.
My thesis provides four main contributions to the emerging body of research in using machine
learning and game theory framework for the fundamental challenges existing in the environmental sustainability domain, namely (i) novel spatio-temporal and uncertainty-aware machine
learning models for complex adversarial behavior based on the imperfect real-world data, (ii) the
first large-scale field test evaluation of the machine learning models in the adversarial settings
concerning the environmental sustainability, (iii) a novel multi-expert online learning model for
constrained patrol planning, and (iv) the first game theoretical model to generate optimal defender
strategy against collusive adversaries.

In regard to the first contribution, I developed bounded rationality models for adversaries
based on the real-world data that account for the naturally occurring uncertainty in past attack
evidence collected by defenders. To that end, I proposed two novel predictive behavioral models,
which I improved progressively. The second major contribution of my thesis is a large-scale field
test evaluation of the proposed adversarial behavior model beyond the laboratory. Particularly,
my thesis is motivated by the challenges in wildlife poaching, where I directed the defenders
(i.e., rangers) to the hotspots of adversaries that they would have missed. During these experiments across multiple vast national parks, several snares and snared animals were detected, and
poachers were arrested, potentially more wildlife saved. The algorithm I proposed, that combines
machine learning and game-theoretic patrol planning is planned to be deployed at 600 national
parks around the world in the near future to combat illegal poaching.
The third contribution in my thesis introduces a novel multi-expert online learning model for
constrained and randomized patrol planning, which benefits from several expert planners where
insufficient or imperfect historical records of past attacks are available to learn adversarial behavior. The final contribution of my thesis is developing an optimal solution against collusive
adversaries in security games assuming both rational and boundedly rational adversaries. I conducted human subject experiments on Amazon Mechanical Turk involving 700 human subjects
using a web-based game that simulates collusive security games.

%G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) (AAMAS-19) %D 2019 %T Robust Peer-Monitoring on Graphs with an Application to Suicide Prevention in Social Networks %A Rahmattalabi, Aida %A Vayanos, Phebe %A Fulginiti, Anthony %A Tambe, Milind %X We consider the problem of selecting a subset of nodes (individuals)
in a (social) network that can act as monitors capable of “watchingout” for their neighbors (friends) when the availability or performance of the chosen monitors is uncertain. Such problems arise
for example in the context of “Gatekeeper Trainings” for suicide
prevention. We formulate this problem as a two-stage robust optimization problem that aims to maximize the worst-case number of
covered nodes. Our model is capable of incorporating domain specific constraints, e.g., fairness constraints. We propose a practically
tractable approximation scheme and we provide empirical results
that demonstrate the effectiveness of our approach. %B International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) (AAMAS-19) %G eng %0 Conference Paper %B International Conference on Cognitive Modeling (ICCM), 2019 %D 2019 %T Towards Personalized Deceptive Signaling for Cyber Defense Using Cognitive Models %A Edward A. Cranford %A Gonzalez, Cleotilde %A Palvi Aggarwal %A Cooney, Sarah %A Tambe, Milind %A Lebiere, Christian %X Recent research in cybersecurity has begun to develop active defense strategies using game-theoretic optimization of the allocation of limited defenses combined with deceptive signaling. While effective, the algorithms are optimized against perfectly rational adversaries. In a laboratory experiment, we pit humans against the defense algorithm in an online game designed to simulate an insider attack scenario. Humans attack far more often than predicted under perfect rationality. Optimizing against human bounded rationality is vitally important. We propose a cognitive model based on instancebased learning theory and built in ACT-R that accurately predicts human performance and biases in the game. We show that the algorithm does not defend well, largely due to its static nature and lack of adaptation to the particular individual’s actions. Thus, we propose an adaptive method of signaling that uses the cognitive model to trace an individual’s experience in real time, in order to optimize defenses. We discuss the results and implications of personalized defense. Keywords: cyber deception; cognitive models; instance-based learning; knowledge-tracing; model-tracing %B International Conference on Cognitive Modeling (ICCM), 2019 %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (Demonstration) (AAMAS-19) %D 2019 %T Using Game Theory in Real Time in the Real World: A Conservation Case Study (Demonstration) %A Bondi, Elizabeth %A Oh, Hoon %A Haifeng Xu %A Fang, Fei %A Dilkina, Bistra %A Tambe, Milind %X In the real world, real-time data are now widely available, especially in security domains. Security cameras, aerial imagery, and even social media keep defenders informed when protecting important events, locations, and people. Further, advances in artificial intelligence have led to tools that can interpret these data automatically. Game theoretic models, for example, have shown great success in security. However, most of them ignore real-time information. In this paper, we demonstrate the potential to use real-time information from imagery to better inform our decisions in game theoretic models for security. As a concrete example, a conservation group called Air Shepherd uses conservation drones equipped with thermal infrared cameras to locate poachers at night and alert park rangers. They have also used lights aboard the drones, or signaled, to warn poachers of their presence, which often deters the poachers. We propose a system that (i) allocates drones and humans strategically throughout a protected area, (ii) detects poachers in the thermal infrared videos recorded by the conservation drones flying through the protected area in the predetermined location, and (iii) recommends moving to the location and/or signaling to the poacher that a patroller is nearby depending on real-time detections. View the demonstration. %B International Conference on Autonomous Agents and Multiagent Systems (Demonstration) (AAMAS-19) %G eng %0 Conference Paper %B AI for Social Good workshop (AI4SG) at International Joint Conference on Artificial Intelligence (IJCAI) 2019 %D 2019 %T Using Graph Convolutional Networks to Learn Interdiction Games %A Kai Wang %A Mate, Aditya %A Bryan Wilder %A Perrault, Andrew %A Tambe, Milind %X Illegal smuggling is one of the most important issues across countries, where more than $10 billion a year of illegal wildlife trafficking is conducted within transnational criminal networks. Governments have tried to deploy inspections at checkpoints to stop illegal smuggling, though the effect is quite limited due to vast protection areas but limited human resources. We study these problems from the perspective of network interdiction games with a boundedly rational attacker. In this paper, we aim to improve the efficiency of the limited number of checkpoints. The problem involves two main stages: i) a predictive stage to predict the attacker’s behavior based on the historical interdiction; ii) a prescriptive stage to optimally allocate limited checkpoints to interdict the attacker. In this paper, we propose a novel boundedly rational model which resolves the issue of exponentially many attacker strategies by making memoryless assumption about the attacker’s behavior. We show that the attacker’s behavior can be reduced to an absorbing Markov chain, where the success probability of reaching any target can be computed analytically, thus optimized via any gradient-based optimization technique. We incorporate graph convolutional neural networks with K-hops look-ahead to model the attacker’s reasoning. Our proposed model provides a new perspective to study the boundedly rationality in traditional interdiction games with graph structure. This novel model possesses nice analytical properties and scales up very well by avoiding enumerating all paths in the graph. %B AI for Social Good workshop (AI4SG) at International Joint Conference on Artificial Intelligence (IJCAI) 2019 %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) 2019 (Doctoral Consortium) %D 2019 %T Visionary Security: Using Uncertain Real-Time Information in Signaling Games %A Bondi, Elizabeth %X In important domains from natural resource conservation to public safety, real-time information is becoming increasingly important. Strategic deployment of security cameras and mobile sensors such as drones can provide real-time updates on illegal activities. To help plan for such strategic deployments of sensors and human patrollers, as well as warning signals to ward off adversaries, the defender-attacker security games framework can be used. [Zhang et al., 2019] has shown that real-time data (e.g., human view from a helicopter) may be used in conjunction with security game models to interdict criminals. Other recent work relies on real-time information from sensors that can notify the patroller when an opponent is detected [Basilico et al., 2017; Xu et al., 2018]. Despite considering real-time information in all cases, these works do not consider the combined situation of uncertainty in real-time information in addition to strategically signaling to adversaries. In this thesis, we will not only address this gap, but also improve the overall security result by considering security game models and computer vision algorithms together. A major aspect of this work is in applying it to real-world challenges, such as conservation. Although it applies to many environmental challenges, such as protecting forests and avoiding illegal mining, we will focus particularly on reducing poaching of endangered wildlife as an example. To reduce poaching, human patrollers typically search for snares and poaching activity as they patrol, as well as intervene if poaching activity is found. Drones are useful patrolling aids due to their ability to cover additional ground, but they must interpret their environments, notify nearby human patrollers for intervention, and send potentially deceptive signals to the adversary to deter poaching. Rather than treating these as separate tasks, models must coordinate to handle challenges found in real-world conservation scenarios (Fig. 1). We will determine the success of this work both in simulated experiments and through work with conservation agencies such as Air Shepherd to implement the system in the real world. %B International Joint Conference on Artificial Intelligence (IJCAI) 2019 (Doctoral Consortium) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) (AAMAS-19) %D 2019 %T Warning Time: Optimizing Strategic Signaling for Security Against Boundedly Rational Adversaries %A Cooney, Sarah %A Vayanos, Phebe %A Thanh H. Nguyen %A Gonzalez, Cleotilde %A Lebiere, Christian %A Edward A. Cranford %A Tambe, Milind %X Defender-attacker Stackelberg security games (SSGs) have been
applied for solving many real-world security problems. Recent work
in SSGs has incorporated a deceptive signaling scheme into the
SSG model, where the defender strategically reveals information
about her defensive strategy to the attacker, in order to inuence
the attacker’s decision making for the defender’s own benet. In
this work, we study the problem of signaling in security games
against a boundedly rational attacker. %B International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) (AAMAS-19) %G eng %0 Conference Paper %B Joint Workshop on Autonomous Agents for Social Good (AASG) at International Conference on Autonomous Agents and Multiagent Systems (AAMAS-19) %D 2019 %T Who and When to Screen: Multi-Round Active Screening for Recurrent Infectious Diseases Under Uncertainty %A Ou, Han-Ching %A Sinha, Arunesh %A Suen, Sze-Chuan %A Perrault, Andrew %A Tambe, Milind %X Controlling recurrent infectious diseases is a vital yet complicated problem. In this paper, we propose a novel active screening
model (ACTS) and algorithms to facilitate active screening for recurrent diseases (no permanent immunity) under infection uncertainty. Our
contributions are: (1) A new approach to modeling multi-round networkbased screening/contact tracing under uncertainty, which is a common
real-life practice in a variety of diseases [10, 30]; (2) Two novel algorithms,
Full- and Fast-REMEDY. Full-REMEDY considers the effect of future actions and finds a policy that provides high solution quality, where
Fast-REMEDY scales linearly in the size of the network; (3) We evaluate
Full- and Fast-REMEDY on several real-world datasets which emulate human contact and find that they control diseases better than the
baselines. To the best of our knowledge, this is the first work on multiround active screening with uncertainty for diseases with no permanent
immunity. %B Joint Workshop on Autonomous Agents for Social Good (AASG) at International Conference on Autonomous Agents and Multiagent Systems (AAMAS-19) %G eng %0 Conference Paper %B Advances in Neural and Information Processing Systems (NeurIPS-19), 2019 %D 2019 %T End to End Learning and Optimization on Graphs %A Bryan Wilder %A Ewing, Eric %A Dilkina, Bistra %A Tambe, Milind %X Real-world applications often combine learning and optimization problems on graphs. For instance, our objective may be to cluster the graph in order to detect meaningful communities (or solve other common graph optimization problems such as facility location, maxcut, and so on). However, graphs or related attributes are often only partially observed, introducing learning problems such as link prediction which must be solved prior to optimization. Standard approaches treat learning and optimization entirely separately, while recent machine learning work aims to predict the optimal solution directly from the inputs. Here, we propose an alternative decision-focused learning approach that integrates a differentiable proxy for common graph optimization problems as a layer in learned systems. The main idea is to learn a representation that maps the original optimization problem onto a simpler proxy problem that can be efficiently differentiated through. Experimental results show that our CLUSTERNET system outperforms both pure end-to-end approaches (that directly predict the optimal solution) and standard approaches that entirely separate learning and optimization. Code for our system is available at https://github.com/bwilder0/clusternet. %B Advances in Neural and Information Processing Systems (NeurIPS-19), 2019 %G eng %0 Conference Paper %D 2019 %T Exploring Algorithmic Fairness in Robust Graph Covering Problems %A Rahmattalabi, Aida %A Vayanos, Phebe %A Fulginiti, Anthony %A Eric Rice %A Bryan Wilder %A Amulya Yadav %A Tambe, Milind %X Fueled by algorithmic advances, AI algorithms are increasingly being deployed in
settings subject to unanticipated challenges with complex social effects. Motivated
by real-world deployment of AI driven, social-network based suicide prevention
and landslide risk management interventions, this paper focuses on robust graph
covering problems subject to group fairness constraints. We show that, in the
absence of fairness constraints, state-of-the-art algorithms for the robust graph covering problem result in biased node coverage: they tend to discriminate individuals
(nodes) based on membership in traditionally marginalized groups. To mitigate
this issue, we propose a novel formulation of the robust graph covering problem
with group fairness constraints and a tractable approximation scheme applicable to
real-world instances. We provide a formal analysis of the price of group fairness
(PoF) for this problem, where we show that uncertainty can lead to greater PoF. We
demonstrate the effectiveness of our approach on several real-world social networks.
Our method yields competitive node coverage while significantly improving group
fairness relative to state-of-the-art methods. %C Advances in Neural and Information Processing Systems (NeurIPS-19), 2019 %G eng %0 Conference Paper %B Strategic Reasoning for Societal Challenges (SRSC) Workshop at International Conference on Autonomous Agents and Multiagent Systems (AAMAS-19) %D 2019 %T Social Network Based Substance Abuse Prevention via Network Modification (A Preliminary Study) %A Rahmattalabi, Aida %A Adhikari, Anamika Barman %A Vayanos, Phebe %A Tambe, Milind %A Eric Rice %A Baker, Robin %X Substance use and abuse is a significant public health problem in the
United States. Group-based intervention programs offer a promising
means of preventing and reducing substance abuse. While effective,
unfortunately, inappropriate intervention groups can result in an
increase in deviant behaviors among participants, a process known
as deviancy training. This paper investigates the problem of optimizing the social influence related to the deviant behavior via careful
construction of the intervention groups. We propose a Mixed Integer Optimization formulation that decides on the intervention
groups to be formed, captures the impact of the intervention groups
on the structure of the social network, and models the impact of
these changes on behavior propagation. In addition, we propose
a scalable hybrid meta-heuristic algorithm that combines Mixed
Integer Programming and Large Neighborhood Search to find nearoptimal network partitions. Our algorithm is packaged in the form
of GUIDE, an AI-based decision aid that recommends intervention groups. Being the first quantitative decision aid of this kind,
GUIDE is able to assist practitioners, in particular social workers, in
three key areas: (a) GUIDE proposes near-optimal solutions that are
shown, via extensive simulations, to significantly improve over the
traditional qualitative practices for forming intervention groups;
(b) GUIDE is able to identify circumstances when an intervention
will lead to deviancy training, thus saving time, money, and effort;
(c) GUIDE can evaluate current strategies of group formation and
discard strategies that will lead to deviancy training. In developing
GUIDE, we are primarily interested in substance use interventions
among homeless youth as a high risk and vulnerable population.
GUIDE is developed in collaboration with Urban Peak, a homelessyouth serving organization in Denver, CO, and is under preparation
for deployment. %B Strategic Reasoning for Societal Challenges (SRSC) Workshop at International Conference on Autonomous Agents and Multiagent Systems (AAMAS-19) %G eng %0 Conference Paper %D 2019 %T When Players Affect Target Values: Modeling and Solving Dynamic Partially Observable Security Games %A Xinrun Wang %A Tambe, Milind %A Branislav Boˇsansky ́ %A An, Bo %X Most of the current security models assume that the values of targets/areas are static or the changes (if any) are scheduled and
known to the defender. Unfortunately, such models are not sufficient
for many domains, where actions of the players modify the values of
the targets. Examples include wildlife scenarios, where the attacker can
increase value of targets by secretly building supporting facilities. To address such security game domains with player-affected values, we first
propose DPOS3G, a novel partially observable stochastic Stackelberg
game where target values are determined by the players’ actions; the defender can only partially observe these targets’ values, while the attacker
can fully observe the targets’ values and the defender’s strategy. Second,
we propose RITA (Reduced game Iterative Transfer Algorithm), which
is based on the heuristic search value iteration algorithm for partially observable stochastic game (PG-HSVI) and introduces three key novelties:
(a) building a reduced game with only key states (derived from partitioning the state space) to reduce the numbers of states and transitions
considered when solving the game; (b) incrementally adding defender’s
actions to further reduce the number of transitions of the game; (c) providing novel heuristics for lower bound initialization of the algorithm.
Third, we conduct extensive experimental evaluations of the algorithms
and the results show that RITA significantly outperforms the baseline
PG-HSVI algorithm on scalability while allowing for trade off in scalability and solution quality. %C Conference on Decision and Game Theory for Security, 2019 %G eng %0 Thesis %D 2018 %T Game Theoretic Deception and Threat Screening for Cyber Security %A Schlenker, Aaron %X

Protecting an organization’s cyber assets from intrusions and breaches due to attacks by malicious actors is an increasingly challenging and complex problem. Companies and organizations (hereon referred to as the defender) who operate enterprise networks employ the use of numerous protection measures to intercept these attacks, such as Intrusion and Detection Systems (IDS) and along with dedicated Cyber Emergency Readiness Teams (CERT) composed of cyber analysts tasked with the general protection of an organization’s cyber assets. In order to optimize the use of the defender’s limited resources and protection mechanisms, we can look to game theory which has been successfully used to handle complex resource allocation problems and has several deployed real-world applications in physical security domains. Applying previous research on security games to cybersecurity domains introduce several novel challenges which I address in my thesis to create models that deceive cyber adversaries and provide the defender with an alert prioritization strategy for IDS. My thesis provides three main contributions to the emerging body of research in using game theory for cyber and physical security , namely (i) the first game theoretic framework for cyber deception of a defender’s network, (ii) the first gametheoretic framework for cyber alert allocation and (iii) algorithms for extending these frameworks to general-sum domains.

In regards to the first contribution, I introduce a novel game model called the Cyber Deception Game (CDG) model which captures the interaction between the defender and adversary during the recon phase of a network attack. The CDG model provides the first game-theoretic framework for deception in cybersecurity and allows the defender to devise deceptive strategies that deceptively alters system responses on the network. I study two different models of cyber adversaries and provide algorithms to solve CDGs that handle the computational complexities stemming from the adversary’s static view of the defender’s network and the varying differences between adversary models. The second major contribution of my thesis is the first game theoretic model for cyber alert prioritization for a network defender. This model, the Cyber-alert Allocation Game, provides an approach which balances intrinsic characteristics of incoming alerts from an IDS with the defender’s analysts that are available to resolve alerts. Additionally, the aforementioned works assume the games are zero-sum which is not true in many real-world domains. As such, the third contribution in my thesis extends CAGs to general-sum domains. I provide scalable algorithms that have additional applicability to other physical screening domains (e.g., container screening, airport passenger screening).

%G eng %9 PhD thesis %0 Thesis %D 2018 %T Information as a Double-Edged Sword in Strategic Interactions %A Haifeng Xu %X This thesis considers the following question: in systems with self-interested agents (a.k.a.,
games), how does information — i.e., what each agent knows about their environment and other
agents’ preferences — affect their decision making? The study of the role of information in
games has a rich history, and in fact forms the celebrated field of information economics. However, different from previous descriptive study, this thesis takes a prescriptive approach and examines computational questions pertaining to the role of information. In particular, it illustrates
the double-edged role of information through two threads of research: (1) how to utilize information to one’s own advantage in strategic interactions; (2) how to mitigate losses resulting from
information leakage to an adversary. In each part, we study the algorithmic foundation of basic
models and also develop efficient solutions to real-world problems arising from physical security
domains. Besides pushing the research frontier, the work of this thesis is also directly impacting
several real-world applications, resulting in delivered software for improving the scheduling of
US federal air marshals and the design of patrolling routes for wildlife conservation.
More concretely, the first part of this thesis studies an intrinsic phenomenon in human endeavors termed persuasion — i.e., the act of exploiting an informational advantage in order to
influence the decisions of others. We start with two real-world motivating examples, illustrating
how security agencies can utilize an informational advantage to influence adversaries’ decisions
and deter potential attacks. Afterwards, we provide a systematic algorithmic study for the foundational economic models underlying these examples. Our analysis not only fully resolves the
complexity of these models, but also leads to new economic insights. We then leverage the insights and algorithmic ideas from our theoretical analysis to develop new models and solutions
for concrete real-world security problems.
The second part of this thesis studies the other side of the double-edged sword, namely, how
to deal with disadvantages due to information leakage. We also start with real-world motivating
examples to illustrate how classified information about security measures may leak to the adversary and cause significant loss to security agencies. We then propose different models to capture
information leakage based on how much the security agency is aware of the leakage situation,
and provide thorough algorithmic analysis for these models. Finally, we return to the real-world
problems and design computationally efficient algorithms tailored to each security domain. %G eng %9 PhD thesis %0 Thesis %D 2018 %T Hierarchical Planning in Security Games; A Game Theoretic Approach to Strategic, Tactical and Operational Decision Making %A Mc Carthy, Sara Marie %X In the presence of an intelligent adversary, game theoretic models such as security games, have proven to be effective tools for mitigating risks from exploitable gaps in protection and security protocols, as they model the strategic interaction between an adversary and defender, and allow the defender to plan the use of scarce or limited resources in the face of such an adversary. However, standard security game models have limited expressivity in the types of planning they allow the defender to perform, as they look only at the deployment and allocation of a fixed set of security resources. This ignores two very important planning problems which concern the strategic design of the security system and resources to deploy as well as the usability and implementation of the security protocols. When these problems appear in real world systems, significant losses in utility and efficiency of security protocols can occur if they are not dealt with in a principled way. To address these limitations, in this thesis I introduce a new hierarchical structure of planning problems for security games, dividing the problem into three levels of planning (i) Strategic Planning, which considers long term planning horizons, and decisions related to game design which constrain the possible defender strategies, (ii) Tactical Planning, which considers shorter term horizons, dealing with the deployment of resources, and selection of defender strategies subject to strategic level constraints and (iii) Operational Planning, dealing with implementation of strategies in real world setting. First, focusing on Strategic Planning, I address the design problem of selecting a set of resource and schedule types. I introduce a new yet fundamental problem, the Simultaneous Optimization of Resource Teams and Tactics (SORT) which models the coupled problem of both strategic and tactical planning, optimizing over both game design with respect to selection of resource types, as well as their deployment actual in the field. I provide algorithms for efficiently solving the SORT problem, which use hierarchical relaxations of the optimization problem to compute these strategic level investment decisions. I show that this more expressive model allows the defender to perform more fine grained decision making that results in significant gains in utility. Second, motivated by the relevance and hardness of security games with resource heterogeneity, I also address challenges in tactical planning by providing a framework for computing adaptive strategies with heterogeneous resources. Lastly, I look at the problem of operational planning, which has never been formally studied in the security game literature. I propose a new solution concept of operationalizable strategies, which randomize over an optimally chosen subset of pure strategies whose cardinality is selected by the defender. I show hardness of computing such operationalizable strategies and provide an algorithm for computing -optimal equilibria which are operationalizable. In all of these problems, I am motivated by real world challenges, and developing solution methods that are usable in the real world. As such, much of this work has been in collaboration with organizations such as Panthera, WWF and other non-governmental organizations (NGOs), to help protect the national parks and wildlife against deforestation and poaching, and the TSA, to protect critical infrastructure such as our airports from terrorist attacks. Because of this, in addressing these three levels of planning, I develop solutions which are not only novel and academically interesting, but also deployable with a real world impact. %G eng %9 PhD thesis %0 Conference Paper %B In COMPASS ’18: ACM SIGCAS Conference on Computing and Sustainable Societies (COMPASS), June 20–22, 2018, %D 2018 %T AirSim-W: A Simulation Environment for Wildlife Conservation with UAVs %A Bondi, Elizabeth %A Dey, Debadeepta %A Kapoor, Ashish %A Piavis, Jim %A Shah, Shital %A Fang, Fei %A Dilkina, Bistra %A Hannaford, Robert %A Iyer, Arvind %A Joppa, Lucas %A Tambe, Milind %X Increases in poaching levels have led to the use of unmanned aerial vehicles (UAVs or drones) to count animals, locate animals in parks, and even find poachers. Finding poachers is often done at night through the use of long wave thermal infrared cameras mounted on these UAVs. Unfortunately, monitoring the live video stream from the conservation UAVs all night is an arduous task. In order to assist in this monitoring task, new techniques in computer vision have been developed. This work is based on a dataset which took approximately six months to label. However, further improvement in detection and future testing of autonomous flight require not only more labeled training data, but also an environment where algorithms can be safely tested. In order to meet both goals efficiently, we present AirSim-W, a simulation environment that has been designed specifically for the domain of wildlife conservation. This includes (i) creation of an African savanna environment in Unreal Engine, (ii) integration of a new thermal infrared model based on radiometry, (iii) API code expansions to follow objects of interest or fly in zig-zag patterns to generate simulated training data, and (iv) demonstrated detection improvement using simulated data generated by AirSim-W. With these additional simulation features, AirSim-W will be directly useful for wildlife conservation research. %B In COMPASS ’18: ACM SIGCAS Conference on Computing and Sustainable Societies (COMPASS), June 20–22, 2018, %C Menlo Park and San Jose, CA, USA. ACM, New York, NY, USA %G eng %0 Thesis %D 2018 %T Artificial Intelligence for Low-Resource Communities: Influence Maximization in an Uncertain World %A Amulya Yadav %X The potential of Artificial Intelligence (AI) to tackle challenging problems that afflict society is enormous, particularly in the areas of healthcare, conservation and public safety and security. Many problems in these domains involve harnessing social networks of under-served communities to enable positive change, e.g., using social networks of homeless youth to raise awareness about Human Immunodeficiency Virus (HIV) and other STDs. Unfortunately, most of these realworld problems are characterized by uncertainties about social network structure and influence models, and previous research in AI fails to sufficiently address these uncertainties, as they make several unrealistic simplifying assumptions for these domains. This thesis addresses these shortcomings by advancing the state-of-the-art to a new generation of algorithms for interventions in social networks. In particular, this thesis describes the design and development of new influence maximization algorithms which can handle various uncertainties that commonly exist in real-world social networks (e.g., uncertainty in social network structure, evolving network state, and availability of nodes to get influenced). These algorithms utilize techniques from sequential planning problems and social network theory to develop new kinds of AI algorithms. Further, this thesis also demonstrates the real-world impact of these algorithms by describing their deployment in three pilot studies to spread awareness about HIV among actual homeless youth in Los Angeles. This represents one of the first-ever deployments of computer science based influence maximization algorithms in this domain. Our results show that our AI algorithms improved upon the state-of-the-art by ∼ 160% in the real-world. We discuss research and implementation challenges faced in deploying these algorithms, and lessons that can be gleaned for future deployment of such algorithms. The positive results from these deployments illustrate the enormous potential of AI in addressing societally relevant problems. %G eng %9 PhD thesis %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI-18) (Doctoral Consortium) %D 2018 %T AI for Conservation: Aerial Monitoring to Learn and Plan Against Illegal Actors %A Bondi, Elizabeth %X Conservation of our planet’s natural resources is of the utmost importance and requires constant innovation. This project focuses on innovation for one aspect of conservation: the reduction of wildlife poaching. Park rangers patrol parks to decrease poaching by searching for poachers and animal snares left by poachers. Multiple strategies exist to aid in these patrols, including adversary behavior prediction and planning optimal ranger patrol strategies. These research efforts suffer from a key shortcoming: they fail to integrate real-time data, and rely on historical data collected during ranger patrols. Recent advances in unmanned aerial vehicle (UAV) technology have made UAVs viable tools to aid in park ranger patrols. There is now an opportunity to augment the input for these strategies in real time using computer vision, by (i) automatically detecting both animals and poachers in UAV videos, (ii) using these detections to learn future poaching locations and to plan UAV patrol routes in real time, and (iii) using poaching location predictions to determine where to fly for the next patrol. In other words, detection is done on realtime data captured aboard a UAV. Detection will then be used to learn adversaries’ behaviors, or where poaching may occur in the future, in future work. This will then be used to plan where to fly in the long term, such as the next mission. Finally, planning where to fly next during the current flight will depend on the long term plan and the real-time detections. This proposed system directly improves wildlife security. Through our collaboration with Air Shepherd, a program of the Charles A. and Anne Morrow Lindbergh Foundation, we have already begun deploying poacher detection prototypes in Africa and will deploy further advances there in the future (Fig. 1). Furthermore, this also applies to similar surveillance tasks, such as locating people after natural disasters. %B International Joint Conference on Artificial Intelligence (IJCAI-18) (Doctoral Consortium) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-18) %D 2018 %T Controlling Elections through Social Influence %A Bryan Wilder %A Vorobeychik, Yevgeniy %X Election control considers the problem of an adversary who attempts to tamper with a voting process, in order to either ensure that their favored candidate wins (constructive control) or another candidate loses (destructive control). As online social networks have become significant sources of information for potential voters, a new tool in an attacker’s arsenal is to effect control by harnessing social influence, for example, by spreading fake news and other forms of misinformation through online social media. We consider the computational problem of election control via social influence, studying the conditions under which finding good adversarial strategies is computationally feasible. We consider two objectives for the adversary in both the constructive and destructive control settings: probability and margin of victory (POV and MOV, respectively). We present several strong negative results, showing, for example, that the problem of maximizing POV is inapproximable for any constant factor. On the other hand, we present approximation algorithms which provide somewhat weaker approximation guarantees, such as bicriteria approximations for the POV objective and constant-factor approximations for MOV. Finally, we present mixed integer programming formulations for these problems. Experimental results show that our approximation algorithms often find near-optimal control strategies, indicating that election control through social influence is a salient threat to election integrity. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-18) %G eng %0 Conference Paper %D 2018 %T Deceiving Cyber Adversaries: A Game Theoretic Approach %A Schlenker, Aaron %A Thakoor, Omkar %A Haifeng Xu %A Fang, Fei %A Tambe, Milind %A Long Tran-Thanh %A Vayanos, Phebe %A Vorobeychik, Yevgeniy %X An important way cyber adversaries find vulnerabilities in modern networks is through reconnaissance, in which they attempt to identify configuration specifics of network hosts. To increase uncertainty of adversarial reconnaissance, the network administrator (henceforth, defender) can introduce deception into responses to network scans, such as obscuring certain system characteristics. We introduce a novel game-theoretic model of deceptive interactions of this kind between a defender and a cyber attacker, which we call the Cyber Deception Game. We consider both a powerful (rational) attacker, who is aware of the defender’s exact deception strategy, and a naive attacker who is not. We show that computing the optimal deception strategy is NP-hard for both types of attackers. For the case with a powerful attacker, we provide a mixed-integer linear program solution as well as a fast and effective greedy algorithm. Similarly, we provide complexity results and propose exact and heuristic approaches when the attacker is naive. Our extensive experimental analysis demonstrates the effectiveness of our approaches. %G eng %0 Conference Paper %B International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research %D 2018 %T Designing Fair, Efficient, and Interpretable Policies for Prioritizing Homeless Youth for Housing Resources %A Azizi, Mohammad Javad %A Vayanos, Phebe %A Bryan Wilder %A Eric Rice %A Tambe, Milind %X We consider the problem of designing fair, efficient, and interpretable policies for prioritizing heterogeneous homeless youth on a waiting list for scarce housing resources of different types. We focus on point-based policies that use features of the housing resources (e.g., permanent supportive housing, rapid rehousing) and the youth (e.g., age, history of substance use) to maximize the probability that the youth will have a safe and stable exit from the housing program. The policies can be used to prioritize waitlisted youth each time a housing resource is procured. Our framework provides the policy-maker the flexibility to select both their desired structure for the policy and their desired fairness requirements. Our approach can thus explicitly trade-off interpretability and efficiency while ensuring that fairness constraints are met. We propose a flexible data-driven mixed-integer optimization formulation for designing the policy, along with an approximate formulation which can be solved efficiently for broad classes of interpretable policies using Bender’s decomposition. We evaluate our framework using real-world data from the United States homeless youth housing system. We show that our framework results in policies that are more fair than the current policy in place and than classical interpretable machine learning approaches while achieving a similar (or higher) level of overall efficiency. %B International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research %G eng %0 Conference Paper %B AAAI conference on Artificial Intelligence (AAAI-18) %D 2018 %T Equilibrium Computation and Robust Optimization in Zero Sum Games with Submodular Structure %A Bryan Wilder %X We define a class of zero-sum games with combinatorial structure, where the best response problem of one player is to maximize a submodular function. For example, this class includes security games played on networks, as well as the problem of robustly optimizing a submodular function over the worst case from a set of scenarios. The challenge in computing equilibria is that both players’ strategy spaces can be exponentially large. Accordingly, previous algorithms have worst-case exponential runtime and indeed fail to scale up on practical instances. We provide a pseudopolynomial-time algorithm which obtains a guaranteed (1 − 1/e) 2 -approximate mixed strategy for the maximizing player. Our algorithm only requires access to a weakened version of a best response oracle for the minimizing player which runs in polynomial time. Experimental results for network security games and a robust budget allocation problem confirm that our algorithm delivers near-optimal solutions and scales to much larger instances than was previously possible. %B AAAI conference on Artificial Intelligence (AAAI-18) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-18) %D 2018 %T Equilibrium Refinement in Security Games with Arbitrary Scheduling Constraints %A Kai Wang %A Qingyu Guo %A Vayanos, Phebe %A Tambe, Milind %A An, Bo %X Significant research effort in security games has focused in devising strategies that perform well even when the attacker deviates from optimal (rational) behavior. In most of these frameworks, a price needs to be paid to ensure robustness against this unpredictability. However, equilibrium refinement is an attractive alternative to boost solution robustness at no cost even though it has not received as much attention in security game literature. In this framework, resources are strategically allocated to secure an optimal outcome against a rational adversary while simultaneously protecting other targets to ensure good outcomes against boundedly rational or constrained attackers. Unfortunately, existing approaches for equilibrium refinement in security games cannot effectively address scheduling constraints that arise frequently in real-world applications. In this paper, we aim to fill this gap and make several key contributions. First, we show that existing approaches for equilibrium refinement can fail in the presence of scheduling constraints. Second, we investigate the properties of the best response of the attacker. Third, we leverage these properties to devise novel iterative algorithms to compute the optimally refined equilibrium, with polynomially many calls to an LP oracle for zero-sum games. Finally, we conduct extensive experimental evaluations that showcase i) the superior performance of our approach in the face of a boundedly rational attacker and ii) the attractive scalability properties of our algorithm that can solve realistic-sized instances. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-18) %G eng %0 Conference Paper %B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) %D 2018 %T From Empirical Analysis to Public Policy: Evaluating Housing Systems for Homeless Youth %A Chan, Hau %A Eric Rice %A Vayanos, Phebe %A Tambe, Milind %A Morton, Matthew %X There are nearly 2 million homeless youth in the United States each year. Coordinated entry systems are being used to provide homeless youth with housing assistance across the nation. Despite these efforts, the number of youth still homeless or unstably housed remains very high. Motivated by this fact, we initiate a first study to understand and analyze the current governmental housing systems for homeless youth. In this paper, we aim to provide answers to the following questions: (1) What is the current governmental housing system for assigning homeless youth to different housing assistance? (2) Can we infer the current assignment guidelines of the local housing communities? (3) What is the result and outcome of the current assignment process? (4) Can we predict whether the youth will be homeless after receiving the housing assistance? To answer these questions, we first provide an overview of the current housing systems. Next, we use simple and interpretable machine learning tools to infer the decision rules of the local communities and evaluate the outcomes of such assignment. We then determine whether the vulnerability features/rubrics can be used to predict youth’s homelessness status after receiving housing assistance. Finally, we discuss the policy recommendations from our study for the local co %B European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) %G eng %0 Conference Paper %B Conference on Decision and Game Theory for Security (GameSec) %D 2018 %T Imbalanced Collusive Security Games %A Ou, Han-Ching %A Tambe, Milind %A Dilkina, Bistra %A Vayanos, Phebe %X Colluding adversaries is a crucial challenge for defenders in many real-world applications. Previous literature has provided Collusive Security Games (COSG) to model colluding adversaries, and provided models and algorithms to generate defender strategies to counter colluding adversaries, often by devising strategies that inhibit collusion [6]. Unfortunately, this previous work focused exclusively on situations with perfectly matched adversaries, i.e., where their rewards were symmetrically distributed. In the real world, however, defenders often face adversaries where their rewards are asymmetrically distributed. Such inherent asymmetry raises a question as to whether human adversaries would attempt to collude in such situations, and whether defender strategies to counter such collusion should focus on inhibiting collusion. To address these open questions, this paper: (i) explores and theoretically analyzes Imbalanced Collusive Security Games (ICOSG) where defenders face adversaries with asymmetrically distributed rewards; (ii) conducts extensive experiments of three different adversary models involving 1800 real human subjects and (iii) derives novel analysis of the reason behind why bounded rational attackers models outperform perfectly rational attackers models. The key principle discovered as the result of our experiments is that: careful modeling of human bounded rationality reveals a key difference (when compared to a model using perfect rationality) in defender strategies for handling colluding adversaries which face symmetric vs asymmetric rewards. Whereas a model based on perfect rationality always attempts to break collusion among adversaries, a bounded rationality model acknowledges the inherent difficulty of breaking such collusion in symmetric situations and focuses only on breaking collusion in asymmetric situation, and only on damage control from collusion in the symmetric situation. %B Conference on Decision and Game Theory for Security (GameSec) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-18) [short paper] %D 2018 %T Inducible Equilibrium for Security Games (Extended Abstract) %A Qingyu Guo %A Jiarui Gan %A Fang, Fei %A Long Tran-Thanh %A Tambe, Milind %A An, Bo %X Strong Stackelberg equilibrium (SSE) is the standard solution concept of Stackelberg security games. The SSE assumes that the follower breaks ties in favor of the leader and this is widely acknowledged and justified by the assertion that the defender can often induce the attacker to choose a preferred action by making an infinitesimal adjustment to her strategy. Unfortunately, in security games with resource assignment constraints, the assertion might not be valid. To overcome this issue, inspired by the notion of inducibility and the pessimistic Stackelberg equilibrium [20, 21], this paper presents the inducible Stackelberg equilibrium (ISE), which is guaranteed to exist and avoids overoptimism as the outcome can always be induced with infinitesimal strategy deviation. Experimental evaluation unveils the significant overoptimism and sub-optimality of SSE and thus, verifies the advantage of the ISE as an alternative solution concept. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-18) [short paper] %G eng %0 Conference Paper %B Annual meeting of the Cognitive Science Society (CogSci) %D 2018 %T Learning about Cyber Deception through Simulations: Predictions of Human Decision Making with Deceptive Signals in Stackelberg Security Games %A Edward A. Cranford %A Lebiere, Christian %A Gonzalez, Cleotilde %A Cooney, Sarah %A Vayanos, Phebe %A Tambe, Milind %X To improve cyber defense, researchers have developed algorithms to allocate limited defense resources optimally. Through signaling theory, we have learned that it is possible to trick the human mind when using deceptive signals. The present work is an initial step towards developing a psychological theory of cyber deception. We use simulations to investigate how humans might make decisions under various conditions of deceptive signals in cyber-attack scenarios. We created an Instance-Based Learning (IBL) model of the attacker decisions using the ACT-R cognitive architecture. We ran simulations against the optimal deceptive signaling algorithm and against four alternative deceptive signal schemes. Our results show that the optimal deceptive algorithm is more effective at reducing the probability of attack and protecting assets compared to other signaling conditions, but it is not perfect. These results shed some light on the expected effectiveness of deceptive signals for defense. The implications of these findings are discussed. %B Annual meeting of the Cognitive Science Society (CogSci) %G eng %0 Journal Article %J Cityscape: A Journal of Policy Development and Research, Volume 20, Number 3 %D 2018 %T Linking Homelessness Vulnerability Assessments to Housing Placements and Outcomes for Youth %A Eric Rice %A Holguin, Monique %A Hsu, Hsun-Ta %A Morton, Matthew %A Vayanos, Phebe %A Tambe, Milind %A Chan, Hau %X Youth homelessness has reached a concerning level of prevalence in the United States. Many communities have attempted to address this problem by creating coordinated community responses, typically referred to as Coordinated Entry Systems (CES). In such systems, agencies within a community pool their housing resources in a centralized system. Youth seeking housing are first assessed for eligibility and vulnerability and then linked to appropriate housing resources. The most widely adopted tool for assessing youth vulnerability is the Transition Age Youth-Vulnerability Index-Service Prioritization Decision Assistance Tool (TAY-VI-SPDAT): Next Step Tool (NST) for homeless youth. To date, no evidence has been amassed to support the value of using this tool or its proposed scoring schematic to prioritize housing resources. Similarly, there is little evidence on the outcomes of youth whose placements are determined by the tool. This article presents the first comprehensive and rigorous evaluation of the connection between vulnerability scores, housing placements, and stability of housing outcomes using data from the Homeless Management Information System (HMIS) collected between 2015 and 2017 from 16 communities across the United States. The two primary aims are (1) to investigate the degree to which communities are using the tool’s recommendations when placing youth into housing programs, and (2) to examine how effectively NST scores distinguish between youth in greater need of formal housing interventions from youth who may be able to self-resolve or return to family successfully. High vulnerability scores at intake were associated with higher odds of continued homelessness without housing intervention, suggesting the tool performs well in predicting youth that need to be prioritized for housing services in the context of limited resources. The majority of low scoring youth appear to return home or self-resolve and remain stably exited from homelessness. Youth placed in permanent supportive housing (PSH) had low recorded returns to homelessness, regardless of their NST score. Youth with vulnerability scores up to 10 who were placed in rapid rehousing (RRH) also had low returns to homelessness, but success was much more variable for higher-scoring youth. %B Cityscape: A Journal of Policy Development and Research, Volume 20, Number 3 %V 20 %G eng %N 3 %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI-18) %D 2018 %T Near Real-Time Detection of Poachers from Drones in AirSim %A Bondi, Elizabeth %A Kapoor, Ashish %A Dey, Debadeepta %A Piavis, James %A Shah, Shital %A Hannaford, Robert %A Iyer, Arvind %A Joppa, Lucas %A Tambe, Milind %X The unrelenting threat of poaching has led to increased development of new technologies to combat it. One such example is the use of thermal infrared cameras mounted on unmanned aerial vehicles (UAVs or drones) to spot poachers at night and report them to park rangers before they are able to harm any animals. However, monitoring the live video stream from these conservation UAVs all night is an arduous task. Therefore, we discuss SPOT (Systematic POacher deTector), a novel application that augments conservation drones with the ability to automatically detect poachers and animals in near real time [Bondi et al., 2018b]. SPOT illustrates the feasibility of building upon state-of-the-art AI techniques, such as Faster RCNN, to address the challenges of automatically detecting animals and poachers in infrared images. This paper reports (i) the design of SPOT, (ii) efficient processing techniques to ensure usability in the field, (iii) evaluation of SPOT based on historical videos and a real-world test run by the end-users, Air Shepherd, in the field, and (iv) the use of AirSim for live demonstration of SPOT. The promising results from a field test have led to a plan for larger-scale deployment in a national park in southern Africa. While SPOT is developed for conservation drones, its design and novel techniques have wider application for automated detection from UAV videos. %B International Joint Conference on Artificial Intelligence (IJCAI-18) %G eng %0 Conference Paper %B International Conference on AAAI ACM conference on AI, Ethics and Society (AIES) %D 2018 %T Partially Generative Neural Networks for Gang Crime Classification with Partial Information %A Seo, Sungyong %A Chan, Hau %A P. Jeffrey Brantingham %A Leap, Jorja %A Vayanos, Phebe %A Tambe, Milind %A Liu, Yan %X More than 1 million homicides, robberies, and aggravated assaults occur in the United States each year. These crimes are often further classified into different types based on the circumstances surrounding the crime (e.g., domestic violence, gang-related). Despite recent technological advances in AI and machine learning, these additional classification tasks are still done manually by specially trained police officers. In this paper, we provide the first attempt to develop a more automatic system for classifying crimes. In particular, we study the question of classifying whether a given violent crime is gang-related. We introduce a novel Partially Generative Neural Networks (PGNN) that is able to accurately classify gang-related crimes both when full information is available and when there is only partial information. Our PGNN is the first generative-classification model that enables to work when some features of the test examples are missing. Using a crime event dataset from Los Angeles covering 2014-2016, we experimentally show that our PGNN outperforms all other typically used classifiers for the problem of classifying gangrelated violent crimes. %B International Conference on AAAI ACM conference on AI, Ethics and Society (AIES) %G eng %0 Conference Paper %B AAAI conference on Artificial Intelligence (AAAI-18) %D 2018 %T Policy Learning for Continuous Space Security Games using Neural Networks %A Kamra, Nitin %A Gupta, Umang %A Fang, Fei %A Liu, Yan %A Tambe, Milind %X A wealth of algorithms centered around (integer) linear programming have been proposed to compute equilibrium strategies in security games with discrete states and actions. However, in practice many domains possess continuous state and action spaces. In this paper, we consider a continuous space security game model with infinite-size action sets for players and present a novel deep learning based approach to extend the existing toolkit for solving security games. Specifically, we present (i) OptGradFP, a novel and general algorithm that searches for the optimal defender strategy in a parameterized continuous search space, and can also be used to learn policies over multiple game states simultaneously; (ii) OptGradFP-NN, a convolutional neural network based implementation of OptGradFP for continuous space security games. We demonstrate the potential to predict good defender strategies via experiments and analysis of OptGradFP and OptGradFP-NN on discrete and continuous game settings. %B AAAI conference on Artificial Intelligence (AAAI-18) %G eng %0 Conference Paper %B AAAI conference on Artificial Intelligence (AAAI-18) %D 2018 %T Preventing Infectious Disease in Dynamic Populations Under Uncertainty %A Bryan Wilder %A Suen, Sze-Chuan %A Tambe, Milind %X Treatable infectious diseases are a critical challenge for public health. Outreach campaigns can encourage undiagnosed patients to seek treatment but must be carefully targeted to make the most efficient use of limited resources. We present an algorithm to optimally allocate limited outreach resources among demographic groups in the population. The algorithm uses a novel multiagent model of disease spread which both captures the underlying population dynamics and is amenable to optimization. Our algorithm extends, with provable guarantees, to a stochastic setting where we have only a distribution over parameters such as the contact pattern between agents. We evaluate our algorithm on two instances where this distribution is inferred from real world data: tuberculosis in India and gonorrhea in the United States. Our algorithm produces a policy which is predicted to avert an average of least 8,000 person-years of tuberculosis and 20,000 personyears of gonorrhea annually compared to current policy. %B AAAI conference on Artificial Intelligence (AAAI-18) %G eng %0 Conference Paper %B 27th International Joint Conference on Artificial Intelligence (IJCAI) %D 2018 %T The Price of Usability: Designing Operationalizable Strategies for Security Games %A Mc Carthy, Sara Marie %A Corine M. Laan %A Kai Wang %A Vayanos, Phebe %A Sinha, Arunesh %A Tambe, Milind %X We consider the problem of allocating scarce security resources among heterogeneous targets to thwart a possible attack. It is well known that deterministic solutions to this problem being highly predictable are severely suboptimal. To mitigate this predictability, the game-theoretic security game model was proposed which randomizes over pure (deterministic) strategies, causing confusion in the adversary. Unfortunately, such mixed strategies typically randomize over a large number of strategies, requiring security personnel to be familiar with numerous protocols, making them hard to operationalize. Motivated by these practical considerations, we propose an easy to use approach for computing strategies that are easy to operationalize and that bridge the gap between the static solution and the optimal mixed strategy. These strategies only randomize over an optimally chosen subset of pure strategies whose cardinality is selected by the defender, enabling them to conveniently tune the trade-off between ease of operationalization and efficiency using a single design parameter. We show that the problem of computing such operationalizable strategies is NP-hard, formulate it as a mixedinteger optimization problem, provide an algorithm for computing ✏-optimal equilibria, and an efficient heuristic. We evaluate the performance of our approach on the problem of screening for threats at airport checkpoints and show that the Price of Usability, i.e., the loss in optimality to obtain a strategy that is easier to operationalize, is typically not high. %B 27th International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Conference Paper %B AAAI conference on Artificial Intelligence (AAAI-18) %D 2018 %T Risk-Sensitive Submodular Optimization %A Bryan Wilder %X The conditional value at risk (CVaR) is a popular risk measure which enables risk-averse decision making under uncertainty. We consider maximizing the CVaR of a continuous submodular function, an extension of submodular set functions to a continuous domain. One example application is allocating a continuous amount of energy to each sensor in a network, with the goal of detecting intrusion or contamination. Previous work allows maximization of the CVaR of a linear or concave function. Continuous submodularity represents a natural set of nonconcave functions with diminishing returns, to which existing techniques do not apply. We give a (1 − 1/e)-approximation algorithm for maximizing the CVaR of a monotone continuous submodular function. This also yields an algorithm for submodular set functions which produces a distribution over feasible sets with guaranteed CVaR. Experimental results in two sensor placement domains confirm that our algorithm substantially outperforms competitive baselines. %B AAAI conference on Artificial Intelligence (AAAI-18) %G eng %0 Conference Paper %B Conference on Decision and Game Theory for Security (GameSec) %D 2018 %T A Robust Optimization Approach to Designing Near-Optimal Strategies for Constant-Sum Monitoring Games %A Rahmattalabi, Aida %A Vayanos, Phebe %A Tambe, Milind %X We consider the problem of monitoring a set of targets, using scarce monitoring resources (e.g., sensors) that are subject to adversarial attacks. In particular, we propose a constant-sum Stackelberg game in which a defender (leader) chooses among possible monitoring locations, each covering a subset of targets, while taking into account the monitor failures induced by a resource-constrained attacker (follower). In contrast to the previous Stackelberg security models in which the defender uses mixed strategies, here, the defender must commit to pure strategies. This problem is highly intractable as both players’ strategy sets are exponentially large. Thus, we propose a solution methodology that automatically partitions the set of adversary’s strategies and maps each subset to a coverage policy. These policies are such that they do not overestimate the defender’s payoff. We show that the partitioning problem can be reformulated exactly as a mixed-integer linear program (MILP) of moderate size which can be solved with off-the-shelf solvers. We demonstrate the effectiveness of our proposed approach in various settings. In particular, we illustrate that even with few policies, we are able to closely approximate the optimal solution and outperform the heuristic solutions. %B Conference on Decision and Game Theory for Security (GameSec) %G eng %0 Conference Paper %B Conference on Decision and Game Theory for Security (GameSec) %D 2018 %T Scaling-up Stackelberg Security Games Applications using Approximations %A Sinha, Arunesh %A Schlenker, Aaron %A Dmello, Donnabell %A Tambe, Milind %X Stackelberg Security Games (SSGs) have been adopted widely for modeling adversarial interactions, wherein scalability of equilibrium computation is an important research problem. While prior research has made progress with regards to scalability, many real world problems cannot be solved satisfactorily yet as per current requirements; these include the deployed federal air marshals (FAMS) application and the threat screening (TSG) problem at airports. We initiate a principled study of approximations in zero-sum SSGs. Our contribution includes the following: (1) a unified model of SSGs called adversarial randomized allocation (ARA) games, (2) hardness of approximation for zero-sum ARA, as well as for the FAMS and TSG sub-problems, (3) an approximation framework for zero-sum ARA with instantiations for FAMS and TSG using intelligent heuristics, and (4) experiments demonstrating the significant 1000x improvement in runtime with an acceptable loss. %B Conference on Decision and Game Theory for Security (GameSec) %G eng %0 Conference Paper %B AAAI conference on Artificial Intelligence (AAAI-18) %D 2018 %T Strategic Coordination of Human Patrollers and Mobile Sensors with Signaling for Security Games %A Haifeng Xu %A Kai Wang %A Vayanos, Phebe %A Tambe, Milind %X Traditional security games concern the optimal randomized allocation of human patrollers, who can directly catch attackers or interdict attacks. Motivated by the emerging application of utilizing mobile sensors (e.g., UAVs) for patrolling, in this paper we propose the novel Sensor-Empowered security Game (SEG) model which captures the joint allocation of human patrollers and mobile sensors. Sensors differ from patrollers in that they cannot directly interdict attacks, but they can notify nearby patrollers (if any). Moreover, SEGs incorporate mobile sensors’ natural functionality of strategic signaling. On the technical side, we first prove that solving SEGs is NP-hard even in zero-sum cases. We then develop a scalable algorithm SEGer based on the branch-and-price framework with two key novelties: (1) a novel MILP formulation for the slave; (2) an efficient relaxation of the problem for pruning. To further accelerate SEGer, we design a faster combinatorial algorithm for the slave problem, which is provably a constant-approximation to the slave problem in zerosum cases and serves as a useful heuristic for general-sum SEGs. Our experiments demonstrate the significant benefit of utilizing mobile sensors. %B AAAI conference on Artificial Intelligence (AAAI-18) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-18) %D 2018 %T Activating the 'Breakfast Club': Modeling Influence Spread in Natural-World Social Networks %A Lily Hu %A Bryan Wilder %A Amulya Yadav %A Eric Rice %A Tambe, Milind %X While reigning models of diffusion have privileged the structure of a given social network as the key to informational exchange, real human interactions do not appear to take place on a single graph of connections. Using data collected from a pilot study of the spread of HIV awareness in social networks of homeless youth, we show that health information did not diffuse in the field according to the processes outlined by dominant models. Since physical network diffusion scenarios often diverge from their more well-studied counterparts on digital networks, we propose an alternative Activation Jump Model (AJM) that describes information diffusion on physical networks from a multi-agent team perspective. Our model exhibits two main differentiating features from leading cascade and threshold models of influence spread: 1) The structural composition of a seed set team impacts each individual node’s influencing behavior, and 2) an influencing node may spread information to non-neighbors. We show that the AJM significantly outperforms existing models in its fit to the observed node-level influence data on the youth networks. We then prove theoretical results, showing that the AJM exhibits many well-behaved properties shared by dominant models. Our results suggest that the AJM presents a flexible and more accurate model of network diffusion that may better inform influence maximization in the field. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-18) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multi-agent Systems (AAMAS 2018) %D 2018 %T Adversary models account for imperfect crime data: Forecasting and planning against real-world poachers (Corrected Version) %A Gholami, Shahrzad %A Mc Carthy, Sara %A Dilkina, Bistra %A Plumptre, Andrew %A Tambe, Milind %A Driciru, Margaret %A Wanyama, Fred %A Rwetsiba, Aggrey %X Poachers are engaged in extinction level wholesale slaughter, so it is critical to harness historical data for predicting poachers’ behavior. However, in these domains, data collected about adversarial actions are remarkably imperfect, where reported negative instances of crime may be mislabeled or uncertain. Unfortunately, past attempts to develop predictive and prescriptive models to address this problem suffer from shortcomings from a modeling perspective as well as in the implementability of their techniques. Most notably these models i) neglect the uncertainty in crime data, leading to inaccurate and biased predictions of adversary behavior, ii) use coarse-grained crime analysis and iii) do not provide a convincing evaluation as they only look at a single protected area. Additionally, they iv) proposed time-consuming techniques which cannot be directly integrated into low resource outposts. In this innovative application paper, we (I) introduce iWare-E a novel imperfect-observation aWare Ensemble (iWare-E) technique, which is designed to handle the uncertainty in crime information efficiently. This approach leads to superior accuracy and efficiency for adversary behavior prediction compared to the previous stateof-the-art. We also demonstrate the country-wide efficiency of the models and are the first to (II) evaluate our adversary behavioral model across different protected areas in Uganda, i.e., Murchison Fall and Queen Elizabeth National Park, (totaling about 7500 km2) as well as (III) on fine-grained temporal resolutions. Lastly, (IV) we provide a scalable planning algorithm to design fine-grained patrol routes for the rangers, which achieves up to 150% improvement in number of predicted attacks detected. %B International Conference on Autonomous Agents and Multi-agent Systems (AAMAS 2018) %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2018 %T Bridging the Gap Between Theory and Practice in Influence Maximization: Raising Awareness about HIV among Homeless Youth %A Amulya Yadav %A Bryan Wilder %A Eric Rice %A Petering, Robin %A Craddock, Jaih %A Yoshioka-Maxwell, Amanda %A Hemler, Mary %A Onasch-Vera, Laura %A Tambe, Milind %A Woo, Darlene %X This paper reports on results obtained by deploying HEALER and DOSIM (two AI agents for social influence maximization) in the real-world, which assist service providers in maximizing HIV awareness in real-world homeless-youth social networks. These agents recommend key ”seed” nodes in social networks, i.e., homeless youth who would maximize HIV awareness in their real-world social network. While prior research on these agents published promising simulation results from the lab, the usability of these AI agents in the real-world was unknown. This paper presents results from three real-world pilot studies involving 173 homeless youth across two different homeless shelters in Los Angeles. The results from these pilot studies illustrate that HEALER and DOSIM outperform the current modus operandi of service providers by ∼160% in terms of information spread about HIV among homeless youth. %B International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-18) %D 2018 %T End-to-End Influence Maximization in the Field %A Bryan Wilder %A Onasch-Vera, Laura %A Hudson, Juliana %A Luna, Jose %A Wilson, Nicole %A Petering, Robin %A Woo, Darlene %A Tambe, Milind %A Eric Rice %X This work is aims to overcome the challenges in deploying influence maximization to support community driven interventions. Influence maximization is a crucial technique used in preventative health interventions, such as HIV prevention amongst homeless youth. Drop-in centers for homeless youth train a subset of youth as peer leaders who will disseminate information about HIV through their social networks. The challenge is to find a small set of peer leaders who will have the greatest possible influence. While many algorithms have been proposed for influence maximization, none can be feasibly deployed by a service provider: existing algorithms require costly surveys of the entire social network of the youth to provide input data, and high performance computing resources to run the algorithm itself. Both requirements are crucial bottlenecks to widespread use of influence maximization in real world interventions. To address the above challenges, this innovative applications paper introduces the CHANGE agent for influence maximization. CHANGE handles the end-to-end process of influence maximization, from data collection to peer leader selection. Crucially, CHANGE only surveys a fraction of the youth to gather network data and minimizes computational cost while providing comparable performance to previously proposed algorithms. We carried out a pilot study of CHANGE in collaboration with a drop-in center serving homeless youth in a major U.S. city. CHANGE surveyed only 18% of the youth to construct its social network. However, the peer leaders it selected reached just as many youth as previously field-tested algorithms which surveyed the entire network. This is the first real-world study of a network sampling algorithm for influence maximization. Simulation results on real-world networks also support our claims. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-18) %G eng %0 Conference Paper %B AAAI conference on Artificial Intelligence (AAAI-18) %D 2018 %T Maximizing Influence in an Unknown Social Network %A Bryan Wilder %A Nicole Immorlica %A Eric Rice %A Tambe, Milind %X In many real world applications of influence maximization, practitioners intervene in a population whose social structure is initially unknown. This poses a multiagent systems challenge to act under uncertainty about how the agents are connected. We formalize this problem by introducing exploratory influence maximization, in which an algorithm queries individual network nodes (agents) to learn their links. The goal is to locate a seed set nearly as influential as the global optimum using very few queries. We show that this problem is intractable for general graphs. However, real world networks typically have community structure, where nodes are arranged in densely connected subgroups. We present the ARISEN algorithm, which leverages community structure to find an influential seed set. Experiments on real world networks of homeless youth, village populations in India, and others demonstrate ARISEN’s strong empirical performance. To formally demonstrate how ARISEN exploits community structure, we prove an approximation guarantee for ARISEN on graphs drawn from the Stochastic Block Model. %B AAAI conference on Artificial Intelligence (AAAI-18) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-18) %D 2018 %T Mitigating the Curse of Correlation in Security Games by Entropy Maximization (Extended Abstract) %A Haifeng Xu %A Dughmi, Shaddin %A Tambe, Milind %A Noronha, Venil Loyd %X In Stackelberg security games, a defender seeks to randomly allocate limited security resources to protect critical targets from an attack. In this paper, we study a fundamental, yet underexplored, phenomenon in security games, which we term the Curse of Correlation (CoC). Specifically, we observe that there are inevitable correlations among the protection status of different targets. Such correlation is a crucial concern, especially in spatio-temporal domains like conservation area patrolling, where attackers can surveil patrollers at certain areas and then infer their patrolling routes using such correlations. To mitigate this issue, we propose to design entropy-maximizing defending strategies for spatio-temporal security games, which frequently suffer from CoC. We prove that the problem is #P-hard in general. However, it admits efficient algorithms in well-motivated special settings. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-18) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-18) %D 2018 %T Optimizing network structure for preventative health %A Bryan Wilder %A Ou, Han Ching %A de la Haye, Kayla %A Tambe, Milind %X Diseases such as heart disease, stroke, or diabetes affect hundreds of millions of people. Such conditions are strongly impacted by obesity, and establishing healthy lifestyle behaviors is a critical public health challenge with many applications. Changing health behaviors is inherently a multiagent problem since people’s behavior is strongly influenced by those around them. Hence, practitioners often attempt to modify the social network of a community by adding or removing edges in ways that will lead to desirable behavior change. To our knowledge, no previous work considers the algorithmic problem of finding the optimal set of edges to add and remove. We propose the RECONNECT algorithm, which efficiently finds high-quality solutions for a range of different network intervention problems. We evaluate RECONNECT in a highly realistic simulated environment based on the Antelope Valley region in California which draws on demographic, social, and health-related data. We find the RECONNECT outperforms an array of baseline policies, in some cases yielding a 150% improvement over the best alternative. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-18) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-18) %D 2018 %T Optimizing network structure for preventative health %A Bryan Wilder %A Ou, Han Ching %A de la Haye, Kayla %A Tambe, Milind %X Diseases such as heart disease, stroke, or diabetes affect hundreds of millions of people. Such conditions are strongly impacted by obesity, and establishing healthy lifestyle behaviors is a critical public health challenge with many applications. Changing health behaviors is inherently a multiagent problem since people’s behavior is strongly influenced by those around them. Hence, practitioners often attempt to modify the social network of a community by adding or removing edges in ways that will lead to desirable behavior change. To our knowledge, no previous work considers the algorithmic problem of finding the optimal set of edges to add and remove. We propose the RECONNECT algorithm, which efficiently finds high-quality solutions for a range of different network intervention problems. We evaluate RECONNECT in a highly realistic simulated environment based on the Antelope Valley region in California which draws on demographic, social, and health-related data. We find the RECONNECT outperforms an array of baseline policies, in some cases yielding a 150% improvement over the best alternative. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-18) %G eng %0 Journal Article %J Journal of the Society for Social Work and Research, Volume 9, Number 4. %D 2018 %T Piloting the Use of Artificial Intelligence to Enhance HIV Prevention Interventions for Youth Experiencing Homelessness %A Eric Rice %A Yoshioka-Maxwell, Amanda %A Petering, Robin %A Onasch-Vera, Laura %A Craddock, Jaih %A Tambe, Milind %A Amulya Yadav %A Bryan Wilder %A Woo, Darlene %A Winetrobe, Hailey %A Wilson, Nicole %X Youth experiencing homelessness are at risk for HIV and need interventions to prevent risky sex behaviors. We tested the feasibility of using artificial intelligence (AI) to select peer change agents (PCAs) to deliver HIV prevention messages among youth experiencing homelessness. Method: We used a pretest– posttest quasi-experimental design. In the AI condition (n 5 62), 11 PCAs were selected via AI algorithm; in the popularity comparison (n 5 55), 11 PCAs were selected 6 months later based on maximum degree centrality (most ties to others in the network). All PCAs were trained to promote HIV testing and condom use among their peers. Participants were clients at a drop-in center in Los Angeles, CA. HIV testing and condom use were assessed via a self-administered, computer-based survey at baseline (n 5 117), 1 month (n 5 86, 74%), and 3 months (n 5 70, 60%). Results: At 3 months, rates of HIV testing increased among participants in the AI condition relative to the comparison group (18.8% vs. 8.1%), as did condom use during anal sex (12.1% vs. 3.3%) and vaginal sex (29.2% vs. 23.7%). Conclusions: AI-enhanced PCA intervention is a feasible method for engaging youth experiencing homelessness in HIV prevention %B Journal of the Society for Social Work and Research, Volume 9, Number 4. %V 9 %G eng %N 4 %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-18) %D 2018 %T Please be an influencer? Contingency Aware Influence Maximization %A Amulya Yadav %A Noothigattu, Ritesh %A Eric Rice %A Onasch-Vera, Laura %A Marcolino, Leandro %A Tambe, Milind %X Most previous work on influence maximization in social networks assumes that the chosen influencers (or seed nodes) can be influenced with certainty (i.e., with no contingencies). In this paper, we focus on using influence maximization in public health domains for assisting low-resource communities, where contingencies are common. It is very difficult in these domains to ensure that the seed nodes are influenced, as influencing them entails contacting/convincing them to attend training sessions, which may not always be possible. Unfortunately, previous state-of-the-art algorithms for influence maximization are unusable in this setting. This paper tackles this challenge via the following four contributions: (i) we propose the Contingency Aware Influence Maximization problem and analyze it theoretically; (ii) we cast this problem as a Partially Observable Markov Decision Process and propose CAIMS (a novel POMDP planner) to solve it, which leverages a natural action space factorization associated with real-world social networks; and (iii) we provide extensive simulation results to compare CAIMS with existing state-of-the-art influence maximization algorithms. Finally, (iv) we provide results from a real-world feasibility trial conducted to evaluate CAIMS, in which key influencers in homeless youth social networks were influenced in order to spread awareness about HIV. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS-18) %G eng %0 Conference Proceedings %B Proceedings of the Thirtieth Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-18) %D 2018 %T SPOT Poachers in Action: Augmenting Conservation Drones with Automatic Detection in Near Real Time %A Bondi, Elizabeth %A Fang, Fei %A Hamilton, Mark %A Kar, Debarun %A Dmello, Donnabell %A Choi, Jongmoo %A Hannaford, Robert %A Iyer, Arvind %A Lucas Jopp %A Tambe, Milind %A Ram Nevatia %X The unrelenting threat of poaching has led to increased development of new technologies to combat it. One such example is the use of long wave thermal infrared cameras mounted on unmanned aerial vehicles (UAVs or drones) to spot poachers at night and report them to park rangers before they are able to harm animals. However, monitoring the live video stream from these conservation UAVs all night is an arduous task. Therefore, we build SPOT (Systematic POacher deTector), a novel application that augments conservation drones with the ability to automatically detect poachers and animals in near real time. SPOT illustrates the feasibility of building upon state-of-the-art AI techniques, such as Faster RCNN, to address the challenges of automatically detecting animals and poachers in infrared images. This paper reports (i) the design and architecture of SPOT, (ii) a series of efforts towards more robust and faster processing to make SPOT usable in the field and provide detections in near real time, and (iii) evaluation of SPOT based on both historical videos and a real-world test run by the end users in the field. The promising results from the test in the field have led to a plan for larger-scale deployment in a national park in Botswana. While SPOT is developed for conservation drones, its design and novel techniques have wider application for automated detection from UAV videos. %B Proceedings of the Thirtieth Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-18) %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2018 %T Stackelberg Security Games: Looking Beyond a Decade of Success %A Sinha, Arunesh %A Fang, Fei %A An, Bo %A Kiekintveld, Christopher %A Tambe, Milind %X The Stackelberg Security Game (SSG) model has been immensely influential in security research since it was introduced roughly a decade ago. Furthermore, deployed SSG-based applications are one of most successful examples of game theory applications in the real world. We present a broad survey of recent technical advances in SSG and related literature, and then look to the future by highlighting the new potential applications and open research problems in SSG. %B International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Conference Paper %B Advances in Cognitive Systems %D 2017 %T Believe It or Not: Modeling Adversary Belief Formation in Stackelberg Security Games with Varying Information %A Kar, Debarun %A Subhasree Sengupta %A Ece Kamar %A Eric Horvitz %A Tambe, Milind %X There has been significant amount of research in Stackelberg Security Games (SSG), and a common assumption in that literature is that the adversary perfectly observes the defender’s mixed strategy. However, in real-world settings the adversary can only observe a sequence of defender pure strategies sampled from the actual mixed strategy. Therefore, a key challenge is the modeling of adversary’s belief formation based on such limited observations. The SSG literature lacks a comparative analysis of these models and a principled study of their strengths and weaknesses. In this paper, we study the following shortcomings of previous work and introduce new models that address these shortcomings. First, we address the lack of empirical evaluation or head-to-head comparison of existing models by conducting the first-of-its-kind systematic comparison of existing and new proposed models on belief data collected from human subjects on Amazon Mechanical Turk. Second, we show that assuming a homogeneous population of adversaries, a common assumption in the literature, is unrealistic based on our experiments, which highlight four heterogeneous groups of adversaries with distinct belief update mechanisms. We present new models that address this shortcoming by clustering and learning these disparate behaviors from data when available. Third, we quantify the value of having historical data on the accuracy of belief prediction. %B Advances in Cognitive Systems %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2017 %T Don’t Bury your Head inWarnings: A Game-Theoretic Approach for Intelligent Allocation of Cyber-security Alerts %A A Schlenker %A Xu, H %A Guirguis, M %A Kiekintveld, C %A Sinha, A %A Tambe, M %A S Sonya %A Balderas, D %A Dunstatter, N %X In recent years, there have been a number of successful cyber attacks on enterprise networks by malicious actors. These attacks generate alerts which must be investigated by cyber analysts to determine if they are an attack. Unfortunately, there are magnitude more alerts than cyber analysts - a trend expected to continue into the future creating a need to find optimal assignments of the incoming alerts to analysts in the presence of a strategic adversary. We address this challenge with the four following contributions: (1) a cyber allocation game (CAG) model for the cyber network protection domain, (2) an NP-hardness proof for computing the optimal strategy for the defender, (3) techniques to find the optimal allocation of experts to alerts in CAG in the general case and key special cases, and (4) heuristics to achieve significant scale-up in CAGs with minimal loss in solution quality. %B International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Journal Article %J Journal of Agents and Multiagent Systems (JAAMAS) (To appear) %D 2017 %T Every Team Deserves a Second Chance An extended study on predicting team performance %A Marcolino, Leandro Soriano %A Aravind S. Lakshminarayanan %A Nagarajan, Vaishnavh %A Tambe, Milind %X Voting among different agents is a powerful tool in problem solving, and it has been widely applied to improve the performance in finding the correct answer to complex problems. We present a novel benefit of voting, that has not been observed before: we can use the voting patterns to assess the performance of a team and predict their final outcome. This prediction can be executed at any moment during problem-solving and it is completely domain independent. Hence, it can be used to identify when a team is failing, allowing an operator to take remedial procedures (such as changing team members, the voting rule, or increasing the allocation of resources). We present three main theoretical results: (i) we show a theoretical explanation of why our prediction method works; (ii) contrary to what would be expected based on a simpler explanation using classical voting models, we show that we can make accurate predictions irrespective of the strength (i.e., performance) of the teams, and that in fact, the prediction can work better for diverse teams composed of different agents than uniform teams made of copies of the best agent; (iii) we show that the quality of our prediction increases with the size of the action space. We perform extensive experimentation in two different domains: Computer Go and Ensemble Learning. In Computer Go, we obtain high quality predictions about the final outcome of games. We analyze the prediction accuracy for three different teams with different levels of diversity and strength, and show that the prediction works significantly better for a diverse team. Additionally, we show that our method still works well when trained with games against one adversary, but tested with games against another, showing the generality of the learned functions. Moreover, we evaluate four different board sizes, and experimentally confirm better predictions in larger board sizes. We analyze in detail the learned prediction functions, and how they change according to each team and action space size. In order to show that our method is domain independent, we also present results in Ensemble Learning, where we make online predictions about the performance of a team of classifiers, while they are voting to classify sets of items. We study a set of classical classification algorithms from machine learning, in a data-set of hand-written digits, and we are able to make high-quality predictions about the final performance of two different teams. Since our approach is domain independent, it can be easily applied to a variety of other domains. %B Journal of Agents and Multiagent Systems (JAAMAS) (To appear) %G eng %0 Conference Proceedings %B Proc. of AAAI Fall Symposium Series on Cognitive Assistance in Government and Public Sector Applications, 2017 %D 2017 %T Evidence From the Past: AI Decision Aids to Improve Housing Systems for Homeless Youth %A Chan, Hau %A Eric Rice %A Vayanos, Phebe %A Tambe, Milind %A Morton, Matthew %X Could an AI decision aid improve housing systems that assist homeless youth? There are nearly 2 million homeless youth in the United States each year. Coordinated entry systems are being used to provide homeless youth with housing assistance across the nation. Despite these efforts, the number of homeless youth still living on the street remains very high. Motivated by this fact, we initiate a first study to create AI decision aids for improving the current housing systems for homeless youth. First, we determine whether the current rubric for prioritizing youth for housing assistance can be used to predict youth’s homelessness status after receiving housing assistance. We then consider building better AI decision aids and predictive models using other components of the rubric. We believe there is much potential for effective human-machine collaboration in the context of housing allocation. We plan to work with HUD and local communities to develop such systems in the future. %B Proc. of AAAI Fall Symposium Series on Cognitive Assistance in Government and Public Sector Applications, 2017 %G eng %0 Magazine Article %D 2017 %T Keeping it Real: Using Real-World Problems to Teach AI to Diverse Audiences %A Sintov, Nicole %A Kar, Debarun %A Nguyen, Thanh %A Fang, Fei %A Hoffman, Kevin %A Lyet, Arnaud %A Tambe, Milind %X In recent years, AI-based applications have increasingly been used in real-world domains. For example, game theorybased decision aids have been successfully deployed in various security settings to protect ports, airports, and wildlife. This paper describes our unique problem-to-project educational approach that used games rooted in real-world issues to teach AI concepts to diverse audiences. Specifically, our educational program began by presenting real-world security issues, and progressively introduced complex AI concepts using lectures, interactive exercises, and ultimately hands-on games to promote learning. We describe our experience in applying this approach to several audiences, including students of an urban public high school, university undergraduates, and security domain experts who protect wildlife. We evaluated our approach based on results from the games and participant surveys. %B AI Magazine (To appear) %G eng %0 Conference Paper %B Conference on Decision and Game Theory for Security (GameSec) 2017 %D 2017 %T Optimal Patrol Planning for Green Security Games with Black-Box Attackers %A Haifeng Xu %A Ford, Benjamin %A Fang, Fei %A Dilkina, Bistra %A Plumptre, Andrew %A Tambe, Milind %A Driciru, Margaret %A Wanyama, Fred %A Rwetsiba, Aggrey %A Nsubaga, Mustapha %A Mabonga, Joshua %X Motivated by the problem of protecting endangered animals, there has been a surge of interests in optimizing patrol planning for conservation area protection. Previous efforts in these domains have mostly focused on optimizing patrol routes against a specific boundedly rational poacher behavior model that describes poachers’ choices of areas to attack. However, these planning algorithms do not apply to other poaching prediction models, particularly, those complex machine learning models which are recently shown to provide better prediction than traditional bounded-rationality-based models. Moreover, previous patrol planning algorithms do not handle the important concern whereby poachers infer the patrol routes by partially monitoring the rangers’ movements. In this paper, we propose OPERA, a general patrol planning framework that: (1) generates optimal implementable patrolling routes against a black-box attacker which can represent a wide range of poaching prediction models; (2) incorporates entropy maximization to ensure that the generated routes are more unpredictable and robust to poachers’ partial monitoring. Our experiments on a real-world dataset from Uganda’s Queen Elizabeth Protected Area (QEPA) show that OPERA results in better defender utility, more efficient coverage of the area and more unpredictability than benchmark algorithms and the past routes used by rangers at QEPA. %B Conference on Decision and Game Theory for Security (GameSec) 2017 %G eng %0 Journal Article %J IBM Journal of Research and Development (To appear) %D 2017 %T Predicting Poaching for Wildlife Protection %A F. Fang %A T. H. Nguyen %A A. Sinha %A S. Gholami %A A. Plumptre %A L. Joppa %A M. Tambe %A M. Driciru %A F. Wanyama %A A. Rwetsiba %A R. Critchlow %A C. M. Beale %X Wildlife species such as tigers and elephants are under the threat of poaching. To combat poaching, conservation agencies (“defenders”) need to (1) anticipate where the poachers are likely to poach and (2) plan effective patrols. We propose an anti-poaching tool CAPTURE (Comprehensive Anti-Poaching tool with Temporal and observation Uncertainty REasoning), which helps the defenders achieve both goals. CAPTURE builds a novel hierarchical model for poacher-patroller interaction. It considers the patroller’s imperfect detection of signs of poaching, the complex temporal dependencies in the poacher's behaviors and the defender’s lack of knowledge of the number of poachers. Further, CAPTURE uses a new game-theoretic algorithm to compute the optimal patrolling strategies and plan effective patrols. This paper investigates the computational challenges that CAPTURE faces. First, we present a detailed analysis of parameter separation and target abstraction, two novel approaches used by CAPTURE to efficiently learn the parameters in the hierarchical model. Second, we propose two heuristics – piece-wise linear approximation and greedy planning – to speed up the computation of the optimal patrolling strategies. We discuss in this paper the lessons learned from using CAPTURE to analyze real-world poaching data collected over 12 years in Queen Elizabeth National Park in Uganda. %B IBM Journal of Research and Development (To appear) %G eng %0 Conference Paper %B AAMAS International Workshop on Optimization in Multi-Agent Systems (OPTMAS) %D 2017 %T Robust, dynamic influence maximization %A Bryan Wilder %A Amulya Yadav %A Nicole Immorlica %A Eric Rice %A Tambe, Milind %X This paper focuses on new challenges in influence maximization inspired by non-profits’ use of social networks to effect behavioral change in their target populations. Influence maximization is a multiagent problem where the challenge is to select the most influential agents from a population connected by a social network. Specifically, our work is motivated by the problem of spreading messages about HIV prevention among homeless youth using their social network. We show how to compute solutions which are provably close to optimal when the parameters of the influence process are unknown. We then extend our algorithm to a dynamic setting where information about the network is revealed at each stage. Simulation experiments using real world networks collected by the homeless shelter show the advantages of our approach. %B AAMAS International Workshop on Optimization in Multi-Agent Systems (OPTMAS) %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2017 %T Staying Ahead of the Game: Adaptive Robust Optimization for Dynamic Allocation of Threat Screening Resources %A Mc Carthy, Sara Marie %A Vayanos, Phebe %A Tambe, Milind %X We consider the problem of dynamically allocating screening resources of different efficacies (e.g., magnetic or X-ray imaging) at checkpoints (e.g., at airports or ports) to successfully avert an attack by one of the screenees. Previously, the Threat Screening Game model was introduced to address this problem under the assumption that screenee arrival times are perfectly known. In reality, arrival times are uncertain, which severely impedes the implementability and performance of this approach. We thus propose a novel framework for dynamic allocation of threat screening resources that explicitly accounts for uncertainty in the screenee arrival times. We model the problem as a multistage robust optimization problem and propose a tractable solution approach using compact linear decision rules combined with robust reformulation and constraint randomization. We perform extensive numerical experiments which showcase that our approach outperforms (a) exact solution methods in terms of tractability, while incurring only a very minor loss in optimality, and (b) methods that ignore uncertainty in terms of both feasibility and optimality. %B International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Conference Paper %D 2017 %T Towards a Game-theoretic Framework for Intelligent Cyber-security Alert Allocation %A A Schlenker %A Xu, H %A Kiekintveld, C %A Sinha, A %A Tambe, M %A Guirguis, M %A S Sonya %A Balderas, D %A Dunstatter, N %X In recent years, there have been a number of successful cyber attacks on enterprise networks by malicious actors. These attacks generate alerts which must be investigated by cyber analysts to determine if they are an attack. Unfortunately, there are magnitude more alerts than cyber analysts - a trend expected to continue into the future creating a need to find optimal assignments of the incoming alerts to analysts in the presence of a strategic adversary. We address this challenge with the four following contributions: (1) a cyber allocation game (CAG) model for the cyber network protection domain, (2) an NP-hardness proof for computing the optimal strategy for the defender, (3) techniques to find the optimal allocation of experts to alerts in CAG in the general case and key special cases, and (4) heuristics to achieve significant scale-up in CAGs with minimal loss in solution quality. %G eng %0 Journal Article %J IBM Journal of Research and Development (To appear) %D 2017 %T Using Social Networks to Raise HIV Awareness Among Homeless Youth %A A. Yadav %A H. Chan %A A.X. Jiang %A Xu, H. %A E. Rice %A R. Petering %A M. Tambe %X Many homeless shelters conduct interventions to raise awareness about HIV (human immunodeficiency virus) among homeless youth. Due to human and financial resource shortages, these shelters need to choose intervention attendees strategically, in order to maximize awareness through the homeless youth social network. In this work, we propose HEALER (hierarchical ensembling based agent which plans for effective reduction in HIV spread), an agent that recommends sequential intervention plans for use by homeless shelters. HEALER's sequential plans (built using knowledge of homeless youth social networks) select intervention participants strategically to maximize influence spread, by solving POMDPs (partially observable Markov decision process) on social networks using heuristic ensemble methods. This paper explores the motivations behind HEALER’s design, and analyzes HEALER’s performance in simulations on real-world networks. First, we provide a theoretical analysis of the DIME (dynamic influence maximization under uncertainty) problem, the main computational problem that HEALER solves. HEALER relies on heuristic methods for solving the DIME problem due to its computational hardness. Second, we explain why heuristics used inside HEALER work well on real-world networks. Third, we present results comparing HEALER to baseline algorithms augmented by HEALER’s heuristics. HEALER is currently being tested in real-world pilot studies with homeless youth in Los Angeles. %B IBM Journal of Research and Development (To appear) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2017 %T Cloudy with a Chance of Poaching: Adversary Behavior Modeling and Forecasting with Real-World Poaching Data %A Kar, Debarun %A Ford, Benjamin %A Gholami, Shahrzad %A Fang, Fei %A Plumptre, Andrew %A Tambe, Milind %A Driciru, Margaret %A Wanyama, Fred %A Rwetsiba, Aggrey %X Wildlife conservation organizations task rangers to deter and capture wildlife poachers. Since rangers are responsible for patrolling vast areas, adversary behavior modeling can help more effectively direct future patrols. In this innovative application track paper, we present an adversary behavior modeling system, INTERCEPT (INTERpretable Classification Ensemble to Protect Threatened species), and provide the most extensive evaluation in the AI literature of one of the largest poaching datasets from Queen Elizabeth National Park (QENP) in Uganda, comparing INTERCEPT with its competitors; we also present results from a month-long test of INTERCEPT in the field. We present three major contributions. First, we present a paradigm shift in modeling and forecasting wildlife poacher behavior. Some of the latest work in the AI literature (and in Conservation) has relied on models similar to the Quantal Response model from Behavioral Game Theory for poacher behavior prediction. In contrast, INTERCEPT presents a behavior model based on an ensemble of decision trees (i) that more effectively predicts poacher attacks and (ii) that is more effectively interpretable and verifiable. We augment this model to account for spatial correlations and construct an ensemble of the best models, significantly improving performance. Second, we conduct an extensive evaluation on the QENP dataset, comparing 41 models in prediction performance over two years. Third, we present the results of deploying INTERCEPT for a one-month field test in QENP - a first for adversary behavior modeling applications in this domain. This field test has led to finding a poached elephant and more than a dozen snares (including a roll of elephant snares) before they were deployed, potentially saving the lives of multiple animals - including endangered elephants. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B 3rd International Workshop on Social Influence Analysis %D 2017 %T Explanations Systems for Influential Maximizations Algorithms %A Amulya Yadav %A Rahmattalabi, Aida %A Ece Kamar %A Vayanos, Phebe %A Tambe, Milind %A Noronha, Venil Loyd %X The field of influence maximization (IM) has made rapid advances, resulting in many sophisticated algorithms for identifying “influential” members in social networks. However, in order to engender trust in IM algorithms, the rationale behind their choice of “influential” nodes needs to be explained to its users. This is a challenging open problem that needs to be solved before these algorithms can be deployed on a large scale. This paper attempts to tackle this open problem via four major contributions: (i) we propose a general paradigm for designing explanation systems for IM algorithms by exploiting the tradeoff between explanation accuracy and interpretability; our paradigm treats IM algorithms as black boxes, and is flexible enough to be used with any algorithm; (ii) we utilize this paradigm to build XplainIM, a suite of explanation systems; (iii) we illustrate the usability of XplainIM by explaining solutions of HEALER (a recent IM algorithm) among ∼200 human subjects on Amazon Mechanical Turk (AMT); and (iv) we provide extensive evaluation of our AMT results, which shows the effectiveness of XplainIM. %B 3rd International Workshop on Social Influence Analysis %G eng %0 Conference Paper %B In IWAISe-17: 1st International Workshop on A.I. in Security held at the International Joint Conference on Artificial Intelligence %D 2017 %T Handling Continuous Space Security Games with Neural Networks %A Kamra, Nitin %A Fang, Fei %A Kar, Debarun %A Liu, Yan %A Tambe, Milind %X Despite significant research in Security Games, limited efforts have been made to handle game domains with continuous space. Addressing such limitations, in this paper we propose: (i) a continuous space security game model that considers infinitesize action spaces for players; (ii) OptGradFP, a novel and general algorithm that searches for the optimal defender strategy in a parametrized search space; (iii) OptGradFP-NN, a convolutional neural network based implementation of OptGradFP for continuous space security games; (iv) experiments and analysis with OptGradFP-NN. This is the first time that neural networks have been used for security games, and it shows the promise of applying deep learning to complex security games which previous approaches fail to handle. %B In IWAISe-17: 1st International Workshop on A.I. in Security held at the International Joint Conference on Artificial Intelligence %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multi-agent Systems (AAMAS) %D 2017 %T Influence Maximization in the Field: The Arduous Journey from Emerging to Deployed Application %A Amulya Yadav %A Bryan Wilder %A Eric Rice %A Petering, Robin %A Craddock, Jaih %A Yoshioka-Maxwell, Amanda %A Hemler, Mary %A Onasch-Vera, Laura %A Tambe, Milind %A Woo, Darlene %X This paper focuses on a topic that is insufficiently addressed in the literature, i.e., challenges faced in transitioning agents from an emerging phase in the lab, to a deployed application in the field. Specifically, we focus on challenges faced in transitioning HEALER and DOSIM, two agents for social influence maximization, which assist service providers in maximizing HIV awareness in real-world homeless-youth social networks. These agents recommend key "seed" nodes in social networks, i.e., homeless youth who would maximize HIV awareness in their real-world social network. While prior research on these agents published promising simulation results from the lab, this paper illustrates that transitioning these agents from the lab into the real-world is not straightforward, and outlines three major lessons. First, it is important to conduct real-world pilot tests; indeed, due to the health-critical nature of the domain and complex influence spread models used by these agents, it is important to conduct field tests to ensure the real-world usability and effectiveness of these agents. We present results from three real-world pilot studies, involving 173 homeless youth in an American city. These are the first such pilot studies which provide headto-head comparison of different agents for social influence maximization, including a comparison with a baseline approach. Second, we present analyses of these real-world results, illustrating the strengths and weaknesses of different influence maximization approaches we compare. Third, we present research and deployment challenges revealed in conducting these pilot tests, and propose solutions to address them. These challenges and proposed solutions are instructive in assisting the transition of agents focused on social influence maximization from the emerging to the deployed application phase. %B International Conference on Autonomous Agents and Multi-agent Systems (AAMAS) %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2017 %T Maximizing Awareness about HIV in Social Networks of Homeless Youth with Limited Information %A Amulya Yadav %A Chan, Hau %A Xin Jiang, Albert %A Haifeng Xu %A Eric Rice %A Tambe, Milind %X This paper presents HEALER, a software agent that recommends sequential intervention plans for use by homeless shelters, who organize these interventions to raise awareness about HIV among homeless youth. HEALER’s sequential plans (built using knowledge of social networks of homeless youth) choose intervention participants strategically to maximize influence spread, while reasoning about uncertainties in the network. While previous work presents influence maximizing techniques to choose intervention participants, they do not address two real-world issues: (i) they completely fail to scale up to real-world sizes; and (ii) they do not handle deviations in execution of intervention plans. HEALER handles these issues via two major contributions: (i) HEALER casts this influence maximization problem as a POMDP and solves it using a novel planner which scales up to previously unsolvable real-world sizes; and (ii) HEALER allows shelter officials to modify its recommendations, and updates its future plans in a deviationtolerant manner. HEALER was deployed in the real world in Spring 2016 with considerable success. %B International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Magazine Article %D 2017 %T PAWS: A Deployed Game-Theoretic Application to Combat Poaching %A Fang, Fei %A Thanh H. Nguyen %A Pickles, Rob %A Wai Y. Lam %A Gopalasamy R. Clements %A An, Bo %A Singh, Amandeep %A Brian C. Schwedock %A Tambe, Milind %A Lemieux, Andrew %X Poaching is considered a major driver for the population drop of key species such as tigers, elephants, and rhinos, which can be detrimental to whole ecosystems. While conducting foot patrols is the most commonly used approach in many countries to prevent poaching, such patrols often do not make the best use of the limited patrolling resources. This paper presents PAWS, a game-theoretic application deployed in Southeast Asia for optimizing foot patrols to combat poaching. In this paper, we report on the significant evolution of PAWS from a proposed decision aid introduced in 2014 to a regularly deployed application. We outline key technical advances that lead to PAWS’s regular deployment: (i) incorporating complex topographic features, e.g., ridgelines, in generating patrol routes; (ii) handling uncertainties in species distribution (game theoretic payoffs); (iii) ensuring scalability for patrolling large-scale conservation areas with fine-grained guidance; and (iv) handling complex patrol scheduling constraints. %B AI Magazine 38(1):23-36 %G eng %0 Thesis %D 2017 %T Real-World Evaluation and Deployment of Wildlife Crime Prediction Models %A Ford, Benjamin %X Conservation agencies worldwide must make the most efficient use of their limited resources to protect natural resources from over-harvesting and animals from poaching. Predictive modeling, a tool to increase efficiency, is seeing increased usage in conservation domains such as to protect wildlife from poaching. Many works in this wildlife protection domain, however, fail to train their models on real-world data or test their models in the real world. My thesis proposes novel poacher behavior models that are trained on real-world data and are tested via first-of-their-kind tests in the real world. First, I proposed a paradigm shift in traditional adversary behavior modeling techniques from Quantal Response-based models to decision tree-based models. Based on this shift, I proposed an ensemble of spatially-aware decision trees, INTERCEPT, that outperformed the prior stateof-the-art and then also presented results from a one-month pilot field test of the ensemble’s predictions in Uganda’s Queen Elizabeth Protected Area (QEPA). This field test represented the first time that a machine learning-based poacher behavior modeling application was tested in the field. Second, I proposed a hybrid spatio-temporal model that led to further performance improvements. To validate this model, I designed and conducted a large-scale, eight-month field test of this model’s predictions in QEPA. This field test, where rangers patrolled over 450 km in the largest and longest field test of a machine learning-based poacher behavior model to date in this domain, successfully demonstrated the selectiveness of the model’s predictions; the model successfully predicted, with statistical significance, where rangers would find more snaring activity and also where rangers would not find as much snaring activity. I also conducted detailed analysis of the behavior of my predictive model. Third, beyond wildlife poaching, I also provided novel graph-aware models for modeling human adversary behavior in wildlife or other contraband smuggling networks and tested them against human subjects. Lastly, I examined human considerations of deployment in new domains and the importance of easily-interpretable models and results. While such interpretability has been a recurring theme in all my thesis work, I also created a game-theoretic inspection strategy application that generated randomized factory inspection schedules and also contained visualization and explanation components for users. %G eng %9 PhD thesis %0 Conference Paper %B The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2017 Applied Data Science Track) %D 2017 %T Taking it for a Test Drive: A Hybrid Spatio-temporal Model for Wildlife Poaching Prediction Evaluated through a Controlled Field Test %A S Gholami %A B Ford %A F Fang %A A Plumptre %A Tambe, M %A M Driciru %A F Wanyama %A A Rwetsiba %A M Nsubaga %A J Mabonga %X Worldwide, conservation agencies employ rangers to protect conservation areas from poachers. However, agencies lack the manpower to have rangers effectively patrol these vast areas frequently. While past work has modeled poachers’ behavior so as to aid rangers in planning future patrols, those models’ predictions were not validated by extensive field tests. In this paper, we present a hybrid spatio-temporal model that predicts poaching threat levels and results from a five-month field test of our model in Uganda’s Queen Elizabeth Protected Area (QEPA). To our knowledge, this is the first time that a predictive model has been evaluated through such an extensive field test in this domain. We present two major contributions. First, our hybrid model consists of two components: (i) an ensemble model which can work with the limited data common to this domain and (ii) a spatio-temporal model to boost the ensemble’s predictions when sufficient data are available. When evaluated on real-world historical data from QEPA, our hybrid model achieves significantly better performance than previous approaches with either temporally-aware dynamic Bayesian networks or an ensemble of spatially-aware models. Second, in collaboration with the Wildlife Conservation Society and Uganda Wildlife Authority, we present results from a five-month controlled experiment where rangers patrolled over 450 sq km across QEPA. We demonstrate that our model successfully predicted (1) where snaring activity would occur and (2) where it would not occur; in areas where we predicted a high rate of snaring activity, rangers found more snares and snared animals than in areas of lower predicted activity. These findings demonstrate that (1) our model’s predictions are selective, (2) our model’s superior laboratory performance extends to the real world, and (3) these predictive models can aid rangers in focusing their efforts to prevent wildlife poaching and save animals. %B The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2017 Applied Data Science Track) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multi-agent Systems (AAMAS) %D 2017 %T Uncharted but not Uninfluenced: Influence Maximization with an Uncertain Network %A Bryan Wilder %A Amulya Yadav %A Nicole Immorlica %A Eric Rice %A Tambe, Milind %X This paper focuses on new challenges in influence maximization inspired by non-profits’ use of social networks to effect behavioral change in their target populations. Influence maximization is a multiagent problem where the challenge is to select the most influential agents from a population connected by a social network. Specifically, our work is motivated by the problem of spreading messages about HIV prevention among homeless youth using their social network. We show how to compute solutions which are provably close to optimal when the parameters of the influence process are unknown. We then extend our algorithm to a dynamic setting where information about the network is revealed at each stage. Simulation experiments using real world networks collected by the homeless shelter show the advantages of our approach. %B International Conference on Autonomous Agents and Multi-agent Systems (AAMAS) %G eng %0 Conference Paper %B Conference on Decision and Game Theory for Security (GameSec) 2017 %D 2017 %T VIOLA: Video Labeling Application for SecurityDomains %A Bondi, Elizabeth %A Fang, Fei %A Kar, Debarun %A Noronha, Venil %A Dmello, Donnabell %A Tambe, Milind %A Iyer, Arvind %A Hannaford, Robert %X Advances in computational game theory have led to several successfully deployed applications in security domains. These gametheoretic approaches and security applications learn game payoff values or adversary behaviors from annotated input data provided by domain experts and practitioners in the field, or collected through experiments with human subjects. Beyond these traditional methods, unmanned aerial vehicles (UAVs) have become an important surveillance tool used in security domains to collect the required annotated data. However, collecting annotated data from videos taken by UAVs efficiently, and using these data to build datasets that can be used for learning payoffs or adversary behaviors in game-theoretic approaches and security applications, is an under-explored research question. This paper presents VIOLA, a novel labeling application that includes (i) a workload distribution framework to efficiently gather human labels from videos in a secured manner; (ii) a software interface with features designed for labeling videos taken by UAVs in the domain of wildlife security. We also present the evolution of VIOLA and analyze how the changes made in the development process relate to the efficiency of labeling, including when seemingly obvious improvements surprisingly did not lead to increased efficiency. VIOLA enables collecting massive amounts of data with detailed information from challenging security videos such as those collected aboard UAVs for wildlife security. VIOLA will lead to the development of a new generation of game-theoretic approaches for security domains, including approaches that integrate deep learning and game theory for real-time detection and response. %B Conference on Decision and Game Theory for Security (GameSec) 2017 %G eng %0 Thesis %D 2017 %T When AI helps Wildlife Conservation: Learning Adversary Behaviors in Green Security Games %A Kar, Debarun %X Whereas previous real-world game-theoretic applications in security focused on protection of critical infrastructure in the absence of past attack data, more recent work has focused on datadriven security and sustainability applications for protecting the environment, including forests, fish and wildlife. One key challenge in such “Green Security Game” (GSG) domains is to model the adversary’s decision making process based on available attack data. This thesis, for the first time, explores the suitability of different adversary behavior modeling approaches in such domains that differ in the type and amount of historical data available. The first contribution is to provide a detailed comparative study, based on actual human subject experiments, of competing adversary behavior models in domains where attack data is available in plenty (e.g., via a large number of sensors). This thesis demonstrates a new human behavior model, SHARP, which mitigates the limitations of previous models in three key ways. First, SHARP reasons based on successes or failures of the adversary’s past actions to model adversary adaptivity. Second, SHARP reasons about similarity between exposed and unexposed areas of the attack surface to handle the adversary’s lack of exposure to enough of the attack surface. Finally, SHARP integrates a non-linear probability weighting function to capture the adversary’s true weighting of probabilities.The second contribution relates to domains requiring predictions over a large set of targets by learning from limited (and in some cases, noisy) data. One example dataset on which we demonstrate our approaches to handle such challenges is a real-world poaching dataset collected over a large geographical area at the Queen Elizabeth National Park in Uganda. This data is too sparse to construct a detailed model. The second contribution of this thesis delivers a surprising result by presenting an adversary behavior modeling system, INTERCEPT, which is based on an ensemble of decision trees (i) that effectively learns and predicts poacher attacks based on limited noisy attack data over a large set of targets, and (ii) has fast execution speed. This has led to a successful month-long test of INTERCEPT in the field, a first for adversary behavior modeling applications in the wildlife conservation domain. Finally, for the my third contribution, we examine one common assumption in adversary behavior modeling that the adversary perfectly observes the defender’s randomized protection strategy. However, in domains such as wildlife conservation, the adversary only observes a limited sequence of defender patrols and forms beliefs about the defender’s strategy. In the absence of a comparative analysis and a principled study of the strengths and weaknesses of belief models, no informed decision could be made to incorporate belief models in adversary behavior models such as SHARP and INTERCEPT. This thesis provides the first-of-its-kind systematic comparison of existing and new proposed belief models and demonstrates based on human subjects experiments data that identifying heterogeneous belief update behavior is essential in making effective predictions. We also propose and evaluate customized models for settings that differ in the type of belief data available and quantify the value of having such historical data on the accuracy of belief prediction. %G eng %9 PhD thesis %0 Conference Paper %B ParSocial Workshop 2016 %D 2016 %T Addressing Behavioral Uncertainty in Security Games: An Efficient Robust Strategic Solution for Defender Patrols %A Thanh H. Nguyen %A Sinha, Arunesh %A Tambe, Milind %X Stackelberg Security Games (SSG) have been widely applied for solving real-world security problems — with a significant research emphasis on modeling attackers’ behaviors to handle their bounded rationality. However, access to real-world data (used for learning an accurate behavioral model) is often limited, leading to uncertainty in attacker’s behaviors while modeling. This paper therefore focuses on addressing behavioral uncertainty in SSG with the following main contributions: 1) we present a new uncertainty game model that integrates uncertainty intervals into a behavioral model to capture behavioral uncertainty; and 2) based on this game model, we propose a novel robust algorithm that approximately computes the defender’s optimal strategy in the worst-case scenario of uncertainty. We show that our algorithm guarantees an additive bound on its solution quality. %B ParSocial Workshop 2016 %G eng %0 Conference Paper %B International Conference on Cognitive Modeling(ICCM) %D 2016 %T Adversaries Wising Up: Modeling Heterogeneity and Dynamics of Behavior %A Abbasi, Yasaman Dehghani %A Sintov, Nicole %A Tambe, Milind %A Gonzalez, Cleotilde %A Morrison, Don %A Ben-Asher, Noam %X Security is an important concern worldwide. Stackelberg Security Games have been used successfully in a variety of security applications, to optimally schedule limited defense resources by modeling the interaction between attackers and defenders. Prior research has suggested that it is possible to classify adversary behavior into distinct groups of adversaries based on the ways humans explore their decision alternatives. However, despite the widespread use of Stackelberg Security Games, there has been little research on how adversaries adapt to defense strategies over time (i.e., dynamics of behavior). In this paper, we advance this work by showing how adversaries’ behavior changes as they learn the defenders’ behavior over time. Furthermore, we show how behavioral game theory models can be modified to capture learning dynamics using a Bayesian Updating modeling approach. These models perform similarly to a cognitive model known as Instance-Based-Learning to predict learning patterns. %B International Conference on Cognitive Modeling(ICCM) %G eng %0 Thesis %D 2016 %T Combating Adversaries under Uncertainties in Real-world Security Problems: Advanced Game-theoretic Behavioral Models and Robust Algorithms %A Nguyen, Thanh Hong %X Security is a global concern. Real-world security problems range from domains such as the protection of ports, airports, and transportation from terrorists to protecting forests, wildlife, and fisheries from smugglers, poachers, and illegal fishermen. A key challenge in solving these security problems is that security resources are limited; not all targets can be protected all the time. Therefore, security resources must be deployed intelligently, taking into account the responses of adversaries and potential uncertainties over their types, priorities, and knowledge. Stackelberg Security Games (SSG) have drawn a significant amount of interest from security agencies by capturing the strategic interaction between security agencies and human adversaries. SSG-based decision aids are in widespread use (both nationally and internationally) for the protection of assets such as major ports in the US, airport terminals, and wildlife and fisheries. My research focuses on addressing uncertainties in SSGs — one recognized area of weakness. My thesis provides innovative techniques and significant advances in addressing these uncertainties in SSGs. First, in many security problems, human adversaries are known to be boundedly rational, and often choose targets with non-highest expected value to attack. I introduce novel behavioral models of adversaries which significantly advance the state-of-the-art in capturing the adversaries’ decision making. More specifically, my new model for predicting poachers’ behavior in wildlife protection is the first game-theoretic model which takes into account key domain challenges including imperfect poaching data and complex temporal dependencies in poachers’ behavior. The superiority of my new models over the existing ones is demonstrated via extensive experiments based on the biggest real-world poaching dataset, collected in a national park in Uganda over 12 years. Second, my research also focuses on developing new robust algorithms which address uncertainties in real-world security problems. I present the first unified maximinbased robust algorithm — a single algorithm — to handle all different types of uncertainties explored in SSGs. Furthermore, I propose a less conservative decision criterion; minimax regret, for generating new, candidate defensive strategies that handle uncertainties in SSGs. In fact, minimax regret and maximin can be used in different security situations which may demand different robust criteria. I then present novel robust algorithms to compute minimax regret for addressing payoff uncertainty. A contribution of particular significance is that my work is deployed in the real world; I have deployed my robust algorithms and behavioral models in the PAWS system, which is currently being used by NGOs (Panthera and Rimba) in a conservation area in Malaysia. %G eng %9 PhD thesis %0 Conference Paper %B Artificial Intelligence Journal (AIJ), Elsevier, DOI %D 2016 %T Comparing Human Behavior Models in Stackelberg Security Games: An Extended Study %A Kar, Debarun %A Fang, Fei %A Francesco M. Delle Fave %A Sintov, Nicole %A Tambe, Milind %A Lyet, Arnaud %X Several competing human behavior models have been proposed to model boundedly rational adversaries in repeated Stackelberg Security Games (SSG). However, these existing models fail to address three main issues which are detrimental to defender performance. First, while they attempt to learn adversary behavior models from adversaries’ past actions (“attacks on targets”), they fail to take into account adversaries’ future adaptation based on successes or failures of these past actions. Second, existing algorithms fail to learn a reliable model of the adversary unless there exists sufficient data collected by exposing enough of the attack surface — a situation that often arises in initial rounds of the repeated SSG. Third, current leading models have failed to include probability weighting functions, even though it is well known that human beings’ weighting of probability is typically nonlinear. To address these limitations of existing models, this article provides three main contributions. Our first contribution is a new human behavior model, SHARP, which mitigates these three limitations as follows: (i) SHARP reasons based on success or failure of the adversary’s past actions on exposed portions of the attack surface to model adversary adaptivity; (ii) SHARP reasons about similarity between exposed and unexposed areas of the attack surface, and also incorporates a discounting parameter to mitigate adversary’s lack of exposure to enough of the attack surface; and (iii) SHARP integrates a non-linear probability weighting function to capture the adversary’s true weighting of probability. Our second contribution is a first “repeated measures study” – at least in the context of SSGs – of competing human behavior models. This study, where each experiment lasted a period of multiple weeks with individual sets of human subjects on the Amazon Mechanical Turk platform, illustrates the strengths and weaknesses of different models and shows the advantages of SHARP. Our third major contribution is to demonstrate SHARP’s superiority by conducting real-world human subjects experiments at the Bukit Barisan Seletan National Park in Indonesia against wildlife security experts. %B Artificial Intelligence Journal (AIJ), Elsevier, DOI %G eng %U http://dx.doi.org/10.1016/j.artint.2016.08.002 %0 Conference Paper %B 30th AAAI Conference on Artificial Intelligence (AAAI) (Student Abstract) %D 2016 %T Conquering Adversary Behavioral Uncertainty in Security Games: An Efficient Modeling Robust based Algorithm %X Real-world deployed applications of Stackelberg Security Games (Shieh et al. 2012; Basilico, Gatti, and Amigoni 2009; Letchford and Vorobeychik 2011) have led to significant research emphasis on modeling the attacker’s bounded rationality (Yang et al. 2011; Nguyen et al. 2013). One key assumption in behavioral modeling is the availability of a significant amount of data to obtain an accurate prediction. However, in real-world security domains such as the wildlife protection, this assumption may be inapplicable due to the limited access to real-world data (Lemieux 2014), leading to uncertainty in the attacker’s behaviors — a key research challenge of security problems. Recent research has focused on addressing uncertainty in behavioral modeling, following two different approaches: 1) one approach assumes a known distribution of multiple attacker types, each follows a certain behavioral model, and attempts to solve the resulting Bayesian games (Yang et al. 2014); and 2) another considers the existence of multiple attacker types of which behavioral models are perfectly known, but without a known distribution over the types. It then only considers the worst attacker type for the defender (Brown, Haskell, and Tambe 2014). These two approaches have several limitations. First, both still require a sufficient amount of data to precisely estimate either the distribution over attacker types (the former approach) or the model parameters for each individual type (the latter approach). Second, solving the resulting Bayesian games in the former case is computationally expensive. Third, the latter approach tends to be overly conservative as it only focuses on the worst-case attacker type. This paper remedies these shortcomings of state-of-theart approaches when addressing behavioral uncertainty in SSG by providing three key contributions. First, we present a new game model with uncertainty in which we consider a single behavioral model to capture decision making of the whole attacker population (instead of multiple behavioral models); uncertainty intervals are integrated with the chosen model to capture behavioral uncertainty. The idea of uncertainty interval is commonly used in literature (Aghassi and Bertsimas 2006) and has been shown to effectively represent uncertainty in SSG (Kiekintveld, Islam, and Kreinovich 2013). Second, based on this game model, we propose a new efficient robust algorithm that computes the defender’s optimal strategy which is robust to the uncertainty. Overall, the resulting robust optimization problem for computing the defender’s optimal strategy against the worst case of behavioral uncertainty is a non-linear non-convex fractional maximin problem. Our algorithm efficiently solves this problem based on the following key insights: 1) it converts the problem into a single maximization problem via a non-linear conversion for fractional terms and the dual of the inner minimization in maximin; 2) a binary search is then applied to remove the fractional terms; and 3) the algorithm explores extreme points of the feasible solution region and uses a piece-wise linear approximation to convert the problem into a Mixed Integer Linear Program (MILP). Our new algorithm provides an O(+ 1 K )-optimal solution where  is the convergence threshold for the binary search and K is the number of segments in the piecewise approximation. %B 30th AAAI Conference on Artificial Intelligence (AAAI) (Student Abstract) %G eng %0 Conference Paper %B Decision and Game Theory for Security (GameSec 2016) %D 2016 %T Data Exfiltration Detection and Prevention: Virtually Distributed POMDPs for Practically Safer Networks %A Mc Carthy, Sara Marie %A Sinha, Arunesh %A Tambe, Milind %A Manadhata, Pratyusa %X We address the challenge of detecting and addressing advanced persistent threats (APTs) in a computer network, focusing in particular on the challenge of detecting data exfiltration over Domain Name System (DNS) queries, where existing detection sensors are imperfect and lead to noisy observations about the network’s security state. Data exfiltration over DNS queries involves unauthorized transfer of sensitive data from an organization to a remote adversary through a DNS data tunnel to a malicious web domain. Given the noisy sensors, previous work has illustrated that standard approaches fail to satisfactorily rise to the challenge of detecting exfiltration attempts. Instead, we propose a decision-theoretic technique that sequentially plans to accumulate evidence under uncertainty while taking into account the cost of deploying such sensors. More specifically, we provide a fast scalable POMDP formulation to address the challenge, where the efficiency of the formulation is based on two key contributions: (i) we use a virtually distributed POMDP (VD-POMDP) formulation, motivated by previous work in distributed POMDPs with sparse interactions, where individual policies for different sub-POMDPs are planned separately but their sparse interactions are only resolved at execution time to determine the joint actions to perform; (ii) we allow for abstraction in planning for speedups, and then use a fast MILP to implement the abstraction while resolving any interactions. This allows us to determine optimal sensing strategies, leveraging information from many noisy detectors, and subject to constraints imposed by network topology, forwarding rules and performance costs on the frequency, scope and efficiency of sensing we can perform. %B Decision and Game Theory for Security (GameSec 2016) %G eng %0 Conference Paper %D 2016 %T Deploying PAWS to Combat Poaching: Game-theoretic Patrolling in Areas with Complex Terrains (Demonstration) %A Fang, Fei %A Thanh H. Nguyen %A Pickles, Rob %A Wai Y. Lam %A Gopalasamy R. Clements %A An, Bo %A Singh, Amandeep %A Tambe, Milind %X The conservation of key wildlife species such as tigers and elephants are threatened by poaching activities. In many conservation areas, foot patrols are conducted to prevent poaching but they may not be well-planned to make the best use of the limited patrolling resources. While prior work has introduced PAWS (Protection Assistant for Wildlife Security) as a game-theoretic decision aid to design effective foot patrol strategies to protect wildlife, the patrol routes generated by PAWS may be difficult to follow in areas with complex terrain. Subsequent research has worked on the significant evolution of PAWS, from an emerging application to a regularly deployed software. A key advance of the deployed version of PAWS is that it incorporates the complex terrain information and generates a strategy consisting of easy-to-follow routes. In this demonstration, we provide 1) a video introducing the PAWS system; 2) an interactive visualization of the patrol routes generated by PAWS in an example area with complex terrain; and 3) a machine-human competition in designing patrol strategy given complex terrain and animal distribution. %G eng %0 Conference Paper %B Conference on Decision and Game Theory for Security (GameSec 2016) %D 2016 %T Divide to Defend: Collusive Security Games %A Gholami, Shahrzad %A Bryan Wilder %A Brown, Matthew %A Thomas, Dana %A Sintov, Nicole %A Tambe, Milind %X Research on security games has focused on settings where the defender must protect against either a single adversary or multiple, independent adversaries. However, there are a variety of real-world security domains where adversaries may benefit from colluding in their actions against the defender, e.g., wildlife poaching, urban crime and drug trafficking. Given such adversary collusion may be more detrimental for the defender, she has an incentive to break up collusion by playing off the self-interest of individual adversaries. As we show in this paper, breaking up such collusion is difficult given bounded rationality of human adversaries; we therefore investigate algorithms for the defender assuming both rational and boundedly rational adversaries. The contributions of this paper include (i) collusive security games (COSGs), a model for security games involving potential collusion among adversaries, (ii) SPECTRE-R, an algorithm to solve COSGs and break collusion assuming rational adversaries, (iii) observations and analyses of adversary behavior and the underlying factors including bounded rationality, imbalanced- resource-allocation effect, coverage perception, and individualism / collectivism attitudes within COSGs with data from 700 human subjects, (iv) a learned human behavioral model that incorporates these factors to predict when collusion will occur, (v) SPECTRE-BR, an enhanced algorithm which optimizes against the learned behavior model to provide demonstrably better performing defender strategies against human subjects compared to SPECTRE-R. %B Conference on Decision and Game Theory for Security (GameSec 2016) %G eng %0 Journal Article %J Multiagent and Grid Systems (MAGS) Journal %D 2016 %T An Extended Study on Addressing Defender Teamwork while Accounting for Uncertainty in Attacker Defender Games using Iterative Dec-MDPs %A Shieh, Eric %A Xin Jiang, Albert %A Amulya Yadav %A Varakantham, Pradeep %A Tambe, Milind %X Multi-agent teamwork and defender-attacker security games are two areas that are currently receiving significant attention within multi-agent systems research. Unfortunately, despite the need for effective teamwork among multiple defenders, little has been done to harness the teamwork research in security games. The problem that this paper seeks to solve is the coordination of decentralized defender agents in the presence of uncertainty while securing targets against an observing adversary. To address this problem, we offer the following novel contributions in this paper: (i) New model of security games with defender teams that coordinate under uncertainty; (ii) New algorithm based on column generation that utilizes Decentralized Markov Decision Processes (Dec-MDPs) to generate defender strategies that incorporate uncertainty; (iii) New techniques to handle global events (when one or more agents may leave the system) during defender execution; (iv) Heuristics that help scale up in the number of targets and agents to handle real-world scenarios; (v) Exploration of the robustness of randomized pure strategies. The paper opens the door to a potentially new area combining computational game theory and multi-agent teamwork. %B Multiagent and Grid Systems (MAGS) Journal %G eng %0 Conference Paper %B Symposium on Educational Advances in Artificial Intelligence (EAAI) 2016 %D 2016 %T From the Lab to the Classroom and Beyond: Extending a Game-Based Research Platform for Teaching AI to Diverse Audiences %A Sintov, Nicole %A Kar, Debarun %A Nguyen, Thanh %A Fang, Fei %A Hoffman, Kevin %A Lyet, Arnaud %A Tambe, Milind %X Recent years have seen increasing interest in AI from outside the AI community. This is partly due to applications based on AI that have been used in real-world domains, for example, the successful deployment of game theory-based decision aids in security domains. This paper describes our teaching approach for introducing the AI concepts underlying security games to diverse audiences. We adapted a game-based research platform that served as a testbed for recent research advances in computational game theory into a set of interactive role-playing games. We guided learners in playing these games as part of our teaching strategy, which also included didactic instruction and interactive exercises on broader AI topics. We describe our experience in applying this teaching approach to diverse audiences, including students of an urban public high school, university undergraduates, and security domain experts who protect wildlife. We evaluate our approach based on results from the games and participant surveys. %B Symposium on Educational Advances in Artificial Intelligence (EAAI) 2016 %G eng %0 Thesis %D 2016 %T The Future of Counterinsurgency Modeling: Decision Aids for United States Army Commanders %A Andrew Plucker %G eng %9 MS thesis %0 Conference Paper %B Workshop on security and multiagent systems, International conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2016 %T A Game Theoretic Approach on Addressing Collusion among Human Adversaries %A Gholami, Shahrzad %A Bryan Wilder %A Brown, Matthew %A Sinha, Arunesh %A Sintov, Nicole %A Tambe, Milind %X Several models have been proposed for Stackelberg security games (SSGs) and protection against perfectly rational and bounded rational adversaries; however, none of these existing models addressed the collusion mechanism between adversaries. In a large number of studies related to SSGs, there is one leader and one follower in the game such that the leader takes action and the follower responds accordingly. These studies fail to take into account the possibility of existence of group of adversaries who can collude and cause synergistic loss to the security agents (defenders). The first contribution of this paper is formulating a new type of Stackleberg security game involving a beneficial collusion mechanism among adversaries. The second contribution of this paper is to develop a parametric human behavior model which is able to capture the bounded rationality of adversaries in this type of collusive games. This model is proposed based on human subject experiments with participants on Amazon Mechanical Turk (AMT). %B Workshop on security and multiagent systems, International conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B 22nd European Conference on Artificial Intelligence (ECAI) %D 2016 %T Get Me to My GATE On Time: Efficiently Solving General-Sum Bayesian Threat Screening Games %A Schlenker, Aaron %A Brown, Matthew %A Sinha, Arunesh %A Tambe, Milind %A Mehta, Ruta %X Threat Screening Games (TSGs) are used in domains where there is a set of individuals or objects to screen with a limited amount of screening resources available to screen them. TSGs are broadly applicable to domains like airport passenger screening, stadium screening, cargo container screening, etc. Previous work on TSGs focused only on the Bayesian zero-sum case and provided the MGA algorithm to solve these games. In this paper, we solve Bayesian general-sum TSGs which we prove are NP-hard even when exploiting a compact marginal representation. We also present an algorithm based upon a adversary type hierarchical tree decomposition and an efficient branch-and-bound search to solve Bayesian generalsum TSGs. With this we provide four contributions: (1) GATE, the first algorithm for solving Bayesian general-sum TSGs, which uses hierarchical type trees and a novel branch-and-bound search, (2) the Branch-and-Guide approach which combines branch-and-bound search with the MGA algorithm for the first time, (3) heuristics based on properties of TSGs for accelerated computation of GATE, and (4) experimental results showing the scalability of GATE needed for real-world domains. %B 22nd European Conference on Artificial Intelligence (ECAI) %G eng %0 Thesis %D 2016 %T Handling Attacker's Preference in Security Domains: Robust Optimization and Learning Approaches %A Qian, Yundi %X Stackelberg security games (SSGs) are now established as a powerful tool in security domains. In order to compute the optimal strategy for the defender in SSG model, the defender needs to know the attacker’s preferences over targets so that she can predict how the attacker would react under a certain defender strategy. Uncertainty over attacker preferences may cause the defender to suffer significant losses. Motivated by that, my thesis focuses on addressing uncertainty in attacker preferences using robust and learning approaches. In security domains with one-shot attack, e.g., counter-terrorism domains, the defender is interested in robust approaches that can provide performance guarantee in the worst case. The first part of my thesis focuses on handling attacker’s preference uncertainty with robust approaches in these domains. My work considers a new dimension of preference uncertainty that has not been taken into account in previous literatures: the risk preference uncertainty of the attacker, and propose an algorithm to efficiently compute defender’s robust strategy against uncertain risk-aware attackers. In security domains with repeated attacks, e.g., green security domain of protecting natural resources, the attacker “attacks” (illegally extracts natural resources) frequently, so it is possible for the defender to learn attacker’s preference from their previous actions. %G eng %9 PhD thesis %0 Conference Paper %B 30th AAAI Conference on Artificial Intelligence (AAAI 2016) %D 2016 %T A Hands-on Musical Experience in AI, Games and Art (Demonstration) %A G. R. Martins %A M. Escarce Junior %A L. S. Marcolino %X AI is typically applied in video games in the creation of artificial opponents, in order to make them strong, realistic or even fallible (for the game to be “enjoyable” by human players). We offer a different perspective: we present the concept of “Art Games”, a view that opens up many possibilities for AI research and applications. Conference participants will play Jikan to Kukan, an art game where the player dynamically creates the soundtrack with the AI system, while developing her experience in the unconscious world of a character. %B 30th AAAI Conference on Artificial Intelligence (AAAI 2016) %G eng %0 Journal Article %J Games Journal %D 2016 %T Keeping Pace with Criminals: An Extended Study of Designing Patrol Allocation against Adaptive Opportunistic Criminals %A Zhang, Chao %A Gholami, Shahrzad %A Kar, Debarun %A Sinha, Arunesh %A Jain, Manish %A Goyal, Ripple %A Tambe, Milind %X Game theoretic approaches have recently been used to model the deterrence effect of patrol officers’ assignments on opportunistic crimes in urban areas. One major challenge in this domain is modeling the behavior of opportunistic criminals. Compared to strategic attackers (such as terrorists) who execute a well-laid out plan, opportunistic criminals are less strategic in planning attacks and more flexible in executing well-laid plans based on their knowledge of patrol officers’ assignments. In this paper, we aim to design an optimal police patrolling strategy against opportunistic criminals in urban areas. Our approach is comprised by two major parts: learning a model of the opportunistic criminal (and how he or she responds to patrols) and then planning optimal patrols against this learned model. The planning part, by using information about how criminals responds to patrols, takes into account the strategic game interaction between the police and criminals. In more detail, first, we propose two categories of models for modeling opportunistic crimes. The first category of models learns the relationship between defender strategy and crime distribution as a Markov chain. The second category of models represents the interaction of criminals and patrol officers as a Dynamic Bayesian Network (DBN) with the number of criminals as the unobserved hidden states. To this end, we: (i) apply standard algorithms, such as Expectation Maximization (EM), to learn the parameters of the DBN; (ii) modify the DBN representation that allows for a compact representation of the model, resulting in better learning accuracy and the increased speed of learning of the EM algorithm when used for the modified DBN. These modifications exploit the structure of the problem and use independence assumptions to factorize the large joint probability distributions. Next, we propose an iterative learning and planning mechanism that periodically updates the adversary model. We demonstrate the efficiency of our learning algorithms by applying them to a real dataset of criminal activity obtained from the police department of the University of Southern California (USC) situated in Los Angeles, CA, USA. We project a significant reduction in crime rate using our planning strategy as compared to the actual strategy deployed by the police department. We also demonstrate the improvement in crime prevention in simulation when we use our iterative planning and learning mechanism when compared to just learning once and planning. Finally, we introduce a web-based software for recommending patrol strategies, which is currently deployed at USC. In the near future, our learning and planning algorithm is planned to be integrated with this software. This work was done in collaboration with the police department of USC. %B Games Journal %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2016 %T Learning Adversary Behavior in Security Games: A PAC Model Perspective %A Sinha, Arunesh %A Kar, Debarun %A Tambe, Milind %X Recent applications of Stackelberg Security Games (SSG), from wildlife crime to urban crime, have employed machine learning tools to learn and predict adversary behavior using available data about defender-adversary interactions. Given these recent developments, this paper commits to an approach of directly learning the response function of the adversary. Using the PAC model, this paper lays a firm theoretical foundation for learning in SSGs and provides utility guarantees when the learned adversary model is used to plan the defender’s strategy. The paper also aims to answer practical questions such as how much more data is needed to improve an adversary model’s accuracy. Additionally, we explain a recently observed phenomenon that prediction accuracy of learned adversary behavior is not enough to discover the utility maximizing defender strategy. We provide four main contributions: (1) a PAC model of learning adversary response functions in SSGs; (2) PAC-model analysis of the learning of key, existing bounded rationality models in SSGs; (3) an entirely new approach to adversary modeling based on a non-parametric class of response functions with PAC-model analysis and (4) identification of conditions under which computing the best defender strategy against the learned adversary behavior is indeed the optimal strategy. Finally, we conduct experiments with real-world data from a national park in Uganda, showing the benefit of our new adversary modeling approach and verification of our PAC model predictions. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Thesis %D 2016 %T Modeling Human Bounded Rationality in Opportunistic Security Games %A Abbasi, Yasaman Dehghani %X Security has been an important, world-wild concern over the past decades. Security agencies have been established to prevent different types of crimes in various domains, such as illegal poaching, human trafficking, terrorist attacks to ports and airports, and urban crimes. Unfortunately, in all these domains, security agencies have limited resources and cannot protect all potential targets at all time. Therefore, it is critical for the security agencies to allocate their limited resources optimally to protect potential targets from the adversary. Recently, game-theoretic decision support systems have been applied to assist defenders (e.g. security agencies) in allocating and scheduling their limited resources. Stackelberg Security Game (denoted as SSG), is an example of a game-theoretic model that has been deployed to assign the security resources to the potential targets. Indeed, decision-support systems based on SSG models have been successfully implemented to assist real-world security agencies in protecting critical infrastructure such as airports, ports, or suppressing crime in urban areas. SSG provides an approach for generating randomized protection strategies for the defender using a mathematical representation of the interaction between the defender and the attacker. Therefore, one of the key steps in applying the SSG algorithm to real-world security problems is to model adversary decision-making process. Building upon the success of SSGs applications, game theory is now being applied to adjacent domains such as Opportunistic Security. In this domain, the defender is faced with adversaries with special characteristics. Opportunistic criminals carry out repeated, and frequent illegal activities (attacks), and they generally do not conduct extensive surveillance before performing an attack and spend less time and effort in planning each attack. To that end, in my thesis, I focus on modeling the opportunistic criminals’ behavior in which modeling adversary decision-making process is particularly crucial to develop efficient patrolling strategies for the defenders. I provide an empirical investigation of adversary behavior in opportunistic crime settings by conducting extensive human subject experiments and analyzing how participants are making their decisions to create adversary behavior prediction models to be deployed in many opportunistic crime domains. More specifically, this thesis provides (i) a comprehensive answer to the question that “which of the proposed human bounded rationality models best predicts adversaries’ behavior in the Opportunistic Crime domain?”, (ii) enhanced human behavior models which outperform existing state-of-the-art models (iii) a detailed comparison between human behavior models and well-known Cognitive Science model: InstanceBased Learning model (iv) an extensive study on the heterogeneity of adversarial behavior, and (v) a thorough study of human behavior changing over time, (vi) as well as how to improve human behavior models to account for the adversaries’ behavior evolve over time. %G eng %9 PhD thesis %0 Conference Paper %B Coordination, Organizations, Institutions and Norms in Agent Systems XI. Springer-Verlag Lecture Notes in AI %D 2016 %T Multi-agent Team Formation for Design Problems %A L. S. Marcolino %A Xu, H. %A D. Gerber %A B. Kolev %A S. Price %A E. Pantazis %A M. Tambe %X Design imposes a novel social choice problem: using a team of voting agents, maximize the number of optimal solutions; allowing a user to then take an aesthetical choice. In an open system of design agents, team formation is fundamental. We present the first model of agent teams for design. For maximum applicability, we envision agents that are queried for a single opinion, and multiple solutions are obtained by multiple iterations. We show that diverse teams composed of agents with different preferences maximize the number of optimal solutions, while uniform teams composed of multiple copies of the best agent are in general suboptimal. Our experiments study the model in bounded time; and we also study a real system, where agents vote to design buildings. %B Coordination, Organizations, Institutions and Norms in Agent Systems XI. Springer-Verlag Lecture Notes in AI %G eng %0 Conference Paper %B The 17 ACM Conference on Economics and Computation (ACM-EC) %D 2016 %T The Mysteries of Security Games: Equilibrium Computation Becomes Combinatorial Algorithm Design %A Haifeng Xu %X The security game is a basic model for resource allocation in adversarial environments. Here there are two players, a defender and an attacker. The defender wants to allocate her limited resources to defend critical targets and the attacker seeks his most favorable target to attack. In the past decade, there has been a surge of research interest in analyzing and solving security games that are motivated by applications from various domains. Remarkably, these models and their game-theoretic solutions have led to real-world deployments in use by major security agencies like the LAX airport, the US Coast Guard and Federal Air Marshal Service, as well as non-governmental organizations. Among all these research and applications, equilibrium computation serves as a foundation. This paper examines security games from a theoretical perspective and provides a unified view of various security game models. In particular, each security game can be characterized by a set system E which consists of the defender’s pure strategies; The defender’s best response problem can be viewed as a combinatorial optimization problem over E. Our framework captures most of the basic security game models in the literature, including all the deployed systems; The set system E arising from various domains encodes standard combinatorial problems like bipartite matching, maximum coverage, min-cost flow, packing problems, etc. Our main result shows that equilibrium computation in security games is essentially a combinatorial problem. In particular, we prove that, for any set system E, the following problems can be reduced to each other in polynomial time: (0) combinatorial optimization over E; (1) computing the minimax equilibrium for zero-sum security games over E; (2) computing the strong Stackelberg equilibrium for security games over E; (3) computing the best or worst (for the defender) Nash equilibrium for security games over E. Therefore, the hardness [polynomial solvability] of any of these problems implies the hardness [polynomial solvability] of all the others. Here, by “games over E” we mean the class of security games with arbitrary payoff structures, but a fixed set E of defender pure strategies. This shows that the complexity of a security game is essentially determined by the set system E. We view drawing these connections as an important conceptual contribution of this paper. %B The 17 ACM Conference on Economics and Computation (ACM-EC) %G eng %0 Conference Paper %B AAAI conference on Artificial Intelligence (AAAI) %D 2016 %T One Size Does Not Fit All: A Game-Theoretic Approach for Dynamically and Effectively Screening for Threats %A Brown, Matthew %A Sinha, Arunesh %A Schlenker, Aaron %A Tambe, Milind %X An effective way of preventing attacks in secure areas is to screen for threats (people, objects) before entry, e.g., screening of airport passengers. However, screening every entity at the same level may be both ineffective and undesirable. The challenge then is to find a dynamic approach for randomized screening, allowing for more effective use of limited screening resources, leading to improved security. We address this challenge with the following contributions: (1) a threat screening game (TSG) model for general screening domains; (2) an NP-hardness proof for computing the optimal strategy of TSGs; (3) a scheme for decomposing TSGs into subgames to improve scalability; (4) a novel algorithm that exploits a compact game representation to efficiently solve TSGs, providing the optimal solution under certain conditions; and (5) an empirical comparison of our proposed algorithm against the current state-ofthe-art optimal approach for large-scale game-theoretic resource allocation problems. %B AAAI conference on Artificial Intelligence (AAAI) %G eng %0 Thesis %D 2016 %T Opportunistic Crime Security Games: Assisting Police to Control Urban Crime Using Real World Data %A Zhang, Chao %X Crime in urban areas plagues every city in all countries. A notable characteristic of urban crime, distinct from organized terrorist attacks, is that most urban crimes are opportunistic in nature, i.e., criminals do not plan their attacks in detail, rather they seek opportunities for committing crime and are agile in their execution of the crime. In order to deter such crimes, police officers conduct patrols with the aim of preventing crime. However, by observing on the spot the actual presence of patrol units, the criminals can adapt their strategy by seeking crime opportunity in less effectively patrolled location. The problem of where and how much to patrol is therefore important. My thesis focuses on addressing such opportunistic crime by introducing a new gametheoretic framework and algorithms. I first introduce the Opportunistic Security Game (OSG), a computational framework to recommend deployment strategies for defenders to control opportunistic crimes. I propose a new exact algorithm EOSG to optimize defender strategies given our opportunistic adversaries. Then I develop a fast heuristic algorithm to solve large-scale OSG problems, exploiting a compact representation. The next contribution in my thesis is a Dynamic Bayesian Network (DBN) to learn the OSG model from real-world criminal activity. Standard Algorithm such as EM can be applied to learn the parameters. Also, I propose a sequence of modifications that allows for a compact representation of the model resulting in better learning accuracy and increased speed of learning of the EM algorithm. Finally, I propose a game abstraction framework that can handle opportunistic crimes in large-scale urban areas. I propose a planning algorithm that recommends a mixed strategy against opportunistic criminals in this abstraction framework. As part of our collaboration with local police departments, we apply our model in two large scale urban problems: USC campus and the city of Nashville. Our approach provides high prediction accuracy in the real datasets; furthermore, we project significant crime rate reduction using our planning strategy compared to current police strategy %G eng %9 PhD thesis %0 Conference Paper %B Decision and Game Theory for Security (GameSec 2016) %D 2016 %T Optimal Allocation of Police Patrol Resources Using a Continuous-Time Crime Model %A Mukhopadhyay, Ayan %A Zhang, Chao %A Vorobeychik, Yevgeniy %A Tambe, Milind %A Pence, Kenneth %A Speer, Paul %X Police departments worldwide are eager to develop better patrolling methods to manage the complex and evolving crime landscape. Surprisingly, the problem of spatial police patrol allocation to optimize expected crime response time has not been systematically addressed in prior research. We develop a bi-level optimization framework to address this problem. Our framework includes novel linear programming patrol response formulations. Bender’s decomposition is then utilized to solve the underlying optimization problem. A key challenge we encounter is that criminals may respond to police patrols, thereby shifting the distribution of crime in space and time. To address this, we develop a novel iterative Bender’s decomposition approach. Our validation involves a novel spatio-temporal continuous-time model of crime based on survival analysis, which we learn using real crime and police patrol data for Nashville, TN. We demonstrate that our model is more accurate, and much faster, than state-of-the-art alternatives. Using this model in the bi-level optimization framework, we demonstrate that our decision theoretic approach outperforms alternatives, including actual police patrol policies. %B Decision and Game Theory for Security (GameSec 2016) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2016 %T Playing Security Games with No Prior Knowledge %A Haifeng Xu %A Long Tran Thanh %A Nick Jennings %X This paper investigates repeated security games with unknown (to the defender) game payoffs and attacker behaviors. As existing work assumes prior knowledge about either the game payoffs or the attacker’s behaviors, they are not suitable for tackling our problem. Given this, we propose the first efficient defender strategy, based on an adversarial online learning framework, that can provably achieve good performance guarantees without any prior knowledge. In particular, we prove that our algorithm can achieve low performance loss against the best fixed strategy on hindsight (i.e., having full knowledge of the attacker’s moves). In addition, we prove that our algorithm can achieve an efficient competitive ratio against the optimal adaptive defender strategy. We also show that for zero-sum security games, our algorithm achieves efficient results in approximating a number of solution concepts, such as algorithmic equilibria and the minimax value. Finally, our extensive numerical results demonstrate that, without having any prior information, our algorithm still achieves good performance, compared to state-of-the-art algorithms from the literature on security games, such as SUQR [19], which require significant amount of prior knowledge. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B International Conference on Practical Applications of Agents and Multi-Agent Systems (PAAMS) %D 2016 %T Protecting the NECTAR of the Ganga River through Game-Theoretic Factory Inspections %A Ford, Benjamin %A Brown, Matthew %A Amulya Yadav %A Singh, Amandeep %A Sinha, Arunesh %A Srivastava, Biplav %A Kiekintveld, Christopher %A Tambe, Milind %X Leather is an integral part of the world economy and a substantial income source for developing countries. Despite government regulations on leather tannery waste emissions, inspection agencies lack adequate enforcement resources, and tanneries’ toxic wastewaters wreak havoc on surrounding ecosystems and communities. Previous works in this domain stop short of generating executable solutions for inspection agencies. We introduce NECTAR - the first security game application to generate environmental compliance inspection schedules. NECTAR’s game model addresses many important real-world constraints: a lack of defender resources is alleviated via a secondary inspection type; imperfect inspections are modeled via a heterogeneous failure rate; and uncertainty, in traveling through a road network and in conducting inspections, is addressed via a Markov Decision Process. To evaluate our model, we conduct a series of simulations and analyze their policy implications. %B International Conference on Practical Applications of Agents and Multi-Agent Systems (PAAMS) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2016 %T Restless Poachers: Handling Exploration-Exploitation Tradeoffs in Security Domains %A Qian, Yundi %A Zhang, Chao %A Krishnamachari, Bhaskar %A Tambe, Milind %X The success of Stackelberg Security Games (SSGs) in counterterrorism domains has inspired researchers’ interest in applying game-theoretic models to other security domains with frequent interactions between defenders and attackers, e.g., wildlife protection. Previous research optimizes defenders’ strategies by modeling this problem as a repeated Stackelberg game, capturing the special property in this domain — frequent interactions between defenders and attackers. However, this research fails to handle exploration-exploitation tradeoff in this domain caused by the fact that defenders only have knowledge of attack activities at targets they protect. This paper addresses this shortcoming and provides the following contributions: (i) We formulate the problem as a restless multi-armed bandit (RMAB) model to address this challenge. (ii) To use Whittle index policy to plan for patrol strategies in the RMAB, we provide two sufficient conditions for indexability and an algorithm to numerically evaluate indexability. (iii) Given indexability, we propose a binary search based algorithm to find Whittle index policy efficiently. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2016 %T Signaling in Bayesian Stackelberg Games %A Haifeng Xu %A Freeman, Rupert %A Vincent Conitzer %A Dughmi, Shaddin %A Tambe, Milind %X Algorithms for solving Stackelberg games are used in an evergrowing variety of real-world domains. Previous work has extended this framework to allow the leader to commit not only to a distribution over actions, but also to a scheme for stochastically signaling information about these actions to the follower. This can result in higher utility for the leader. In this paper, we extend this methodology to Bayesian games, in which either the leader or the follower has payoff-relevant private information or both. This leads to novel variants of the model, for example by imposing an incentive compatibility constraint for each type to listen to the signal intended for it. We show that, in contrast to previous hardness results for the case without signaling [5, 16], we can solve unrestricted games in time polynomial in their natural representation. For security games, we obtain hardness results as well as efficient algorithms, depending on the settings. We show the benefits of our approach in experimental evaluations of our algorithms. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B 3rd Workshop on Expanding the Boundaries of Health Informatics Using AI (HIAI'16) %D 2016 %T Simultaneous Influencing and Mapping for Health Interventions %A L. S. Marcolino %A A. Lakshminarayanan %A A. Yadav %A M. Tambe %X Influence Maximization is an active topic, but it was always assumed full knowledge of the social network graph. However, the graph may actually be unknown beforehand. For example, when selecting a subset of a homeless population to attend interventions concerning health, we deal with a network that is not fully known. Hence, we introduce the novel problem of simultaneously influencing and mapping (i.e., learning) the graph. We study a class of algorithms, where we show that: (i) traditional algorithms may have arbitrarily low performance; (ii) we can effectively influence and map when the independence of objectives hypothesis holds; (iii) when it does not hold, the upper bound for the influence loss converges to 0. We run extensive experiments over four real-life social networks, where we study two alternative models, and obtain significantly better results in both than traditional approaches. %B 3rd Workshop on Expanding the Boundaries of Health Informatics Using AI (HIAI'16) %G eng %0 Conference Paper %B International conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2016 %T SPECTRE: A Game Theoretic Framework for Preventing Collusion in Security Games (Demonstration) %A Gholami, Shahrzad %A Bryan Wilder %A Brown, Matthew %A Sinha, Arunesh %A Sintov, Nicole %A Tambe, Milind %X Several models have been proposed for Stackelberg security games (SSGs) and protection against perfectly rational and bounded rational adversaries; however, none of these existing models addressed the destructive cooperation mechanism between adversaries. SPECTRE (Strategic Patrol planner to Extinguish Collusive ThREats) takes into account the synergistic destructive collusion among two groups of adversaries in security games. This framework is designed for the purpose of efficient patrol scheduling for security agents in security games in presence of collusion and is mainly build up on game theoretic approaches, optimization techniques, machine learning methods and theories for human decision making under risk. The major advantage of SPECTRE is involving real world data from human subject experiments with participants on Amazon Mechanical Turk (AMT). %B International conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B 25th International Joint Conference on Artificial Intelligence (IJCAI) %D 2016 %T Three Strategies to Success: Learning Adversary Models in Security Games %A Haghtalab, Nika %A Fang, Fei %A Thanh H. Nguyen %A Sinha, Arunesh %A Ariel D. Procaccia %A Tambe, Milind %X State-of-the-art applications of Stackelberg security games — including wildlife protection — offer a wealth of data, which can be used to learn the behavior of the adversary. But existing approaches either make strong assumptions about the structure of the data, or gather new data through online algorithms that are likely to play severely suboptimal strategies. We develop a new approach to learning the parameters of the behavioral model of a bounded rational attacker (thereby pinpointing a near optimal strategy), by observing how the attacker responds to only three defender strategies. We also validate our approach using experiments on real and synthetic data. %B 25th International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Book Section %B New Frontiers of Multidisciplinary Research in STEAM-H (Book chapter) (edited by B Toni) %D 2016 %T Towards a Science of Security Games %A Thanh H. Nguyen %A Kar, Debarun %A Brown, Matthew %A Sinha, Arunesh %A Xin Jiang, Albert %A Tambe, Milind %X Security is a critical concern around the world. In many domains from counter-terrorism to sustainability, limited security resources prevent complete security coverage at all times. Instead, these limited resources must be scheduled (or allocated or deployed), while simultaneously taking into account the importance of different targets, the responses of the adversaries to the security posture, and the potential uncertainties in adversary payoffs and observations, etc. Computational game theory can help generate such security schedules. Indeed, casting the problem as a Stackelberg game, we have developed new algorithms that are now deployed over multiple years in multiple applications for scheduling of security resources. These applications are leading to real-world use-inspired research in the emerging research area of “security games”. The research challenges posed by these applications include scaling up security games to real-world sized problems, handling multiple types of uncertainty, and dealing with bounded rationality of human adversaries. %B New Frontiers of Multidisciplinary Research in STEAM-H (Book chapter) (edited by B Toni) %G eng %0 Thesis %D 2016 %T Towards Addressing Spatio-Temporal Aspects in Security Games %A Fang, Fei %X Game theory has been successfully used to handle complex resource allocation and patrolling problems in security and sustainability domains. More specifically, real-world applications have been deployed for different domains based on the framework of security games, where the defender (e.g., security agency) has a limited number of resources to protect a set of targets from an adversary (e.g., terrorist). Whereas the first generation of security games research provided algorithms for optimizing security resources in mostly static settings, my thesis advances the state-of-the-art to a new generation of security games, handling massive games with complex spatio-temporal settings and leading to real-world applications that have fundamentally altered current practices of security resource allocation. Indeed, in many real-world domains, players act in a geographical space over time, and my thesis is then to expand the frontiers of security games and to deal with challenges in domains with spatio-temporal dynamics. My thesis provides the first algorithms and models for advancing key aspects of spatio-temporal challenges in security games, including (i) continuous time; (ii) continuous space; (iii) frequent and repeated attacks; (iv) complex spatial constraints. First, focusing on games where actions are taken over continuous time (for example games with moving targets such as ferries and refugee supply lines), I propose a new game model that accurately models the continuous strategy space for the attacker. Based on this model, I provide an efficient algorithm to calculate the defender’s optimal strategy using a compact representation for both the defender and the attacker’s strategy space. Second, for games where actions are taken over continuous space (for example games with forest land as a target), I provide an algorithm computing the optimal distribution of patrol effort. Third, my work addresses challenges with one key dimension of complexity – frequent and repeated attacks. Motivated by the repeated interaction of players in domains such as preventing poaching and illegal fishing, I introduce a novel game model that deals with frequent defender-adversary interactions and provide algorithms to plan effective sequential defender strategies. Furthermore, I handle complex spatial constraints that arise from the problem of designing optimal patrol strategy given detailed topographical information. My thesis work has led to two applications which have been deployed in the real world and have fundamentally altered previously used tactics, including one used by the US Coast Guard for protecting the Staten Island Ferry in New York City and another deployed in a protected area in Southeast Asia to combat poaching. %G eng %0 Conference Paper %B 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2016 %T CAPTURE: A New Predictive Anti-Poaching Tool for Wildlife Protection %A Thanh H. Nguyen %A Sinha, Arunesh %A Gholami, Shahrzad %A Plumptre, Andrew %A Joppa, Lucas %A Tambe, Milind %A Driciru, Margaret %A Wanyama, Fred %A Rwetsiba, Aggrey %A Critchlow, Rob %A Colin Beale %X Wildlife poaching presents a serious extinction threat to many animal species. Agencies (“defenders”) focused on protecting such animals need tools that help analyze, model and predict poacher activities, so they can more effectively combat such poaching; such tools could also assist in planning effective defender patrols, building on the previous security games research. To that end, we have built a new predictive anti-poaching tool, CAPTURE (Comprehensive Anti-Poaching tool with Temporal and observation Uncertainty REasoning). CAPTURE provides four main contributions. First, CAPTURE’s modeling of poachers provides significant advances over previous models from behavioral game theory and conservation biology. This accounts for: (i) the defender’s imperfect detection of poaching signs; (ii) complex temporal dependencies in the poacher’s behaviors; (iii) lack of knowledge of numbers of poachers. Second, we provide two new heuristics: parameter separation and target abstraction to reduce the computational complexity in learning the poacher models. Third, we present a new game-theoretic algorithm for computing the defender’s optimal patrolling given the complex poacher model. Finally, we present detailed models and analysis of realworld poaching data collected over 12 years in Queen Elizabeth National Park in Uganda to evaluate our new model’s prediction accuracy. This paper thus presents the largest dataset of real-world defender-adversary interactions analyzed in the security games literature. CAPTURE will be tested in Uganda in early 2016. %B 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B Twenty-Eighth Innovative Applications of Artificial Intelligence Conference %D 2016 %T Deploying PAWS: Field Optimization of the Protection Assistant for Wildlife Security %A Fang, Fei %A Thanh H. Nguyen %A Pickles, Rob %A Wai Y. Lam %A Gopalasamy R. Clements %A An, Bo %A Singh, Amandeep %A Tambe, Milind %A Lemieux, Andrew %X Poaching is a serious threat to the conservation of key species and whole ecosystems. While conducting foot patrols is the most commonly used approach in many countries to prevent poaching, such patrols often do not make the best use of limited patrolling resources. To remedy this situation, prior work introduced a novel emerging application called PAWS (Protection Assistant for Wildlife Security); PAWS was proposed as a game-theoretic (“security games”) decision aid to optimize the use of patrolling resources. This paper reports on PAWS’s significant evolution from a proposed decision aid to a regularly deployed application, reporting on the lessons from the first tests in Africa in Spring 2014, through its continued evolution since then, to current regular use in Southeast Asia and plans for future worldwide deployment. In this process, we have worked closely with two NGOs (Panthera and Rimba) and incorporated extensive feedback from professional patrolling teams. We outline key technical advances that lead to PAWS’s regular deployment: (i) incorporating complex topographic features, e.g., ridgelines, in generating patrol routes; (ii) handling uncertainties in species distribution (game theoretic payoffs); (iii) ensuring scalability for patrolling large-scale conservation areas with fine-grained guidance; and (iv) handling complex patrol scheduling constraints. %B Twenty-Eighth Innovative Applications of Artificial Intelligence Conference %G eng %0 Conference Paper %B International conference on Autonomous Agents and Multiagent Systems %D 2016 %T HEALER: POMDP Planning for Scheduling Interventions among Homeless Youth (Demonstration) %A Amulya Yadav %A Ece Kamar %A Grosz, Barbara %A Tambe, Milind %X Adaptive software agents like HEALER have been proposed in the literature recently to recommend intervention plans to homeless shelter officials. However, generating networks for HEALER’s input is challenging. Moreover, HEALER’s solutions are often counter-intuitive to people. This demo paper makes two contributions. First, we demonstrate HEALER’s Facebook application, which parses the Facebook contact lists in order to construct an approximate social network for HEALER. Second, we present a software interface to run human subject experiments (HSE) to understand human biases in recommendation of intervention plans. We plan to use data collected from these HSEs to build an explanation system for HEALER’s solutions. %B International conference on Autonomous Agents and Multiagent Systems %G eng %0 Conference Paper %B AAMAS 2016 IDEAS Workshop %D 2016 %T POMDPs for Assisting Homeless Shelters - Computational and Deployment Challenges %A Amulya Yadav %A Chan, Hau %A Jiang, Albert %A Eric Rice %A Ece Kamar %A Grosz, Barbara %A Tambe, Milind %X This paper looks at challenges faced during the ongoing deployment of HEALER, a POMDP based software agent that recommends sequential intervention plans for use by homeless shelters, who organize these interventions to raise awareness about HIV among homeless youth. HEALER’s sequential plans (built using knowledge of social networks of homeless youth) choose intervention participants strategically to maximize influence spread, while reasoning about uncertainties in the network. In order to compute its plans, HEALER (i) casts this influence maximization problem as a POMDP and solves it using a novel planner which scales up to previously unsolvable real-world sizes; (ii) and constructs social networks of homeless youth at low cost, using a Facebook application. HEALER is currently being deployed in the real world in collaboration with a homeless shelter. Initial feedback from the shelter officials has been positive but they were surprised by the solutions generated by HEALER as these solutions are very counterintuitive. Therefore, there is a need to justify HEALER’s solutions in a way that mirrors the officials’ intuition. In this paper, we report on progress made towards HEALER’s deployment and detail first steps taken to tackle the issue of explaining HEALER’s solutions. %B AAMAS 2016 IDEAS Workshop %G eng %0 Conference Paper %B AAAI conference on Artificial Intelligence (AAAI) %D 2016 %T Preventing Illegal Logging: Simultaneous Optimization of Resource Teams and Tactics for Security %A Mc Carthy, Sara %A Tambe, Milind %A Kiekintveld, Christopher %A Meredith L. Gore %A Killion, Alex %X Green security – protection of forests, fish and wildlife – is a critical problem in environmental sustainability. We focus on the problem of optimizing the defense of forests against illegal logging, where often we are faced with the challenge of teaming up many different groups, from national police to forest guards to NGOs, each with differing capabilities and costs. This paper introduces a new, yet fundamental problem: Simultaneous Optimization of Resource Teams and Tactics (SORT). SORT contrasts with most previous game-theoretic research for green security – in particular based on security games – that has solely focused on optimizing patrolling tactics, without consideration of team formation or coordination. We develop new models and scalable algorithms to apply SORT towards illegal logging in large forest areas. We evaluate our methods on a variety of synthetic examples, as well as a real-world case study using data from our on-going collaboration in Madagascar. %B AAAI conference on Artificial Intelligence (AAAI) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2016 %T Simultaneous Influencing and Mapping Social Networks (Extended Abstract) %A Marcolino, Leandro Soriano %A Lakshminarayanan, Aravind %A Amulya Yadav %A Tambe, Milind %X Influencing a social network is an important technique, with potential to positively impact society, as we can modify the behavior of a community. For example, we can increase the overall health of a population; Yadav et al. (2015) [4], for instance, spread information about HIV prevention in homeless populations. However, although influence maximization has been extensively studied [2, 1], their main motivation is viral marketing, and hence they assume that the social network graph is fully known, generally taken from some social media network. However, the graphs recorded in social media do not really represent all the people and all the connections of a population. Most critically, when performing interventions in real life, we deal with large degrees of lack of knowledge. Normally the social agencies have to perform several interviews in order to learn the social network graph [3]. These highly unknown networks, however, are exactly the ones we need to influence in order to have a positive impact in the real world, beyond product advertisement. Additionally, learning a social network graph is very valuable per se. Agencies need data about a population, in order to perform future actions to enhance their well-being, and better actuate in their practices [3]. As mentioned, however, the works in influence maximization are currently ignoring this problem. Each person in a social network actually knows other people, including the ones she cannot directly influence. When we select someone for an intervention (to spread influence), we also have an opportunity to obtain knowledge. Therefore, in this work we present for the first time the problem of simultaneously influencing and mapping a social network. We study the performance of the classical influence maximization algorithm in this context, and show that it can be arbitrarily low. Hence, we study a class of algorithms for this problem, performing an experimentation using four real life networks of homeless populations. We show that our algorithm is competitive with previous approaches in terms of influence, and is significantly better in terms of mapping. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Journal Article %J European Conference on Artificial Intelligence (ECAI )[short paper] %D 2016 %T Toward Addressing Collusion among Human Adversaries in Security Games %A Gholami, Shahrzad %A Bryan Wilder %A Brown, Matthew %A Thomas, Dana %A Sintov, Nicole %A Tambe, Milind %X Security agencies including the US Coast Guard, the Federal Air Marshal Service and the Los Angeles Airport police are several major domains that have been deploying Stackelberg security games and related algorithms to protect against a single adversary or multiple, independent adversaries strategically. However, there are a variety of real-world security domains where adversaries may benefit from colluding in their actions against the defender. Given the potential negative effect of these collusive actions, the defender has an incentive to break up collusion by playing off the self-interest of individual adversaries. This paper deals with problem of collusive security games for rational and bounded rational adversaries. The theoretical results verified with human subject experiments showed that behavior model which optimizes against bounded rational adversaries provides demonstrably better performing defender strategies against human subjects. %B European Conference on Artificial Intelligence (ECAI )[short paper] %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2016 %T Using abstractions to solve opportunistic crime security games at scale %A Zhang, Chao %A Bucarey, Victor %A Mukhopadhyay, Ayan %A Sinha, Arunesh %A Qian, Yundi %A Vorobeychik, Yevgeniy %A Tambe, Milind %X In this paper, we aim to deter urban crime by recommending optimal police patrol strategies against opportunistic criminals in large scale urban problems. While previous work has tried to learn criminals’ behavior from real world data and generate patrol strategies against opportunistic crimes, it cannot scale up to large-scale urban problems. Our first contribution is a game abstraction framework that can handle opportunistic crimes in large-scale urban areas. In this game abstraction framework, we model the interaction between officers and opportunistic criminals as a game with discrete targets. By merging similar targets, we obtain an abstract game with fewer total targets. We use real world data to learn and plan against opportunistic criminals in this abstract game, and then propagate the results of this abstract game back to the original game. Our second contribution is the layer-generating algorithm used to merge targets as described in the framework above. This algorithm applies a mixed integer linear program (MILP) to merge similar and geographically neighboring targets in the large scale problem. As our third contribution, we propose a planning algorithm that recommends a mixed strategy against opportunistic criminals. Finally, our fourth contribution is a heuristic propagation model to handle the problem of limited data we occasionally encounter in largescale problems. As part of our collaboration with local police departments, we apply our model in two large scale urban problems: a university campus and a city. Our approach provides high prediction accuracy in the real datasets; furthermore, we project significant crime rate reduction using our planning strategy compared to current police strategy. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2016 %D 2016 %T Using Social Networks to Aid Homeless Shelters: Dynamic Influence Maximization under Uncertainty %A Amulya Yadav %A Chan, Hau %A Xin Jiang, Albert %A Haifeng Xu %A Eric Rice %A Tambe, Milind %X This paper presents HEALER, a software agent that recommends sequential intervention plans for use by homeless shelters, who organize these interventions to raise awareness about HIV among homeless youth. HEALER’s sequential plans (built using knowledge of social networks of homeless youth) choose intervention participants strategically to maximize influence spread, while reasoning about uncertainties in the network. While previous work presents influence maximizing techniques to choose intervention participants, they do not address three real-world issues: (i) they completely fail to scale up to real-world sizes; (ii) they do not handle deviations in execution of intervention plans; (iii) constructing real-world social networks is an expensive process. HEALER handles these issues via four major contributions: (i) HEALER casts this influence maximization problem as a POMDP and solves it using a novel planner which scales up to previously unsolvable realworld sizes; (ii) HEALER allows shelter officials to modify its recommendations, and updates its future plans in a deviation-tolerant manner; (iii) HEALER constructs social networks of homeless youth at low cost, using a Facebook application. Finally, (iv) we show hardness results for the problem that HEALER solves. HEALER will be deployed in the real world in early Spring 2016 and is currently undergoing testing at a homeless shelter. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2016 %G eng %0 Journal Article %J Journal of Cybersecurity %D 2015 %T From Physical Security to Cyber Security %A A. Sinha %A T. H. Nguyen %A D. Kar %A Brown, M. %A M. Tambe %A A.X. Jiang %X Security is a critical concern around the world. In many domains from cybersecurity to sustainability, limited security resources prevent complete security coverage at all times. Instead, these limited resources must be scheduled (or allocated or deployed), while simultaneously taking into account the importance of different targets, the responses of the adversaries to the security posture, and the potential uncertainties in adversary payoffs and observations, etc. Computational game theory can help generate such security schedules. Indeed, casting the problem as a Stackelberg game, we have developed new algorithms that are now deployed over multiple years in multiple applications for scheduling of security resources. These applications are leading to realworld use-inspired research in the emerging research area of “security games.” The research challenges posed by these applications include scaling up security games to real-world-sized problems, handling multiple types of uncertainty, and dealing with bounded rationality of human adversaries. In cybersecurity domain, the interaction between the defender and adversary is quite complicated with high degree of incomplete information and uncertainty. While solutions have been proposed for parts of the problem space in cybersecurity, the need of the hour is a comprehensive understanding of the whole space including the interaction with the adversary. We highlight the innovations in security games that could be used to tackle the game problem in cybersecurity. %B Journal of Cybersecurity %G eng %0 Conference Paper %B Conference on Artificial Intelligence (AAAI 2015). (Doctoral Consortium) %D 2015 %T Multi-agent Team Formation: Solving Complex Problems by Aggregating Opinions %A L. S. Marcolino %X Aggregating the opinions of different agents is a powerful way to find high-quality solutions to complex problems. However, when using agents in this fashion, there are two fundamental open questions. First, given a universe of agents, how to quickly identify which ones should be used to form a team? Second, given a team of agents, what is the best way to aggregate their opinions? Many researchers value diversity when forming teams. LiCalzi and Surucu (2012) and Hong and Page (2004) propose models where the agents know the utility of the solutions, and the team converges to the best solution found by one of its members. Clearly in complex problems the utility of solutions would not be available, and agents would have to resort to other methods, such as voting, to take a common decision. Lamberson and Page (2012) study diversity in the context of forecasts, where the solutions are represented by real numbers and the team takes the average of the opinion of its members. Domains where the possible solutions are discrete, however, are not captured by such a model. I proposed a new model to study teams of agents that vote in discrete solution spaces (Marcolino, Jiang, and Tambe 2013), where I show that a diverse team of weaker agents can overcome a uniform team made of copies of the best agent. However, this phenomenon does not always occur, and it is still necessary to identify when we should use diverse teams and when uniform teams would be more appropriate. Hence, in Marcolino et al. (2014b), I shed a new light into this problem, by presenting a new, more general model of diversity for teams of voting agents. Using that model I can predict that diverse teams perform better than uniform teams in problems with a large action space. All my predictions are verified in a real system of voting agents, in the Computer Go domain. I show that: (i) a team of diverse players gets a higher winning rate than a uniform team made of copies of the best agent; (ii) the diverse team plays increasingly better as the board size increases. Moreover, I also performed an experimental study in the building design domain. This is a fundamental domain in the current scenario, since it is known that the design of a building has a major impact in the consumption of energy throughout its whole lifespan (Lin and Gerber 2014). It is fundamental to design energy efficient buildings. Meanwhile, it is important to balance other factors, such as construction cost, creating a multi-objective optimization problem. I show that by aggregating the opinions of a team of agents, a higher number of 1 st ranked solutions in the Pareto frontier is found than when using a single agent. Moreover, my approach eliminates falsely reported 1 st ranked solutions (Marcolino et al. 2014a; 2015). As mentioned, studying different aggregation rules is also fundamental. In Jiang et al. (2014), I introduce a novel method to extract a ranking from agents, based on the frequency that actions are played when sampling them multiple times. My method leads to significant improvements in the winning rate in Go games when using the Borda voting rule to aggregate the generated rankings. %B Conference on Artificial Intelligence (AAAI 2015). (Doctoral Consortium) %C Texas, USA %G eng %0 Conference Paper %B International Workshop on Coordination, Organisations, Institutions and Norms (COIN 2015) %D 2015 %T Agent Teams for Design Problems %A L. S. Marcolino %A Xu, H. %A D. Gerber %A B. Kolev %A S. Price %A E. Pantazis %A M. Tambe %X Design imposes a novel social choice problem: using a team of voting agents, maximize the number of optimal solutions; allowing a user to then take an aesthetical choice. In an open system of design agents, team formation is fundamental. We present the first model of agent teams for design. For maximum applicability, we envision agents that are queried for a single opinion, and multiple solutions are obtained by multiple iterations. We show that diverse teams composed of agents with different preferences maximize the number of optimal solutions, while uniform teams composed of multiple copies of the best agent are in general suboptimal. Our experiments study the model in bounded time; and we also study a real system, where agents vote to design buildings. %B International Workshop on Coordination, Organisations, Institutions and Norms (COIN 2015) %G eng %0 Conference Paper %B Workshop on Computational Sustainability (AAAI 2015) %D 2015 %T Agents vote for the environment: Designing energy-efficient architecture %A L. S. Marcolino %A D. Gerber %A B. Kolev %A S. Price %A E. Pantazis %A Tian, Y. %A M. Tambe %X Saving energy is a major concern. Hence, it is fundamental to design and construct buildings that are energy-efficient. It is known that the early stage of architectural design has a significant impact on this matter. However, it is complex to create designs that are optimally energy efficient, and at the same time balance other essential criterias such as economics, space, and safety. One state-of-the art approach is to create parametric designs, and use a genetic algorithm to optimize across different objectives. We further improve this method, by aggregating the solutions of multiple agents. We evaluate diverse teams, composed by different agents; and uniform teams, composed by multiple copies of a single agent. We test our approach across three design cases of increasing complexity, and show that the diverse team provides a significantly larger percentage of optimal solutions than single agents. %B Workshop on Computational Sustainability (AAAI 2015) %G eng %0 Thesis %D 2015 %T Balancing Tradeoffs in Security Games: Handling Defenders and Adversaries with Multiple Objectives %A Brown, Matthew %X Stackelberg security games (SSG) have received a significant amount of attention in the literature for modeling the strategic interactions between a defender and an adversary, in which the defender has a limited amount of security resources to protect a set of targets from a potential attack by the adversary. SSGs are at the heart of several significant decision-support applications deployed in real world security domains. All of these applications rely on standard assumptions made in SSGs, including that the defender and the adversary each have a single objective which is to maximize their expected utility. Given the successes and real world impact of previous SSG research, there is a natural desire to push towards increasingly complex security domains, leading to a point where considering only a single objective is no longer appropriate. My thesis focuses on incorporating multiple objectives into SSGs. With multiple conflicting objectives for either the defender or adversary, there is no one solution which maximizes all objectives simultaneously and tradeoffs between the objectives must be made. Thus, my thesis provides two main contributions by addressing the research challenges raised by considering SSGs with (1) multiple defender objectives and (2) multiple adversary objectives. These contributions consist of approaches for modeling, calculating, and analyzing the tradeoffs between objectives in a variety of different settings. First, I consider multiple defender objectives resulting from diverse adversary threats where protecting against each type of threat is treated as a separate objective for the defender. Second, I investigate the defender’s need to balance between the exploitation of collected data and the exploration of alternative strategies in patrolling domains. Third, I explore the necessary tradeoff between the efficacy and the efficiency of the defender’s strategy in screening domains. Forth, I examine multiple adversary objectives for heterogeneous populations of boundedly rational adversaries that no longer strictly maximize expected utility. The contributions of my thesis provide the novel game models and algorithmic techniques required to incorporate multiple objectives into SSGs. My research advances the state of the art in SSGs and opens up the model to new types of security domains that could not have been handled previously. As a result, I developed two applications for real world security domains that either have been or will be tested and evaluated in the field. %G eng %9 PhD thesis %0 Conference Paper %B Conference on Decision and Game Theory for Security %D 2015 %T Beware the Soothsayer: From Attack Prediction Accuracy to Predictive Reliability in Security Games %A Ford, Benjamin %A Nguyen, Thanh %A Tambe, Milind %A Sintov, Nicole %A Delle Fave, Francesco %X . Interdicting the flow of illegal goods (such as drugs and ivory) is a major security concern for many countries. The massive scale of these networks, however, forces defenders to make judicious use of their limited resources. While existing solutions model this problem as a Network Security Game (NSG), they do not consider humans’ bounded rationality. Previous human behavior modeling works in Security Games, however, make use of large training datasets that are unrealistic in real-world situations; the ability to effectively test many models is constrained by the time-consuming and complex nature of field deployments. In addition, there is an implicit assumption in these works that a model’s prediction accuracy strongly correlates with the performance of its corresponding defender strategy (referred to as predictive reliability). If the assumption of predictive reliability does not hold, then this could lead to substantial losses for the defender. In the following paper, we (1) first demonstrate that predictive reliability is indeed strong for previous Stackelberg Security Game experiments. We also run our own set of human subject experiments in such a way that models are restricted to learning on dataset sizes representative of real-world constraints. In the analysis on that data, we demonstrate that (2) predictive reliability is extremely weak for NSGs. Following that discovery, however, we identify (3) key factors that influence predictive reliability results: the training set’s exposed attack surface and graph structure. %B Conference on Decision and Game Theory for Security %G eng %0 Conference Paper %B AAAI conference on Artificial Intelligence (AAAI) %D 2015 %T Combining Compact Representation and Incremental Generation in Large Games with Sequential Strategies %A B. Bosansky %A A. Jiang %A M. Tambe %A C. Kiekintveld %X Many search and security games played on a graph can be modeled as normal-form zero-sum games with strategies consisting of sequences of actions. The size of the strategy space provides a computational challenge when solving these games. This complexity is tackled either by using the compact representation of sequential strategies and linear programming, or by incremental strategy generation of iterative double-oracle methods. In this paper, we present novel hybrid of these two approaches: compact-strategy doubleoracle (CS-DO) algorithm that combines the advantages of the compact representation with incremental strategy generation. We experimentally compare CS-DO with the standard approaches and analyze the impact of the size of the support on the performance of the algorithms. Results show that CS-DO dramatically improves the convergence rate in games with non-trivial support. %B AAAI conference on Artificial Intelligence (AAAI) %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2015 %T Computing Optimal Mixed Strategies for Security Games with Dynamic Payoffs %A Yin, Yue %A Haifeng Xu %A Jiarui Gan %A An, Bo %A Albert X. Jiang %X Security agencies in the real world often need to protect targets with time-dependent values, e.g., tourist sites where the number of travelers changes over time. Since the values of different targets often change asynchronously, the defender can relocate security resources among targets dynamically to make the best use of limited resources. We propose a game-theoretic scheme to develop dynamic, randomized security strategies in consideration of adversary’s surveillance capability. This differs from previous studies on security games by considering varying target values and continuous strategy spaces of the security agency and the adversary. The main challenge lies in the computational intensiveness due to the continuous, hence infinite strategy spaces. We propose an optimal algorithm and an arbitrarily near-optimal algorithm to compute security strategies under different conditions. Experimental results show that both algorithms significantly outperform existing approaches. %B International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Conference Paper %B Human-Agent Interaction Design and Models (HAIDM) Workshop at the International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015) %D 2015 %T Conducting Longitudinal Experiments with Behavioral Models in Repeated Stackelberg Security Games on Amazon Mechanical Turk %A Kar, Debarun %A Fang, Fei %A Delle Fave, Francesco %A Sintov, Nicole %A Tambe, Milind %X Recently, there has been an increase of interest in domains involving repeated interactions between defenders and adversaries. This has been modeled as a repeated Stackelberg Security Game (repeated SSG). Although different behavioral models have been proposed for the attackers in these games, human subjects experiments for testing these behavioral models in repeated SSGs have not been conducted previously. This paper presents the first “longitudinal study” – at least in the context of SSGs – of testing human behavior models in repeated SSG settings. We provide the following contributions in this paper. First, in order to test the behavioral models, we design a game that simulates the repeated interactions between the defender and the adversary and deploy it on Amazon Mechanical Turk (AMT). Human subjects are asked to participate in this repeated task in rounds of the game, with a break between consecutive rounds. Second, we develop several approaches to keep the human subjects motivated throughout the course of this longitudinal study so that they participate in all measurement occasions, thereby minimizing attrition. We provide results showing improvements of retention rate due to implementation of these approaches. Third, we propose a way of choosing representative payoffs that fit the real-world scenarios as conducting these experiments are extremely time-consuming and we can only conduct a limited number of such experiments. %B Human-Agent Interaction Design and Models (HAIDM) Workshop at the International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015) %D 2015 %T Defender Strategies In Domains Involving Frequent Adversary Interaction (Extended Abstract) %A Fang, Fei %A Stone, Peter %A Tambe, Milind %X Recently, there has been an increase in interest in applying game theoretic approaches to domains involving frequent adversary interactions, such as wildlife and fishery protection. In these domains, the law enforcement agency faces adversaries who repeatedly and frequently carry out illegal activities, and thus, do not have time for extensive surveillance before taking actions. This makes them significantly different from counter-terrorism domains where game-theoretic approaches have been widely deployed. This paper presents a game-theoretic approach to be used by the defender in these Frequent Adversary Interaction (FAI) domains. We provide (i) a novel game model for FAI domains, describing the interaction between the defender and the attackers in a repeated game and (ii) algorithms that plan for the defender strategies to achieve high average expected utility over all rounds. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015) %G eng %0 Conference Paper %B CAAD Futures Conference %D 2015 %T Design Agency: Prototyping Multi-Agent Systems in Architecture %A D. J. Gerber %A E. Pantazis %A L. S. Marcolino %X This paper presents research on the prototyping of multi-agent systems for architectural design. It proposes a design exploration methodology at the intersection of architecture, engineering, and computer science. The motivation of the work includes exploring bottom up generative methods coupled with optimizing performance criteria including for geometric complexity and objective functions for environmental, structural and fabrication parameters. The paper presents the development of a research framework and initial experiments to provide design solutions, which simultaneously satisfy complexly coupled and often contradicting objectives. The prototypical experiments and initial algorithms are described through a set of different design cases and agents within this framework; for the generation of façade panels for light control; for emergent design of shell structures; for actual construction of reciprocal frames; and for robotic fabrication. Initial results include multi-agent derived efficiencies for environmental and fabrication criteria and discussion of future steps for inclusion of human and structural factors. %B CAAD Futures Conference %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015) %D 2015 %T Every Team Deserves a Second Chance: An Interactive 9x9 Go Experience (Demonstration) %A L. S. Marcolino %A V. Nagarajan %A M. Tambe %X We show that without using any domain knowledge, we can predict the final performance of a team of voting agents, at any step towards solving a complex problem. This demo allows users to interact with our system, and observe its predictions, while playing 9x9 Go. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015) %G eng %0 Conference Paper %B Conference on Artificial Intelligence (AAAI 2015) %D 2015 %T Every team deserves a second chance: Identifying when things go wrong (Student Abstract Version) %A V. Nagarajan %A L. S. Marcolino %A M. Tambe %X We show that without using any domain knowledge, we can predict the final performance of a team of voting agents, at any step towards solving a complex problem. %B Conference on Artificial Intelligence (AAAI 2015) %C Texas, USA %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015) %D 2015 %T Every team deserves a second chance: Identifying when things go wrong %A V. Nagarajan %A L. S. Marcolino %A M. Tambe %X Voting among different agents is a powerful tool in problem solving, and it has been widely applied to improve the performance in finding the correct answer to complex problems. We present a novel benefit of voting, that has not been observed before: we can use the voting patterns to assess the performance of a team and predict their final outcome. This prediction can be executed at any moment during problem-solving and it is completely domain independent. We present a theoretical explanation of why our prediction method works. Further, contrary to what would be expected based on a simpler explanation using classical voting models, we argue that we can make accurate predictions irrespective of the strength (i.e., performance) of the teams, and that in fact, the prediction can work better for diverse teams composed of different agents than uniform teams made of copies of the best agent. We perform experiments in the Computer Go domain, where we obtain a high accuracy in predicting the final outcome of the games. We analyze the prediction accuracy for three different teams with different levels of diversity and strength, and we show that the prediction works significantly better for a diverse team. Since our approach is domain independent, it can be easily applied to a variety of domains. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015) %G eng %0 Conference Paper %B AAAI Workshop Learning for General Competency in Video Games (AAAI 2015) %D 2015 %T Every Team Makes Mistakes: An Initial Report on Predicting Failure in Teamwork %A V. Nagarajan %A L. S. Marcolino %A M. Tambe %X Voting among different agents is a powerful tool in problem solving, and it has been widely applied to improve the performance in machine learning. However, the potential of voting has been explored only in improving the ability of finding the correct answer to a complex problem. In this paper we present a novel benefit in voting, that has not been observed before: we show that we can use the voting patterns to assess the performance of a team and predict their final outcome. This prediction can be executed at any moment during problem-solving and it is completely domain independent. We present a preliminary theoretical explanation of why our prediction method works, where we show that the accuracy is better for diverse teams composed by different agents than for uniform teams made of copies of the same agent. We also perform experiments in the Computer Go domain, where we show that we can obtain a high accuracy in predicting the final outcome of the games. We analyze the prediction accuracy for 3 different teams, and we show that the prediction works significantly better for a diverse team. Since our approach is completely domain independent, it can be easily applied to a variety of domains, such as the video games in the Arcade Learning Environment. %B AAAI Workshop Learning for General Competency in Video Games (AAAI 2015) %C Texas, USA %G eng %0 Conference Paper %B Multidisciplinary Workshop on Advances in Preference Handling (M-PREF 2015) %D 2015 %T Every Team Makes Mistakes, in Large Action Spaces %A L. S. Marcolino %A V. Nagarajan %A M. Tambe %X Voting is applied to better estimate an optimal answer to complex problems in many domains. We recently presented a novel benefit of voting, that has not been observed before: we can use the voting patterns to assess the performance of a team and predict whether it will be successful or not in problem-solving. Our prediction technique is completely domain independent, and it can be executed at any time during problem solving. In this paper we present a novel result about our technique: we show that the prediction quality increases with the size of the action space. We present a theoretical explanation for such phenomenon, and experiments in Computer Go with a variety of board sizes. %B Multidisciplinary Workshop on Advances in Preference Handling (M-PREF 2015) %G eng %0 Conference Paper %B AAAI Conference on Artificial Intelligence (AAAI) %D 2015 %T Exploring Information Asymmetry in Two-Stage Security Games %A Xu, H. %A Z. Rabinovich %A S. Dughmi %A M. Tambe %X Stackelberg security games have been widely deployed to protect real-world assets. The main solution concept there is the Strong Stackelberg Equilibrium (SSE), which optimizes the defender’s random allocation of limited security resources. However, solely deploying the SSE mixed strategy has limitations. In the extreme case, there are security games in which the defender is able to defend all the assets “almost perfectly” at the SSE, but she still sustains significant loss. In this paper, we propose an approach for improving the defender’s utility in such scenarios. Perhaps surprisingly, our approach is to strategically reveal to the attacker information about the sampled pure strategy. Specifically, we propose a two-stage security game model, where in the first stage the defender allocates resources and the attacker selects a target to attack, and in the second stage the defender strategically reveals local information about that target, potentially deterring the attacker’s attack plan. We then study how the defender can play optimally in both stages. We show, theoretically and experimentally, that the two-stage security game model allows the defender to achieve strictly better utility than SSE. %B AAAI Conference on Artificial Intelligence (AAAI) %G eng %0 Conference Paper %B IJCAI'15 Workshop on Behavioral, Economic and Computational Intelligence for Security (BECIS) %D 2015 %T An extensive study of Dynamic Bayesian Network for patrol allocation against adaptive opportunistic criminals %A Gholami, Shahrzad %A Zhang, Chao %A Sinha, Arunesh %A Tambe, Milind %X Police patrols are used ubiquitously to deter crimes in urban areas. A distinctive feature of urban crimes is that criminals react opportunistically to patrol officers’ assignments. Different models of adversary behavior have been proposed but their exact form remains uncertain. Recent work [Zhang et al., 2015] has explored learning the model from real-world criminal activity data. To that end, criminal behavior and the interaction with the patrol officers is represented as parameters of a Dynamic Bayesian Network (DBN), enabling application of standard algorithms such as EM to learn the parameters. More specifically, the EMC2 algorithm is a sequence of modifications to the DBN representation, that allows for a compact representation resulting in better learning accuracy and increased speed of learning. In this paper, we perform additional experiments showing the efficacy of the EMC2 algorithm. Furthermore, we explore different variations of Markov model. Unlike DBNs, the Markov models do not have hidden states, which indicate distribution of criminals, and are therefore easier to learn using standard MLE techniques. We compare all the approaches by learning from a real data set of criminal activity obtained from the police department of University of Southern California (USC) situated in Los Angeles, USA. We demonstrate a significant better accuracy of predicting the crime using the EMC2 algorithm compared to other approaches. This work was done in collaboration with the police department of USC. %B IJCAI'15 Workshop on Behavioral, Economic and Computational Intelligence for Security (BECIS) %G eng %0 Conference Paper %B In IJCAI-15 Workshop on Behavioral, Economic and Computational Intelligence for Security (BECIS-15) %D 2015 %T Handling Payoff Uncertainty in Green Security Domains with Adversary Bounded Rationality %A Amulya Yadav %A Nguyen, Thanh %A Delle Fave, Francesco %A Tambe, Milind %A Agmon, Noa %A Jain, Manish %A Ramono, Widodo %A Batubara, Timbul %X Research on Stackelberg Security Games (SSG) has recently shifted to green security domains, for example, protecting wildlife from illegal poaching. Previous research on this topic has advocated the use of behavioral (bounded rationality) models of adversaries in SSG. As its first contribution, this paper, for the first time, provides validation of these behavioral models based on real-world data from a wildlife park. The paper’s next contribution is the first algorithm to handle payoff uncertainty – an important concern in green security domains – in the presence of such adversarial behavioral models. %B In IJCAI-15 Workshop on Behavioral, Economic and Computational Intelligence for Security (BECIS-15) %G eng %0 Conference Paper %B In IJCAI-15 Workshop on Algorithmic Game Theory (AGT-15) %D 2015 %T Handling Payoff Uncertainty with Adversary Bounded Rationality in Green Security Domains %A Amulya Yadav %A Nguyen, Thanh %A Delle Fave, Francesco %A Tambe, Milind %A Agmon, Noa %A Jain, Manish %A Ramono, Widodo %A Batubara, Timbul %X Research on Stackelberg Security Games (SSG) has recently shifted to green security domains, for example, protecting wildlife from illegal poaching. Previous research on this topic has advocated the use of behavioral (bounded rationality) models of adversaries in SSG. As its first contribution, this paper, for the first time, provides validation of these behavioral models based on real-world data from a wildlife park. The paper’s next contribution is the first algorithm to handle payoff uncertainty – an important concern in green security domains – in the presence of such adversarial behavioral models. %B In IJCAI-15 Workshop on Algorithmic Game Theory (AGT-15) %G eng %0 Conference Paper %B Conference on Advances in Cognitive Systems %D 2015 %T Human Adversaries in Opportunistic Crime Security Games: Evaluating Competing Bounded Rationality Models %A Abbasi, Yasaman Dehghani %A Short, Martin %A Sinha, Arunesh %A Sintov, Nicole %A Zhang, Chao %A Tambe, Milind %X There are a growing number of automated decision aids based on game-theoretic algorithms in daily use by security agencies to assist in allocating or scheduling their limited security resources. These applications of game theory, based on the “security games” paradigm, are leading to fundamental research challenges: one major challenge is modeling human bounded rationality. More specifically, the security agency, assisted with an automated decision aid, is assumed to act with perfect rationality against a human adversary; it is important to investigate the bounded rationality of these human adversaries to improve effectiveness of security resource allocation. This paper for the first time provides an empirical investigation of adversary bounded rationality in opportunistic crime settings, where modeling bounded rationality is particularly crucial. We conduct extensive human subject experiments, comparing ten different bounded rationality models, and illustrate that: (a) while previous research proposed the use of the stochastic choice “quantal response” model of human adversary, this model is significantly outperformed by more advanced models of “subjective utility quantal response”; (b) Combinations of the well-known prospect theory model with these advanced models lead to an even better performance in modeling human adversary behavior; (c) while it is important to model the non-linear human weighing of probability, as advocated by prospect theory, our findings are the exact opposite of prospect theory in terms of how humans are seen to weigh this non-linear probability. %B Conference on Advances in Cognitive Systems %G eng %0 Conference Paper %B Workshop on Behavioral, Economic and Computational Intelligence for Security (IJCAI) %D 2015 %T Human Adversaries in Opportunistic Crime Security Games: How Past success (or failure) affects future behavior %A Dehghani Abbasi Y. %A Short M. %A Sinha A. %A Sintov N. %A Zhang Ch. %A Tambe M. %X There are a growing number of automated decision aids based on game-theoretic algorithms in daily use by security agencies to assist in allocating or scheduling their limited security resources. These applications of game theory, based on the “security games” paradigm, are leading to fundamental research challenges: one major challenge is modeling human bounded rationality. More specifically, the security agency, assisted with an automated decision aid, is assumed to act with perfect rationality against a human adversary; it is important to investigate the bounded rationality of these human adversaries to improve effectiveness of security resource allocation. In (Abbasi et al, 2015), the authors provide an empirical investigation of adversary bounded rationality in opportunistic crime settings. In this paper, we propose two additional factors in the “subjective utility quantal response” model. %B Workshop on Behavioral, Economic and Computational Intelligence for Security (IJCAI) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2015 %T Information Disclosure as a Means to Security %A Zinovi Rabinovich %A Albert X. Jiang %A Jain, Manish %A Haifeng Xu %X In this paper we present a novel Stackelberg-type model of security domains: Security Assets aSsignment with Information disclosure (SASI). The model combines both the features of the Stackelberg Security Games (SSGs) model and of the Bayesian Persuasion (BP) model. More specifically, SASI includes: a) an uncontrolled, exogenous security state that serves as the Defender’s private information; b) multiple security assets with non-accumulating, targetlocal defence capability; c) a pro-active, verifiable and public, unidirectional information disclosure channel from the Defender to the Attacker. We show that SASI with a non-degenerate information disclosure can be arbitrarily more efficient, than a “silent” Stackelberg assets allocation. We also provide a linear program reformulation of SASI that can be solved in polynomial time in SASI parameters. Furthermore, we show that it is possible to remove one of SASI parameters and, rather than require it as an input, recover it by computation. As a result, SASI becomes highly scalable. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B Workshop on Cognitive Knowledge Acquisition and Applications (IJCAI 2015) %D 2015 %T Introduction to Green Security Games (Extended Abstract) %A Fang, Fei %A Nguyen, Thanh %A Ford, Benjamin %A Sintov, Nicole %A Tambe, Milind %X Conservation agencies around the world are tasked with protecting endangered wildlife from poaching. Despite their substantial efforts, however, species are continuing to be poached to critical status and, in some cases, extinction. In South Africa, rhino poaching has seen a recent escalation in frequency; while only 122 rhinos were poached in 2009, a record 1215 rhinos were poached in 2014 (approximately 1 rhino every eight hours)[the Rhino International, 2015]. To combat poaching, conservation agencies send well-trained rangers to patrol designated protected areas. However, these agencies have limited resources and are unable to provide 100% coverage to the entire area at all times. Thus, it is important that agencies make the most efficient use of their patrolling resources, and we introduce Green Security Games (GSGs) as a tool to aid agencies in designing effective patrols. First introduced by [Von Stengel and Zamir, 2004] as a Leadership Game, Stackelberg Games have been applied in a variety of Security Game research (i.e., Stackelberg Security Games, or SSGs). In particular, the focus on randomization in Stackelberg Games lends itself to solving real-world security problems where defenders have limited resources, such as randomly allocating Federal Air Marshals to international flights [Tsai et al., 2009]. However, the SSG model focuses on generating an optimal defender strategy against a single defender-attacker interaction (e.g., a single terrorist attack). For domains where attacks occur frequently, such as in wildlife conservation, another type of Security Game is needed that effectively models the repeated interactions between the defender and the attacker. While still following the Leader-Follower paradigm of SSGs, GSGs have been developed as a way of applying Game Theory to assist wildlife conservation efforts, whether its to prevent illegal fishing [Haskell et al., 2014], illegal logging [Johnson et al., 2012], or wildlife poaching [Yang et al., 2014]. GSGs are similar to SSGs except that, in GSGs, the game takes place over N rounds. In SSGs, once the attacker makes a decision, the game is over, but in GSGs, the attacker (e.g., the poacher) and defender have multiple rounds in which they can adapt to each other’s choices in previous rounds. This multi-round feature of GSGs introduces some key research challenges that are being studied: (1) how can we incorporate the attacker’s previous choices into our model of their behavior, in order to improve the defender’s strategy, [Yang et al., 2014; Kar et al., 2015] and (2) how do we choose a strategy such that the long-term payoff (i.e., cumulative expected utility) is maximized [Fang et al., 2015]? In addition to exploring these open research questions, we also discuss field tests of the Protection Assistant for Wildlife Security (PAWS) software in Uganda and Malaysia. %B Workshop on Cognitive Knowledge Acquisition and Applications (IJCAI 2015) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015) %D 2015 %T Keeping pace with criminals: Designing patrol allocation against adaptive opportunistic criminals %A Zhang, Chao %A Sinha, Arunesh %A Tambe, Milind %X Police patrols are used ubiquitously to deter crimes in urban areas. A distinctive feature of urban crimes is that criminals react opportunistically to patrol officers’ assignments. Compared to strategic attackers (such as terrorists) with a well-laid out plan, opportunistic criminals are less strategic in planning attacks and more flexible in executing them. In this paper, our goal is to recommend optimal police patrolling strategy against such opportunistic criminals. We first build a game-theoretic model that captures the interaction between officers and opportunistic criminals. However, while different models of adversary behavior have been proposed, their exact form remains uncertain. Rather than simply hypothesizing a model as done in previous work, one key contribution of this paper is to learn the model from real-world criminal activity data. To that end, we represent the criminal behavior and the interaction with the patrol officers as parameters of a Dynamic Bayesian Network (DBN), enabling application of standard algorithms such as EM to learn the parameters. Our second contribution is a sequence of modifications to the DBN representation, that allows for a compact representation of the model resulting in better learning accuracy and increased speed of learning of the EM algorithm when used for the modified DBN. These modifications use marginalization approaches and exploit the structure of this problem. Finally, our third contribution is an iterative learning and planning mechanism that keeps updating the adversary model periodically. We demonstrate the efficiency of our learning algorithm by applying it to a real data set of criminal activity obtained from the police department of University of Southern California (USC) situated in Los Angeles, USA. We project a significant reduction in crime rate using our planning strategy as opposed to the actual strategy deployed by the police department. We also demonstrate the improvement in crime prevention in simulations when we use our iterative planning and learning mechanism compared to just learning once and planing. This work was done in collaboration with the police department of USC. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015) %D 2015 %T Keeping pace with criminals: Learning, Predicting and Planning against Crime: Demonstration Based on Real Urban Crime Data (Demonstration) %A Zhang, Chao %A Jain, Manish %A Goyal, Ripple %A Sinha, Arunesh %A Tambe, Milind %X Crime in urban areas plagues every city in all countries. This demonstration will show a novel approach for learning and predicting crime patterns and planning against such crimes using real urban crime data. A notable characteristic of urban crime, distinct from organized terrorist attacks, is that most urban crimes are opportunistic in nature, i.e., criminals do not plan their attacks in detail, rather they seek opportunities for committing crime and are agile in their execution of the crime [6, 7, 1, 4]. Police officers conduct patrols with the aim of preventing crime. However, criminals can adapt their strategy in response of police deployment by seeking crime opportunity in less effectively patrolled location. The problem of where and how much to patrol is therefore important. There are two approaches to solve this problem. The first approach is to schedule patrols manually by human planners, which is followed in various police departments. However, it has been demonstrated that manual planning of patrols is not only time-consuming but also highly ineffective in related scenarios of protecting airport terminals [3] and ships in ports [5]. The second approach is to use automated planners to plan patrols against urban crime. This approach has either focused on modeling the criminal explicitly [7,6] (rational, bounded rational, etc.) in a game model or to learn the adversary behavior using machine learning [2]. However, the proposed mathematical models of criminal behavior have not been validated with real data. Also, prior machine learning approaches have either only focused on the adversary actions ignoring their adaptation to the defenders’ actions [2]. Hence, in this presentation we propose a novel approach to learn and update the criminal behavior from real data [8]. We model the interaction between criminals and patrol officers as a Dynamic Bayesian Network (DBN). Figure 1 shows an example of such DBN. Next, we apply a dynamic programming algorithm to generate optimal patrol strategy against the learned criminal model. By iteratively updating the criminals’ model and computing patrol strategy against them, we help patrol officers keep up with criminals’ adaptive behavior and execute effective patrols. This process is shown as a flow chart in Figure 2. With this context, the demonstration presented in this paper introduces a web-based software with two contributions. First, our system collects and analyzes crime reports and resources (security camera, emergency supplies, etc.) data, presenting them in various forms. Second, our patrol scheduler incorporates the algorithm in [8] in a scheduling recommendation system. The demonstration will engage audience members by having them participate as patrol officers and using the software to ‘patrol’ the University of Southern California (USC) campus in USA. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015) %G eng %0 Conference Paper %B Adaptive and Learning Agents Workshop at the International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015) %D 2015 %T Learning Bounded Rationality Models of the Adversary in Repeated Stackelberg Security Games %A Kar, Debarun %A Fang, Fei %A Delle Fave, Francesco %A Sintov, Nicole %A Sinha, Arunesh %A Galstyan, Aram %A An, Bo %A Tambe, Milind %X Several competing human behavior models have been proposed to model and protect against boundedly rational adversaries in repeated Stackelberg security games (SSGs). However, these existing models fail to address three main issues which are extremely detrimental to defender performance. First, while they attempt to learn adversary behavior models from adversaries’ past actions (“attacks on targets”), they fail to take into account adversaries’ future adaptation based on successes or failures of these past actions. Second, they assume that sufficient data in the initial rounds will lead to a reliable model of the adversary. However, our analysis reveals that the issue is not the amount of data, but that there just is not enough of the attack surface exposed to the adversary to learn a reliable model. Third, current leading approaches have failed to include probability weighting functions, even though it is well known that human beings’ weighting of probability is typically nonlinear. Moreover, the performances of these models may be critically dependent on the learning algorithm used to learn the parameters of these models. The first contribution of this paper is a new human behavior model, SHARP, which mitigates these three limitations as follows: (i) SHARP reasons based on success or failure of the adversary’s past actions on exposed portions of the attack surface to model adversary adaptiveness; (ii) SHARP reasons about similarity between exposed and unexposed areas of the attack surface, and also incorporates a discounting parameter to mitigate adversary’s lack of exposure to enough of the attack surface; and (iii) SHARP integrates a non-linear probability weighting function to capture the adversary’s true weighting of probability. Our second contribution is a comparison of two different approaches for learning the parameters of the bounded rationality models. Our third contribution is a first “longitudinal study” – at least in the context of SSGs – of competing models in settings involving repeated interaction between the attacker and the defender. This study, where each experiment lasted a period of multiple weeks with individual sets of human subjects, illustrates the strengths and weaknesses of different models and shows the advantages of SHARP. %B Adaptive and Learning Agents Workshop at the International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015) %G eng %0 Conference Paper %B Symposium on Simulation for Architecture and Urban Design (SimAUD 2015) %D 2015 %T A Multi-Agent Systems for Design Simulation Framework: Experiments with Virtual-Physical-Social Feedback for Architecture %A D. J. Gerber %A E. Pantazis %A L. S. Marcolino %A A. Heydarian %X This paper presents research on the development of multiagent systems (MAS) for integrated and performance driven architectural design. It presents the development of a simulation framework that bridges architecture and engineering, through a series of multi-agent based experiments. The research is motivated to combine multiple design agencies into a system for managing and optimizing architectural form, across multiple objectives and contexts. The research anticipates the incorporation of feedback from real world human behavior and user preferences with physics based structural form finding and environmental analysis data. The framework is a multi-agent system that provides design teams with informed design solutions, which simultaneously optimize and satisfy competing design objectives. The initial results for building structures are measured in terms of the level of lighting improvements and qualitatively in geometric terms. Critical to the research is the elaboration of the system and the feedback loops that are possible when using the multi-agent systems approach. %B Symposium on Simulation for Architecture and Urban Design (SimAUD 2015) %G eng %0 Thesis %D 2015 %T Not a Lone Ranger: Unleashing Defender Teamwork in Security Games %A Shieh, Eric %X Game theory has become an important research area in handling complex security resource allocation and patrolling problems. Stackelberg Security Games (SSGs) have been used in modeling these types of problems via a defender and an attacker(s). Despite recent successful real-world deployments of SSGs, scale-up to handle defender teamwork remains a fundamental challenge in this field. The latest techniques do not scale-up to domains where multiple defenders must coordinate time-dependent joint activities. To address this challenge, my thesis presents algorithms for solving defender teamwork in SSGs in two phases. As a first step, I focus on domains without execution uncertainty, in modeling and solving SSGs that incorporate teamwork among defender resources via three novel features: (i) a column-generation approach that uses an ordered network of nodes (determined by solving the traveling salesman problem) to generate individual defender strategies; (ii) exploitation of iterative reward shaping of multiple coordinating defender units to generate coordinated strategies; (iii) generation of tighter upper-bounds for pruning by solving security games that only abide by key scheduling constraints. In the second stage of my thesis, I address execution uncertainty among defender resources that arises from the real world by integrating the powerful teamwork mechanisms offered by decentralized Markov Decision Problems (Dec-MDPs) into security games. My thesis offers the following novel contributions: (i) New model of security games with defender teams that coordinate under uncertainty; (ii) New algorithm based on column generation that utilizes Decentralized Markov Decision Processes (Dec-MDPs) to generate defender strategies that incorporate uncertainty; (iii) New techniques to handle global events (when one or more agents may leave the system) during defender execution; (iv) Heuristics that help scale up in the number of targets and resources to handle real-world scenarios; (v) Exploration of the robustness of randomized pure strategies. Different mechanisms, from both solving situations with and without execution uncertainty, may be used depending on the features of the domain. This thesis opens the door to a powerful combination of previous work in multiagent systems on teamwork and security games. %G eng %9 PhD thesis %0 Conference Paper %B Coordination, Organizations, Institutions and Norms in Agent Systems X. Springer-Verlag Lecture Notes in AI, 2015 %D 2015 %T The Power of Teams that Disagree: Team Formation in Large Action Spaces %A L. S. Marcolino %A Xu, H. %A A.X. Jiang %A M. Tambe %A E. Bowring %X Recent work has shown that diverse teams can outperform a uniform team made of copies of the best agent. However, there are fundamental questions that were never asked before. When should we use diverse or uniform teams? How does the performance change as the action space or the teams get larger? Hence, we present a new model of diversity, where we prove that the performance of a diverse team improves as the size of the action space increases. Moreover, we show that the performance converges exponentially fast to the optimal one as we increase the number of agents. We present synthetic experiments that give further insights: even though a diverse team outperforms a uniform team when the size of the action space increases, the uniform team will eventually again play better than the diverse team for a large enough action space. We verify our predictions in a system of Go playing agents, where a diverse team improves in performance as the board size increases, and eventually overcomes a uniform team.1 %B Coordination, Organizations, Institutions and Norms in Agent Systems X. Springer-Verlag Lecture Notes in AI, 2015 %G eng %0 Journal Article %J AI Magazine %D 2015 %T PSINET: Aiding HIV Prevention Amongst Homeless Youth by Planning Ahead %A Amulya Yadav %A Marcolino, Leandro Soriano %A Eric Rice %A Petering, Robin %A Winetrobe, Hailey %A Rhoades, Harmony %A Tambe, Milind %A Carmichael, Heather %X Homeless youth are prone to Human Immunodeficiency Virus (HIV) due to their engagement in high risk behavior such as unprotected sex, sex under influence of drugs, etc. Many non-profit agencies conduct interventions to educate and train a select group of homeless youth about HIV prevention and treatment practices and rely on word-of-mouth spread of information through their social network. Previous work in strategic selection of intervention participants does not handle uncertainties in the social network’s structure and evolving network state, potentially causing significant shortcomings in spread of information. Thus, we developed PSINET, a decision support system to aid the agencies in this task. PSINET includes the following key novelties: (i) it handles uncertainties in network structure and evolving network state; (ii) it addresses these uncertainties by using POMDPs in influence maximization; and (iii) it provides algorithmic advances to allow high quality approximate solutions for such POMDPs. Simulations show that PSINET achieves ∼60% more information spread over the current state-of-the-art. PSINET was developed in collaboration with My Friend’s Place (a drop-in agency serving homeless youth in Los Angeles) and is currently being reviewed by their officials. %B AI Magazine %G eng %0 Conference Paper %B In AAAI-15 Workshop on Planning, Search, and Optimization (PlanSOpt-15) %D 2015 %T PSINET - An Online POMDP Solver for HIV Prevention in Homeless Populations %A Amulya Yadav %A Marcolino, Leandro %A Eric Rice %A Petering, Robin %A Winetrobe, Hailey %A Rhoades, Harmony %A Tambe, Milind %A Carmichael, Heather %X Homeless youth are prone to Human Immunodeficiency Virus (HIV) due to their engagement in high risk behavior such as unprotected sex, sex under influence of drugs, etc. Many non-profit agencies conduct interventions to educate and train a select group of homeless youth about HIV prevention and treatment practices and rely on word-of-mouth spread of information through their social network. Previous work in strategic selection of intervention participants does not handle uncertainties in the social network’s structure and evolving network state, potentially causing significant shortcomings in spread of information. Thus, we developed PSINET, a decision support system to aid the agencies in this task. PSINET includes the following key novelties: (i) it handles uncertainties in network structure and evolving network state; (ii) it addresses these uncertainties by using POMDPs in influence maximization; and (iii) it provides algorithmic advances to allow high quality approximate solutions for such POMDPs. Simulations show that PSINET achieves ∼60% more information spread over the current state-of-the-art. PSINET was developed in collaboration with My Friend’s Place (a drop-in agency serving homeless youth in Los Angeles) and is currently being reviewed by their officials. %B In AAAI-15 Workshop on Planning, Search, and Optimization (PlanSOpt-15) %G eng %0 Conference Paper %B ACM Symposium on Applied Computing (ACM SAC 2015) Track on Intelligent Robotics and Multi-Agent Systems (IRMAS) %D 2015 %T Robust Resource Allocation in Security Games and Ensemble Modeling of Adversary Behavior %A Tambe, Arjun %A Nguyen, Thanh %X Game theoretic algorithms have been used to optimize the allocation of security resources to improve the protection of critical infrastructure against threats when limits on security resources prevent full protection of all targets. Past approaches have assumed adversaries will always behave to maximize their expected utility, failing to address real-world adversaries who are not perfectly rational. Instead, adversaries may be boundedly rational, i.e., they generally act to increase their expected value but do not consistently maximize it. A successful approach to addressing bounded adversary rationality has been a robust approach that does not explicitly model adversary behavior. However, these robust algorithms implicitly rely on an efficiently computable weak model of adversary behavior, which does not necessarily match adversary behavior trends. We therefore propose a new robust algorithm that provides a more refined model of adversary behavior that retains the advantage of efficient computation. We also develop an ensemble method used to tune the algorithm’s parameters, and compare this method’s accuracy in predicting adversary behavior to previous work. We test these contributions in security games against human subjects to show the advantages of our approach. %B ACM Symposium on Applied Computing (ACM SAC 2015) Track on Intelligent Robotics and Multi-Agent Systems (IRMAS) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015) %D 2015 %T Robust Strategy against Unknown Risk-averse Attackers in Security Games %A Qian, Yundi %A William B. Haskell %A Tambe, Milind %X Stackelberg security games (SSGs) are now established as a powerful tool in security domains. In this paper, we consider a new dimension of security games: the risk preferences of the attacker. Previous work assumes a risk-neutral attacker that maximizes his expected reward. However, extensive studies show that the attackers in some domains are in fact risk-averse, e.g., terrorist groups in counter-terrorism domains. The failure to incorporate the risk aversion in SSG models may lead the defender to suffer significant losses. Additionally, defenders are uncertain about the degree of attacker’s risk aversion. Motivated by this challenge this paper provides the following five contributions: (i) we propose a novel model for security games against risk-averse attackers with uncertainty in the degree of their risk aversion; (ii) we develop an intuitive MIBLP formulation based on previous security games research, but find that it finds locally optimal solutions and is unable to scale up; (iii) based on insights from our MIBLP formulation, we develop our scalable BeRRA algorithm that finds globally ǫ-optimal solutions; (iv) our BeRRA algorithm can also be extended to handle other risk-aware attackers, e.g., risk-seeking attackers; (v) we show that we do not need to consider attacker’s risk attitude in zero-sum games. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015) %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI 2015) %D 2015 %T Security Games with Information Leakage: Modeling and Computation %A Haifeng Xu %A Albert X. Jiang %A Sinha, Arunesh %A Zinovi Rabinovich %A Dughmi, Shaddin %A Tambe, Milind %X Most models of Stackelberg security games assume that the attacker only knows the defender’s mixed strategy, but is not able to observe (even partially) the instantiated pure strategy. Such partial observation of the deployed pure strategy – an issue we refer to as information leakage – is a significant concern in practical applications. While previous research on patrolling games has considered the attacker’s real-time surveillance, our settings, therefore models and techniques, are fundamentally different. More specifically, after describing the information leakage model, we start with an LP formulation to compute the defender’s optimal strategy in the presence of leakage. Perhaps surprisingly, we show that a key subproblem to solve this LP (more precisely, the defender oracle) is NP-hard even for the simplest of security game models. We then approach the problem from three possible directions: efficient algorithms for restricted cases, approximation algorithms, and heuristic algorithms for sampling that improves upon the status quo. Our experiments confirm the necessity of handling information leakage and the advantage of our algorithms. %B International Joint Conference on Artificial Intelligence (IJCAI 2015) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015) %D 2015 %T Three Fundamental Pillars of Multi-agent Team Formation (Doctoral Consortium) %A L. S. Marcolino %A M. Tambe %X Teams of voting agents are a powerful tool for solving complex problems. When forming such teams, there are three fundamental issues that must be addressed: (i) Selecting which agents should form a team; (ii) Aggregating the opinions of the agents; (iii) Assessing the performance of a team. In this thesis we address all these points. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015) %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI 2015) %D 2015 %T Unleashing the Power of Multi-agent Voting Teams (Doctoral Consortium) %A L. S. Marcolino %A M. Tambe %X Teams of voting agents have great potential in finding optimal solutions. However, there are fundamental challenges to effectively use such teams: (i) selecting agents; (ii) aggregating opinions; (iii) assessing performance. I address all these challenges, with theoretical and experimental contributions. %B International Joint Conference on Artificial Intelligence (IJCAI 2015) %G eng %0 Conference Paper %B Computational Sustainability Workshop at AAAI’15, Texas, Austin %D 2015 %T Effectiveness of Probability Perception Modeling and Defender Strategy Generation Algorithms in Repeated Stackelberg Games: An Initial Report %A D. Kar %A F. Fang %A F. Delle Fave %A N. Sintov %A M. Tambe %A A. Van Wissen %X While human behavior models based on repeated Stackelberg games have been proposed for domains such as “wildlife crime” where there is repeated interaction between the defender and the adversary, there has been no empirical study with human subjects to show the effectiveness of such models. This paper presents an initial study based on extensive human subject experiments with participants on Amazon Mechanical Turk (AMT). Our findings include: (i) attackers may view the defender’s coverage probability in a non-linear fashion; specifically it follows an S-shaped curve, and (ii) there are significant losses in defender utility when strategies generated by existing models are deployed in repeated Stackelberg game settings against human subjects. %B Computational Sustainability Workshop at AAAI’15, Texas, Austin %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015) %D 2015 %T A Game of Thrones: When Human Behavior Models Compete in Repeated Stackelberg Security Games %A Kar, Debarun %A Fang, Fei %A Delle Fave, Francesco %A Sintov, Nicole %A Tambe, Milind %X Several competing human behavior models have been proposed to model and protect against boundedly rational adversaries in repeated Stackelberg security games (SSGs). However, these existing models fail to address three main issues which are extremely detrimental to defender performance. First, while they attempt to learn adversary behavior models from adversaries’ past actions (“attacks on targets”), they fail to take into account adversaries’ future adaptation based on successes or failures of these past actions. Second, they assume that sufficient data in the initial rounds will lead to a reliable model of the adversary. However, our analysis reveals that the issue is not the amount of data, but that there just is not enough of the attack surface exposed to the adversary to learn a reliable model. Third, current leading approaches have failed to include probability weighting functions, even though it is well known that human beings’ weighting of probability is typically nonlinear. The first contribution of this paper is a new human behavior model, SHARP, which mitigates these three limitations as follows: (i) SHARP reasons based on success or failure of the adversary’s past actions on exposed portions of the attack surface to model adversary adaptiveness; (ii) SHARP reasons about similarity between exposed and unexposed areas of the attack surface, and also incorporates a discounting parameter to mitigate adversary’s lack of exposure to enough of the attack surface; and (iii) SHARP integrates a non-linear probability weighting function to capture the adversary’s true weighting of probability. Our second contribution is a first “longitudinal study” – at least in the context of SSGs – of competing models in settings involving repeated interaction between the attacker and the defender. This study, where each experiment lasted a period of multiple weeks with individual sets of human subjects, illustrates the strengths and weaknesses of different models and shows the advantages of SHARP. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015) %G eng %0 Conference Paper %B Conference on Decision and Game Theory for Security %D 2015 %T Making the most of Our Regrets: Regret-based Solutions to Handle Payoff Uncertainty and Elicitation in Green Security Games %A Thanh H. Nguyen %A Francesco M. Delle Fave %A Kar, Debarun %A Aravind S. Lakshminarayanan %A Amulya Yadav %A Tambe, Milind %A Agmon, Noa %A Andrew J. Plumptre %A Driciru, Margaret %A Wanyama, Fred %A Rwetsiba, Aggrey %X Recent research on Green Security Games (GSG), i.e., security games for the protection of wildlife, forest and fisheries, relies on the promise of an abundance of available data in these domains to learn adversary behavioral models and determine game payoffs. This research suggests that adversary behavior models (capturing bounded rationality) can be learned from real-world data on where adversaries have attacked, and that game payoffs can be determined precisely from data on animal densities. However, previous work has, as yet, failed to demonstrate the usefulness of these behavioral models in capturing adversary behaviors based on real-world data in GSGs. Previous work has also been unable to address situations where available data is insufficient to accurately estimate behavioral models or to obtain the required precision in the payoff values. In addressing these limitations, as our first contribution, this paper, for the first time, provides validation of the aforementioned adversary behavioral models based on real-world data from a wildlife park in Uganda. Our second contribution addresses situations where real-world data is not precise enough to determine exact payoffs in GSG, by providing the first algorithm to handle payoff uncertainty in the presence of adversary behavioral models. This algorithm is based on the notion of minimax regret. Furthermore, in scenarios where the data is not even sufficient to learn adversary behaviors, our third contribution is to provide a novel algorithm to address payoff uncertainty assuming a perfectly rational attacker (instead of relying on a behavioral model); this algorithm allows for a significant scaleup for large security games. Finally, to reduce the problems due to paucity of data, given mobile sensors such as Unmanned Aerial Vehicles (UAV), we introduce new payoff elicitation strategies to strategically reduce uncertainty. %B Conference on Decision and Game Theory for Security %G eng %0 Conference Paper %B Conference on Innovative Applications of Artificial Intelligence (IAAI-15) %D 2015 %T Preventing HIV Spread in Homeless Populations Using PSINET %A Amulya Yadav %A Marcolino, Leandro %A Eric Rice %A Petering, Robin %A Winetrobe, Hailey %A Rhoades, Harmony %A Tambe, Milind %A Carmichael, Heather %X Homeless youth are prone to HIV due to their engagement in high risk behavior. Many agencies conduct interventions to educate/train a select group of homeless youth about HIV prevention practices and rely on word-of-mouth spread of information through their social network. Previous work in strategic selection of intervention participants does not handle uncertainties in the social network’s structure and in the evolving network state, potentially causing significant shortcomings in spread of information. Thus, we developed PSINET, a decision support system to aid the agencies in this task. PSINET includes the following key novelties: (i) it handles uncertainties in network structure and evolving network state; (ii) it addresses these uncertainties by using POMDPs in influence maximization; (iii) it provides algorithmic advances to allow high quality approximate solutions for such POMDPs. Simulations show that PSINET achieves ∼60% more information spread over the current state-of-the-art. PSINET was developed in collaboration with My Friend’s Place (a drop-in agency serving homeless youth in Los Angeles) and is currently being reviewed by their officials. %B Conference on Innovative Applications of Artificial Intelligence (IAAI-15) %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2015 %T When Security Games Go Green: Designing Defender Strategies to Prevent Poaching and Illegal Fishing %A Fang, Fei %A Stone, Peter %A Tambe, Milind %X Building on the successful applications of Stackelberg Security Games (SSGs) to protect infrastructure, researchers have begun focusing on applying game theory to green security domains such as protection of endangered animals and fish stocks. Previous efforts in these domains optimize defender strategies based on the standard Stackelberg assumption that the adversaries become fully aware of the defender’s strategy before taking action. Unfortunately, this assumption is inappropriate since adversaries in green security domains often lack the resources to fully track the defender strategy. This paper (i) introduces Green Security Games (GSGs), a novel game model for green security domains with a generalized Stackelberg assumption; (ii) provides algorithms to plan effective sequential defender strategies — such planning was absent in previous work; (iii) proposes a novel approach to learn adversary models that further improves defender performance; and (iv) provides detailed experimental analysis of proposed approaches. %B International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Conference Paper %B International Conference on Automated Planning and Scheduling (ICAPS) %D 2014 %T Computing Solutions in Infinite-Horizon Discounted Adversarial Patrolling Game %A Y. Vorobeychik %A B. An %A M. Tambe %A S. Singh %X Stackelberg games form the core of a number of tools deployed for computing optimal patrolling strategies in adversarial domains, such as the US Federal Air Marshall Service and the US Coast Guard. In traditional Stackelberg security game models the attacker knows only the probability that each target is covered by the defender, but is oblivious to the detailed timing of the coverage schedule. In many real-world situations, however, the attacker can observe the current location of the defender and can exploit this knowledge to reason about the defender’s future moves. We show that this general modeling framework can be captured using adversarial patrolling games (APGs) in which the defender sequentially moves between targets, with moves constrained by a graph, while the attacker can observe the defender’s current location and his (stochastic) policy concerning future moves. We offer a very general model of infinite-horizon discounted adversarial patrolling games. Our first contribution is to show that defender policies that condition only on the previous defense move (i.e., Markov stationary policies) can be arbitrarily suboptimal for general APGs. We then offer a mixed-integer nonlinear programming (MINLP) formulation for computing optimal randomized policies for the defender that can condition on history of bounded, but arbitrary, length, as well as a mixed-integer linear programming (MILP) formulation to approximate these, with provable quality guarantees. Additionally, we present a non-linear programming (NLP) formulation for solving zero-sum APGs. We show experimentally that MILP significantly outperforms the MINLP formulation, and is, in turn, significantly outperformed by the NLP specialized to zero-sum games. %B International Conference on Automated Planning and Scheduling (ICAPS) %G eng %0 Conference Paper %B Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2014 %T PAWS: Adaptive Game-theoretic Patrolling for Wildlife Protection (Demonstration) %A Ford, Benjamin %A Kar, Debarun %A Francesco M. Delle Fave %A Yang, Rong %A Tambe, Milind %X Endangered species around the world are in danger of extinction from poaching. From the start of the 20th century, the African rhino population has dropped over 98% [4] and the global tiger population has dropped over 95% [5], resulting in multiple species extinctions in both groups. Species extinctions have negative consequences on local ecosystems, economies, and communities. To protect these species, countries have set up conservation agencies and national parks, such as Uganda’s Queen Elizabeth National Park (QENP). However, a common lack of funding for these agencies results in a lack of law enforcement resources to protect these large, rural areas. As an example of the scale of disparity, one wildlife crime study in 2007 reported an actual coverage density of one ranger per 167 square kilometers [2]. Because of the hazards involved (e.g., armed poachers, wild animals), rangers patrol in groups, further increasing the amount of area they are responsible for patrolling. Security game research has typically been concerned with combating terrorism, and this field has indeed benefited from a range of successfully deployed applications [1, 6]. These applications have enabled security agencies to make more efficient use of their limited resources. In this previous research, adversary data has been absent during the development of these solutions, and thus, it has been difficult to make accurate adversary behavior models during algorithm development. In a domain such as wildlife crime, interactions with the adversary are frequent and repeated, thus enabling conservation agencies to collect data. This presence of data enables security game researchers to begin developing algorithms that incorporate this data into, potentially, more accurate behavior models and consequently better security solutions. Developed in conjunction with staff at QENP, the Protection Assistant for Wildlife Security (PAWS) generates optimized defender strategies for use by park rangers [7]. Due to the repeated nature of wildlife crime, PAWS is able to leverage crime event data - a previously unrealized capability in security games research. Thus, PAWS implements a novel adaptive algorithm that processes crime event data, builds multiple human behavior models, and, based on those models, predicts where adversaries will attack next. These predictions are then used to generate a patrol strategy for the rangers (i.e., a set of patrol waypoints) that can be viewed on a GPS unit. Against this background, the demonstration presented in this paper introduces two contributions. First, we present the PAWS system which incorporates the algorithm in [7] into a scheduling system and a GPS visualizer. Second, we present a software interface to run a number of human subject experiments (HSE) to evaluate and improve the efficacy of PAWS before its deployment in QENP. By conducting these HSEs, we can (i) test the PAWS algorithms with repeated interactions with humans, thus providing a more realistic testing environment than in its previous simulations; (ii) generate data that can be used to initialize PAWS’s human behavior models for deployment, and (iii) compare the current PAWS algorithms’ performance to alternatives and determine if additional improvements are needed prior to deployment. To provide proper context for the presentation, this paper also presents a brief overview of the PAWS system data flow and its adaptive algorithms. The demonstration will engage audience members by having them participate in the HSEs and using the GPS unit to visualize a patrol schedule in QENP. %B Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B Conference on Decision and Game Theory for Security (GameSec) %D 2014 %T Addressing Scalability and Robustness in Security Games with Multiple Boundedly Rational Adversaries %A Brown, Matthew %A William B. Haskell %A Tambe, Milind %X Boundedly rational human adversaries pose a serious challenge to security because they deviate from the classical assumption of perfect rationality. An emerging trend in security game research addresses this challenge by using behavioral models such as quantal response (QR) and subjective utility quantal response (SUQR). These models improve the quality of the defender’s strategy by more accurately modeling the decisions made by real human adversaries. Work on incorporating human behavioral models into security games has typically followed two threads. The first thread, scalability, seeks to develop efficient algorithms to design patrols for large-scale domains that protect against a single adversary. However, this thread cannot handle the common situation of multiple adversary types with heterogeneous behavioral models. Having multiple adversary types introduces considerable uncertainty into the defender’s planning problem. The second thread, robustness, uses either Bayesian or maximin approaches to handle this uncertainty caused by multiple adversary types. However, the robust approach has so far not been able to scale up to complex, large-scale security games. Thus, each of these two threads alone fails to work in key real world security games. Our present work addresses this shortcoming and merges these two research threads to yield a scalable and robust algorithm, MIDAS (MaxImin Defense Against SUQR), for generating game-theoretic patrols to defend against multiple boundedly rational human adversaries. Given the size of the defender’s optimization problem, the key component of MIDAS is incremental cut and strategy generation using a master/slave optimization approach. Innovations in MIDAS include (i) a maximin mixed-integer linear programming formulation in the master and (ii) a compact transition graph formulation in the slave. Additionally, we provide a theoretical analysis of our new model and report its performance in simulations. In collaboration with the United States Coast Guard (USCG), we consider the problem of defending fishery stocks from illegal fishing in the Gulf of Mexico and use MIDAS to handle heterogeneity in adversary types (i.e., illegal fishermen) in order to construct robust patrol strategies for USCG assets. %B Conference on Decision and Game Theory for Security (GameSec) %G eng %0 Conference Paper %B 8th Multidisciplinary Workshop on Advances in Preference Handling (M-PREF 2014) %D 2014 %T Aggregating Opinions to Design Energy-Efficient Buildings %A L. S. Marcolino %A B. Kolev %A S. Price %A S. P. Veetil %A D. Gerber %A J. Musil %A M. Tambe %X In this research-in-progress paper we present a new real world domain for studying the aggregation of different opinions: early stage architectural design of buildings. This is an important real world application, not only because building design and construction is one of the world’s largest industries measured by global expenditures, but also because the early stage design decision making has a significant impact on the energy consumption of buildings. We present a mapping between the domain of architecture and engineering research and that of the agent models present in the literature. We study the importance of forming diverse teams when aggregating the opinions of different agents for architectural design, and also the effect of having agents optimizing for different factors of a multi-objective optimization design problem. We find that a diverse team of agents is able to provide a higher number of top ranked solutions for the early stage designer to choose from. Finally, we present the next steps for a deeper exploration of our questions. %B 8th Multidisciplinary Workshop on Advances in Preference Handling (M-PREF 2014) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2014 %T Building THINC: User Incentivization and Meeting Rescheduling for Energy Savings %A Kwak, Jun-young %A Kar, Debarun %A Haskell, William %A Varakantham, Pradeep %A Tambe, Milind %X This paper presents THINC, an agent developed for saving energy in real-world commercial buildings. While previous work has presented techniques for computing energy-efficient schedules, it fails to address two issues, centered on human users, that are essential in real-world agent deployments: (i) incentivizing users for their energy saving activities and (ii) interacting with users to reschedule key “energy-consuming” meetings in a timely fashion, while handling the uncertainty in such interactions. THINC addresses these shortcomings by providing four new major contributions. First, THINC computes fair division of credits from energy savings. For this fair division, THINC provides novel algorithmic advances for efficient computation of Shapley value. Second, THINC includes a novel robust algorithm to optimally reschedule identified key meetings addressing user interaction uncertainty. Third, THINC provides an end-to-end integration within a single agent of energy efficient scheduling, rescheduling and credit allocation. Finally, we deploy THINC in the real-world as a pilot project at one of the main libraries at the University of Southern California and present results illustrating the benefits in saving energy. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Journal Article %J Journal of Information Processing(JIP), (Invited article) %D 2014 %T Computational Game Theory for Security and Sustainability %A A. Jiang %A M. Jain %A M. Tambe %X Security is a critical concern around the world that arises in protecting our ports, airports, transportation and other critical national infrastructure from adversaries, in protecting our wildlife and forests from poachers and smugglers, and in curtailing the illegal flow of weapons, drugs and money; and it arises in problems ranging from physical to cyber-physical systems. In all of these problems, we have limited security resources which prevent full security coverage at all times; instead, security resources must be deployed intelligently taking into account differences in priorities of targets requiring security coverage, the responses of the attackers to the security posture, and potential uncertainty over the types, capabilities, knowledge and priorities of attackers faced. Game theory, which studies interactions among multiple selfinterested agents, is well-suited to the adversarial reasoning required for security resource allocation and scheduling problems. Casting the problem as a Bayesian Stackelberg game, we have developed new algorithms for efficiently solving such games that provide randomized patrolling or inspection strategies. These algorithms have led to some initial successes in this challenging problem arena, leading to advances over previous approaches in security scheduling and allocation, e.g., by addressing key weaknesses of predictability of human schedulers. These algorithms are now deployed in multiple applications: ARMOR has been deployed at the Los Angeles International Airport (LAX) since 2007 to randomize checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals [17]; IRIS, a game-theoretic scheduler for randomized deployment of the US Federal Air Marshals (FAMS) requiring significant scaleup in underlying algorithms, has been in use since 2009 [17]; PROTECT, which schedules the US Coast Guard’s randomized patrolling of ports using a new set of algorithms based on modeling bounded-rational human attackers, has been deployed in the port of Boston since April 2011 and is in use at the port of New York since February 2012 [34], and is headed for nationwide deployment; another application for deploying escort boats to protect ferries has been deployed by the US Coast Guard since April 2013 [10]; GUARDS is under evaluation for national deployment by the US Transportation Security Administration (TSA) [32], and TRUSTS [43] has been evaluated in field trials by the Los Angeles Sheriffs Department (LASD) in the LA Metro system and a nation-wide deployment is now being evaluated at TSA. These initial successes point the way to major future applications in a wide range of security domains; with major research challenges in scaling up our game-theoretic algorithms, in addressing human adversaries’ bounded rationality and uncertainties in action execution and observation, as well as in multiagent learning. This paper will provide an overview of the models and algorithms, key research challenges and a brief description of our successful deployments. %B Journal of Information Processing(JIP), (Invited article) %V 22 %P 176-185 %G eng %N 2 %0 Conference Paper %B AAAI Spring Symposium on Applied Computational Game Theory %D 2014 %T Computational game theory for security: Progress and challenges %A Tambe, Milind %A Jiang, Albert %A An, Bo %A Jain, Manish %X The goal of this paper is to (re)introduce a real-world challenge problem for researchers in multiagent systems and beyond, where our collective efforts may have a significant impact on activities in the real-world. The challenge is in applying game theory for security: our goal is to not only introduce the research challenges for algorithmic and behavioral game theory in service of this problem, but also to provide initial exemplars of successes of deployed systems, and to challenges introduced by these deployments of computational game theory in the field. We also wish to provide an overview of key open research challenges and pointers to getting started in this research. %B AAAI Spring Symposium on Applied Computational Game Theory %G eng %0 Conference Paper %B International Joint Workshop on Optimization in Multi-Agent Systems and Distributed Constraint Reasoning (OPTMAS-DCR) In Conjunction with AAMAS 2014 %D 2014 %T Computing Minimax Strategy for Discretized Spatio-Temporal Zero-Sum Security Games %A Haifeng Xu %A Fang, Fei %A Xin Jiang, Albert %A Vincent Conitzer %A Dughmi, Shaddin %A Tambe, Milind %X Among the many deployment areas of Stackelberg Security games, a major area involves games played out in space and time, which includes applications in multiple mobile defender resources protecting multiple mobile targets. Previous algorithms for such spatio-temporal security games fail to scale-up and little is known of the computational complexity properties of these problems. This paper provides a novel oracle-based algorithmic framework for a systematic study of different problem variants of computing optimal (minimax) strategies in spatio-temporal security games. Our framework enables efficient computation of a minimax strategy when the problem admits a polynomial-time oracle. Furthermore, for the cases in which efficient oracles are difficult to find, we propose approximations or prove hardness results. %B International Joint Workshop on Optimization in Multi-Agent Systems and Distributed Constraint Reasoning (OPTMAS-DCR) In Conjunction with AAMAS 2014 %C Paris, France %G eng %0 Conference Paper %B In Conference on Decision and Game Theory for Security (Gamesec) 2014 %D 2014 %T Defending Against Opportunistic Criminals: New Game-Theoretic Frameworks and Algorithms %A Zhang, Chao %A Xin Jiang, Albert %A Martin B. Short %A Jeffrey P. Brantingham %A Tambe, Milind %X This paper introduces a new game-theoretic framework and algorithms for addressing opportunistic crime. The Stackelberg Security Game (SSG), which models highly strategic and resourceful adversaries, has become an important computational framework within multiagent systems. Unfortunately, SSG is ill-suited as a framework for handling opportunistic crimes, which are committed by criminals who are less strategic in planning attacks and more flexible in executing them than SSG assumes. Yet, opportunistic crime is what is commonly seen in most urban settings.We therefore introduce the Opportunistic Security Game (OSG), a computational framework to recommend deployment strategies for defenders to control opportunistic crimes. Our first contribution in OSG is a novel model for opportunistic adversaries, who (i) opportunistically and repeatedly seek targets; (ii) react to real-time information at execution time rather than planning attacks in advance; and (iii) have limited observation of defender strategies. Our second contribution to OSG is a new exact algorithm EOSG to optimize defender strategies given our opportunistic adversaries. Our third contribution is the development of a fast heuristic algorithm to solve largescale OSG problems, exploiting a compact representation.We use urban transportation systems as a critical motivating domain, and provide detailed experimental results based on a real-world system. %B In Conference on Decision and Game Theory for Security (Gamesec) 2014 %G eng %0 Conference Paper %B Coordination, Organizations, Institutions and Norms in Agent Systems IX. Springer-Verlag Lecture Notes in AI %D 2014 %T A Detailed Analysis of a Multi-agent Diverse Team %A Marcolino, Leandro Soriano %A Zhang, Chao %A Xin Jiang, Albert %A Tambe, Milind %E T. Balke %E A. Chopra %E F. Dignum %E B. van Riemsdijk %X In an open system we can have many different kinds of agents. However, it is a challenge to decide which agents to pick when forming multi-agent teams. In some scenarios, agents coordinate by voting continuously. When forming such teams, should we focus on the diversity of the team or on the strength of each member? Can a team of diverse (and weak) agents outperform a uniform team of strong agents? We propose a new model to address these questions. Our key contributions include: (i) we show that a diverse team can overcome a uniform team and we give the necessary conditions for it to happen; (ii) we present optimal voting rules for a diverse team; (iii) we perform synthetic experiments that demonstrate that both diversity and strength contribute to the performance of a team; (iv) we show experiments that demonstrate the usefulness of our model in one of the most difficult challenges for Artificial Intelligence: Computer Go. %B Coordination, Organizations, Institutions and Norms in Agent Systems IX. Springer-Verlag Lecture Notes in AI %G eng %0 Conference Paper %B 28th Neural Information Processing Systems Conference (NIPS) %D 2014 %T Diverse Randomized Agents Vote to Win %A A.X. Jiang %A L. S. Marcolino %A A. D. Procaccia %A T. Sandholm %A N. Shah %A M. Tambe %X We investigate the power of voting among diverse, randomized software agents. With teams of computer Go agents in mind, we develop a novel theoretical model of two-stage noisy voting that builds on recent work in machine learning. This model allows us to reason about a collection of agents with different biases (determined by the first-stage noise models), which, furthermore, apply randomized algorithms to evaluate alternatives and produce votes (captured by the secondstage noise models). We analytically demonstrate that a uniform team, consisting of multiple instances of any single agent, must make a significant number of mistakes, whereas a diverse team converges to perfection as the number of agents grows. Our experiments, which pit teams of computer Go agents against strong agents, provide evidence for the effectiveness of voting when agents are diverse. %B 28th Neural Information Processing Systems Conference (NIPS) %G eng %0 Journal Article %J Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS) %D 2014 %T Efficient Solutions for Joint Activity Based Security Games: Fast Algorithms, Results and a Field Experiment on a Transit System %A Francesco M. Delle Fave %A Shieh, Eric %A Jain, Manish %A Xin Jiang, Albert %A Rosoff, Heather %A Tambe, Milind %A John P. Sullivan %X In recent years, several security agencies have been deploying scheduling systems based on algorithmic advances in Stackelberg security games (SSGs). Unfortunately, none of the existing algorithms can scale up to domains where benefits are accrued from multiple defender resources performing jointly coordinated activities. Yet in many domains, including port patrolling where SSGs are in use, enabling multiple defender resources to perform jointly coordinated activities would significantly enhance the effectiveness of the patrols. To address this challenge, this paper presents four contributions. First, we present Smart (Security games with Multiple coordinated Activities and Resources that are Time-dependent), a novel SSG model that explicitly represents jointly coordinated activities between defender’s resources. Second,we present two branch-and-price algorithms, SmartO—an optimal algorithm, and SmartH— a heuristic approach, to solve Smart instances. The two algorithms present three novel features: (i) a novel approach to generate individual defender strategies by ordering the search space during column generation using insights from the Traveling Salesman Problem(TSP); (ii) exploitation of iterative modification of rewards of multiple defender resources to generate coordinated strategies and (iii) generation of tight upper bounds for pruning using the structure of the problem. Third, we present an extensive empirical and theoretical analysis of both SmartO and SmartH. Fourth, we describe a large scale real-world experiment whereby we run the first head-to-head comparison between game-theoretic schedules generated using SmartH against schedules generated by humans on a one-day patrol exercise over one train line of the Los Angeles Metro System. Our results show that game-theoretic schedules were evaluated to be superior to ones generated by humans. %B Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS) %G eng %0 Journal Article %J Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS) %D 2014 %T An Extended Study on Multi-Objective Security Games %A Brown, Matthew %A An, Bo %A Kiekintveld, Christopher %A Ordonez, Fernando %A Tambe, Milind %X The burgeoning area of security games has focused on real-world domains where security agencies protect critical infrastructure from a diverse set of adaptive adversaries. In such domains, decision makers have multiple competing objectives they must consider which may take different forms that are not readily comparable including safety, cost, and public perception. Thus, it can be difficult to know how to weigh the different objectives when deciding on a security strategy. To address the challenges of these domains, we propose a fundamentally different solution concept, multi-objective security games (MOSG). Instead of a single optimal solution, MOSGs have a set of Pareto optimal (non-dominated) solutions referred to as the Pareto frontier, which can be generated by solving a sequence of constrained single-objective optimization problems (CSOP). The Pareto frontier allows the decision maker to analyze the tradeoffs that exist between the multiple objectives. Our contributions include: (i) an algorithm, Iterative--Constraints, for generating the sequence of CSOPs; (ii) an exact approach for solving an MILP formulation of a CSOP; (iii) heuristics that achieve speed up by exploiting the structure of security games to further constrain the MILP; (iv) an approximate approach for solving a CSOP built off those same heuristics, increasing the scalability of our approach with quality guarantees. Additional contributions of this paper include proofs on the level of approximation, detailed experimental evaluation of the proposed approaches and heuristics, as well as a discussion on techniques for visualizing the Pareto frontier. %B Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS) %V 28 %P 31-71 %G eng %N 1 %0 Journal Article %J Journal of Artificial Intelligence Research %D 2014 %T Game-theoretic Security Patrolling with Dynamic Execution Uncertainty and a Case Study on a Real Transit System %A F. M. Delle Fave %A A.X. Jiang %A Z. Yin %A Zhang, C. %A M. Tambe %A Kraus, S. %A J.P. Sullivan %X Attacker-Defender Stackelberg security games (SSGs) have emerged as an important research area in multi-agent systems. However, existing SSGs models yield fixed, static, schedules which fail in dynamic domains where defenders face execution uncertainty, i.e., in domains where defenders may face unanticipated disruptions of their schedules. A concrete example is an application involving checking fares on trains, where a defender’s schedule is frequently interrupted by fare evaders, making static schedules useless. To address this shortcoming, this paper provides four main contributions. First, we present a novel general Bayesian Stackelberg game model for security resource allocation in dynamic uncertain domains. In this new model, execution uncertainty is handled by using a Markov decision process (MDP) for generating defender policies. Second, we study the problem of computing a Stackelberg equilibrium for this game and exploit problem structure to reduce it to a polynomial-sized optimization problem. Shifting to evaluation, our third contribution shows in simulation that our MDP-based policies overcome the failures of previous SSG algorithms. In so doing, we can now build a complete system, that enables handling of schedule interruptions and, consequently, to conduct some of the first controlled experiments on SSGs in the field. Hence, as our final contribution, we present results from a real-world experiment on Metro trains in Los Angeles validating our MDPbased model, and most importantly, concretely measuring the benefits of SSGs for security resource allocation. %B Journal of Artificial Intelligence Research %P 321-367 %G eng %N 50 %0 Journal Article %J In The Computer Journal %D 2014 %T Game-Theoretic Target Selection in Contagion-based Domains %A J. Tsai %A T. Nguyen %A N. Weller %A M. Tambe %X Many strategic actions carry a ‘contagious’ component beyond the immediate locale of the effort itself. Viral marketing and peacekeeping operations have both been observed to have a spreading effect. In this work, we use counterinsurgency as our illustrative domain. Defined as the effort to block the spread of support for an insurgency, such operations lack the manpower to defend the entire population and must focus on the opinions of a subset of local leaders. As past researchers of security resource allocation have done, we propose using game theory to develop such policies and model the interconnected network of leaders as a graph. Unlike this past work in security games, actions in these domains possess a probabilistic, nonlocal impact. To address this new class of security games, we combine recent research in influence blocking maximization with a double oracle approach and create novel heuristic oracles to generate mixed strategies for a real-world leadership network from Afghanistan, synthetic leadership networks, and scale-free graphs. We find that leadership networks that exhibit highly interconnected clusters can be solved equally well by our heuristic methods, but our more sophisticated heuristics outperform simpler ones in less interconnected scale-free graphs. %B In The Computer Journal %V 57 %P 893-905 %G eng %N 6 %0 Conference Paper %B 28th Conference on Artificial Intelligence (AAAI 2014) %D 2014 %T Give a Hard Problem to a Diverse Team: Exploring Large Action Spaces %A L. S. Marcolino %A Xu, H. %A A.X. Jiang %A M. Tambe %A E. Bowring %X Recent work has shown that diverse teams can outperform a uniform team made of copies of the best agent. However, there are fundamental questions that were not asked before. When should we use diverse or uniform teams? How does the performance change as the action space or the teams get larger? Hence, we present a new model of diversity for teams, that is more general than previous models. We prove that the performance of a diverse team improves as the size of the action space gets larger. Concerning the size of the diverse team, we show that the performance converges exponentially fast to the optimal one as we increase the number of agents. We present synthetic experiments that allow us to gain further insights: even though a diverse team outperforms a uniform team when the size of the action space increases, the uniform team will eventually again play better than the diverse team for a large enough action space. We verify our predictions in a system of Go playing agents, where we show a diverse team that improves in performance as the board size increases, and eventually overcomes a uniform team. %B 28th Conference on Artificial Intelligence (AAAI 2014) %C Québec, Canada %G eng %0 Thesis %D 2014 %T Human Adversaries in Security Games: Integrating Models of Bounded Rationality and Fast Algorithms %A Yang, Rong %X Security is a world-wide concern in a diverse set of settings, such as protecting ports, airport and other critical infrastructures, interdicting the illegal flow of drugs, weapons and money, preventing illegal poaching/hunting of endangered species and fish, suppressing crime in urban areas and securing cyberspace. Unfortunately, with limited security resources, not all the potential targets can be protected at all times. Game-theoretic approaches — in the form of ”security games” — have recently gained significant interest from researchers as a tool for analyzing real-world security resource allocation problems leading to multiple deployed systems in day-to-day use to enhance security of US ports, airports and transportation infrastructure. One of the key challenges that remains open in enhancing current security game applications and enabling new ones originates from the perfect rationality assumption of the adversaries — an assumption may not hold in the real world due to the bounded rationality of human adversaries and hence could potentially reduce the effectiveness of solutions offered. My thesis focuses on addressing the human decision-making in security games. It seeks to bridge the gap between two important subfields in game theory: algorithmic game theory and behavioral game theory. The former focuses on efficient computation of equilibrium solution concepts, and the latter develops models to predict the behaviors of human players in various game settings. More specifically, I provide: (i) the answer to the question of which of the existing models best represents the salient features of the security problems, by empirically exploring different human behavioral models from the literature; (ii) algorithms to efficiently compute the resource allocation strategies for the security agencies considering these new models of the adversaries; (iii) real-world deployed systems that range from security of ports to wildlife security. %G eng %9 PhD thesis %0 Conference Paper %B In AAMAS 2014 Workshop on Adaptive Learning Agents (ALA) %D 2014 %T Online Learning and Planning in Resource Conservation Games %A Qian, Yundi %A William B. Haskell %A Xin Jiang, Albert %A Tambe, Milind %X Protecting our environment and natural resources is a major global challenge. “Protectors” (law enforcement agencies) try to protect these natural resources, while “extractors” (criminals) seek to exploit them. In many domains, such as illegal fishing, the extractors know more about the distribution and richness of the resources than the protectors, making it extremely difficult for the protectors to optimally allocate their assets for patrol and interdiction. Fortunately, extractors carry out frequent illegal extractions, so protectors can learn the richness of resources by observing the extractor’s behavior. This paper presents an approach for allocating protector assets based on learning from extractors. We make the following four specific contributions: (i) we model resource conservation as a repeated game and transform this repeated game into a POMDP, which cannot be solved by the latest general POMDP solvers due to its exponential state space; (ii) in response, we propose GMOP, a dedicated algorithm that combines Gibbs sampling with Monte Carlo tree search for online planning in this POMDP; (iii) for a specific class of our game, we speed up the GMOP algorithm without sacrificing solution quality, as well as provide a heuristic that trades off solution quality for lower computational cost; (iv) we explore the continuous utility scenario where the POMDP becomes a continuous-state POMDP, and provide a solution in special cases. %B In AAMAS 2014 Workshop on Adaptive Learning Agents (ALA) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2014) %D 2014 %T Online Planning for Optimal Protector Strategies in Resource Conservation Games %A Qian, Yundi %A William B. Haskell %A Xin Jiang, Albert %A Tambe, Milind %X Protecting our environment and natural resources is a major global challenge. “Protectors” (law enforcement agencies) try to protect these natural resources, while “extractors” (criminals) seek to exploit them. In many domains, such as illegal fishing, the extractors know more about the distribution and richness of the resources than the protectors, making it extremely difficult for the protectors to optimally allocate their assets for patrol and interdiction. Fortunately, extractors carry out frequent illegal extractions, so protectors can learn about the richness of resources by observing the extractor’s behavior. This paper presents an approach for allocating protector assets based on learning from extractors. We make the following four specific contributions: (i) we model resource conservation as a repeated game; (ii) we transform this repeated game into a POMDP by adopting a fixed model for the adversary’s behavior, which cannot be solved by the latest general POMDP solvers due to its exponential state space; (iii) in response, we propose GMOP, a dedicated algorithm that combines Gibbs sampling with Monte Carlo tree search for online planning in this POMDP; (iv) for a specific class of our game, we can speed up the GMOP algorithm without sacrificing solution quality, as well as provide a heuristic that trades off solution quality for lower computational cost. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2014) %G eng %0 Conference Paper %B International Joint Workshop on Optimization in Multi-Agent Systems and Distributed Constraint Reasoning (OPTMAS-DCR) In Conjunction with AAMAS 2014 %D 2014 %T Opportunistic Security Game: An Initial Report %A Zhang, Chao %A Xin Jiang, Albert %A Short, Martin %A Jeffrey Brantingham %A Tambe, Milind %X This paper introduces a new game-theoretic framework and algorithms for addressing opportunistic crime. Stackelberg Security Game (SSG), focused on highly strategic and resourceful adversaries, has become an important computational framework within multiagent systems. Unfortunately, SSG is ill-suited as a framework for handling opportunistic crimes, which are committed by criminals who are less strategic in planning attacks and more flexible in executing them than SSG assumes. Yet, opportunistic crime is what is commonly seen in most urban settings. We therefore introduce Opportunistic Security Game (OSG), a computational framework to recommend deployment strategies for defenders to control opportunistic crimes. Our first contribution in OSG is a novel model for opportunistic adversaries, who (i) opportunistically and repeatedly seek targets; (ii) react to real-time information at execution time rather than planning attacks in advance; and (iii) have limited observation of defender strategies. Our second contribution to OSG is a new exact algorithm EOSG to optimize defender strategies given our opportunistic adversaries. Our third contribution is the development of a fast heuristic algorithm to solve large-scale OSG problems, exploiting a compact representation. We use urban transportation systems as a critical motivating domain, and provide detailed experimental results based on a real-world system. %B International Joint Workshop on Optimization in Multi-Agent Systems and Distributed Constraint Reasoning (OPTMAS-DCR) In Conjunction with AAMAS 2014 %C Paris, France %G eng %0 Conference Paper %B National Conference on Artificial Intelligence (AAAI) %D 2014 %T Regret-based Optimization and Preference Elicitation for Stackelberg Security Games with Uncertainty %A Thanh H. Nguyen %A Amulya Yadav %A An, Bo %A Tambe, Milind %A Craig Boutilier %X Stackelberg security games (SSGs) have been deployed in a number of real-world domains. One key challenge in these applications is the assessment of attacker payoffs which may not be perfectly known. Previous work has studied SSGs with uncertain payoffs modeled by interval uncertainty and provided maximin-based robust solutions. In contrast, in this work we propose the use of the less conservative minimax regret decision criterion for such payoff-uncertain SSGs and present the first algorithms for computing minimax regret for SSGs. We also address the challenge of preference elicitation, using minimax regret to develop the first elicitation strategies for SSGs. Experimental results validate the effectiveness of our approaches. %B National Conference on Artificial Intelligence (AAAI) %G eng %0 Conference Paper %B Innovative applications of Artificial Intelligence (IAAI) %D 2014 %T Robust protection of fisheries with COmPASS %A William B. Haskell %A Kar, Debarun %A Fang, Fei %A Tambe, Milind %A Cheung, Sam %A Denicola, Elizabeth %X Fish stocks around the world are in danger from illegal fishing. In collaboration with the U.S. Coast Guard (USCG), we work to defend fisheries from illegal fisherman (henceforth called Lanchas) in the U.S. Gulf of Mexico. We have developed the COmPASS (Conservative Online Patrol ASSistant) system to design USCG patrols against the Lanchas. In this application, we face a population of Lanchas with heterogeneous behavior who fish frequently. We have some data about these Lanchas, but not enough to fit a statistical model. Previous security patrol assistants have focused on counterterrorism in one-shot games where adversaries are assumed to be perfectly rational, and much less data about their behavior is available. COmPASS is novel because: (i) it emphasizes environmental crime; (ii) it is based on a repeated Stackelberg game; (iii) it allows for bounded rationality of the Lanchas and it offers a robust approach against the heterogeneity of the Lancha population; and (iv) it can learn from sparse Lancha data. We report the effectiveness of COmPASS in the Gulf in our numerical experiments based on real fish data. The COmPASS system is to be tested by USCG. %B Innovative applications of Artificial Intelligence (IAAI) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) [Short paper] %D 2014 %T Security Games in the Field: an Initial Study on a Transit System (Extended Abstract) %A F. M. Delle Fave %A Brown, M. %A Zhang, C. %A E. Shieh %A A.X. Jiang %A H. Rosoff %A M. Tambe %A J.P. Sullivan %X Going beyond previous deployments of Stackeleberg security games (SSGs), this paper presents actual results from the field using a novel deployed system referred to as the Multi-Operation Patrol Scheduling System (MOPSS). MOPSS generates patrols for a transit system considering three different threats: fare evasion (FE), terrorism (CT) and crime (CR). In so doing, this paper present four contributions: (i) we propose the first multi-operation patrolling system; (ii) MOPSS is the first system to use Markov decision processes (MDPs) to handle uncertain interruptions in the execution of patrol schedules; (iii) we are the first to deploy a new Opportunistic Security Game model, where the adversary, a criminal, makes opportunistic decisions on when and where to commit crimes and, most importantly (iv) we evaluate MOPSS via real-world deployments, providing some of the first real-world data from security games in the field. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) [Short paper] %G eng %0 Conference Proceedings %B In: Lecture Notes in Artificial Intelligence. Springer. In Press. %D 2014 %T Security Games in the Field: Deployments on a Transit System %A F. M. Delle Fave %A Brown, M. %A Zhang, C. %A E. Shieh %A A.X. Jiang %A H. Rosoff %A M. Tambe %A J.P. Sullivan %X This paper proposes the Multi-Operation Patrol Scheduling System (MOPSS), a new system to generate patrols for transit system. MOPSS is based on five contributions. First, MOPSS is the first system to use three fundamentally different adversary models for the threats of fare evasion, terrorism and crime, generating three significantly different types of patrol schedule. Second, to handle uncertain interruptions in the execution of patrol schedules, MOPSS uses Markov decision processes (MDPs) in its scheduling. Third, MOPSS is the first system to account for joint activities between multiple resources, by employing the well known SMART security game model that tackles coordination between defender’s resources. Fourth, we are also the first to deploy a new Opportunistic Security Game model, where the adversary, a criminal, makes opportunistic decisions on when and where to commit crimes. Our fifth, and most important, contribution is the evaluation of MOPSS via real-world deployments, providing data from security games in the field. %B In: Lecture Notes in Artificial Intelligence. Springer. In Press. %G eng %0 Conference Paper %B 28th Conference on Artificial Intelligence (AAAI 2014) %D 2014 %T Solving Zero-Sum Security Games in Discretized Spatio-Temporal Domains %A Haifeng Xu %A Fang, Fei %A Xin Jiang, Albert %A Vincent Conitzer %A Dughmi, Shaddin %A Tambe, Milind %X Among the many deployment areas of Stackelberg Security games, a major area involves games played out in space and time, which includes applications in multiple mobile defender resources protecting multiple mobile targets. Previous algorithms for such spatio-temporal security games fail to scale-up and little is known of the computational complexity properties of these problems. This paper provides a novel oracle-based algorithmic framework for a systematic study of different problem variants of computing optimal (minimax) strategies in spatio-temporal security games. Our framework enables efficient computation of a minimax strategy when the problem admits a polynomial-time oracle. Furthermore, for the cases in which efficient oracles are difficult to find, we propose approximations or prove hardness results. %B 28th Conference on Artificial Intelligence (AAAI 2014) %C Québec, Canada %G eng %0 Conference Paper %B Innovative applications of Artificial Intelligence (IAAI) %D 2014 %T STREETS: Game-Theoretic Traffic Patrolling with Exploration and Exploitation %A Brown, Matthew %A Saisubramanian, Sandhya %A Varakantham, Pradeep %A Tambe, Milind %X To dissuade reckless driving and mitigate accidents, cities deploy resources to patrol roads. In this paper, we present STREETS, an application developed for the city of Singapore, which models the problem of computing randomized traffic patrol strategies as a defenderattacker Stackelberg game. Previous work on Stackelberg security games has focused extensively on counterterrorism settings. STREETS moves beyond counterterrorism and represents the first use of Stackelberg games for traffic patrolling, in the process providing a novel algorithm for solving such games that addresses three major challenges in modeling and scale-up. First, there exists a high degree of unpredictability in travel times through road networks, which we capture using a Markov Decision Process for planning the patrols of the defender (the police) in the game. Second, modeling all possible police patrols and their interactions with a large number of adversaries (drivers) introduces a significant scalability challenge. To address this challenge we apply a compact game representation in a novel fashion combined with adversary and state sampling. Third, patrol strategies must balance exploitation (minimizing violations) with exploration (maximizing omnipresence), a tradeoff we model by solving a biobjective optimization problem. We present experimental results using real-world traffic data from Singapore. This work is done in collaboration with the Singapore Ministry of Home Affairs and is currently being evaluated by the Singapore Police Force. %B Innovative applications of Artificial Intelligence (IAAI) %G eng %0 Conference Paper %B In 17th International Workshop on Coordination, Organizations, Institutions and Norms (COIN 2014) %D 2014 %T Team Formation in Large Action Spaces %A L. S. Marcolino %A Xu, H. %A A.X. Jiang %A M. Tambe %A E. Bowring %X Recent work has shown that diverse teams can outperform a uniform team made of copies of the best agent. However, there are fundamental questions that were not asked before. When should we use diverse or uniform teams? How does the performance change as the action space or the teams get larger? Hence, we present a new model of diversity for teams, that is more general than previous models. We prove that the performance of a diverse team improves as the size of the action space gets larger. Concerning the size of the diverse team, we show that the performance converges exponentially fast to the optimal one as we increase the number of agents. We present synthetic experiments that allow us to gain further insights: even though a diverse team outperforms a uniform team when the size of the action space increases, the uniform team will eventually again play better than the diverse team for a large enough action space. We verify our predictions in a system of Go playing agents, where we show a diverse team that improves in performance as the board size increases, and eventually overcomes a uniform team. %B In 17th International Workshop on Coordination, Organizations, Institutions and Norms (COIN 2014) %C Paris, France %G eng %0 Journal Article %J Journal of Autonomous Agents and Multiagent Systems, JAAMAS %D 2014 %T TESLA: An Extended Study of an Energy-saving Agent that Leverages Schedule Flexibility %A Kwak, Jun-young %A Varakantham, Pradeep %A Maheswaran, Rajiv %A Chang, Yu-Han %A Tambe, Milind %A Becerik-Gerber, Burcin %A Wood, Wendy %X This paper presents TESLA, an agent for optimizing energy usage in commercial buildings. TESLA’s key insight is that adding flexibility to event/meeting schedules can lead to significant energy savings. This paper provides four key contributions: (i) online scheduling algorithms, which are at the heart of TESLA, to solve a stochastic mixed integer linear program (SMILP) for energy-efficient scheduling of incrementally/dynamically arriving meetings and events; (ii) an algorithm to effectively identify key meetings that lead to significant energy savings by adjusting their flexibility; (iii) an extensive analysis on energy savings achieved by TESLA; and (iv) surveys of real users which indicate that TESLA’s assumptions of user flexibility hold in practice. TESLA was evaluated on data gathered from over 110,000 meetings held at nine campus buildings during an eight month period in 2011–2012 at the University of Southern California (USC) and the Singapore Management University (SMU). These results and analysis show that, compared to the current systems, TESLA can substantially reduce overall energy consumption. %B Journal of Autonomous Agents and Multiagent Systems, JAAMAS %V 28 %P 605-636 %G eng %N 4 %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) [SHORT PAPER] %D 2014 %T Towards a game theoretic approach for defending against crime diffusion %A Zhang, Chao %A Xin Jiang, Albert %A Martin B. Short %A Jeffrey P. Brantingham %A Tambe, Milind %X In urban transportation networks, crime diffuses as criminals travel through the networks and look for illicit opportunities. It is important to first model this diffusion in order to recommend actions or patrol policies to control the diffusion of such crime. Previously, game theory has been used for such patrol policy recommendations, but these applications of game theory for security have not modeled the diffusion of crime that comes about due to criminals seeking opportunities; instead the focus has been on highly strategic adversaries that plan attacks in advance. To overcome this limitation of previous work, this paper provides the following key contributions. First, we provide a model of crime diffusion based on a quantal biased random movement (QBRM) of criminals opportunistically and repeatedly seeking targets. Within this model, criminals react to real-time information, rather than strategically planning their attack in advance. Second, we provide a game-theoretic approach to generate randomized patrol policies for controlling such diffusion. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) [SHORT PAPER] %G eng %0 Conference Paper %B European Conference on Artificial Intelligence (ECAI) %D 2014 %T Unleashing Dec-MDPs in Security Games: Enabling Effective Defender Teamwork %A Shieh, Eric %A Xin Jiang, Albert %A Amulya Yadav %A Varakantham, Pradeep %A Tambe, Milind %X Multiagent teamwork and defender-attacker security games are two areas that are currently receiving significant attention within multiagent systems research. Unfortunately, despite the need for effective teamwork among multiple defenders, little has been done to harness the teamwork research in security games. This paper is the first to remedy this situation by integrating the powerful teamwork mechanisms offered by Dec-MDPs into security games. We offer the following novel contributions in this paper: (i) New models of security games where a defender team’s pure strategy is defined as a DecMDP policy for addressing coordination under uncertainty; (ii) New algorithms based on column generation that enable efficient generation of mixed strategies given this new model; (iii) Handling global events during defender execution for effective teamwork; (iv) Exploration of the robustness of randomized pure strategies. The paper opens the door to a potentially new area combining computational game theory and multiagent teamwork. %B European Conference on Artificial Intelligence (ECAI) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2014 %T Adaptive Resource Allocation for Wildlife Protection against Illegal Poachers %A Yang, Rong %A Ford, Benjamin %A Tambe, Milind %A Lemieux, Andrew %X Illegal poaching is an international problem that leads to the extinction of species and the destruction of ecosystems. As evidenced by dangerously dwindling populations of endangered species, existing anti-poaching mechanisms are insufficient. This paper introduces the Protection Assistant for Wildlife Security (PAWS) application - a joint deployment effort done with researchers at Uganda’s Queen Elizabeth National Park (QENP) with the goal of improving wildlife ranger patrols. While previous works have deployed applications with a game-theoretic approach (specifically Stackelberg Games) for counter-terrorism, wildlife crime is an important domain that promotes a wide range of new deployments. Additionally, this domain presents new research challenges and opportunities related to learning behavioral models from collected poaching data. In addressing these challenges, our first contribution is a behavioral model extension that captures the heterogeneity of poachers’ decision making processes. Second, we provide a novel framework, PAWS-Learn, that incrementally improves the behavioral model of the poacher population with more data. Third, we develop a new algorithm, PAWS-Adapt, that adaptively improves the resource allocation strategy against the learned model of poachers. Fourth, we demonstrate PAWS’s potential effectiveness when applied to patrols in QENP, where PAWS will be deployed. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2014 %T Stop the Compartmentalization: Unified Robust Algorithms for Handling Uncertainties in Security Games %A Nguyen, Thanh %A Jiang, Albert %A Tambe, Milind %X Given the real-world applications of Stackelberg security games (SSGs), addressing uncertainties in these games is a major challenge. Unfortunately, we lack any unified computational framework for handling uncertainties in SSGs. Current state-of-the-art has provided only compartmentalized robust algorithms that handle uncertainty exclusively either in the defender’s strategy or in adversary’s payoff or in the adversary’s rationality, leading to potential failures in real-world scenarios where a defender often faces multiple types of uncertainties. Furthermore, insights for improving performance are not leveraged across the compartments, leading to significant losses in quality or efficiency. In this paper, we provide the following main contributions: 1) we present the first unified framework for handling the uncertainties explored in SSGs; 2) based on this unified framework, we propose the first set of “unified” robust algorithms to address combinations of these uncertainties; 3) we introduce approximate scalable robust algorithms for handling these uncertainties that leverage insights across compartments; 4) we present experiments demonstrating solution quality and runtime advantages of our algorithms. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Thesis %D 2013 %T The Power of Flexibility: Autonomous Agents That Conserve Energy in Commercial Buildings %A Kwak, Jun-young %X Agent-based systems for energy conservation are now a growing area of research in multiagent systems, with applications ranging from energy management and control on the smart grid, to energy conservation in residential buildings, to energy generation and dynamic negotiations in distributed rural communities. Contributing to this area, my thesis presents new agent-based models and algorithms aiming to conserve energy in commercial buildings. More specifically, my thesis provides three sets of algorithmic contributions. First, I provide online predictive scheduling algorithms to handle massive numbers of meeting/event scheduling requests considering flexibility, which is a novel concept for capturing generic user constraints while optimizing the desired objective. Second, I present a novel BM-MDP (Bounded-parameter Multi-objective Markov Decision Problem) model and robust algorithms for multi-objective optimization under uncertainty both at the planning and execution time. The BM-MDP model and its robust algorithms are useful in (re)scheduling events to achieve energy efficiency in the presence of uncertainty over user’s preferences. Third, when multiple users contribute to energy savings, fair division of credit for such savings to incentivize users for their energy saving activities arises as an important question. I appeal to cooperative game theory and specifically to the concept of Shapley value for this fair division. Unfortunately, scaling up this Shapley value computation is a major hindrance in practice. Therefore, I present novel approximation algorithms to efficiently compute the Shapley value based on sampling and partitions and to speed up the characteristic function computation. These new models have not only advanced the state of the art in multiagent algorithms, but have actually been successfully integrated within agents dedicated to energy efficiency: SAVES, TESLA and THINC. SAVES focuses on the day-to-day energy consumption of individuals and groups in commercial buildings by reactively suggesting energy conserving alternatives. TESLA takes a long-range planning perspective and optimizes overall energy consumption of a large number of group events or meetings together. THINC provides an end-to-end integration within a single agent of energy efficient scheduling, rescheduling and credit allocation. While SAVES, TESLA and THINC thus differ in their scope and applicability, they demonstrate the utility of agent-based systems in actually reducing energy consumption in commercial buildings. I evaluate my algorithms and agents using extensive analysis on data from over 110,000 real meetings/events at multiple educational buildings including the main libraries at the University of Southern California. I also provide results on simulations and real-world experiments, clearly demonstrating the power of agent technology to assist human users in saving energy in commercial buildings. %G eng %9 PhD thesis %0 Thesis %D 2013 %T Addressing Uncertainty in Stackelberg Games for Security: Models and Algorithms %A Yin, Zhengyu %X Recently, there has been significant research interest in using game-theoretic approaches to allocate limited security resources to protect physical infrastructure including ports, airports, transit systems, and other critical national infrastructure as well as natural resources such as forests, tigers, fish, and so on. Indeed, the leader-follower Stackelberg game model is at the heart of many deployed applications. In these applications, the game model provides a randomized strategy for the leader (security forces), under the assumption that the adversary will conduct surveillance before launching an attack. Inevitably, the security forces are faced with the problem of uncertainty. For example, a security officer may be forced to execute a different patrol strategy from the planned one due to unexpected events. Also, there may be significant uncertainty regarding the amount of surveillance conducted by an adversary. While Bayesian Stackelberg games for modeling discrete uncertainty have been successfully used in deployed applications, they are NP-hard problems and existing methods perform poorly in scaling up the number of types: inadequate for complex real world problems. Furthermore, Bayesian Stackelberg games have not been applied to model execution and observation uncertainty and finally, they require the availability of full distributional information of the uncertainty. To overcome these difficulties, my thesis presents four major contributions. First, I provide a novel algorithm Hunter for Bayesian Stackelberg games to scale up the number of types. Exploiting the efficiency of Hunter, I show preference, execution and observation uncertainty can be addressed in a unified framework. Second, to address execution and observation uncertainty (where distribution may be difficult to estimate), I provide a robust optimization formulation to compute the optimal risk-averse leader strategy in Stackelberg games. Third, addressing the uncertainty of the adversary’s capability of conducting surveillance, I show that for a class of Stackelberg games motivated by real security applications, the leader is always best-responding with a Stackelberg equilibrium strategy regardless of whether the adversary conducts surveillance or not. As the final contribution, I provide TRUSTS, a novel game-theoretic formulation for scheduling randomized patrols in public transit domains where timing is a crucial component. TRUSTS addresses dynamic execution uncertainty in such spatiotemporal domains by integrating Markov Decision Processes into the game-theoretic model. Simulation results as well as real-world trials of TRUSTS in the Los Angeles Metro Rail system provide validations of my approach. %G eng %9 PhD thesis %0 Conference Paper %B Conference on Artificial Intelligence (AAAI) %D 2013 %T Analyzing the Effectiveness of Adversary Modeling in Security Games %A Thanh H. Nguyen %A Yang, Rong %A Azaria, Amos %A Kraus, Sarit %A Tambe, Milind %X Recent deployments of Stackelberg security games (SSG) have led to two competing approaches to handle boundedly rational human adversaries: (1) integrating models of human (adversary) decision-making into the game-theoretic algorithms, and (2) applying robust optimization techniques that avoid adversary modeling. A recent algorithm (MATCH) based on the second approach was shown to outperform the leading modeling-based algorithm even in the presence of significant amount of data. Is there then any value in using human behavior models in solving SSGs? Through extensive experiments with 547 human subjects playing 11102 games in total, we emphatically answer the question in the affirmative, while providing the following key contributions: (i) we show that our algorithm, SU-BRQR, based on a novel integration of human behavior model with the subjective utility function, significantly outperforms both MATCH and its improvements; (ii) we are the first to present experimental results with security intelligence experts, and find that even though the experts are more rational than the Amazon Turk workers, SU-BRQR still outperforms an approach assuming perfect rationality (and to a more limited extent MATCH); (iii) we show the advantage of SU-BRQR in a new, large game setting and demonstrate that sufficient data enables it to improve its performance over MATCH. %B Conference on Artificial Intelligence (AAAI) %G eng %0 Conference Paper %B MAIN Workshop at AAMAS 2013 %D 2013 %T Bayesian Security Games for Controlling Contagion %A Tsai, Jason %A Qian, Yundi %A Vorobeychik, Yevgeniy %A Kiekintveld, Christopher %A Milnd Tambe %X Influence blocking games have been used to model adversarial domains with a social component, such as counterinsurgency. In these games, a mitigator attempts to minimize the efforts of an influencer to spread his agenda across a social network which is modeled as a graph. Previous work has assumed that the influence graph structure is known with certainty by both players. However, in reality, there is often significant information asymmetry between the mitigator and the influencer. We introduce a model of this information asymmetry as a two-player zero-sum Bayesian game. Nearly all past work in influence maximization and social network analysis suggests that graph structure is fundamental in strategy generation, leading to an expectation that solving the Bayesian game exactly would be vastly superior to any technique that does not account for uncertainty about the network structure. Surprisingly, we show through extensive experimentation on synthetic and real-world social networks that many common forms of uncertainty can be addressed near-optimally by ignoring the vast majority of it and simply solving an abstracted game with a few randomly chosen types. This suggests that optimal strategies of games that do not model the full range of uncertainty in influence blocking games are in many cases robust to uncertainty about the structure of the influence graph. %B MAIN Workshop at AAMAS 2013 %7 2 %V 27 %P 200-217 %G eng %0 Conference Paper %B ASE/IEEE International Conference on Social Computing(SocialCom) %D 2013 %T Bayesian security games for controlling contagion %A J. Tsai %A Y. Qian %A Y. Vorobeychik %A C. Kiekintveld %A M. Tambe %X Influence blocking games have been used to model adversarial domains with a social component, such as counterinsurgency. In these games, a mitigator attempts to minimize the efforts of an influencer to spread his agenda across a social network. Previous work has assumed that the influence graph structure is known with certainty by both players. However, in reality, there is often significant information asymmetry between the mitigator and the influencer. We introduce a model of this information asymmetry as a two-player zero-sum Bayesian game. Nearly all past work in influence maximization and social network analysis suggests that graph structure is fundamental in strategy generation, leading to an expectation that solving the Bayesian game exactly is crucial. Surprisingly, we show through extensive experimentation on synthetic and real-world social networks that many common forms of uncertainty can be addressed nearoptimally by ignoring the vast majority of it and simply solving an abstracted game with a few randomly chosen types. This suggests that optimal strategies of games that do not model the full range of uncertainty in influence blocking games are typically robust to uncertainty about the influence graph structure. %B ASE/IEEE International Conference on Social Computing(SocialCom) %G eng %0 Journal Article %J ASE Human Journal %D 2013 %T Bayesian Security Games for Controlling Contagion (Extended version) %A J. Tsai %A Y. Qian %A M. Tambe %A Y. Vorobeychik %A C. Kiekintveld %X Influence blocking games have been used to model adversarial domains with a social component, such as counterinsurgency. In these games, a mitigator attempts to minimize the efforts of an influencer to spread his agenda across a social network. Previous work has assumed that the influence graph structure is known with certainty by both players. However, in reality, there is often significant information asymmetry between the mitigator and the influencer. We introduce a model of this information asymmetry as a two-player zero-sum Bayesian game. Nearly all past work in influence maximization and social network analysis suggests that graph structure is fundamental in strategy generation, leading to an expectation that solving the Bayesian game exactly is crucial. Surprisingly, we show through extensive experimentation on synthetic and real-world social networks that many common forms of uncertainty can be addressed near-optimally by ignoring the vast majority of it and simply solving an abstracted game with a few randomly chosen types. This suggests that optimal strategies of games that do not model the full range of uncertainty in influence blocking games are typically robust to uncertainty about the influence graph structure. %B ASE Human Journal %V 2 %P 168-181 %G eng %0 Conference Paper %B International Association for Computing and Philosophy (IACAP) 2013 %D 2013 %T Computational Models of Moral Perception, Conflict & Elevation %A Dehghani M. %A Immordino-Yang M. H. %A Graham J. %A Marsella S. %A Forbus K. %A Ginges J. %A Tambe M. %A Maheswaran R. %X Computational models of moral cognition will be critical to the creation of agents and robots that operate autonomously in morally sensitive and complex domains. We propose a framework for developing computational models of moral cognition based on behavioral and neurobiological experimental results and field observations. Specifically, we discuss the following critical issues in building such models: 1. Managing conflicts between different moral concerns; 2. The role of moral perceptions in moral judgments; 3. Mechanisms and consequences of moral emotions; 4. Learning and adjusting moral behavior. Moreover, we discuss computational architectures for building and exploring models of moral cognition at different levels of analysis: individual, small groups and large groups. %B International Association for Computing and Philosophy (IACAP) 2013 %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2013 %T Defender (Mis)coordination in Security Games %A Xin Jiang, Albert %A Ariel D. Procaccia %A Qian, Yundi %A Nisarg Shah %A Tambe, Milind %X We study security games with multiple defenders. To achieve maximum security, defenders must perfectly synchronize their randomized allocations of resources. However, in real-life scenarios (such as protection of the port of Boston) this is not the case. Our goal is to quantify the loss incurred by miscoordination between defenders, both theoretically and empirically. We introduce two notions that capture this loss under different assumptions: the price of miscoordination, and the price of sequential commitment. Generally speaking, our theoretical bounds indicate that the loss may be extremely high in the worst case, while our simulations establish a smaller yet significant loss in practice. %B International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Journal Article %J In Interfaces %D 2013 %T A Deployed Quantal Response Based Patrol Planning System for the US Coast Guard %A B. An %A E. Shieh %A R. Yang %A M. Tambe %A C. Baldwin %A J. DiRenzo %A B. Maule %A G. Meyer %X In this paper we describe the model, theory developed and deployment of PROTECT, a game-theoretic system in use by the United States Coast Guard (USCG) in the Port of Boston for scheduling patrols. The USCG evaluated the deployment of PROTECT in the Port of Boston as a success and is currently evaluating the system in the Port of New York, with the potential for nationwide deployment. The PROTECT system is premised on an attacker-defender Stackelberg game model but its development and implementation required both theoretical contributions and detailed evaluations. In this paper we describe the work required in the deployment which we group into five key innovations. First, we propose a compact representation of the defender’s strategy space, by exploiting equivalence and dominance, that makes PROTECT efficient enough to solve real world sized problems. Second, this system does not assume that adversaries are perfectly rational, a regular assumption in previous game theoretic models for security. Instead, PROTECT relies on a quantal response (QR) model of the adversary’s behavior — to the best of our knowledge, this is the first real-world deployment of a QR model. Third, we develop specialized solution algorithms that are able to solve this problem for real-world instances and give theoretical guarantees. Fourth, our experimental results illustrate that PROTECT’s QR model handles real-world uncertainties more robustly than a perfect rationality model. Finally, this paper presents real-world evaluation of PROTECT by: (i) a comparison of human-generated vs PROTECT security schedules, and (ii) results from an Adversarial Perspective Team’s (human mock attackers) analysis. %B In Interfaces %V 43 %P 400-420 %G eng %N 5 %0 Conference Paper %B International Workshop on Optimisation in Multi-Agent Systems (OPTMAS) %D 2013 %T Designing Optimal Patrol Strategy for Protecting Moving Targets with Multiple Mobile Resources %A Fang, Fei %A Xin Jiang, Albert %A Tambe, Milind %X Previous work on Stackelberg Security Games for scheduling security resources has mostly assumed that the targets are stationary relative to the defender and the attacker, leading to discrete game models with finite numbers of pure strategies. This paper in contrast focuses on protecting mobile targets that lead to a continuous set of strategies for the players. The problem is motivated by several real-world domains including protecting ferries with escorts and protecting refugee supply lines. Our contributions include: (i) a new game model for multiple mobile defender resources and moving targets with a discretized strategy space for the defender and a continuous strategy space for the attacker; (ii) an efficient linear-program-based solution that uses a compact representation for the defender’s mixed strategy, while accurately modeling the attacker’s continuous strategy using a novel sub-interval analysis method; (iii) a heuristic method of equilibrium refinement for improved robustness and (iv) detailed experimental analysis in the ferry protection domain. %B International Workshop on Optimisation in Multi-Agent Systems (OPTMAS) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) [Demonstrations Track] %D 2013 %T Diversity Beats Strength? - A Hands-on Experience with 9x9 Go (Demonstration) %A L. S. Marcolino %A D. Chen %A A.X. Jiang %A M. Tambe %X Team formation is a critical step in deploying a multi-agent team. In some scenarios, agents coordinate by voting continuously. When forming such teams, should we focus on the diversity of the team or on the strength of each member? Can a team of diverse (and weak) agents outperform a uniform team of strong agents? In this demo, the user will be able to explore these questions by playing one of the most challenging board games: Go. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) [Demonstrations Track] %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2013 %T Efficiently Solving Joint Activity Based Security Games %A Shieh, Eric %A Jain, Manish %A Xin Jiang, Albert %A Tambe, Milind %X Despite recent successful real-world deployments of Stackelberg Security Games (SSGs), scale-up remains a fundamental challenge in this field. The latest techniques do not scale-up to domains where multiple defenders must coordinate time-dependent joint activities. To address this challenge, this paper presents two branch-and-price algorithms for solving SSGs, SMARTO and SMARTH, with three novel features: (i) a column-generation approach that uses an ordered network of nodes (determined by solving the traveling salesman problem) to generate individual defender strategies; (ii) exploitation of iterative reward shaping of multiple coordinating defender units to generate coordinated strategies; (iii) generation of tighter upper-bounds for pruning by solving security games that only abide by key scheduling constraints. We provide extensive experimental results and formal analyses. %B International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Conference Paper %B Workshop on Optimization in Multiagent Systems (OPTMAS) at AAMAS %D 2013 %T Efficiently Solving Time-Dependent Joint Activities in Security Games %A Shieh, Eric %A Jain, Manish %A Xin Jiang, Albert %A Tambe, Milind %X Despite recent successful real-world deployments of Stackelberg Security Games (SSGs), scale-up remains a fundamental challenge in this field. The latest techniques do not scale-up to domains where multiple defenders must coordinate time-dependent joint activities. To address this challenge, this paper presents two branch-and-price algorithms for solving SSGs, SMARTO and SMARTH, with three novel features: (i) a column-generation approach that uses an ordered network of nodes (determined by solving the traveling salesman problem) to generate individual defender strategies; (ii) exploitation of iterative reward shaping of multiple coordinating defender units to generate coordinated strategies; (iii) generation of tighter upper-bounds for pruning by solving security games that only abide by key scheduling constraints. We provide extensive experimental results and formal analyses. %B Workshop on Optimization in Multiagent Systems (OPTMAS) at AAMAS %G eng %0 Journal Article %J Journal of Autonomous Agents and Multiagent Systems, JAAMAS %D 2013 %T Empirical Evaluation of Computational Fear Contagion Models in Crowd Dispersions %X t In social psychology, emotional contagion describes the widely observed phenomenon of one person’s emotions being influenced by surrounding people’s emotions. While the overall effect is agreed upon, the underlying mechanism of the spread of emotions has seen little quantification and application to computational agents despite extensive evidence of its impacts in everyday life. In this paper, we examine computational models of emotional contagion by implementing two models ((Bosse et al, 2009b) and (Durupinar, 2010)) that draw from two separate lines of contagion research: thermodynamics-based and epidemiologicalbased. We first perform sensitivity tests on each model in an evacuation simulation, ESCAPES, showing both models to be reasonably robust to parameter variations with certain exceptions. We then compare their ability to reproduce a real crowd panic scene in simulation, showing that the thermodynamics-style model (Bosse et al, 2009b) produces superior results due to the ill-suited contagion mechanism at the core of epidemiological models. We also identify that a graduated effect of fear and proximity-based contagion effects are key to producing the superior results. We then reproduce the methodology on a second video, showing that the same results hold, implying generality of the conclusions reached in the first scene. %B Journal of Autonomous Agents and Multiagent Systems, JAAMAS %V 27 %P 200-217 %G eng %N 2 %0 Conference Paper %B Conference of the Spanish Association for Artificial Intelligence (CAEPIA), 2013. %D 2013 %T Engineering the decentralized coordination of UAVs with limited communication range %A Pujol-Gonzalez %A J. Cerquides %A P. Meseguer %A J. A. Rodriguez Aguilar %A M. Tambe %X This paper tackles the problem of allowing a team of UAVs with limited communication range to autonomously coordinate to service requests. We present two MRF-based solutions: one assumes independence between requests; and the other considers also the UAVs’ workloads. Empirical evaluation shows that the latter performs almost as well as state-of-the-art centralized techniques in realistic scenarios. %B Conference of the Spanish Association for Artificial Intelligence (CAEPIA), 2013. %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS)[Demonstrations Track] %D 2013 %T Game-theoretic Patrol Strategies for Transit Systems: the TRUSTS System and its Mobile App (Demonstration) %A Luber, Samantha %A Yin, Zhengyu %A Delle Fave, Francesco %A Xin Jiang, Albert %A Tambe, Milind %A John P. Sullivan %X Fare evasion costs proof-of-payment transit systems significant losses in revenue. In 2007 alone, the Los Angeles Metro system, using proof-of-payment, suffered an estimated revenue loss of $5.6 million due to fare evasion [2]. In addition, resource limitations prevent officers from verifying all passengers. Thus, such officers periodically inspect a subset of the passengers based on a patrol strategy. Effective patrol strategies are then needed to deter fare evasion and maximize revenue in transit systems. In addition, since potential fare evaders can exploit knowledge about the patrol strategy to avoid inspection, an unpredictable patrol strategy is needed for effectiveness. Furthermore, due to transit system complexity, human schedulers cannot manually produce randomized patrol strategies, while taking into account all of the system’s scheduling constraints [3]. In previous work on computing game-theoretic patrol strategies, Bayesian Stackelberg games have been successfully used to model the patrolling problem. In this model, the security officer commits to a patrol strategy and the fare evaders observe this patrol strategy and select a counter strategy accordingly [4]. This approach has also been successfully deployed in real-world applications, including by the L.A. International Airport police, the U.S. Coast Guard at the Port of Boston, and the Federal Air Marshal Service [5]. However, this approach cannot be used within our setting due to the increased complexity of having more potential followers and scheduling constraints [6]. In addition, transit systems face the challenge of execution uncertainty, in which unexpected events cause patrol officers to fall off schedule and exist in unknown states in the model [1]. Addressing the increased complexity challenge, TRUSTS (Tactical Randomizations for Urban Security in Transit Systems) reduces the temporal and spatial scheduling constraints imposed by the transit system into a single transition graph, a compact representation of all possible movement throughout the transit system as flows from each station node [1]. In addition, TRUSTS remedies the execution uncertainty challenge by modeling the execution of patrol units as Markov Decision Processes (MDPs) [1]. In simulation and trial testing, the TRUSTS approach has generated effective patrol strategies for L.A. Metro System [1, 6]. In order to implement the TRUSTS approach in real-world transit systems, the METRO mobile app presented in this paper is being developed to work with TRUSTS to (i) provide officers with realtime TRUSTS-generated patrol schedules, (ii) provide recovery from schedule interruptions, and (iii) collect patrol data. An innovation in transit system patrol scheduling technology, the app works as an online agent that provides officers with the best set of patrol actions for maximizing fare evasion deterrence based on the current time and officer location. In this paper, we propose a demonstration of the TRUSTS system, composed of the TRUSTS and METRO app components, which showcases how the system works with emphasis on the mobile app for user interaction. To establish sufficient background context for the demonstration, this paper also presents a brief overview of the TRUSTS system, including the TRUSTS approach to patrol strategy generation in Section 2.1 and discussion of the METRO app’s features and user interface design in Section 2.2, and the expected benefits from deployment in the L.A. Metro System. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS)[Demonstrations Track] %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2013 %T Game-theoretic Randomization for Security Patrolling with Dynamic Execution Uncertainty %A Xin Jiang, Albert %A Yin, Zhengyu %A Zhang, Chao %A Tambe, Milind %A Kraus, Sarit %X In recent years there has been extensive research on game-theoretic models for infrastructure security. In time-critical domains where the security agency needs to execute complex patrols, execution uncertainty (interruptions) affect the patroller’s ability to carry out their planned schedules later. Indeed, experiments in this paper show that in some real-world domains, small fractions of execution uncertainty can have a dramatic impact. The contributions of this paper are threefold. First, we present a general Bayesian Stackelberg game model for security patrolling in dynamic uncertain domains, in which the uncertainty in the execution of patrols is represented using Markov Decision Processes. Second, we study the problem of computing Stackelberg equilibrium for this game. We show that when the utility functions have a certain separable structure, the defender’s strategy space can be compactly represented, and we can reduce the problem to a polynomial-sized optimization problem. Finally, we apply our approach to fare inspection in the Los Angeles Metro Rail system. Numerical experiments show that patrol schedules generated using our approach outperform schedules generated using a previous algorithm that does not consider execution uncertainty. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Journal Article %J Artificial Intelligence Journal (AIJ) %D 2013 %T Improving Resource Allocation Strategies Against Human Adversaries in Security Games: An Extended Study %A R. Yang %A C. Kiekintvled %A F. Ordonez %A M. Tambe %A R. John %X Stackelberg games have garnered significant attention in recent years given their deployment for real world security. Most of these systems, such as ARMOR, IRIS and GUARDS have adopted the standard game-theoretical assumption that adversaries are perfectly rational, which is standard in the game theory literature. This assumption may not hold in real-world security problems due to the bounded rationality of human adversaries, which could potentially reduce the effectiveness of these systems. In this paper, we focus on relaxing the unrealistic assumption of perfectly rational adversary in Stackelberg security games. In particular, we present new mathematical models of human adversaries’ behavior, based on using two fundamental theory/method in human decision making: Prospect Theory (PT) and stochastic discrete choice model. We also provide methods for tuning the parameters of these new models. Additionally, we propose a modification of the standard quantal response based model inspired by rankdependent expected utility theory. We then develop efficient algorithms to compute the best response of the security forces when playing against the different models of adversaries. In order to evaluate the effectiveness of the new models, we conduct comprehensive experiments with human subjects using a web-based game, comparing them with models previously proposed in the literature to address the perfect rationality assumption on part of the adversary. Our experimental results show that the subjects’ responses follow the assumptions of our new models more closely than the previous perfect rationality assumption. We also show that the defender strategy produced by our new stochastic discrete choice model outperform the previous leading contender for relaxing the assumption of perfect rationality.Furthermore, in a separate set of experiments, we show the benefits of our modified stochastic model (QRRU) over the standard model (QR). %B Artificial Intelligence Journal (AIJ) %P 440-469 %G eng %N 195 %0 Journal Article %J Ad-Hoc Networks Journal %D 2013 %T Mitigating Multi-path Fading in a Mobile Mesh Network %X By using robots as routers, a team of networked robots can provide a communication substrate to establish a wireless mesh network. The mobile mesh network can autonomously optimize its configuration, increasing performance. One of the main sources of radio signal fading in such a network is multi-path propagation, which can be mitigated by moving the senders or the receivers on the distance of the order of a wavelength. In this paper, we measure the performance gain when robots are allowed to make such small movements and find that it may be as much as 270%. Our main contribution is the design of a system that allows robots to cooperate and improve the real-world network throughput via a practical solution. We model the problem of which robots to move as a distributed constraint optimization problem (DCOP). Our study includes four local metrics to estimate global throughput. %B Ad-Hoc Networks Journal %V 11 %P 1510-1521 %G eng %N 4 %0 Conference Paper %B SNSC 2013: The AAAI Fall Symposium 2013 on Social Networks and Social Contagion %D 2013 %T Modeling Crime diffusion and crime suppression on transportation networks: An initial report %A Zhang, Chao %A Xin Jiang, Albert %A Martin B. Short %A P. Jeffrey Brantingham %A Tambe, Milind %X In urban transportation networks, crime diffuses as criminals travel through the networks and look for illicit opportunities. It is important to first model this diffusion in order to recommend actions or patrol policies to control the diffusion of such crime. Previously, game theory has been used for such patrol policy recommendations, but these applications of game theory for security have not modeled the diffusion of crime that comes about due to criminals seeking opportunities; instead the focus has been on highly strategic adversaries that plan attacks in advance. To overcome this limitation of previous work, this paper provides the following key contributions. First, we provide a model of crime diffusion based on a quantal biased random movement (QBRM) of criminals opportunistically and repeatedly seeking targets. Within this model, criminals react to real-time information, rather than strategically planning their attack in advance. Second, we provide a gametheoretic approach to generate randomized patrol policies for controlling such diffusion. %B SNSC 2013: The AAAI Fall Symposium 2013 on Social Networks and Social Contagion %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) [SHORT PAPER] %D 2013 %T Modeling Human Adversary Decision Making in Security Games: An Initial Report (Extended Abstract) %A Thanh H. Nguyen %A Azaria, Amos %A Pita, James %A Maheswaran, Rajiv %A Kraus, Sarit %A Tambe, Milind %X Motivated by recent deployments of Stackelberg security games (SSGs), two competing approaches have emerged which either integrate models of human decision making into game-theoretic algorithms or apply robust optimization techniques that avoid adversary modeling. Recently, a robust technique (MATCH) has been shown to significantly outperform the leading modeling-based algorithms (e.g., Quantal Response (QR)) even in the presence of significant amounts of subject data. As a result, the effectiveness of using human behaviors in solving SSGs remains in question. We study this question in this paper. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) [SHORT PAPER] %G eng %0 Conference Paper %B Conference on Decision and Game Theory for Security (GameSec) %D 2013 %T Monotonic Maximin: A Robust Stackelberg Solution Against Boundedly Rational Followers %A Xin Jiang, Albert %A Thanh H. Nguyen %A Tambe, Milind %A Ariel D. Procaccia %X There has been recent interest in applying Stackelberg games to infrastructure security, in which a defender must protect targets from attack by an adaptive adversary. In real-world security settings the adversaries are humans and are thus boundedly rational. Most existing approaches for computing defender strategies against boundedly rational adversaries try to optimize against specific behavioral models of adversaries, and provide no quality guarantee when the estimated model is inaccurate. We propose a new solution concept, monotonic maximin, which provides guarantees against all adversary behavior models satisfying monotonicity, including all in the family of Regular Quantal Response functions. We propose a mixed-integer linear program formulation for computing monotonic maximin. We also consider top-monotonic maximin, a related solution concept that is more conservative, and propose a polynomial-time algorithm for top-monotonic maximin. %B Conference on Decision and Game Theory for Security (GameSec) %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2013 %T Multi-agent Team Formation: Diversity Beats Strength? %A Marcolino, Leandro Soriano %A Xin Jiang, Albert %A Tambe, Milind %X Team formation is a critical step in deploying a multi-agent team. In some scenarios, agents coordinate by voting continuously. When forming such teams, should we focus on the diversity of the team or on the strength of each member? Can a team of diverse (and weak) agents outperform a uniform team of strong agents? We propose a new model to address these questions. Our key contributions include: (i) we show that a diverse team can overcome a uniform team and we give the necessary conditions for it to happen; (ii) we present optimal voting rules for a diverse team; (iii) we perform synthetic experiments that demonstrate that both diversity and strength contribute to the performance of a team; (iv) we show experiments that demonstrate the usefulness of our model in one of the most difficult challenges for Artificial Intelligence: Computer Go. %B International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2013 %T Optimal Patrol Strategy for Protecting Moving Targets with Multiple Mobile Resources %A Fang, Fei %A Xin Jiang, Albert %A Tambe, Milind %X Previous work on Stackelberg Security Games for scheduling security resources has mostly assumed that the targets are stationary relative to the defender and the attacker, leading to discrete game models with finite numbers of pure strategies. This paper in contrast focuses on protecting mobile targets that lead to a continuous set of strategies for the players. The problem is motivated by several real-world domains including protecting ferries with escorts and protecting refugee supply lines. Our contributions include: (i) a new game model for multiple mobile defender resources and moving targets with a discretized strategy space for the defender and a continuous strategy space for the attacker; (ii) an efficient linearprogram-based solution that uses a compact representation for the defender’s mixed strategy, while accurately modeling the attacker’s continuous strategy using a novel sub-interval analysis method; (iii) a heuristic method of equilibrium refinement for improved robustness and (iv) detailed experimental analysis in the ferry protection domain. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B ACM SigEcom Exchanges %D 2013 %T Planning and Learning in Security Games %A Delle Fave, Francesco Maria %A Qian, Yundi %A Xin Jiang, Albert %A Brown, Matthew %A Tambe, Milind %X We present two new critical domains where security games are applied to generate randomized patrol schedules. For each setting, we present the current research that we have produced. We then propose two new challenges to build accurate schedules that can be deployed effectively in the real world. The first is a planning challenge. Current schedules cannot handle interruptions. Thus, more expressive models, that allow for reasoning over stochastic actions, are needed. The second is a learning challenge. In several security domains, data can be used to extract information about both the environment and the attacker. This information can then be used to improve the defender’s strategies. %B ACM SigEcom Exchanges %7 3 %V 11 %G eng %0 Conference Paper %B International Symposium on Automation and Robotics in Construction (ISARC) %D 2013 %T Predicting HVAC Energy Consumption in Commercial Buildings using Multiagent Systems %A Li, Nan %A Kwak, Jun-young %A Becerik-Gerber, Burcin %A Tambe, Milind %X Energy consumption in commercial buildings has been increasing rapidly in the past decade. The knowledge of future energy consumption can bring significant value to commercial building energy management. For example, prediction of energy consumption decomposition helps analyze the energy consumption patterns and efficiencies as well as waste, and identify the prime targets for energy conservation. Moreover, prediction of temporal energy consumption enables building managers to plan out the energy usage over time, shift energy usage to off-peak periods, and make more effective energy purchase plans. This paper proposes a novel model for predicting heating, ventilation and air conditioning (HVAC) energy consumption in commercial buildings. The model simulates energy behaviors of HVAC systems in commercial buildings, and interacts with a multiagent systems (MAS) based framework for energy consumption prediction. Prediction is done on a daily, weekly and monthly basis. Ground truth energy consumption data is collected from a test bed office building over 267 consecutive days, and is compared to predicted energy consumption for the same period. Results show that the prediction can match 92.6 to 98.2% of total HVAC energy consumption with coefficient of variation of the root mean square error (CV-RMSE) values of 7.8 to 22.2%. Ventilation energy consumption can be predicted at high accuracies (over 99%) and low variations (CV-RMSE values of 3.1 to 16.3%), while cooling energy consumption accounts for majority of inaccuracies and variations in total energy consumption prediction. %B International Symposium on Automation and Robotics in Construction (ISARC) %G eng %0 Journal Article %J Journal of Artificial Intelligence Research %D 2013 %T Protecting Moving Targets with Multiple Mobile Resources %A Fang, Fei %A Xin Jiang, Albert %A Tambe, Milind %X In recent years, Stackelberg Security Games have been successfully applied to solve resource allocation and scheduling problems in several security domains. However, previous work has mostly assumed that the targets are stationary relative to the defender and the attacker, leading to discrete game models with finite numbers of pure strategies. This paper in contrast focuses on protecting mobile targets that leads to a continuous set of strategies for the players. The problem is motivated by several real-world domains including protecting ferries with escort boats and protecting refugee supply lines. Our contributions include: (i) A new game model for multiple mobile defender resources and moving targets with a discretized strategy space for the defender and a continuous strategy space for the attacker. (ii) An efficient linear-programming-based solution that uses a compact representation for the defender’s mixed strategy, while accurately modeling the attacker’s continuous strategy using a novel sub-interval analysis method. (iii) Discussion and analysis of multiple heuristic methods for equilibrium refinement to improve robustness of defender’s mixed strategy. (iv) Discussion of approaches to sample actual defender schedules from the defender’s mixed strategy. (iv) Detailed experimental analysis of our algorithms in the ferry protection domain. %B Journal of Artificial Intelligence Research %V 48 %P 583-634 %G eng %0 Thesis %D 2013 %T Protecting Networks Against Diffusive Attacks: Game-Theoretic Resource Allocation for Contagion Mitigation %A Tsai, Jason %X Many real-world situations involve attempts to spread influence through a social network. For example, viral marketing is when a marketer selects a few people to receive some initial advertisement in the hopes that these ‘seeds’ will spread the news. Even peacekeeping operations in one area have been shown to have a contagious effect on the neighboring vicinity. Each of these domains also features multiple parties seeking to maximize or mitigate a contagious effect by spreading its own influence among a select few seeds, naturally yielding an adversarial resource allocation problem. My work models the interconnected network of people as a graph and develops algorithms to optimize resource allocation in these networked competitive contagion scenarios. Game-theoretic resource allocation in the past has not considered domains with both a networked structure and contagion effects, rendering them unusable in critical domains such as rumor control, counterinsurgency, and crowd management. Networked domains without contagion effects already present computational challenges due to the large scale of the action space. To address this issue, my first contribution proposed efficient game-theoretic allocation algorithms for the graph-based urban road network domain. This work still provides the only polynomialtime algorithm for allocating vehicle checkpoints through a city, giving law enforcement officers an efficient tool to combat terrorists making their way to potential points of attack. Second, I have provided the first game-theoretic treatment for contagion mitigation in social networks and given practitioners the first principled techniques for such vital concerns as rumor control and counterinsurgency. Finally, I extended my work on game-theoretic contagion mitigation to address uncertainty about the network structure to find that, contrary to what evidence and intuition suggest, heuristic sampling approaches provide near-optimal solutions across a wide range of generative graph models and uncertainty models. Thus, despite extreme practical challenges in attaining accurate social network information, my techniques remain near-optimal across numerous forms of uncertainty in multiple synthetic and real-world graph structures. Beyond optimization of resource allocation, I have further studied contagion effects to understand the effectiveness of such resources. First, I created an evacuation simulation, ESCAPES, to explore the interaction of pedestrian fear contagion and authority fear mitigation during an evacuation. Second, using this simulator, I have advanced the frontier in contagion modeling by developing empirical evaluation methods for comparing and calibrating computational contagion models that are critical in crowd simulations and evacuation modeling. Finally, I have also conducted an examination of agent-human emotional contagion to inform the rising use of simulations for personnel training in emotionally-charged situations. %G eng %0 Conference Paper %B International conference on Automated Software Engineering (ASE) %D 2013 %T Randomizing Regression Tests using Game Theory %A Kukreja, Nupul %A William G. J. Halfond %A Tambe, Milind %X As software evolves, the number of test-cases in the regression test suites continues to increase, requiring testers to prioritize their execution. Usually only a subset of the test cases is executed due to limited testing resources. This subset is often known to the developers who may try to “game” the system by committing insufficiently tested code for parts of the software that will not be tested. In this new ideas paper, we propose a novel approach for randomizing regression test scheduling, based on Stackelberg games for deployment of scarce resources. We apply this approach to randomizing test cases in such a way as to maximize the testers’ expected payoff when executing the test cases. Our approach accounts for resource limitations (e.g., number of testers) and provides a probabilistic distribution for scheduling test cases. We provide an example application of our approach showcasing the idea of using Stackelberg games for randomized regression test scheduling. %B International conference on Automated Software Engineering (ASE) %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2013 %T Scaling-up Security Games with Boundedly Rational Adversaries: A Cutting-plane Approach %A Yang, Rong %A Xin Jiang, Albert %A Tambe, Milind %A Fernando Ordo´nez %X To improve the current real-world deployments of Stackelberg security games (SSGs), it is critical now to efficiently incorporate models of adversary bounded rationality in large-scale SSGs. Unfortunately, previously proposed branch-and-price approaches fail to scale-up given the non-convexity of such models, as we show with a realization called COCOMO. Therefore, we next present a novel cutting-plane algorithm called BLADE to scale-up SSGs with complex adversary models,with three key novelties: (i) an efficient scalable separation oracle to generate deep cuts; (ii) a heuristic that uses gradient to further improve the cuts; (iii) techniques for quality-efficiency tradeoff. %B International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) [SHORT PAPER] %D 2013 %T Security Games with Contagion: Handling Asymmetric Information %A Tsai, Jason %A Qian, Yundi %A Vorobeychik, Yevgeniy %A Kiekintveld, Christopher %A Tambe, Milind %X Counterinsurgency, which is the effort to mitigate support for an opposing organization, is one such domain that has been studied recently and past work has modeled the problem as an influence blocking maximization that features an influencer and a mitigator. While past work has introduced scalable heuristic techniques for generating effective strategies using a double oracle algorithm, it has not addressed the issue of uncertainty and asymmetric information, which is the topic of this paper. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) [SHORT PAPER] %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2013 %T Security Games with Surveillance Cost and Optimal Timing of Attack Execution %A An, Bo %A Brown, Matthew %A Vorobeychik, Yevgeniy %A Tambe, Milind %X Stackelberg games have been used in several deployed applications to allocate limited resources for critical infrastructure protection. These resource allocation strategies are randomized to prevent a strategic attacker from using surveillance to learn and exploit patterns in the allocation. Past work has typically assumed that the attacker has perfect knowledge of the defender’s randomized strategy or can learn the defender’s strategy after conducting a fixed period of surveillance. In consideration of surveillance cost, these assumptions are clearly simplistic since attackers may act with partial knowledge of the defender’s strategies and may dynamically decide whether to attack or conduct more surveillance. In this paper, we propose a natural model of limited surveillance in which the attacker dynamically determine a place to stop surveillance in consideration of his updated belief based on observed actions and surveillance cost. We show an upper bound on the maximum number of observations the attacker can make and show that the attacker’s optimal stopping problem can be formulated as a finite state space MDP. We give mathematical programs to compute optimal attacker and defender strategies. We compare our approaches with the best known previous solutions and experimental results show that the defender can achieve significant improvement in expected utility by taking the attacker’s optimal stopping decision into account, validating the motivation of our work. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2013 %T TESLA: An Energy-saving Agent that Leverages Schedule Flexibility %A Kwak, Jun-young %A Varakantham, Pradeep %A Maheswaran, Rajiv %A Chang, Yu-Han %A Tambe, Milind %A Becerik-Gerber, Burcin %A Wood, Wendy %X This innovative application paper presents TESLA, an agent-based application for optimizing the energy use in commercial buildings. TESLA’s key insight is that adding flexibility to event/meeting schedules can lead to significant energy savings. TESLA provides three key contributions: (i) three online scheduling algorithms that consider flexibility of people’s preferences for energyefficient scheduling of incrementally/dynamically arriving meetings and events; (ii) an algorithm to effectively identify key meetings that lead to significant energy savings by adjusting their flexibility; and (iii) surveys of real users that indicate that TESLA’s assumptions exist in practice. TESLA was evaluated on data of over 110,000 meetings held at nine campus buildings during eight months in 2011–2012 at USC and SMU. These results show that, compared to the current systems, TESLA can substantially reduce overall energy consumption. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Thesis %D 2013 %T Thwarting Adversaries with Unpredictability: Massive-scale Game-Theoretic Algorithms for Real-world Security Deployments %A Jain, Manish %X Protecting critical infrastructure and targets such as airports, transportation networks, power generation facilities as well as critical natural resources and endangered species is an important task for police and security agencies worldwide. Securing such potential targets using limited resources against intelligent adversaries in the presence of the uncertainty and complexities of the real-world is a major challenge. My research uses a game-theoretic framework to model the strategic interaction between a defender (or security forces) and an attacker (or terrorist adversary) in security domains. Game theory provides a sound mathematical approach for deploying limited security resources to maximize their effectiveness. While game theory has always been popular in the arena of security, unfortunately, the state of the art algorithms either fail to scale or to provide a correct solution for large problems with arbitrary scheduling constraints. For example, US carriers fly over 27,000 domestic and 2,000 international flights daily, presenting a massive scheduling challenge for Federal Air Marshal Service (FAMS). My thesis contributes to a very new area that solves game-theoretic problems using insights from large-scale optimization literature towards addressing the computational challenge posed by real-world domains. I have developed new models and algorithms that compute optimal strategies for scheduling defender resources is large real-world domains. My thesis makes the following contributions. First, it presents new algorithms that can solve for trillions of actions for both the defender and the attacker. Second, it presents a hierarchical framework that provides orders of magnitude scale-up in attacker types for Bayesian Stackelberg games. Third, it provides an analysis and detection of a phase-transition that identifies properties that makes security games hard to solve. These new models have not only advanced the state of the art in computational game-theory, but have actually been successfully deployed in the real-world. My work represents a successful transition from game-theoretic advancements to real-world applications that are already in use, and it has opened exciting new avenues to greatly expand the reach of game theory. For instance, my algorithms are used in the IRIS system: IRIS has been in use by the Federal Air Marshals Service (FAMS) to schedule air marshals on board international commercial flights since October 2009. %G eng %9 PhD thesis %0 Conference Paper %B Workshop on Multiagent-based Societal Systems (MASS) at AAMAS %D 2013 %T Why TESLA Works: Innovative Agent-based Application Leveraging Schedule Flexibility for Conserving Energy %A Kwak, Jun-young %A Varakantham, Pradeep %A Maheswaran, Rajiv %A Chang, Yu-Han %A Tambe, Milind %A Becerik-Gerber, Burcin %A Wood, Wendy %X This paper presents TESLA, an agent-based application for optimizing the energy use in commercial buildings. TESLA’s key insight is that adding flexibility to event/meeting schedules can lead to significant energy savings. TESLA provides two key contributions: (i) three online scheduling algorithms that consider flexibility of people’s preferences for energy-efficient scheduling of incrementally/dynamically arriving meetings and events; and (ii) an algorithm to effectively identify key meetings that lead to significant energy savings by adjusting their flexibility. TESLA was evaluated on data of over 110,000 meetings held at nine campus buildings during eight months in 2011–2012 at the University of Southern California (USC) and the Singapore Management University (SMU), and it indicated that TESLA’s assumptions exist in practice. This paper also provides an extensive analysis on energy savings achieved by TESLA. These results and analysis show that, compared to the current systems, TESLA can substantially reduce overall energy consumption. %B Workshop on Multiagent-based Societal Systems (MASS) at AAMAS %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2013 %T Security Scheduling for Real-world Networks %A Jain, Manish %A Vincent Conitzer %A Tambe, Milind %X Network based security games, where a defender strategically places security measures on the edges of a graph to protect against an adversary, who chooses a path through a graph is an important research problem with potential for real-world impact. For example, police forces face the problem of placing checkpoints on roads to inspect vehicular traffic in their day-to-day operations, a security measure the Mumbai police have performed since the terrorist attacks in 2008. Algorithms for solving such network-based security problems have been proposed in the literature, but none of them scale up to solving problems of the size of real-world networks. In this paper, we present SNARES, a novel algorithm that computes optimal solutions for both the defender and the attacker in such network security problems. Based on a double-oracle framework, SNARES makes novel use of two approaches: warm starts and greedy responses. It makes the following contributions: (1) It defines and uses mincut-fanout, a novel method for efficient warm-starting of the computation; (2) It exploits the submodularity property of the defender optimization in a greedy heuristic, which is used to generate “better-responses"; SNARES also uses a better-response computation for the attacker. Furthermore, we evaluate the performance of SNARES in real-world networks illustrating a significant advance: whereas state-of-the-art algorithms could handle just the southern tip of Mumbai, SNARES can compute optimal strategy for the entire urban road network of Mumbai. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B AAAI Spring Symposium on Security, Sustainability and Health %D 2012 %T Adversarial Patrolling Games %A Vorobeychik, Yevgeniy %A An, Bo %A Tambe, Milind %X Defender-Attacker Stackelberg games are the foundations of tools deployed for computing optimal patrolling strategies in adversarial domains such as the United states Federal Air Marshals Service and the United States Coast Guard, among others. In Stackelberg game models of these systems the attacker knows only the probability that each target is covered by the defender, but is oblivious to the detailed timing of the coverage schedule. In many real-world situations, however, the attacker can observe the current location of the defender and can exploit this knowledge to reason about the defender’s future moves. We study Stackelberg security games in which the defender sequentially moves between targets, with moves constrained by an exogenously specified graph, while the attacker can observe the defender’s current location and his (stochastic) policy concerning future moves. We offer five contributions: (1) We model this adversarial patrolling game (APG) as a stochastic game with special structure and present several alternative formulations that leverage the general nonlinear programming (NLP) approach for computing equilibria in zero-sum stochastic games. We show that our formulations yield significantly better solutions than previous approaches. (2) We extend the NLP formulation for APG allow for attacks that may take multiple time steps to unfold. (3) We provide an approximate MILP formulation that uses discrete defender move probabilities. (4) We experimentally demonstrate the efficacy of an NLP-based approach, and systematically study the impact of network topology on the results. (5) We extend our model to allow the defender to construct the graph constraining his moves, at some cost, and offer novel algorithms for this setting, finding that a MILP approximation is much more effective than the exact NLP in this setting. %B AAAI Spring Symposium on Security, Sustainability and Health %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) (Short paper) %D 2012 %T Adversarial Patrolling Games: Extended Abstract %A Vorobeychik, Yevgeniy %A An, Bo %A Tambe, Milind %X Defender-Attacker Stackelberg games are the foundations of tools deployed for computing optimal patrolling strategies in adversarial domains such as the United states Federal Air Marshals Service and the United States Coast Guard, among others. In Stackelberg game models of these systems the attacker knows only the probability that each target is covered by the defender, but is oblivious to the detailed timing of the coverage schedule. In many real-world situations, however, the attacker can observe the current location of the defender and can exploit this knowledge to reason about the defender’s future moves. We study Stackelberg security games in which the defender sequentially moves between targets, with moves constrained by an exogenously specified graph, while the attacker can observe the defender’s current location and his (stochastic) policy concerning future moves. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) (Short paper) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) Demonstration Track %D 2012 %T AgentPolis: Towards a Platform for Fully Agent-based Modeling of Multi-Modal Transportation %A Michal Jakob %A Zbynek Moler %A Antonín Komenday %A Yin, Zhengyu %A Xin Jiang, Albert %A Matthew P. Johnson %A Michal Pechoucek %A Tambe, Milind %X AgentPolis is a fully agent-based platform for modeling multi-modal transportation systems. It comprises a highperformance discrete-event simulation core, a cohesive set of high-level abstractions for building extensible agent-based models and a library of predefined components frequently used in transportation models. Together with a suite of supporting tools, AgentPolis enables rapid prototyping and execution of data-driven simulations of a wide range of mobility and transportation phenomena. We illustrate the capabilities of the platform on a model of fare inspection in public transportation networks. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) Demonstration Track %G eng %0 Conference Paper %B AAAI Fall Symposium %D 2012 %T Analysis of Heuristic Techniques for Controlling Contagion %A Tsai, Jason %A Weller, Nicholas %A Tambe, Milind %X Many strategic actions carry a ‘contagious’ component beyond the immediate locale of the effort itself. Viral marketing and peacekeeping operations have both been observed to have a spreading effect. In this work, we use counterinsurgency as our illustrative domain. Defined as the effort to block the spread of support for an insurgency, such operations lack the manpower to defend the entire population and must focus on the opinions of a subset of local leaders. As past researchers of security resource allocation have done, we propose using game theory to develop such policies and model the interconnected network of leaders as a graph. Unlike this past work in security games, actions in these domains possess a probabilistic, non-local impact. To address this new class of security games, recent research has used novel heuristic oracles in a double oracle formulation to generate mixed strategies. However, these heuristic oracles were evaluated only on runtime and quality scaling with the graph size. Given the complexity of the problem, numerous other problem features and metrics must be considered to better inform practical application of such techniques. Thus, this work provides a thorough experimental analysis including variations of the contagion probability average and standard deviation. We extend the previous analysis to also examine the size of the action set constructed in the algorithms and the final mixed strategies themselves. Our results indicate that game instances featuring smaller graphs and low contagion probabilities converge slowly while games with larger graphs and medium contagion probabilities converge most quickly. %B AAAI Fall Symposium %G eng %0 Conference Paper %B AAAI Spring Symposium on Game Theory for Security, Sustainability and Health %D 2012 %T Challenges in Patrolling to Maximize Pristine Forest Area (Position Paper) %A Matthew P. Johnson %A Fang, Fei %A Yang, Rong %A Tambe, Milind %A Heidi Jo Albers %X Illegal extraction of forest resources is fought, in many developing countries, by patrols through the forest that seek to deter such activity by decreasing its profitability. With limited resources for performing such patrols, a patrol strategy will seek to distribute the patrols throughout the forest, in space and time, in order to minimize the resulting amount of extraction that occurs or maximize the degree of forest protection, according to one of several potential metrics. We pose this problem as a Stackelberg game. We adopt and extend the simple, geometrically elegant model of (Robinson 2010). First, we study optimal allocations of patrol density under generalizations of this model, relaxing several of its assumptions. Second, we pose the problem of generating actual schedules whose site visit frequencies are consistent with the analytically computed optimal patrol densities. %B AAAI Spring Symposium on Game Theory for Security, Sustainability and Health %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2012 %T Computing Optimal Strategy against Quantal Response in Security Games %A Yang, Rong %A Ordonez, Fernando %A Tambe, Milind %X To step beyond the first-generation deployments of attacker-defender security games – for LAX Police, US FAMS and others – it is critical that we relax the assumption of perfect rationality of the human adversary. Indeed, this assumption is a well-accepted limitation of classical game theory and modeling human adversaries’ bounded rationality is critical. To this end, quantal response (QR) has provided very promising results to model human bounded rationality. However, in computing optimal defender strategies in real-world security games against a QR model of attackers, we face difficulties including (1) solving a nonlinear non-convex optimization problem efficiently for massive real-world security games; and (2) addressing constraints on assigning security resources, which adds to the complexity of computing the optimal defender strategy. This paper presents two new algorithms to address these difficulties: GOSAQ can compute the globally optimal defender strategy against a QR model of attackers when there are no resource constraints and gives an efficient heuristic otherwise; PASAQ in turn provides an efficient approximation of the optimal defender strategy with or without resource constraints. These two novel algorithms are based on three key ideas: (i) use of a binary search method to solve the fractional optimization problem efficiently, (ii) construction of a convex optimization problem through a non-linear transformation, (iii) building a piecewise linear approximation of the non-linear terms in the problem. Additional contributions of this paper include proofs of approximation bounds, detailed experimental results showing the advantages of GOSAQ and PASAQ in solution quality over the benchmark algorithm (BRQR) and the efficiency of PASAQ. Given these results, PASAQ is at the heart of the PROTECT system, which is deployed for the US Coast Guard in the port of Boston, and is now headed to other ports. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Journal Article %J Automation in Construction: An International Research Journal %D 2012 %T Coordinating Occupant Behavior for Building Energy and Comfort Management using Multi-Agent Systems %A Laura Klein %A Kwak, Jun-young %A Geoffrey Kavulya %A Farrokh Jazizadeh %A Becerik-Gerber, Burcin %A Varakantham, Pradeep %A Tambe, Milind %X There is growing interest in reducing building energy consumption through increased sensor data and increased computational support for building controls. The goal of reduced building energy is often coupled with the desire for improved occupant comfort. Current building systems are inefficient in their energy usage for maintaining occupant comfort as they operate according to fixed schedules and maximum design occupancy assumptions, and they rely on code defined occupant comfort ranges. This paper presents and implements a multi-agent comfort and energy system (MACES) to model alternative management and control of building systems and occupants. MACES specifically improves upon previous multi-agent systems as it coordinates both building system devices and building occupants through direct changes to occupant meeting schedules using multi-objective Markov Decision Problems (MDP). MACES is implemented and tested with input from a real-world building including actual thermal zones, temperatures, occupant preferences, and occupant schedules. The operations of this building are then simulated according to three distinct control strategies involving varying levels of intelligent coordination of devices and occupants. Finally, the energy and comfort results of these three strategies are compared to the baseline and opportunities for further energy savings are assessed. A 12% reduction in energy consumption and a 5% improvement in occupant comfort are realized as compared to the baseline control. Specifically, by employing MDP meeting relocating, an additional 5% improvement in energy consumption is realized over other control strategies. %B Automation in Construction: An International Research Journal %P 525-536 %G eng %N 22 %0 Book Section %B Handbook on Operations Research for Homeland Security %D 2012 %T Deployed Security Games for Patrol Planning %E J. Herrmann %X Nations and organizations need to secure locations of economic, military, or political importance from groups or individuals that can cause harm. The fact that there are limited security resources prevents complete security coverage, which allows adversaries to observe and exploit patterns in patrolling or monitoring, and enables them to plan attacks that avoid existing patrols. The use of randomized security policies that are more difficult for adversaries to predict and exploit can counter their surveillance capabilities and improve security. In this chapter we describe the recent development of models to assist security forces in randomizing their patrols and their deployment in real applications. The systems deployed are based on fast algorithms for solving large instances of Bayesian Stackelberg games that capture the interaction between security forces and adversaries. Here we describe a generic mathematical formulation of these models, present some of the results that have allowed these systems to be deployed in practice, and outline remaining future challenges. We discuss the deployment of these systems in two real-world security applications: 1) The police at the Los Angeles International Airport uses these models to randomize the placement of checkpoints on roads entering the airport and the routes of canine unit patrols within the airport terminals. 2) The Federal Air Marshal Service uses these models to randomize the schedules of air marshals on international flights. %B Handbook on Operations Research for Homeland Security %G eng %0 Conference Paper %B Conference on Artificial Intelligence (AAAI) %D 2012 %T The Deployment-to-Saturation Ratio in Security Games %A Jain, Manish %A Leyton-Brown, Kevin %A Tambe, Milind %X Stackelberg security games form the backbone of systems like ARMOR, IRIS and PROTECT, which are in regular use by the Los Angeles International Police, US Federal Air Marshal Service and the US Coast Guard respectively. An understanding of the runtime required by algorithms that power such systems is critical to furthering the application of game theory to other real-world domains. This paper identifies the concept of the deployment-to-saturation ratio in random Stackelberg security games, and shows that problem instances for which this ratio is 0.5 are computationally harder than instances with other deployment-to-saturation ratios for a wide range of different equilibrium computation methods, including (i) previously published different MIP algorithms, and (ii) different underlying solvers and solution mechanisms. This finding has at least two important implications. First, it is important for new algorithms to be evaluated on the hardest problem instances. We show that this has often not been done in the past, and introduce a publicly available benchmark suite to facilitate such comparisons. Second, we provide evidence that this computationally hard region is also one where optimization would be of most benefit to security agencies, and thus requires significant attention from researchers in this area. Furthermore, we use the concept of phase transitions to better understand this computationally hard region. We define a decision problem related to security games, and show that the probability that this problem has a solution exhibits a phase transition as the deployment-to-saturation ratio crosses 0.5. We also demonstrate that this phase transition is invariant to changes both in the domain and the domain representation, and that the phase transition point corresponds to the computationally hardest instances. %B Conference on Artificial Intelligence (AAAI) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS)(Short paper) %D 2012 %T Designing Better Strategies against Human Adversaries in Network Security Games: Extended Abstract %A Yang, Rong %A Fang, Fei %A Xin Jiang, Albert %A Karthik Rajagopal %A Tambe, Milind %A Maheswaran, Rajiv %X In a Network Security Game (NSG), security agencies must allocate limited resources to protect targets embedded in a network, such as important buildings in a city road network. A recent line of work relaxed the perfect-rationality assumption of human adversary and showed significant advantages of incorporating the bounded rationality adversary models in non-networked security domains. Given that real-world NSG are often extremely complex and hence very difficult for humans to solve, it is critical that we address human bounded rationality when designing defender strategies. To that end, the key contributions of this paper include: (i) comprehensive experiments with human subjects using a web-based game that we designed to simulate NSGs; (ii) new behavioral models of human adversary in NSGs, which we train with the data collected from human experiments; (iii) new algorithms for computing the defender optimal strategy against the new models. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS)(Short paper) %G eng %0 Conference Paper %B Workshop on Optimization in Multiagent Systems (OPTMAS) at AAMAS %D 2012 %T Designing Patrol Strategies to Maximize Pristine Forest Area %A Johnson, Matthew P %A Fang, Fei %A Tambe, Milind %A H. J. Albers %X Illegal extraction of forest resources is fought, in many developing countries, by patrols that seek to deter such activity by decreasing its profitability. With a limited budget, a patrol strategy will seek to distribute the patrols throughout the forest, in order to minimize the resulting amount of extraction that occurs or maximize the amount of “pristine” forest area. Prior work in forest economics has posed this problem as a Stackelberg game, but efficient optimal or approximation algorithms for generating leader strategies have not previously been found. Unlike previous work on Stackelberg games in the multiagent literature, much of it motivated by counter-terrorism, here we seek to protect a continuous area, as much as possible, from extraction by an indeterminate number of followers. The continuous nature of this problem setting leads to new challenges and solutions, very different in character from in the discrete Stackelberg settings previously studied. In this paper, we give an optimal patrol allocation algorithm and a guaranteed approximation algorithm, the latter of which is more efficient and yields simpler, more practical patrol allocations. In our experimental investigations, we find that these algorithms perform significantly better—yielding a larger pristine area—than naive patrol allocations. %B Workshop on Optimization in Multiagent Systems (OPTMAS) at AAMAS %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2012 %T Detection of Suspicious Behavior from a Sparse Set of Multiagent Interactions %A Bostjan Kaluza %A Gal Kaminka %A Tambe, Milind %X In many multiagent domains, no single observation event is sufficient to determine that the behavior of individuals is suspicious. Instead, suspiciousness must be inferred from a combination of multiple events, where events refer to the individual’s interactions with other individuals. Hence, a detection system must employ a detector that combines evidence from multiple events, in contrast to most previous work, which focuses on the detection of a single, clearly suspicious event. This paper proposes a two-step detection system, where it first detects trigger events from multiagent interactions, and then combines the evidence to provide a degree of suspicion. The paper provides three key contributions: (i) proposes a novel detector that generalizes a utility-based plan recognition with arbitrary utility functions, (ii) specifies conditions that any reasonable detector should satisfy, and (iii) analyzes three detectors and compares them with the proposed approach. The results on a simulated airport domain and a dangerous-driver domain show that our new algorithm outperforms other approaches in several settings. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS)(Short paper) %D 2012 %T Emotional Contagion with Virtual Characters: Extended Abstract %A Tsai, Jason %A Bowring, Emma %A Marsella, Stacy %A Wood, Wendy %A Tambe, Milind %X In social psychology, emotional contagion describes the widely observed phenomenon of one person’s emotions mimicking surrounding people’s emotions [8]. While it has been observed in humanhuman interactions, no known studies have examined its existence in agent-human interactions. As virtual characters make their way into high-risk, high-impact applications such as psychotherapy and military training with increasing frequency, the emotional impact of the agents’ expressions must be accurately understood to avoid undesirable repercussions. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS)(Short paper) %G eng %0 Conference Paper %B AAAI Spring Symposium on Game Theory for Security, Sustainability and Health %D 2012 %T Game Theory for Security: A Real-World Challenge Problem for Multiagent Systems and Beyond %A Tambe, Milind %A An, Bo %X The goal of this paper is to introduce a real-world challenge problem for researchers in multiagent systems and beyond, where our collective efforts may have a significant impact on activities in the real-world. The challenge is in applying game theory for security: Our goal is not only to introduce the problem, but also to provide exemplars of initial successes of deployed systems in this challenge problem arena, some key open research challenges and pointers to getting started in this research. %B AAAI Spring Symposium on Game Theory for Security, Sustainability and Health %G eng %0 Conference Paper %B European Workshop on Multiagent Systems (EUMAS) 2011 workshop (Invited) %D 2012 %T Game Theory for Security: An Important Challenge for Multiagent Systems %A An, Bo %A Tambe, Milind %X The goal of this paper is to introduce a real-world challenge problem for researchers in multiagent systems and beyond, where our collective efforts may have a significant impact on activities in the real-world. The challenge is in applying game theory for security: Our goal is not only to introduce the problem, but also to provide exemplars of initial successes of deployed systems in this challenge problem arena, some key open research challenges and pointers to getting started in this research. %B European Workshop on Multiagent Systems (EUMAS) 2011 workshop (Invited) %G eng %0 Conference Paper %B 50th Annual Allerton Conference on Communication, Control, and Computing %D 2012 %T Game Theory for Security: Key Algorithmic Principles, Deployed Systems, Lessons Learned %A Tambe, Milind %A Jain, Manish %A James Adam Pita %A Xin Jiang, Albert %X Security is a critical concern around the world. In many security domains, limited security resources prevent full security coverage at all times; instead, these limited resources must be scheduled, avoiding schedule predictability, while simultaneously taking into account different target priorities, the responses of the adversaries to the security posture and potential uncertainty over adversary types. Computational game theory can help design such unpredictable security schedules. Indeed, casting the problem as a Bayesian Stackelberg game, we have developed new algorithms that are now deployed over multiple years in multiple applications for security scheduling. These applications are leading to real-world use-inspired research in the emerging research area of “security games”; specifically, the research challenges posed by these applications include scaling up security games to largescale problems, handling significant adversarial uncertainty, dealing with bounded rationality of human adversaries, and other interdisciplinary challenges. %B 50th Annual Allerton Conference on Communication, Control, and Computing %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2012 %T Game-theoretic Resource Allocation for Malicious Packet Detection in Computer Networks %A Ondrej Vanek %A Yin, Zhengyu %A Jain, Manish %A Bosansky, Branislav %A Tambe, Milind %A Michal Pechoucek %X We study the problem of optimal resource allocation for packet selection and inspection to detect potential threats in large computer networks with multiple valuable computers of differing importance. An attacker tries to harm these targets by sending malicious packets from multiple entry points of the network; the defender thus needs to optimally allocate his resources to maximize the probability of malicious packet detection under network latency constraints. We formulate the problem as a graph-based security game with multiple resources of heterogeneous capabilities and propose a mathematical program for finding optimal solutions. Due to the very limited scalability caused by the large attacker’s strategy space and non-linearity of the program, we investigate solutions with approximated utility function and propose Grande, a novel polynomial approximate algorithm utilizing submodularity of the problem able to find solutions with a bounded error on problem of a realistic size. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B Workshop on Optimization in Multiagent Systems (OPTMAS) at AAMAS %D 2012 %T Game-Theoretic Target Selection in Contagion-based Domains %A Tsai, Jason %A Thanh H. Nguyen %A Tambe, Milind %X Many strategic actions carry a ‘contagious’ component beyond the immediate locale of the effort itself. Viral marketing and peacekeeping operations have both been observed to have a spreading effect. In this work, we use counterinsurgency as our illustrative domain. Defined as the effort to block the spread of support for an insurgency, such operations lack the manpower to defend the entire population and must focus on the opinions of a subset of local leaders. As past researchers of security resource allocation have done, we propose using game theory to develop such policies and model the interconnected network of leaders as a graph. Unlike this past work in security games, actions in these domains possess a probabilistic, non-local impact. To address this new class of security games, we combine recent research in influence blocking maximization with a double oracle approach and create novel heuristic oracles to generate mixed strategies for a real-world leadership network from Afghanistan, synthetic leadership networks, and scale-free graphs. We find that leadership networks that exhibit highly interconnected clusters can be solved equally well by our heuristic methods, but our more sophisticated heuristics outperform simpler ones in less interconnected scale-free graphs. %B Workshop on Optimization in Multiagent Systems (OPTMAS) at AAMAS %G eng %0 Thesis %D 2012 %T The Human Element: Addressing Human Adversaries in Security Domains %A Pita, James %X Recently, game theory has been shown to be useful for reasoning about real-world security settings where security forces must protect critical assets from potential adversaries. In fact, there have been a number of deployed real-world applications of game theory for security (e.g., ARMOR at Los Angeles International Airport and IRIS for the Federal Air Marshals Service). Here, the objective is for the security force to utilize its limited resources to best defend their critical assets. An important factor in these real-world security settings is that the adversaries involved are humans who may not behave according to the standard assumptions of game-theoretic models. There are two key shortcomings of the approaches currently employed in these recent applications. First, human adversaries may not make the predicted rational decision. In such situations, where the security force has optimized against a perfectly rational opponent, a deviation by the human adversary can lead to adverse affects on the security force’s predicted outcome. Second, human adversaries are naturally creative and security domains are highly dynamic, making enumeration of all potential threats a practically impossible task and solving the resulting game, with current leading approaches, would be intractable. My thesis contributes to a very new area that combines algorithmic and experimental gametheory. Indeed, it examines a critical problem in applying game-theoretic techniques to situations where perfectly rational solvers must address human adversaries. In doing so it advances the study and reach of game theory to domains where software agents and humans may interact. More specifically, to address the first shortcoming, my thesis presents two separate algorithms to address potential deviations from the predicted rational decision by human adversaries. Experimental results, from a simulation that is motivated by a real-world security domain at Los Angeles International airport, demonstrated that both of my approaches outperform the currently deployed optimal algorithms which utilize standard game-theoretic assumptions and additional alternative algorithms against humans. In fact, one of my approaches is currently under evaluation in a real-world application to aid in resource allocation decisions for the United States Coast Guard. Towards addressing the second shortcoming of enumeration of a large number of potential adversary threat capabilities, I introduce a new game-theoretic model for efficiency, which additionally generalizes the previously accepted model for security domains. This new game-theoretic model for addressing human threat capabilities has seen real-world deployment and is under evaluation to aid the United States Transportation Security Administration in their resource allocation challenges. %G eng %9 PhD thesis %0 Conference Paper %B Construction Research Congress %D 2012 %T Human-Building Interaction for Energy Conservation in Office Buildings %A Farrokh Jazizadeha %A Geoffrey Kavulyaa %A Kwak, Jun-young %A Becerik-Gerber, Burcin %A Tambe, Milind %A Wood, Wendy %X Buildings are one of the major consumers of energy in the U.S. Both commercial and residential buildings account for about 42% of the national U.S. energy consumption. The majority of commercial buildings energy consumption is attributed to lighting (25%), space heating and cooling (25%), and ventilation (7%). Several research studies and industrial developments have focused on energy management based on maximum occupancy. However, fewer studies, with the objective of energy savings, have considered human preferences. This research focuses on office buildings’ occupants’ preferences and their contribution to the building energy conservation. Accordingly, occupants of selected university campus offices were asked to reduce lighting levels in their offices during work hours. Different types of information regarding their energy consumption were provided to the occupants. Email messages were used to communicate with the occupants. To monitor behavioral changes during the study, the test bed offices were equipped with wireless light sensors. The deployed light sensors were capable of detecting variations in light intensity, which was correlated with energy consumption. The impact of different types of information on occupant’s energy related behavior is presented. %B Construction Research Congress %G eng %0 Conference Paper %B Workshop on Human-Agent Interaction Design and Models (HAIDM) at AAMAS %D 2012 %T Modeling Human Bounded Rationality to Improve Defender Strategies in Network Security Games %A Yang, Rong %A Fang, Fei %A Xin Jiang, Albert %A Karthik Rajagopal %A Tambe, Milind %A Maheswaran, Rajiv %X In a Network Security Game (NSG), security agencies must allocate limited resources to protect targets embedded in a network, such as important buildings in a city road network. A recent line of work relaxed the perfectrationality assumption of human adversary and showed significant advantages of incorporating the bounded rationality adversary models in non-networked security domains. Given that real-world NSG are often extremely complex and hence very difficult for humans to solve, it is critical that we address human bounded rationality when designing defender strategies. To that end, the key contributions of this paper include: (i) comprehensive experiments with human subjects using a web-based game that we designed to simulate NSGs; (ii) new behavioral models of human adversary in NSGs, which we train with the data collected from human experiments; (iii) new algorithms for computing the defender optimal strategy against the new models. %B Workshop on Human-Agent Interaction Design and Models (HAIDM) at AAMAS %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2012 %T Multi-Objective Optimization for Security Games %A Brown, Matthew %A An, Bo %A Kiekintveld, Christopher %A Ordonez, Fernando %A Tambe, Milind %X The burgeoning area of security games has focused on real-world domains where security agencies protect critical infrastructure from a diverse set of adaptive adversaries. There are security domains where the payoffs for preventing the different types of adversaries may take different forms (seized money, reduced crime, saved lives, etc) which are not readily comparable. Thus, it can be difficult to know how to weigh the different payoffs when deciding on a security strategy. To address the challenges of these domains, we propose a fundamentally different solution concept, multi-objective security games (MOSG), which combines security games and multiobjective optimization. Instead of a single optimal solution, MOSGs have a set of Pareto optimal (non-dominated) solutions referred to as the Pareto frontier. The Pareto frontier can be generated by solving a sequence of constrained single-objective optimization problems (CSOP), where one objective is selected to be maximized while lower bounds are specified for the other objectives. Our contributions include: (i) an algorithm, Iterative -Constraints, for generating the sequence of CSOPs; (ii) an exact approach for solving an MILP formulation of a CSOP (which also applies to multi-objective optimization in more general Stackelberg games); (iii) heuristics that achieve speedup by exploiting the structure of security games to further constrain a CSOP; (iv) an approximate approach for solving an algorithmic formulation of a CSOP, increasing the scalability of our approach with quality guarantees. Additional contributions of this paper include proofs on the level of approximation and detailed experimental evaluation of the proposed approaches. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2012 %T Multi-Objective Optimization for Security Games %A Brown, Matthew %A An, Bo %A Kiekintveld, Christopher %A Ordonez, Fernando %A Tambe, Milind %X The burgeoning area of security games has focused on real-world domains where security agencies protect critical infrastructure from a diverse set of adaptive adversaries. There are security domains where the payoffs for preventing the different types of adversaries may take different forms (seized money, reduced crime, saved lives, etc) which are not readily comparable. Thus, it can be difficult to know how to weigh the different payoffs when deciding on a security strategy. To address the challenges of these domains, we propose a fundamentally different solution concept, multi-objective security games (MOSG), which combines security games and multiobjective optimization. Instead of a single optimal solution, MOSGs have a set of Pareto optimal (non-dominated) solutions referred to as the Pareto frontier. The Pareto frontier can be generated by solving a sequence of constrained single-objective optimization problems (CSOP), where one objective is selected to be maximized while lower bounds are specified for the other objectives. Our contributions include: (i) an algorithm, Iterative -Constraints, for generating the sequence of CSOPs; (ii) an exact approach for solving an MILP formulation of a CSOP (which also applies to multi-objective optimization in more general Stackelberg games); (iii) heuristics that achieve speedup by exploiting the structure of security games to further constrain a CSOP; (iv) an approximate approach for solving an algorithmic formulation of a CSOP, increasing the scalability of our approach with quality guarantees. Additional contributions of this paper include proofs on the level of approximation and detailed experimental evaluation of the proposed approaches. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B Conference on Artificial Intelligence (AAAI) %D 2012 %T Patrol Strategies to Maximize Pristine Forest Area %A Matthew P. Johnson %A Fang, Fei %A Tambe, Milind %X Illegal extraction of forest resources is fought, in many developing countries, by patrols that try to make this activity less profitable, using the threat of confiscation. With a limited budget, officials will try to distribute the patrols throughout the forest intelligently, in order to most effectively limit extraction. Prior work in forest economics has formalized this as a Stackelberg game, one very different in character from the discrete Stackelberg problem settings previously studied in the multiagent literature. Specifically, the leader wishes to minimize the distance by which a profit-maximizing extractor will trespass into the forest—or to maximize the radius of the remaining “pristine” forest area. The follower’s costbenefit analysis of potential trespass distances is affected by the likelihood of being caught and suffering confiscation. In this paper, we give a near-optimal patrol allocation algorithm and a 1/2-approximation algorithm, the latter of which is more efficient and yields simpler, more practical patrol allocations. Our simulations indicate that these algorithms substantially outperform existing heuristic allocations. %B Conference on Artificial Intelligence (AAAI) %G eng %0 Conference Paper %B Workshop on Emotional and Empathic Agents (EEA) at AAMAS %D 2012 %T Preliminary Exploration of Agent-Human Emotional Contagion via Static Expressions %A Tsai, Jason %A Bowring, Emma %A Marsella, Stacy %A Wood, Wendy %A Tambe, Milind %X In social psychology, emotional contagion describes the widely observed phenomenon of one person’s emotions mimicking surrounding people’s emotions [13]. While it has been observed in humanhuman interactions, no known studies have examined its existence in agent-human interactions. As virtual characters make their way into high-risk, high-impact applications such as psychotherapy and military training with increasing frequency, the emotional impact of the agents’ expressions must be accurately understood to avoid undesirable repercussions. In this paper, we perform a battery of experiments to explore the existence of agent-human emotional contagion. The first study is a between-subjects design, wherein subjects were shown an image of a character’s face with either a neutral or happy expression. Findings indicate that even a still image induces a very strong increase in self-reported happiness between Neutral and Happy conditions with all characters tested and, to our knowledge, is the first ever study explicitly showing emotional contagion from a virtual agent to a human. We also examine the effects of participant gender, participant ethnicity, character attractiveness, and perceived character happiness and find that only perceived character happiness has a substantial impact on emotional contagion. In a second study, we examine the effect of a virtual character’s presence in a strategic situation by presenting subjects with a modernized Stag Hunt game. Our experiments show that the contagion effect is substantially dampened and does not cause a consistent impact on behavior. A third study explores the impact of the strategic decision within the Stag Hunt and conducts the same experiment using a description of the same strategic situation with the decision already made. We find that the emotional impact returns again, particularly for women, implying that the contagion effect is substantially lessened in the presence of a strategic decision. %B Workshop on Emotional and Empathic Agents (EEA) at AAMAS %G eng %0 Magazine Article %D 2012 %T PROTECT - A Deployed Game Theoretic System for Strategic Security Allocation for the United States Coast Guard %A An, Bo %A Shieh, Eric %A Yang, Rong %A Tambe, Milind %A Baldwin, Craig %A Joseph DiRenzo %A Maule, Ben %A Meyer, Garrett %X While three deployed applications of game theory for security have recently been reported, we as a community of agents and AI researchers remain in the early stages of these deployments; there is a continuing need to understand the core principles for innovative security applications of game theory. Towards that end, this paper presents PROTECT, a gametheoretic system deployed by the United States Coast Guard (USCG) in the Port of Boston for scheduling their patrols. USCG has termed the deployment of PROTECT in Boston a success; PROTECT is currently being tested in the Port of New York, with the potential for nationwide deployment. PROTECT is premised on an attacker-defender Stackelberg game model and offers five key innovations. First, this system is a departure from the assumption of perfect adversary rationality noted in previous work, relying instead on a quantal response (QR) model of the adversary’s behavior — to the best of our knowledge, this is the first real-world deployment of the QR model. Second, to improve PROTECT’s efficiency, we generate a compact representation of the defender’s strategy space, exploiting equivalence and dominance. Third, we show how to practically model a real maritime patrolling problem as a Stackelberg game. Fourth, our experimental results illustrate that PROTECT’s QR model more robustly handles real-world uncertainties than a perfect rationality model. Finally, in evaluating PROTECT, this paper for the first time provides real-world data: (i) comparison of human-generated vs PROTECT security schedules, and (ii) results from an Adversarial Perspective Team’s (human mock attackers) analysis. %B AI Magazine %V 33 %P 96-110 %G eng %N 4 %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2012 %T PROTECT: A Deployed Game Theoretic System to Protect the Ports of the United States %A Shieh, Eric %A An, Bo %A Yang, Rong %A Tambe, Milind %A Baldwin, Craig %A Joseph DiRenzo %A Maule, Ben %A Meyer, Garrett %X While three deployed applications of game theory for security have recently been reported at AAMAS [12], we as a community remain in the early stages of these deployments; there is a continuing need to understand the core principles for innovative security applications of game theory. Towards that end, this paper presents PROTECT, a game-theoretic system deployed by the United States Coast Guard (USCG) in the port of Boston for scheduling their patrols. USCG has termed the deployment of PROTECT in Boston a success, and efforts are underway to test it in the port of New York, with the potential for nationwide deployment. PROTECT is premised on an attacker-defender Stackelberg game model and offers five key innovations. First, this system is a departure from the assumption of perfect adversary rationality noted in previous work, relying instead on a quantal response (QR) model of the adversary’s behavior — to the best of our knowledge, this is the first real-world deployment of the QR model. Second, to improve PROTECT’s efficiency, we generate a compact representation of the defender’s strategy space, exploiting equivalence and dominance. Third, we show how to practically model a real maritime patrolling problem as a Stackelberg game. Fourth, our experimental results illustrate that PROTECT’s QR model more robustly handles real-world uncertainties than a perfect rationality model. Finally, in evaluating PROTECT, this paper for the first time provides real-world data: (i) comparison of human-generated vs PROTECT security schedules, and (ii) results from an Adversarial Perspective Team’s (human mock attackers) analysis. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B Conference on Artificial Intelligence (AAAI) Spotlight Track %D 2012 %T PROTECT: An Application of Computational Game Theory for the Security of the Ports of the United States %A Shieh, Eric %A An, Bo %A Yang, Rong %A Tambe, Milind %A Baldwin, Craig %A Joseph DiRenzo %A Maule, Ben %A Meyer, Garrett %X Building upon previous security applications of computational game theory, this paper presents PROTECT, a gametheoretic system deployed by the United States Coast Guard (USCG) in the port of Boston for scheduling their patrols. USCG has termed the deployment of PROTECT in Boston a success, and efforts are underway to test it in the port of New York, with the potential for nationwide deployment. PROTECT is premised on an attacker-defender Stackelberg game model and offers five key innovations. First, this system is a departure from the assumption of perfect adversary rationality noted in previous work, relying instead on a quantal response (QR) model of the adversary’s behavior — to the best of our knowledge, this is the first real-world deployment of the QR model. Second, to improve PROTECT’s efficiency, we generate a compact representation of the defender’s strategy space, exploiting equivalence and dominance. Third, we show how to practically model a real maritime patrolling problem as a Stackelberg game. Fourth, our experimental results illustrate that PROTECT’s QR model more robustly handles real-world uncertainties than a perfect rationality model. Finally, in evaluating PROTECT, this paper provides realworld data: (i) comparison of human-generated vs PROTECT security schedules, and (ii) results from an Adversarial Perspective Team’s (human mock attackers) analysis. %B Conference on Artificial Intelligence (AAAI) Spotlight Track %G eng %0 Book Section %D 2012 %T PROTECT in the Ports of Boston, New York and Beyond: Experiences in Deploying Stackelberg Security Games with Quantal Response %A Shieh, Eric %A An, Bo %A Yang, Rong %A Tambe, Milind %A Baldwin, Craig %A Joseph DiRenzo %A Maule, Ben %A Meyer, Garrett %A Moretti, Kathryn %X While three deployed applications of game theory for security have recently been reported at AAMAS [21], we as a community remain in the early stages of these deployments; there is a continuing need to understand the core principles for innovative security applications of game theory. Towards that end, this chapter presents PROTECT, a game-theoretic system deployed by the United States Coast Guard (USCG) in the port of Boston for scheduling their patrols. USCG has termed the deployment of PROTECT in Boston a success, and efforts are underway to test it in the port of New York, with the potential for nationwide deployment. PROTECT is premised on an attacker-defender Stackelberg game model and offers five key innovations. First, this system is a departure from the assumption of perfect adversary rationality noted in previous work, relying instead on a quantal response (QR) model of the adversary’s behavior. To the best of our knowledge, this is the first real-world deployment of the QR model. Second, to improve PROTECT’s efficiency, we generate a compact representation of the defender’s strategy space, exploiting equivalence and dominance. Third, we show how to practically model a real maritime patrolling problem as a Stackelberg game. Fourth, our experimental results illustrate that PROTECT’s QR model more robustly handles real-world uncertainties than a perfect rationality model does. Finally, in evaluating PROTECT, this chapter provides real-world data: (i) comparison of human-generated vs. PROTECT security schedules, and (ii) results from an Adversarial Perspective Team’s (human mock attackers) analysis. %I Springer %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) (Short paper) %D 2012 %T A Robust Approach to Addressing Human Adversaries in Security Games: Extended Abstract %A Pita, James %A Richard John %A Maheswaran, Rajiv %A Tambe, Milind %A Yang, Rong %A Kraus, Sarit %X While game-theoretic approaches have been proposed for addressing complex security resource allocation problems, many of the standard game-theoretic assumptions fail to address human adversaries who security forces will likely face. To that end, approaches have been proposed that attempt to incorporate better models of human decision-making in these security settings. We take a new approach where instead of trying to create a model of human decisionmaking, we leverage ideas from robust optimization techniques. In addition, we extend our approach and the previous best performing approach to also address human anchoring biases under limited observation conditions. To evaluate our approach, we perform a comprehensive examination comparing the performance of our new approach against the current leading approaches to addressing human adversaries. Finally, in our experiments we take the first ever analysis of some demographic information and personality measures that may influence decision making in security games. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) (Short paper) %G eng %0 Conference Paper %B European Conference on Artificial Intelligence (ECAI) %D 2012 %T A Robust Approach to Addressing Human Adversaries in Security Games %A Pita, James %A Richard John %A Maheswaran, Rajiv %A Tambe, Milind %A Kraus, Sarit %X Game-theoretic approaches have been proposed for addressing the complex problem of assigning limited security resources to protect a critical set of targets. However, many of the standard assumptions fail to address human adversaries who security forces will likely face. To address this challenge, previous research has attempted to integrate models of human decision-making into the game-theoretic algorithms for security settings. The current leading approach, based on experimental evaluation, is derived from a wellfounded solution concept known as quantal response and is known as BRQR. One critical difficulty with opponent modeling in general is that, in security domains, information about potential adversaries is often sparse or noisy and furthermore, the games themselves are highly complex and large in scale. Thus, we chose to examine a completely new approach to addressing human adversaries that avoids the complex task of modeling human decision-making. We leverage and modify robust optimization techniques to create a new type of optimization where the defender’s loss for a potential deviation by the attacker is bounded by the distance of that deviation from the expected-value-maximizing strategy. To demonstrate the advantages of our approach, we introduce a systematic way to generate meaningful reward structures and compare our approach with BRQR in the most comprehensive investigation to date involving 104 security settings where previous work has tested only up to 10 security settings. Our experimental analysis reveals our approach performing as well as or outperforming BRQR in over 90% of the security settings tested and we demonstrate significant runtime benefits. These results are in favor of utilizing an approach based on robust optimization in these complex domains to avoid the difficulties of opponent modeling. %B European Conference on Artificial Intelligence (ECAI) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2012 %T SAVES: A Sustainable Multiagent Application to Conserve Building Energy Considering Occupants %A Kwak, Jun-young %A Varakantham, Pradeep %A Maheswaran, Rajiv %A Tambe, Milind %A Farrokh Jazizadeh %A Geoffrey Kavulya %A Laura Klein %A Becerik-Gerber, Burcin %A Timothy Hayes %A Wood, Wendy %X This paper describes an innovative multiagent system called SAVES with the goal of conserving energy in commercial buildings. We specifically focus on an application to be deployed in an existing university building that provides several key novelties: (i) jointly performed with the university facility management team, SAVES is based on actual occupant preferences and schedules, actual energy consumption and loss data, real sensors and hand-held devices, etc.; (ii) it addresses novel scenarios that require negotiations with groups of building occupants to conserve energy; (iii) it focuses on a non-residential building, where human occupants do not have a direct financial incentive in saving energy and thus requires a different mechanism to effectively motivate occupants; and (iv) SAVES uses a novel algorithm for generating optimal MDP policies that explicitly consider multiple criteria optimization (energy and personal comfort) as well as uncertainty over occupant preferences when negotiating energy reduction – this combination of challenges has not been considered in previous MDP algorithms. In a validated simulation testbed, we show that SAVES substantially reduces the overall energy consumption compared to the existing control method while achieving comparable average satisfaction levels for occupants. As a real-world test, we provide results of a trial study where SAVES is shown to lead occupants to conserve energy in real buildings. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B In S. Jajodia, A.K. Ghosh, V.S. Subramanian, V. Swarup, C. Wang, and X. S. Wang editors Moving Target Defense II: Application of Game Theory and Adversarial Modeling, Springer %D 2012 %T Security Games Applied to Real-World: Research Contributions and Challenges %A Jain, Manish %A An, Bo %A Tambe, Milind %X The goal of this chapter is to introduce a challenging real-world problem for researchers in multiagent systems and beyond, where our collective efforts may have a significant impact on activities in the real-world. The challenge is in applying game theory for security: our goal is not only to introduce the problem, but also to provide exemplars of initial successes of deployed systems in this problem arena. Furthermore, we present key ideas and algorithms for solving and understanding the characteristics large-scale real-world security games, and then present some key open research challenges in this area. %B In S. Jajodia, A.K. Ghosh, V.S. Subramanian, V. Swarup, C. Wang, and X. S. Wang editors Moving Target Defense II: Application of Game Theory and Adversarial Modeling, Springer %G eng %0 Conference Paper %B AAAI Fall Symposium, 2012 %D 2012 %T Security Games on Social Networks %A Thanh H. Nguyen %A Tsai, Jason %A Jiang, Albert %A Bowring, Emma %A Maheswaran, Rajiv %A Tambe, Milind %X Many real-world problems exhibit competitive situations in which a defender (a defending agent, agency, or organization) has to address misinformation spread by its adversary, e.g., health organizations cope with vaccination-related misinformation provided by anti-vaccination groups. The rise of social networks has allowed misinformation to be easily and quickly diffused to a large community. Taking into account knowledge of its adversary’s actions, the defender has to seek efficient strategies to limit the influence of the spread of misinformation by the opponent. In this paper, we address this problem as a blocking influence maximization problem using a game-theoretic approach. Two players strategically select a number of seed nodes in the social network that could initiate their own influence propagation. While the adversary attempts to maximize its negative influence, the defender tries to minimize this influence. We represent the problem as a zero-sum game and apply the Double Oracle algorithm to solve the game in combination with various heuristics for oracle phases. Our experimental results reveal that by using the game theoretic approach, we are able to significantly reduce the negative influence in comparison to when the defender does not do anything. In addition, we propose using an approximation of the payoff matrix, making the algorithms scalable to large real-world networks. %B AAAI Fall Symposium, 2012 %G eng %0 Conference Paper %B Conference on Artificial Intelligence (AAAI) %D 2012 %T Security Games with Limited Surveillance %A An, Bo %A David Kempe %A Kiekintveld, Christopher %A Shieh, Eric %A Satinder Singh %A Tambe, Milind %A Vorobeychik, Yevgeniy %X Randomized first-mover strategies of Stackelberg games are used in several deployed applications to allocate limited resources for the protection of critical infrastructure. Stackelberg games model the fact that a strategic attacker can surveil and exploit the defender’s strategy, and randomization guards against the worst effects by making the defender less predictable. In accordance with the standard game-theoretic model of Stackelberg games, past work has typically assumed that the attacker has perfect knowledge of the defender’s randomized strategy and will react correspondingly. In light of the fact that surveillance is costly, risky, and delays an attack, this assumption is clearly simplistic: attackers will usually act on partial knowledge of the defender’s strategies. The attacker’s imperfect estimate could present opportunities and possibly also threats to a strategic defender. In this paper, we therefore begin a systematic study of security games with limited surveillance. We propose a natural model wherein an attacker forms or updates a belief based on observed actions, and chooses an optimal response. We investigate the model both theoretically and experimentally. In particular, we give mathematical programs to compute optimal attacker and defender strategies for a fixed observation duration, and show how to use them to estimate the attacker’s observation durations. Our experimental results show that the defender can achieve significant improvement in expected utility by taking the attacker’s limited surveillance into account, validating the motivation of our work. %B Conference on Artificial Intelligence (AAAI) %G eng %0 Conference Paper %B AAAI Spring Symposium on Game Theory for Security, Sustainability and Health %D 2012 %T Security Games with Limited Surveillance: An Initial Report %A An, Bo %A David Kempe %A Kiekintveld, Christopher %A Shieh, Eric %A Satinder Singh %A Tambe, Milind %A Vorobeychik, Yevgeniy %X Stackelberg games have been used in several deployed applications of game theory to make recommendations for allocating limited resources for protecting critical infrastructure. The resource allocation strategies are randomized to prevent a strategic attacker from using surveillance to learn and exploit patterns in the allocation. An important limitation of previous work on security games is that it typically assumes that attackers have perfect surveillance capabilities, and can learn the exact strategy of the defender. We introduce a new model that explicitly models the process of an attacker observing a sequence of resource allocation decisions and updating his beliefs about the defender’s strategy. For this model we present computational techniques for updating the attacker’s beliefs and computing optimal strategies for both the attacker and defender, given a specific number of observations. We provide multiple formulations for computing the defender’s optimal strategy, including non-convex programming and a convex approximation. We also present an approximate method for computing the optimal length of time for the attacker to observe the defender’s strategy before attacking. Finally, we present experimental results comparing the efficiency and runtime of our methods. %B AAAI Spring Symposium on Game Theory for Security, Sustainability and Health %G eng %0 Conference Paper %B International Conference on Intelligent Virtual Agents (IVA) (short paper) %D 2012 %T A Study of Emotional Contagion with Virtual Characters %A J. Tsai %A E. Bowring %A S. Marsella %A W. Wood %A M. Tambe %X In social psychology, emotional contagion describes the widely observed phenomenon of one person’s emotions mimicking surrounding people’s emotions [10]. In this paper, we perform a battery of experiments to explore the existence of agent-human emotional contagion. The first study is a betweensubjects design, wherein subjects were shown an image of a character’s face with either a neutral or happy expression. Findings indicate that even a still image induces a very strong increase in self-reported happiness between Neutral and Happy conditions with all characters tested. In a second study, we examine the effect of a virtual character’s presence in a strategic situation by presenting subjects with a modernized Stag Hunt game. Our experiments show that the contagion effect is substantially dampened and does not cause a consistent impact on behavior. A third study explores the impact of the strategic decision within the Stag Hunt and conducts the same experiment using a description of the same strategic situation with the decision already made. We find that the emotional impact returns, implying that the contagion effect is substantially lessened in the presence of a strategic decision. %B International Conference on Intelligent Virtual Agents (IVA) (short paper) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) Demonstration Track %D 2012 %T Sustainable Multiagent Application to Conserve Energy %A Kwak, Jun-young %A Varakantham, Pradeep %A Maheswaran, Rajiv %A Tambe, Milind %A Farrokh Jazizadeh %A Geoffrey Kavulya %A Laura Klein %A Becerik-Gerber, Burcin %A Timothy Hayes %A Wood, Wendy %X Limited availability of energy resources has motivated the need for developing efficient measures of conserving energy. Conserving energy in commercial buildings is an important goal since these buildings consume significant amount of energy, e.g., 46.2% of all building energy and 18.4% of total energy consumption in the US [1]. This demonstration focuses on a novel application to be deployed at Ralph & Goldy Lewis Hall (RGL) at the University of Southern California as a practical research testbed to optimize multiple competing objectives: i) energy use in the building; ii) occupants’ comfort level; and iii) practical usage considerations. This demonstration complements our paper in the AAMAS innovative applications track [4], presenting a novel multiagent building application for sustainability called SAVES (Sustainable multiAgent systems for optimizing Variable objectives including Energy and Satisfaction). This writeup will provide a high-level overview of SAVES and focus more on the proposed demonstration, but readers are referred to [4] for a more technical description. SAVES provides three key contributions: (i) jointly performed with the university facility management team, our research is based on actual building and occupant data as well as real sensors and devices, etc.; (ii) it focuses on non-residential buildings, where human occupants do not have a direct financial incentive in saving energy; and (iii) SAVES uses a novel algorithm for generating optimal BM-MDP (Bounded parameter Multi-objective MDP) policies. We demonstrate SAVES to show how to achieve significant energy savings and comparable average satisfaction level of occupants while emphasizing the interactive aspects of our application. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) Demonstration Track %G eng %0 Conference Paper %B AAAI Spring Symposium on Game Theory for Security, Sustainability and Health %D 2012 %T Towards Optimal Patrol Strategies for Fare Inspection in Transit Systems %A Xin Jiang, Albert %A Yin, Zhengyu %A Matthew P. Johnson %A Kiekintveld, Christopher %A Leyton-Brown, Kevin %A Tuomas Sandholm %A Tambe, Milind %X In some urban transit systems, passengers are legally required to purchase tickets before entering but are not physically forced to do so. Instead, patrol units move about through the transit system, inspecting tickets of passengers, who face fines for fare evasion. This setting yields the problem of computing optimal patrol strategies satisfying certain temporal and spacial constraints, to deter fare evasion and hence maximize revenue. In this paper we propose an initial model of this problem as a leader-follower Stackelberg game. We then formulate an LP relaxation of this problem and present initial experimental results using real-world ridership data from the Los Angeles Metro Rail system. %B AAAI Spring Symposium on Game Theory for Security, Sustainability and Health %G eng %0 Conference Paper %B Workshop on Agent Technologies for Energy Systems (ATES) at AAMAS %D 2012 %T Towards Robust Multi-objective Optimization Under Model Uncertainty for Energy Conservation %A Kwak, Jun-young %A Varakantham, Pradeep %A Maheswaran, Rajiv %A Tambe, Milind %A Timothy Hayes %A Wood, Wendy %A Becerik-Gerber, Burcin %X to the significant growth in energy usage. Building multiagent systems for real-world energy applications raises several research challenges regarding scalability, optimizing multiple competing objectives, model uncertainty, and complexity in deploying the system. Motivated by these challenges, this paper proposes a new approach to effectively conserve building energy. This work contributes to a very new area that requires considering large-scale multi-objective optimization as well as uncertainty over occupant preferences when negotiating energy reduction. There are three major contributions. We (i) develop a new method called HRMM to compute robust solutions in practical situations; (ii) experimentally show that obtained strategies from HRMM converge to near-optimal solutions; and (iii) provide a systematic way to tightly incorporate the insights from human subject studies into our computational model and algorithms. The HRMM method is verified in a validated simulation testbed in terms of energy savings and comfort levels of occupants %B Workshop on Agent Technologies for Energy Systems (ATES) at AAMAS %G eng %0 Newspaper Article %B AI Magazine %D 2012 %T TRUSTS: Scheduling Randomized Patrols for Fare Inspection in Transit Systems using Game Theory %A Yin, Zhengyu %A Xin Jiang, Albert %A Tambe, Milind %A Kiekintveld, Christopher %A Leyton-Brown, Kevin %A Tuomas Sandholm %A John P. Sullivan %X In proof-of-payment transit systems, passengers are legally required to purchase tickets before entering but are not physically forced to do so. Instead, patrol units move about the transit system, inspecting the tickets of passengers, who face fines if caught fare evading. The deterrence of fare evasion depends on the unpredictability and effectiveness of the patrols. In this paper, we present TRUSTS, an application for scheduling randomized patrols for fare inspection in transit systems. TRUSTS models the problem of computing patrol strategies as a leader-follower Stackelberg game where the objective is to deter fare evasion and hence maximize revenue. This problem differs from previously studied Stackelberg settings in that the leader strategies must satisfy massive temporal and spatial constraints; moreover, unlike in these counterterrorism-motivated Stackelberg applications, a large fraction of the ridership might realistically consider fare evasion, and so the number of followers is potentially huge. A third key novelty in our work is deliberate simplification of leader strategies to make patrols easier to be executed. We present an efficient algorithm for computing such patrol strategies and present experimental results using real-world ridership data from the Los Angeles Metro Rail system. The Los Angeles County Sheriff’s department is currently carrying out trials of TRUSTS. %B AI Magazine %V 33 %P 59-72 %G eng %N 4 %0 Conference Paper %B Conference on Innovative Applications of Artificial Intelligence (IAAI) %D 2012 %T TRUSTS: Scheduling Randomized Patrols for Fare Inspection in Transit Systems %A Yin, Zhengyu %A Jiang, Albert %A Matthew Johnson %A Tambe, Milind %A Kiekintveld, Christopher %A Leyton-Brown, Kevin %A Tuomas Sandholm %A Sullivan, John %X In proof-of-payment transit systems, passengers are legally required to purchase tickets before entering but are not physically forced to do so. Instead, patrol units move about the transit system, inspecting the tickets of passengers, who face fines if caught fare evading. The deterrence of such fines depends on the unpredictability and effectiveness of the patrols. In this paper, we present TRUSTS, an application for scheduling randomized patrols for fare inspection in transit systems. TRUSTS models the problem of computing patrol strategies as a leader-follower Stackelberg game where the objective is to deter fare evasion and hence maximize revenue. This problem differs from previously studied Stackelberg settings in that the leader strategies must satisfy massive temporal and spatial constraints; moreover, unlike in these counterterrorism-motivated Stackelberg applications, a large fraction of the ridership might realistically consider fare evasion, and so the number of followers is potentially huge. A third key novelty in our work is deliberate simplification of leader strategies to make patrols easier to be executed. We present an efficient algorithm for computing such patrol strategies and present experimental results using realworld ridership data from the Los Angeles Metro Rail system. The Los Angeles Sheriff’s department has begun trials of TRUSTS. %B Conference on Innovative Applications of Artificial Intelligence (IAAI) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2012 %T A Unified Method for Handling Discrete and Continuous Uncertainty in Bayesian Stackelberg Games %A Yin, Zhengyu %A Tambe, Milind %X Given their existing and potential real-world security applications, Bayesian Stackelberg games have received significant research interest [3, 12, 8]. In these games, the defender acts as a leader, and the many different follower types model the uncertainty over discrete attacker types. Unfortunately since solving such games is an NP-hard problem, scale-up has remained a difficult challenge. This paper scales up Bayesian Stackelberg games, providing a novel unified approach to handling uncertainty not only over discrete follower types but also other key continuously distributed real world uncertainty, due to the leader’s execution error, the follower’s observation error, and continuous payoff uncertainty. To that end, this paper provides contributions in two parts. First, we present a new algorithm for Bayesian Stackelberg games, called HUNTER, to scale up the number of types. HUNTER combines the following five key features: i) efficient pruning via a best-first search of the leader’s strategy space; ii) a novel linear program for computing tight upper bounds for this search; iii) using Bender’s decomposition for solving the upper bound linear program efficiently; iv) efficient inheritance of Bender’s cuts from parent to child; v) an efficient heuristic branching rule. Our experiments show that HUNTER provides orders of magnitude speedups over the best existing methods to handle discrete follower types. In the second part, we show HUNTER’s efficiency for Bayesian Stackelberg games can be exploited to also handle the continuous uncertainty using sample average approximation. We experimentally show that our HUNTER-based approach also outperforms latest robust solution methods under continuously distributed uncertainty. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B AAAI Spring Symposium on Game Theory for Security, Sustainability and Health %D 2012 %T Which Security Games are Hard to Solve? %A Jain, Manish %A Leyton-Brown, Kevin %A Tambe, Milind %X Stackelberg security games form the backbone of systems like ARMOR, IRIS and PROTECT, which are in regular use by the Los Angeles International Police, US Federal Air Marshal Service and the US Coast Guard respectively. An understanding of the runtime required by algorithms that power such systems is critical to furthering the application of game theory to other real-world domains. This paper identifies the concept of the deployment-to-saturation ratio in random Stackelberg security games, and shows that in a decision problem related to these games, the probability that a solution exists exhibits a phase transition as the ratio crosses 0.5. We demonstrate that this phase transition is invariant to changes both in the domain and the domain representation. Moreover, problem instances at this phase transition point are computationally harder than instances with other deployment-tosaturation ratios for a wide range of different equilibrium computation methods, including (i) previously published different MIP algorithms, and (ii) different underlying solvers and solution mechanisms. Our findings have at least two important implications. First, it is important for new algorithms to be evaluated on the hardest problem instances. We show that this has often not been done in the past, and introduce a publicly available benchmark suite to facilitate such comparisons. Second, we provide evidence that this phase transition region is also one where optimization would be of most benefit to security agencies, and thus requires significant attention from researchers in this area. %B AAAI Spring Symposium on Game Theory for Security, Sustainability and Health %G eng %0 Journal Article %J AI Magazine %D 2012 %T An Overview of Recent Application Trends at the AAMAS conference: Security, Sustainability and Safety %A Jain, Manish %A An, Bo %A Tambe, Milind %X A key feature of the AAMAS conference is its emphasis on ties to real-world applications. The focus of this article is to provide a broad overview of application-focused papers published at the AAMAS 2010 and 2011 conferences. More specifically, recent applications at AAMAS could be broadly categorized as belonging to research areas of security, sustainability and safety. We outline the domains of applications, key research thrusts underlying each such application area, and emerging trends. %B AI Magazine %V 33 %P 14-28 %G eng %N 3 %0 Conference Paper %B Conference on Artificial Intelligence (AAAI) %D 2012 %T Security Games for Controlling Contagion %A Tsai, Jason %A Thanh H. Nguyen %A Tambe, Milind %X Many strategic actions carry a ‘contagious’ component beyond the immediate locale of the effort itself. Viral marketing and peacekeeping operations have both been observed to have a spreading effect. In this work, we use counterinsurgency as our illustrative domain. Defined as the effort to block the spread of support for an insurgency, such operations lack the manpower to defend the entire population and must focus on the opinions of a subset of local leaders. As past researchers of security resource allocation have done, we propose using game theory to develop such policies and model the interconnected network of leaders as a graph. Unlike this past work in security games, actions in these domains possess a probabilistic, non-local impact. To address this new class of security games, we combine recent research in influence blocking maximization with a double oracle approach and create novel heuristic oracles to generate mixed strategies for a real-world leadership network from Afghanistan, synthetic leadership networks, and a real social network. We find that leadership networks that exhibit highly interconnected clusters can be solved equally well by our heuristic methods, but our more sophisticated heuristics outperform simpler ones in less interconnected social networks. %B Conference on Artificial Intelligence (AAAI) %G eng %0 Conference Paper %B AAAI'11 Workshop on Applied Adversarial Reasoning and Risk Modeling (AARM) %D 2011 %T Addressing Execution and Observation Error in Security Games %A Jain, Manish %A Yin, Zhengyu %A Tambe, Milind %A Ordonez, Fernando %X Attacker-defender Stackelberg games have become a popular game-theoretic approach for security with deployments for LAX Police, the FAMS and the TSA. Unfortunately, most of the existing solution approaches do not model two key uncertainties of the real-world: there may be noise in the defender’s execution of the suggested mixed strategy and/or the observations made by an attacker can be noisy. In this paper, we analyze a framework to model these uncertainties, and demonstrate that previous strategies perform poorly in such uncertain settings. We also analyze RECON, a novel algorithm that computes strategies for the defender that are robust to such uncertainties, and explore heuristics that further improve RECON’s efficiency. %B AAAI'11 Workshop on Applied Adversarial Reasoning and Risk Modeling (AARM) %G eng %0 Conference Paper %B Workshop on Multiagent Sequential Decision Making in Uncertain Domains(MSDM) at AAMAS 2011 %D 2011 %T Applying Multi-Agent Techniques to Cancer Modeling %A Brown, Matthew %A Bowring, Emma %A Shira Epstein %A Mufaddal Jhaveri %A Maheswaran, Rajiv %A Parag Mallick %A Shannon Mumenthaler %A Povinelli, Michelle %A Tambe, Milind %X Each year, cancer is responsible for 13% of all deaths worldwide. In the United States, that percentage increases to 25%, translating to an estimated 569,490 deaths in 2010 [1]. Despite significant advances in the fight against cancer, these statistics make clear the need for additional research into new treatments. As such, there has been growing interest in the use of computer simulations as a tool to aid cancer researchers. We propose an innovative multi-agent approach in which healthy cells and cancerous cells are modeled as opposing teams of agents using a decentralized Markov decision process (DEC-MDP). We then describe changes made to traditional DEC-MDP algorithms in order to better handle the complexity and scale of our domain. We conclude by presenting and analyzing preliminary simulation results. This paper is intended to introduce the cancer modeling domain to the multi-agent community with the hope of fostering a discussion about the opportunities and challenges it presents. Given the complexity of the domain, we do not claim our approach to be a definitive solution but rather a first step toward the larger goal of creating realistic simulations of cancer. %B Workshop on Multiagent Sequential Decision Making in Uncertain Domains(MSDM) at AAMAS 2011 %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems %D 2011 %T Approximation Methods for Infinite Bayesian Stackelberg Games: Modeling Distributional Payoff Uncertainty %A Kiekintveld, Christopher %A Janusz Marecki %A Tambe, Milind %X Game theory is fast becoming a vital tool for reasoning about complex real-world security problems, including critical infrastructure protection. The game models for these applications are constructed using expert analysis and historical data to estimate the values of key parameters, including the preferences and capabilities of terrorists. In many cases, it would be natural to represent uncertainty over these parameters using continuous distributions (such as uniform intervals or Gaussians). However, existing solution algorithms are limited to considering a small, finite number of possible attacker types with different payoffs. We introduce a general model of infinite Bayesian Stackelberg security games that allows payoffs to be represented using continuous payoff distributions. We then develop several techniques for finding approximate solutions for this class of games, and show empirically that our methods offer dramatic improvements over the current state of the art, providing new ways to improve the robustness of security game models. %B International Conference on Autonomous Agents and Multiagent Systems %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2011 %T Continuous Time Planning for Multiagent Teams with Temporal Constraints %A Yin, Zhengyu %A Tambe, Milind %X Continuous state DEC-MDPs are critical for agent teams in domains involving resources such as time, but scaling them up is a significant challenge. To meet this challenge, we first introduce a novel continuous-time DEC-MDP model that exploits transition independence in domains with temporal constraints. More importantly, we present a new locally optimal algorithm called SPAC. Compared to the best previous algorithm, SPAC finds solutions of comparable quality substantially faster; SPAC also scales to larger teams of agents. %B International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Conference Paper %B Advances in Complex Systems %D 2011 %T Distributed On-line Multi-Agent Optimization Under Uncertainty: Balancing Exploration and Exploitation %A Matthew E. Taylor %A Jain, Manish %A Tandon, Prateek %A Tambe, Milind %A Makoto Yokoo %X A significant body of work exists on effectively allowing multiple agents to coordinate to achieve a shared goal. In particular, a growing body of work in the Distributed Constraint Optimization (DCOP) framework enables such coordination with different amounts of teamwork. Such algorithms can implicitly or explicitly trade-off improved solution quality with increased communication and computation requirements. However, the DCOP framework is limited to planning problems; DCOP agents must have complete and accurate knowledge about the reward function at plan time. We extend the DCOP framework, defining the Distributed Coordination of Exploration and Exploitation (DCEE) problem class to address real-world problems, such as ad-hoc wireless network optimization, via multiple novel algorithms. DCEE algorithms differ from DCOP algorithms in that they (1) are limited to a finite number of actions in a single trial, (2) attempt to maximize the on-line, rather than final, reward, (3) are unable to exhaustively explore all possible actions, and (4) may have knowledge about the distribution of rewards in the environment, but not the rewards themselves. Thus, a DCEE problem is not a type of planning problem, as DCEE algorithms must carefully balance and coordinate multiple agents’ exploration and exploitation. Two classes of algorithms are introduced: static estimation algorithms perform simple calculations that allow agents to either stay or explore, and balanced exploration algorithms use knowledge about the distribution of the rewards and the time remaining in an experiment to decide whether to stay, explore, or (in some algorithms) backtrack to a previous location. These two classes of DCEE algorithms are compared in simulation and on physical robots in a complex mobile ad-hoc wireless network setting. Contrary to our expectations, we found that increasing teamwork in DCEE algorithms may lower team performance. In contrast, agents running DCOP algorithms improve their reward as teamwork increases. We term this previously unknown phenomenon the team uncertainty penalty, analyze it in both simulation and on robots, and present techniques to ameliorate the penalty. %B Advances in Complex Systems %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems %D 2011 %T A Double Oracle Algorithm for Zero-Sum Security Games on Graphs %A Jain, Manish %A Dmytro Korzhyk %A Ondrej Vanek %A Vincent Conitzer %A Michal Pechoucek %A Tambe, Milind %X In response to the Mumbai attacks of 2008, the Mumbai police have started to schedule a limited number of inspection checkpoints on the road network throughout the city. Algorithms for similar security-related scheduling problems have been proposed in recent literature, but security scheduling in networked domains when targets have varying importance remains an open problem at large. In this paper, we cast the network security problem as an attackerdefender zero-sum game. The strategy spaces for both players are exponentially large, so this requires the development of novel, scalable techniques. We first show that existing algorithms for approximate solutions can be arbitrarily bad in general settings. We present RUGGED (Randomization in Urban Graphs by Generating strategies for Enemy and Defender), the first scalable optimal solution technique for such network security games. Our technique is based on a double oracle approach and thus does not require the enumeration of the entire strategy space for either of the players. It scales up to realistic problem sizes, as is shown by our evaluation of maps of southern Mumbai obtained from GIS data. %B International Conference on Autonomous Agents and Multiagent Systems %G eng %0 Conference Paper %B International Conference on Intelligent Virtual Agents (IVA) %D 2011 %T Empirical Evaluation of Computational Emotional Contagion Models %A Tsai, Jason %A Bowring, Emma %A Marsella, Stacy %A Tambe, Milind %X In social psychology, emotional contagion describes the widely observed phenomenon of one person’s emotions being influenced by surrounding people’s emotions. While the overall effect is agreed upon, the underlying mechanism of the spread of emotions has seen little quantification and application to computational agents despite extensive evidence of its impacts in everyday life. In this paper, we examine computational models of emotional contagion by implementing two models ([2] and [8]) that draw from two separate lines of contagion research: thermodynamics-based and epidemiological-based. We first perform sensitivity tests on each model in an evacuation simulation, ESCAPES, showing both models to be reasonably robust to parameter variations with certain exceptions. We then compare their ability to reproduce a real crowd panic scene in simulation, showing that the thermodynamics-style model ([2]) produces superior results due to the ill-suited contagion mechanism at the core of epidemiological models. We also identify that a graduated effect of fear and proximity-based contagion effects are key to producing the superior results. We then reproduce the methodology on a second video, showing that the same results hold, implying generality of the conclusions reached in the first scene. %B International Conference on Intelligent Virtual Agents (IVA) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems %D 2011 %T ESCAPES - Evacuation Simulation with Children, Authorities, Parents, Emotions, and Social comparison %A Tsai, Jason %A Natalie Fridman %A Bowring, Emma %A Brown, Matthew %A Shira Epstein %A Gal Kaminka %A Marsella, Stacy %A Andrew Ogden %A Inbal Rika %A Ankur Sheel %A Matthew Taylor %A Xuezhi Wang %A Avishay Zilka %A Tambe, Milind %X In creating an evacuation simulation for training and planning, realistic agents that reproduce known phenomenon are required. Evacuation simulation in the airport domain requires additional features beyond most simulations, including the unique behaviors of firsttime visitors who have incomplete knowledge of the area and families that do not necessarily adhere to often-assumed pedestrian behaviors. Evacuation simulations not customized for the airport domain do not incorporate the factors important to it, leading to inaccuracies when applied to it. In this paper, we describe ESCAPES, a multiagent evacuation simulation tool that incorporates four key features: (i) different agent types; (ii) emotional interactions; (iii) informational interactions; (iv) behavioral interactions. Our simulator reproduces phenomena observed in existing studies on evacuation scenarios and the features we incorporate substantially impact escape time. We use ESCAPES to model the International Terminal at Los Angeles International Airport (LAX) and receive high praise from security officials. %B International Conference on Autonomous Agents and Multiagent Systems %G eng %0 Conference Paper %B Algorithmic Decision Theory (ADT) %D 2011 %T Game Theory and Human Behavior: Challenges in Security and Sustainability %A Yang, Rong %A Tambe, Milind %A Jain, Manish %A Kwak, Jun-young %A Pita, James %A Yin, Zhengyu %X Security and sustainability are two critical global challenges that involve the interaction of many intelligent actors. Game theory provides a sound mathematical framework to model such interactions, and computational game theory in particular has a promising role to play in helping to address key aspects of these challenges. Indeed, in the domain of security, we have already taken some encouraging steps by successfully applying game-theoretic algorithms to real-world security problems: our algorithms are in use by agencies such as the US coast guard, the Federal Air Marshals Service, the LAX police and the Transportation Security Administration. While these applications of game-theoretic algorithms have advanced the state of the art, this paper lays out some key challenges as we continue to expand the use of these algorithms in real-world domains. One such challenge in particular is that classical game theory makes a set of assumptions of the players, which may not be consistent with real-world scenarios, especially when humans are involved. To actually model human behavior within game-theoretic framework, it is important to address the new challenges that arise due to the presence of human players: (i) human bounded rationality; (ii) limited observations and imperfect strategy execution; (iii) large action spaces. We present initial solutions to these challenges in context of security games. For sustainability, we lay out our initial efforts and plans, and key challenges related to human behavior in the loop. %B Algorithmic Decision Theory (ADT) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems %D 2011 %T GUARDS - Game Theoretic Security Allocation on a National Scale %A Pita, James %A Tambe, Milind %A Chris Kiekintveld %A Shane Cullen %A Erin Steigerwald %X Building on research previously reported at AAMAS conferences, this paper describes an innovative application of a novel gametheoretic approach for a national scale security deployment. Working with the United States Transportation Security Administration (TSA), we have developed a new application called GUARDS to assist in resource allocation tasks for airport protection at over 400 United States airports. In contrast with previous efforts such as ARMOR and IRIS, which focused on one-off tailored applications and one security activity (e.g. canine patrol or checkpoints) per application, GUARDS faces three key issues: (i) reasoning about hundreds of heterogeneous security activities; (ii) reasoning over diverse potential threats; (iii) developing a system designed for hundreds of end-users. Since a national deployment precludes tailoring to specific airports, our key ideas are: (i) creating a new game-theoretic framework that allows for heterogeneous defender activities and compact modeling of a large number of threats; (ii) developing an efficient solution technique based on general purpose Stackelberg game solvers; (iii) taking a partially centralized approach for knowledge acquisition and development of the system. In doing so we develop a software scheduling assistant, GUARDS, designed to reason over two agents — the TSA and a potential adversary — and allocate the TSA’s limited resources across hundreds of security activities in order to provide protection within airports. The scheduling assistant has been delivered to the TSA and is currently under evaluation and testing for scheduling practices at an undisclosed airport. If successful, the TSA intends to incorporate the system into their unpredictable scheduling practices nationwide. In this paper we discuss the design choices and challenges encountered during the implementation of GUARDS. GUARDS represents promising potential for transitioning years of academic research into a nationally deployed system. %B International Conference on Autonomous Agents and Multiagent Systems %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2011 %T GUARDS - Innovative Application of Game Theory for National Airport Security %A Pita, James %A Tambe, Milind %A Kiekintveld, Christopher %A Shane Cullen %A Erin Steigerwald %X We describe an innovative application of a novel game-theoretic approach for a national scale security deployment. Working with the United States Transportation Security Administration (TSA), we have developed a new application called GUARDS to allocate the TSA’s limited resources across hundreds of security activities to provide protection at over 400 United States airports. Similar security applications (e.g., ARMOR and IRIS) have focused on one-off tailored applications and one security activity (e.g. checkpoints) per application, GUARDS on the other hand faces three new key issues: (i) reasoning about hundreds of heterogeneous security activities; (ii) reasoning over diverse potential threats; (iii) developing a system designed for hundreds of end-users. Since a national deployment precludes tailoring to specific airports, our key ideas are: (i) creating a new game-theoretic framework that allows for heterogeneous defender activities and compact modeling of a large number of threats; (ii) developing an efficient solution technique based on general purpose Stackelberg game solvers; (iii) taking a partially centralized approach for knowledge acquisition. The scheduling assistant has been delivered to the TSA and is currently undergoing evaluation for scheduling practices at an undisclosed airport. If successful, the TSA intends to incorporate the system into their unpredictable scheduling practices nationwide. %B International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) %D 2011 %T Improved Computational Models of Human Behavior in Security Games %A Yang, Rong %A Kiekintveld, Christopher %A Ordonez, Fernando %A Tambe, Milind %A Richard John %X Security games refer to a special class of attacker-defender Stackelberg games. In these non zero-sum games, the attacker’s utility of attacking a target decreases as the defender allocates more resources to protect it (and vice versa for the defender). The defender (leader) first commits to a mixed strategy, assuming the attacker (follower) decides on a pure strategy after observing the defender’s strategy. This models the situation where an attacker conducts surveillance to learn the defender’s mixed strategy and then launches an attack on a single target. Given that the defender has limited resources, she must design her mixed-strategy optimally against the adversaries’ response to maximize effectiveness. One leading family of algorithms to compute such mixed strategies are DOBSS and its successors [3, 5], which are used in the deployed ARMOR [5] and IRIS [8] applications. One key set of assumptions these systems make is about how attackers choose strategies based on their knowledge of the security strategy. Typically, such systems apply the standard game-theoretic assumption that attackers are perfectly rational. This is a reasonable proxy for the worst case of a highly intelligent attacker, but it can lead to a defense strategy that is not robust against attackers using different decision procedures, and it fails to exploit known weaknesses in human decision-making. Indeed, it is widely accepted that standard game-theoretic assumptions of perfect rationality are not ideal for predicting the behavior of humans in multi-agent decision problems [1]. Thus, integrating more realistic models of human decision-making has become necessary in solving real-world security problems. The current leading contender that accounts for human behavior in security games is COBRA [6], which assumes that adversaries can deviate to −optimal strategies and that they have an anchoring bias when interpreting a probability distribution. It remains an open question whether other models yield better solutions than COBRA against human adversaries. The literature has introduced a multitude of candidate models, but there is an important empirical question of which model best represents the salient features of human behavior in applied security contexts. We address these open questions by developing three new algorithms to generate defender strategies in security games, based on using two fundamental theories of human behavior to predict an attacker’s decision: Prospect Theory (PT) [2] and Quantal Response Equilibrium (QRE) [4]. PT is a noble-prize-winning theory, which describes human decision making as a process of maximizing ‘prospect’. ‘Prospect’ is defined as the weighted sum of the benefit of all possible outcomes for each action. QRE suggests that instead of strictly maximizing utility, individuals respond stochastically in games: the chance of selecting a non-optimal strategy increases as the cost of such an error decreases. %B International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2011 %T Improving Resource Allocation Strategy Against Human Adversaries in Security Games %A Yang, Rong %A Kiekintveld, Christopher %A Ordonez, Fernando %A Tambe, Milind %A Richard John %X Recent real-world deployments of Stackelberg security games make it critical that we address human adversaries’ bounded rationality in computing optimal strategies. To that end, this paper provides three key contributions: (i) new efficient algorithms for computing optimal strategic solutions using Prospect Theory and Quantal Response Equilibrium; (ii) the most comprehensive experiment to date studying the effectiveness of different models against human subjects for security games; and (iii) new techniques for generating representative payoff structures for behavioral experiments in generic classes of games. Our results with human subjects show that our new techniques outperform the leading contender for modeling human behavior in security games. %B International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Conference Paper %B Ad-Hoc Networks %D 2011 %T Mitigating Multi-path Fading in a Mobile Mesh Network %A Marcos A. M. Vieira %A Matthew E. Taylor %A Prateek Tandon Manish Jain %A Govindan, Ramesh %A Gaurav S. Sukhatme %A Tambe, Milind %X By using robots as routers, a team of networked robots can provide a communication substrate to establish a wireless mesh network. The mobile mesh network can autonomously optimize its configuration, increasing performance. One of the main sources of radio signal fading in such a network is multi-path propagation, which can be mitigated by moving the senders or the receivers on the distance of the order of a wavelength. In this paper, we measure the performance gain when robots are allowed to make such small movements and find that it may be as much as 270%. Our main contribution is the design of a system that allows robots to cooperate and improve the real-world network throughput via a practical solution. We model the problem of which robots to move as a distributed constraint optimization problem (DCOP). Our study includes four local metrics to estimate global throughput. %B Ad-Hoc Networks %G eng %0 Conference Paper %B AAAI Spring Symposium on Help me help you:Bridging the Gaps in Human-Agent Collaboration %D 2011 %T Mixed-Initiative Optimization in Security Games: A Preliminary Report %A An, Bo %A Jain, Manish %A Tambe, Milind %A Kiekintveld, Christopher %X Stackelberg games have been widely used to model patrolling or monitoring problems in security. In a Stackelberg security game, the defender commits to a strategy and the adversary makes its decision with knowledge of the leader’s commitment. Algorithms for computing the defender’s optimal strategy are used in deployed decision-support tools in use by the Los Angeles International Airport (LAX), the Federal Air Marshals Service, and the Transportation Security Administration (TSA). Those algorithms take into account various resource usage constraints defined by human users. However, those constraints may lead to poor (even infeasible) solutions due to users’ insufficient information and bounded rationality. A mixed-initiative approach, in which human users and software assistants (agents) collaborate to make security decisions, is needed. Efficient human-agent interaction process leads to models with higher overall solution quality. This paper preliminarily analyzes the needs and challenges for such a mixed-initiative approach. %B AAAI Spring Symposium on Help me help you:Bridging the Gaps in Human-Agent Collaboration %G eng %0 Conference Paper %B MABS Multiagent-based Simulation Workshop at AAMAS 2011 %D 2011 %T Modeling Emotional Contagion %A Tsai, Jason %A Bowring, Emma %A Marsella, Stacy %A Tambe, Milind %X In social psychology, emotional contagion describes the widely observed phenomenon of one person’s emotions being influenced by surrounding people’s emotions. While the overall effect is agreed upon, the underlying mechanism of the spread of emotions has seen little quantification and application to computational agents. In this paper, we explore computational models of emotional contagion by implementing two models (Bosse et al., Durupinar et al.) and augmenting them to better model real world observations. Our additions include examining the impact of physical proximity and authority figures. We show that these additions provide substantial improvements to the qualitative trends of emotion spreading, more in line with expectations than either of the two previous models. We also evaluate their impact on evacuation safety in an evacuation simulation, ESCAPES, showing substantial differences in predicted safety based on the contagion model. %B MABS Multiagent-based Simulation Workshop at AAMAS 2011 %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems %D 2011 %T Quality guarantees for region optimal DCOP algorithms %A Meritxell Vinyals %A Shieh, Eric %A Cerquides, Jesus %A Juan Antonio Rodriguez-Aguilar %A Yin, Zhengyu %A Tambe, Milind %A Bowring, Emma %X k- and t-optimality algorithms [9, 6] provide solutions to DCOPs that are optimal in regions characterized by its size and distance respectively. Moreover, they provide quality guarantees on their solutions. Here we generalise the k- and t-optimal framework to introduce C-optimality, a flexible framework that provides reward-independent quality guarantees for optima in regions characterised by any arbitrary criterion. Therefore, C-optimality allows us to explore the space of criteria (beyond size and distance) looking for those that lead to better solution qualities. We benefit from this larger space of criteria to propose a new criterion, the socalled size-bounded-distance criterion, which outperforms kand t-optimality. %B International Conference on Autonomous Agents and Multiagent Systems %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems %D 2011 %T Quality-bounded Solutions for Finite Bayesian Stackelberg Games: Scaling up %A Jain, Manish %A Tambe, Milind %A Kiekintveld, Christopher %X The fastest known algorithm for solving General Bayesian Stackelberg games with a finite set of follower (adversary) types have seen direct practical use at the LAX airport for over 3 years; and currently, an (albeit non-Bayesian) algorithm for solving these games is also being used for scheduling air marshals on limited sectors of international flights by the US Federal Air Marshals Service. These algorithms find optimal randomized security schedules to allocate limited security resources to protect targets. As we scale up to larger domains, including the full set of flights covered by the Federal Air Marshals, it is critical to develop newer algorithms that scale-up significantly beyond the limits of the current state-of-theart of Bayesian Stackelberg solvers. In this paper, we present a novel technique based on a hierarchical decomposition and branch and bound search over the follower type space, which may be applied to different Stackelberg game solvers. We have applied this technique to different solvers, resulting in: (i) A new exact algorithm called HBGS that is orders of magnitude faster than the best known previous Bayesian solver for general Stackelberg games; (ii) A new exact algorithm called HBSA which extends the fastest known previous security game solver towards the Bayesian case; and (iii) Approximation versions of HBGS and HBSA that show significant improvements over these newer algorithms with only 1- 2% sacrifice in the practical solution quality. %B International Conference on Autonomous Agents and Multiagent Systems %G eng %0 Conference Paper %B Conference on Artificial Intelligence (AAAI) %D 2011 %T Refinement of Strong Stackelberg Equilibria in Security Games %A An, Bo %A Tambe, Milind %A Ordonez, Fernando %A Shieh, Eric %A Kiekintveld, Christopher %X Given the real-world deployments of attacker-defender Stackelberg security games, robustness to deviations from expected attacker behaviors has now emerged as a critically important issue. This paper provides four key contributions in this context. First, it identifies a fundamentally problematic aspect of current algorithms for security games. It shows that there are many situations where these algorithms face multiple equilibria, and they arbitrarily select one that may hand the defender a significant disadvantage, particularly if the attacker deviates from its equilibrium strategies due to unknown constraints. Second, for important subclasses of security games, it identifies situations where we will face such multiple equilibria. Third, to address these problematic situations, it presents two equilibrium refinement algorithms that can optimize the defender’s utility if the attacker deviates from equilibrium strategies. Finally, it experimentally illustrates that the refinement approach achieved significant robustness in consideration of attackers’ deviation due to unknown constraints. %B Conference on Artificial Intelligence (AAAI) %G eng %0 Conference Paper %B Workshop on Optimisation in Multiagent Systems (OPTMAS) at AAMAS 2011 %D 2011 %T Reward-based region optimal quality guarantees %A Meritxell Vinyals %A Shieh, Eric %A Cerquides, Jesus %A Juan Antonio Rodriguez-Aguilar %A Yin, Zhengyu %A Tambe, Milind %A Bowring, Emma %X Distributed constraint optimization (DCOP) is a promising approach to coordination, scheduling and task allocation in multi agent networks. DCOP is NP- hard [6], so an important line of work focuses on developing fast incomplete solution algorithms that can provide guarantees on the quality of their local optimal solutions. Region optimality [11] is a promising approach along this line: it provides quality guarantees for region optimal solutions, namely solutions that are optimal in a specific region of the DCOP. Region optimality generalises k- and t-optimality [7, 4] by allowing to explore the space of criteria that define regions to look for solutions with better quality guarantees. Unfortunately, previous work in region-optimal quality guarantees fail to exploit any a-priori knowledge of the reward structure of the problem. This paper addresses this shortcoming by defining reward-dependent region optimal quality guarantees that exploit two different levels of knowledge about rewards, namely: (i) a ratio between the least minimum reward to the maximum reward among relations; and (ii) the minimum and maximum rewards per relation. %B Workshop on Optimisation in Multiagent Systems (OPTMAS) at AAMAS 2011 %G eng %0 Conference Paper %B Conference on Artificial Intelligence (AAAI) %D 2011 %T Risk-Averse Strategies for Security Games with Execution and Observational Uncertainty %A Yin, Zhengyu %A Jain, Manish %A Tambe, Milind %A Ordonez, Fernando %X Attacker-defender Stackelberg games have become a popular game-theoretic approach for security with deployments for LAX Police, the FAMS and the TSA. Unfortunately, most of the existing solution approaches do not model two key uncertainties of the real-world: there may be noise in the defender’s execution of the suggested mixed strategy and/or the observations made by an attacker can be noisy. In this paper, we provide a framework to model these uncertainties, and demonstrate that previous strategies perform poorly in such uncertain settings. We also provide RECON, a novel algorithm that computes strategies for the defender that are robust to such uncertainties, and provide heuristics that further improve RECON’s efficiency %B Conference on Artificial Intelligence (AAAI) %G eng %0 Conference Paper %B Workshop on Multiagent Sequential Decision Making in Uncertain Domains(MSDM) at AAMAS 2011 %D 2011 %T Robust Execution-time Coordination in DEC-POMDPs Under Model Uncertainty %A Kwak, Jun-young %A Yang, Rong %A Yin, Zhengyu %A Matthew E. Taylor %A Tambe, Milind %X Despite their worst-case NEXP-complete planning complexity, DEC-POMDPs remain a popular framework for multiagent teamwork. This paper introduces effective teamwork under model uncertainty (i.e., potentially inaccurate transition and observation functions) as a novel challenge for DEC-POMDPs and presents MODERN, the first execution-centric framework for DEC-POMDPs explicitly motivated by addressing such model uncertainty. MODERN’s shift of coordination reasoning from planning-time to execution-time avoids the high cost of computing optimal plans whose promised quality may not be realized in practice. There are three key ideas in MODERN: (i) it maintains an exponentially smaller model of other agents’ beliefs and actions than in previous work and then further reduces the computationtime and space expense of this model via bounded pruning; (ii) it reduces execution-time computation by exploiting BDI theories of teamwork, and limits communication to key trigger points; and (iii) it limits its decision-theoretic reasoning about communication to trigger points and uses a systematic markup to encourage extra communication at these points – thus reducing uncertainty among team members at trigger points. We empirically show that MODERN is substantially faster than existing DEC-POMDP executioncentric methods while achieving significantly higher reward. %B Workshop on Multiagent Sequential Decision Making in Uncertain Domains(MSDM) at AAMAS 2011 %G eng %0 Journal Article %J Cambridge University Press %D 2011 %T Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned %A Tambe, Milind %X Global threats of terrorism, drug-smuggling and other crimes have led to a significant increase in research on game theory for security. Game theory provides a sound mathematical approach to deploy limited security resources to maximize their effectiveness. A typical approach is to randomize security schedules to avoid predictability, with the randomization using artificial intelligence techniques to take into account the importance of different targets and potential adversary reactions. This book distills the forefront of this research to provide the first and only study of long-term deployed applications of game theory for security for key organizations such as the Los Angeles International Airport police and the US Federal Air Marshals Service. The author and his research group draw from their extensive experience working with security officials to intelligently allocate limited security resources to protect targets, outlining the applications of these algorithms in research and the real world. %B Cambridge University Press %G eng %0 Conference Paper %B AAMAS Workshop on Multiagent Sequential Decision Making in Uncertain Domains (MSDM) %D 2011 %T Solving Continuous-Time Transition-Independent DEC-MDP with Temporal Constraints %A Yin, Zhengyu %A Kanna Rajan %A Tambe, Milind %X Despite the impact of DEC-MDPs over the past decade, scaling to large problem domains has been difficult to achieve. The scale-up problem is exacerbated in DEC-MDPs with continuous states, which are critical in domains involving time; the latest algorithm (M-DPFP) does not scale-up beyond two agents and a handful of unordered tasks per agent. This paper is focused on meeting this challenge in continuous resource DEC-MDPs with two predominant contributions. First, it introduces a novel continuous time model for multi-agent planning problems that exploits transition independence in domains with graphical agent dependencies and temporal constraints. More importantly, it presents a new, iterative, locally optimal algorithm called SPAC that is a combination of the following key ideas: (1) defining a novel augmented CT-MDP such that solving this singleagent continuous time MDP provably provides an automatic best response to neighboring agents’ policies; (2) fast convolution to efficiently generate such augmented MDPs; (3) new enhanced lazy approximation algorithm to solve these augmented MDPs; (4) intelligent seeding of initial policies in the iterative process; (5) exploiting graph structure of reward dependencies to exploit local interactions for scalability. Our experiments show SPAC not only finds solutions substantially faster than M-DPFP with comparable quality, but also scales well to large teams of agents. %B AAMAS Workshop on Multiagent Sequential Decision Making in Uncertain Domains (MSDM) %G eng %0 Journal Article %J Journal of AI Research (JAIR) %D 2011 %T Stackelberg vs. Nash in Security Games: An Extended Investigation of Interchangeability, Equivalence, and Uniqueness %A Dmytro Korzhyk %A Yin, Zhengyu %A Kiekintveld, Christopher %A Vincent Conitzer %A Tambe, Milind %X There has been significant recent interest in game theoretic approaches to security, with much of the recent research focused on utilizing the leader-follower Stackelberg game model; for example, these games are at the heart of major applications such as the ARMOR program deployed for security at the LAX airport since 2007 and the IRIS program in use by the US Federal Air Marshals (FAMS). The foundational assumption for using Stackelberg games is that security forces (leaders), acting first, commit to a randomized strategy; while their adversaries (followers) choose their best response after surveillance of this randomized strategy. Yet, in many situations, the followers may act without observation of the leader’s strategy, essentially converting the game into a simultaneous-move game model. Previous work fails to address how a leader should compute her strategy given this fundamental uncertainty about the type of game faced. Focusing on the complex games that are directly inspired by real-world security applications, the paper provides four contributions in the context of a general class of security games. First, exploiting the structure of these security games, the paper shows that the Nash equilibria in security games are interchangeable, thus alleviating the equilibrium selection problem. Second, resolving the leader’s dilemma, it shows that under a natural restriction on security games, any Stackelberg strategy is also a Nash equilibrium strategy; and furthermore, the solution is unique in a class of security games of which ARMOR is a key exemplar. Third, when faced with a follower that can attack multiple targets, many of these properties no longer hold. Fourth, we show experimentally that in most (but not all) games where the restriction does not hold, the Stackelberg strategy is still a Nash equilibrium strategy, but this is no longer true when the attacker can attack multiple targets. These contributions have major implications for the real-world applications. As a possible direction for future research on cases where the Stackelberg strategy is not a Nash equilibrium strategy, we propose an extensive-form game model that makes the defender’s uncertainty about the attacker’s ability to observe explicit. %B Journal of AI Research (JAIR) %V 41 %P 297-327 %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) %D 2011 %T Teamwork in Distributed POMDPs: Execution-time Coordination Under Model Uncertainty %A Kwak, Jun-young %A Yang, Rong %A Yin, Zhengyu %A Matthew E. Taylor %A Tambe, Milind %X Despite their NEXP-complete policy generation complexity [1], Distributed Partially Observable Markov Decision Problems (DEC-POMDPs) have become a popular paradigm for multiagent teamwork [2, 6, 8]. DEC-POMDPs are able to quantitatively express observational and action uncertainty, and yet optimally plan communications and domain actions. This paper focuses on teamwork under model uncertainty (i.e., potentially inaccurate transition and observation functions) in DEC-POMDPs. In many domains, we only have an approximate model of agent observation or transition functions. To address this challenge we rely on execution-centric frameworks [7, 11, 12], which simplify planning in DEC-POMDPs (e.g., by assuming costfree communication at plan-time), and shift coordination reasoning to execution time. Specifically, during planning, these frameworks have a standard single-agent POMDP planner [4] to plan a policy for the team of agents by assuming zero-cost communication. Then, at execution-time, agents model other agents’ beliefs and actions, reason about when to communicate with teammates, reason about what action to take if not communicating, etc. Unfortunately, past work in execution-centric approaches [7, 11, 12] also assumes a correct world model, and the presence of model uncertainty exposes key weaknesses that result in erroneous plans and additional inefficiency due to reasoning over incorrect world models at every decision epoch. This paper provides two sets of contributions. The first is a new execution-centric framework for DEC-POMDPs called MODERN (MOdel uncertainty in Dec-pomdp Execution-time ReasoNing). MODERN is the first execution-centric framework for DECPOMDPs explicitly motivated by model uncertainty. It is based on three key ideas: (i) it maintains an exponentially smaller model of other agents’ beliefs and actions than in previous work and then further reduces the computation-time and space expense of this model via bounded pruning; (ii) it reduces execution-time computation by exploiting BDI theories of teamwork, thus limiting communication to key trigger points; and (iii) it simplifies its decision-theoretic reasoning about communication over the pruned model and uses a systematic markup, encouraging extra communication and reducing uncertainty among team members at trigger points. This paper’s second set of contributions are in opening up model uncertainty as a new research direction for DEC-POMDPs and emphasizing the similarity of this problem to the Belief-DesireIntention (BDI) model for teamwork [5, 9]. In particular, BDI teamwork models also assume inaccurate mapping between realworld problems and domain models. As a result, they emphasize robustness via execution-time reasoning about coordination [9]. Given some of the successes of prior BDI research in teamwork, we leverage insights from BDI in designing MODERN. %B International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract) %G eng %0 Conference Paper %B AAAI'11 Workshop on Applied Adversarial Reasoning and Risk Modeling (AARM) %D 2011 %T Toward Addressing Human Behavior with Observational Uncertainty in Security Games %A Pita, James %A Yang, Rong %A Tambe, Milind %A Richard John %X Stackelberg games have recently gained significant attention for resource allocation decisions in security settings. One critical assumption of traditional Stackelberg models is that all players are perfectly rational and that the followers perfectly observe the leader’s strategy. However, in real-world security settings, security agencies must deal with human adversaries who may not always follow the utility maximizing rational strategy. Accounting for these likely deviations is important since they may adversely affect the leader’s (security agency’s) utility. In fact, a number of behavioral gametheoretic models have begun to emerge for these domains. Two such models in particular are COBRA (Combined Observability and Bounded Rationality Assumption) and BRQR (Best Response to Quantal Response), which have both been shown to outperform game-theoretic optimal models against human adversaries within a security setting based on Los Angeles International Airport (LAX). Under perfect observation conditions, BRQR has been shown to be the leading contender for addressing human adversaries. In this work we explore these models under limited observation conditions. Due to human anchoring biases, BRQR’s performance may suffer under limited observation conditions. An anchoring bias is when, given no information about the occurrence of a discrete set of events, humans will tend to assign an equal weight to the occurrence of each event (a uniform distribution). This study makes three main contributions: (i) we incorporate an anchoring bias into BRQR to improve performance under limited observation; (ii) we explore finding appropriate parameter settings for BRQR under limited observation; (iii) we compare BRQR’s performance versus COBRA under limited observation conditions. %B AAAI'11 Workshop on Applied Adversarial Reasoning and Risk Modeling (AARM) %G eng %0 Conference Paper %B AIAA Infotech at Aerospace %D 2011 %T Towards a Robust MultiAgent Autonomous Reasoning System (MAARS): An Initial Simulation Study for Satellite Defense %A Kwak, Jun-young %A Tambe, Milind %A Paul Scerri %A Amos Freedy %A Onur Sert %X Multi-agent autonomous reasoning systems have emerged as a promising planning technique for addressing satellite defense problems. The main challenge is to extend and scale up the capabilities of current and emerging reasoning and planning methods to handle the characteristics of the satellite defense problem. This paper focuses on some key critical research issues that need to be addressed in order to perform automated planning and execution fitted to the specific nature of response to ASAT attacks, and provides MAARS, a new autonomous reasoning framework for satellite defense. As the core of MAARS, we present MODERN, a new execution-centric method for DEC-POMDPs explicitly motivated by model uncertainty. There are two key innovative features in MODERN: (i) it maintains an exponentially smaller model of other agents’ beliefs and actions than in previous work and then further reduces the computation-time and space expense of this model via bounded pruning; and (ii) it reduces execution-time computation by exploiting BDI theories of teamwork, and limits communication reasoning to key trigger points. We demonstrate a proof of concept of MAARS in the simplified ASAT mitigation scenario. We then show initial evaluation results of MAARS in ASAT domains that are critical in advancing the state-of-the-art in providing autonomous reasoning to delve into unperceived models as well as deal with exponential explosion of the computational complexity of current algorithms. %B AIAA Infotech at Aerospace %G eng %0 Conference Paper %B International Conference on Intelligent Agent Technology (short paper) %D 2011 %T Towards Addressing Model Uncertainty: Robust Execution-time Coordination for Teamwork (Short Paper) %A Kwak, Jun-young %A Yang, Rong %A Yin, Zhengyu %A Matthew E. Taylor %A Tambe, Milind %X Despite their worst-case NEXP-complete planning complexity, DEC-POMDPs remain a popular framework for multiagent teamwork. This paper introduces effective teamwork under model uncertainty (i.e., potentially inaccurate transition and observation functions) as a novel challenge for DEC-POMDPs and presents MODERN, the first executioncentric framework for DEC-POMDPs explicitly motivated by addressing such model uncertainty. MODERN’s shift of coordination reasoning from planning-time to execution-time avoids the high cost of computing optimal plans whose promised quality may not be realized in practice. There are three key ideas in MODERN: (i) it maintains an exponentially smaller model of other agents’ beliefs and actions than in previous work and then further reduces the computationtime and space expense of this model via bounded pruning; (ii) it reduces execution-time computation by exploiting BDI theories of teamwork, and limits communication to key trigger points; and (iii) it limits its decision-theoretic reasoning about communication to trigger points and uses a systematic markup to encourage extra communication at these points — thus reducing uncertainty among team members at trigger points. We empirically show that MODERN is substantially faster than existing DEC-POMDP execution-centric methods while achieving significantly higher reward. %B International Conference on Intelligent Agent Technology (short paper) %G eng %0 Conference Paper %B PAIR 2011: AAAI Workshop on Plan, Activity, and Intent Recognition %D 2011 %T Towards Detection of Suspicious Behavior from Multiple Observations %A Bostjan Kaluza %A Gal Kaminka %A Tambe, Milind %X This paper addresses the problem of detecting suspicious behavior from a collection of individuals events, where no single event is enough to decide whether his/her behavior is suspicious, but the combination of multiple events enables reasoning. We establish a Bayesian framework for evaluating multiple events and show that the current approaches lack modeling behavior history included in the estimation whether a trace of events is generated by a suspicious agent. We propose a heuristic for evaluating events according to the behavior of the agent in the past. The proposed approach, tested on an airport domain, outperforms the current approaches. %B PAIR 2011: AAAI Workshop on Plan, Activity, and Intent Recognition %G eng %0 Conference Paper %B Workshop on Agent Technologies for Energy Systems (ATES) at AAMAS 2011 %D 2011 %T Towards Optimal Planning for Distributed Coordination Under Uncertainty in Energy Domains %A Kwak, Jun-young %A Varakantham, Pradeep %A Tambe, Milind %A Laura Klein %A Farrokh Jazizadeh %A Geoffrey Kavulya %A Burcin B. Gerber %A David J. Gerber %X Recent years have seen a rise of interest in the deployment of multiagent systems in energy domains that inherently have uncertain and dynamic environments with limited resources. In such domains, the key challenge is to minimize the energy consumption while satisfying the comfort level of occupants in the buildings under uncertainty (regarding agent negotiation actions). As human agents begin to interact with complex building systems as a collaborative team, it becomes crucial that the resulting multiagent teams reason about coordination under such uncertainty to optimize multiple metrics, which have not been systematically considered in previous literature. This paper presents a novel multiagent system based on distributed coordination reasoning under uncertainty for sustainability called SAVES. There are three key ideas in SAVES: (i) it explicitly considers uncertainty while reasoning about coordination in a distributed manner relying on MDPs; (ii) human behaviors and their occupancy preferences are incorporated into planning and modeled as part of the system; and (iii) the influence of various control strategies for multiagent teams is evaluated on an existing university building as the practical research testbed with actual energy consumption data. We empirically show the preliminary results that our intelligent control strategies substantially reduce the overall energy consumption in the actual simulation testbed compared to the existing control means while achieving comparable average satisfaction level of occupants. %B Workshop on Agent Technologies for Energy Systems (ATES) at AAMAS 2011 %G eng %0 Conference Paper %B International Symposium on Automation and Robotics in Construction %D 2011 %T Towards Optimization Of Building Energy And Occupant Comfort Using Multi-Agent Simulation %A Laura Klein %A Geoffrey Kavulya %A Farrokh Jazizadeh %A Kwak, Jun-young %A Becerik-Gerber, Burcin %A Varakantham, Pradeep %A Tambe, Milind %X The primary consumers of building energy are heating, cooling, ventilation, and lighting systems, which maintain occupant comfort, and electronics and appliances that enable occupant functionality. The optimization of building energy is therefore a complex problem highly dependent on unique building and environmental conditions as well as on time dependent operational factors. To provide computational support for this optimization, this paper presents and implements a multi-agent comfort and energy simulation (MACES) to model alternative management and control of building systems and occupants. Human and device agents are used to explore current trends in energy consumption and management of a university test bed building. Reactive and predictive control strategies are then imposed on device agents in an attempt to reduce building energy consumption while maintaining occupant comfort. Finally, occupant agents are motivated by simulation feedback to accept more energy conscious scheduling through multi-agent negotiations. Initial results of the MACES demonstrate potential energy savings of 17% while maintaining a high level of occupant comfort. This work is intended to demonstrate a simulation tool, which is implementable in the actual test bed site and compatible with real-world input to instigate and motivate more energy conscious control and occupant behaviors. %B International Symposium on Automation and Robotics in Construction %G eng %0 Conference Paper %B Collaborative Agents REsearch and Development (CARE) 2010 workshop %D 2011 %T Two Decades of Multiagent Teamwork Research: Past, Present, and Future %A Matthew E. Taylor %A Jain, Manish %A Kiekintveld, Christopher %A Kwak, Jun-young %A Yang, Rong %A Yin, Zhengyu %A Tambe, Milind %X This paper discusses some of the recent cooperative multiagent systems work in the TEAMCORE lab at the University of Southern California. Based in part on an invited talk at the CARE 2010 workshop, we highlight how and why execution-time reasoning has been supplementing, or replacing, planning-time reasoning in such systems. %B Collaborative Agents REsearch and Development (CARE) 2010 workshop %G eng %0 Conference Paper %B ACM SIGecom Exchanges %D 2011 %T GUARDS and PROTECT: Next Generation Applications of Security Games %A An, Bo %A Pita, James %A Shieh, Eric %A Tambe, Milind %A Kiekintveld, Christopher %A Janusz Marecki %X The last five years have witnessed the successful application of game theory in reasoning about complex security problems [Basilico et al. 2009; Korzhyk et al. 2010; Dickerson et al. 2010; Jakob et al. 2010; Paruchuri et al. 2008; Pita et al. 2009; Pita et al. 2010; Kiekintveld et al. 2009; Jain et al. 2010]. Stackelberg games have been widely used to model patrolling or monitoring problems in security. In a Stackelberg security game, the defender commits to a strategy and the adversary makes its decision with knowledge of the leader’s commitment. Two systems applying Stackelberg game models to assist with randomized resource allocation decisions are currently in use by the Los Angeles International Airport (LAX) [Pita et al. 2008] and the Federal Air Marshals Service (FAMS) [Tsai et al. 2009]. Two new applications called GUARDS (Game-theoretic Unpredictable and Randomly Deployed Security) [Pita et al. 2011] and PROTECT (Port Resilience Operational / Tactical Enforcement to Combat Terrorism) are under development for the Transportation Security Administration (TSA) and the United States Coast Guard respectively. Both are based on Stackelberg games. In contrast with previous applications at LAX and FAMS, which focused on one-off tailored applications and one security activity (e.g., canine patrol, checkpoints, or covering flights) per application, both GUARDS and PROTECT face new challenging issues due to the potential large scale deployment. This includes reasoning about hundreds of heterogeneous security activities, reasoning over diverse potential threats, and developing a system designed for hundreds of end-users. In this article we will highlight several of the main issues that have arisen. We begin with an overview of the new applications and then discuss these issues in turn. %B ACM SIGecom Exchanges %7 1 %V 10 %G eng %0 Journal Article %J Expert systems with Applications, %D 2011 %T A Probabilistic Risk Analysis for Multimodal Entry Control %A Bostjan Kaluza %A Erik Dovgan %A Tea Tusar %A Tambe, Milind %A Matjaz Gams %X Entry control is an important security measure that prevents undesired persons from entering secure areas. The advanced risk analysis presented in this paper makes it possible to distinguish between acceptable and unacceptable entries, based on several entry sensors, such as fingerprint readers, and intelligent methods that learn behavior from previous entries. We have extended the intelligent layer in two ways: first, by adding a meta-learning layer that combines the output of specific intelligent modules, and second, by constructing a Bayesian network to integrate the predictions of the learning and meta-learning modules. The obtained results represent an important improvement in detecting security attacks. %B Expert systems with Applications, %V 28 %P 6696-6704 %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2010 %T Asynchronous Algorithms for Approximate Distributed Constraint Optimization with Quality Bounds %A Kiekintveld, Christopher %A Yin, Zhengyu %A Kumar, Atul %A Tambe, Milind %X Distributed Constraint Optimization (DCOP) is a popular framework for cooperative multi-agent decision making. DCOP is NPhard, so an important line of work focuses on developing fast incomplete solution algorithms for large-scale applications. One of the few incomplete algorithms to provide bounds on solution quality is k-size optimality, which defines a local optimality criterion based on the size of the group of deviating agents. Unfortunately, the lack of a general-purpose algorithm and the commitment to forming groups based solely on group size has limited the use of k-size optimality. This paper introduces t-distance optimality which departs from k-size optimality by using graph distance as an alternative criteria for selecting groups of deviating agents. This throws open a new research direction into the tradeoffs between different group selection and coordination mechanisms for incomplete DCOP algorithms. We derive theoretical quality bounds for t-distance optimality that improve known bounds for k-size optimality. In addition, we develop a new efficient asynchronous local search algorithm for finding both k-size and t-distance optimal solutions — allowing these concepts to be deployed in real applications. Indeed, empirical results show that this algorithm significantly outperforms the only existing algorithm for finding general k-size optimal solutions, which is also synchronous. Finally, we compare the algorithmic performance of k-size and t-distance optimality using this algorithm. We find that t-distance consistently converges to higher-quality solutions in the long run, but results are mixed on convergence speed; we identify cases where k-size and t-distance converge faster. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Journal Article %J Journal of Multiagent and Grid Systems (MAGS) %D 2010 %T Balancing Local Resources and Global Goals in Multiply-Constrained DCOP %A Bowring, Emma %A Tambe, Milind %A Makoto Yokoo %X Distributed constraint optimization (DCOP) is a useful framework for cooperative multiagent coordination. DCOP focuses on optimizing a single team objective. However, in many domains, agents must satisfy constraints on resources consumed locally while optimizing the team goal. Yet, these resource constraints may need to be kept private. Designing DCOP algorithms for these domains requires managing complex trade-offs in completeness, scalability, privacy and efficiency. This article defines the multiply-constrained DCOP (MC-DCOP) framework and provides complete (globally optimal) and incomplete (locally optimal) algorithms for solving MC-DCOP problems. Complete algorithms find the best allocation of scarce resources while optimizing the team objective, while incomplete algorithms are more scalable. The algorithms use four main techniques: (i) transforming constraints to maintain privacy; (ii) dynamically setting upper bounds on resource consumption; (iii) identifying the extent to which the local graph structure allows agents to compute exact bounds; and (iv) using a virtual assignment to flag problems rendered unsatisfiable by resource constraints. Proofs of correctness are presented for all algorithms. Experimental results illustrate the strengths and weaknesses of both the complete and incomplete algorithms. %B Journal of Multiagent and Grid Systems (MAGS) %V 6 %P 353-393 %G eng %N 4 %0 Journal Article %J Informatica %D 2010 %T A Framework for Evaluating Deployed Security Systems: Is There a Chink in your ARMOR? %A Matthew E. Taylor %A Kiekintveld, Christopher %A Craig Western %A Tambe, Milind %X A growing number of security applications are being developed and deployed to explicitly reduce risk from adversaries’ actions. However, there are many challenges when attempting to evaluate such systems, both in the lab and in the real world. Traditional evaluations used by computer scientists, such as runtime analysis and optimality proofs, may be largely irrelevant. The primary contribution of this paper is to provide a preliminary framework which can guide the evaluation of such systems and to apply the framework to the evaluation of ARMOR (a system deployed at LAX since August 2007). This framework helps to determine what evaluations could, and should, be run in order to measure a system’s overall utility. A secondary contribution of this paper is to help familiarize our community with some of the difficulties inherent in evaluating deployed applications, focusing on those in security domains. %B Informatica %P 129-139 %G eng %N 34 %0 Conference Paper %B AAMAS 2010 workshop on Optimization in Multiagent Systems %D 2010 %T Game-Theoretic Allocation of Security Forces in a City %A Tsai, Jason %A Yin, Zhengyu %A Kwak, Jun-young %A David Kempe %A Kiekintveld, Christopher %A Tambe, Milind %X Law enforcement agencies frequently must allocate limited resources to protect targets embedded in a network, such as important buildings in a city road network. Since intelligent attackers may observe and exploit patterns in the allocation, it is crucial that the allocations be randomized. We cast this problem as an attacker-defender Stackelberg game: the defender’s goal is to obtain an optimal mixed strategy for allocating resources. The defender’s strategy space is exponential in the number of resources, and the attacker’s exponential in the network size. Existing algorithms are therefore useless for all but the smallest networks. We present a solution approach based on two key ideas: (i) a polynomial-sized game model obtained via an approximation of the strategy space, solved efficiently using a linear program; (ii) two efficient techniques that map solutions from the approximate game to the original, with proofs of correctness under certain assumptions. We present in-depth experimental results, including an evaluation on part of the Mumbai road network. %B AAMAS 2010 workshop on Optimization in Multiagent Systems %G eng %0 Conference Paper %B Extended Abstract for International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2010 %T How to Protect a City: Strategic Security Placement in Graph-Based Domains %A Tsai, Jason %A Yin, Zhengyu %A Kwak, Jun-young %A David Kempe %A Kiekintveld, Christopher %A Tambe, Milind %X Protecting targets against potential attacks is an important problem for security forces worldwide. The general setting we study is as follows: An attacker assigns different values to reaching (and damaging or destroying) one of multiple targets. A defender is able to allocate resources (such as patrol cars or canine units) to capture the attacker before he reaches a target. In many of these situations, the domain has structure that is naturally modeled as a graph. For example, city maps can be modeled with intersections as nodes and roads as edges, where nodes are targets for attackers. In order to prevent attacks, security forces can schedule checkpoints on edges (e.g., roads) to detect intruders. For instance, in response to the devastating terrorist attacks in 2008 [1], Mumbai police deploy randomized checkpoints as one countermeasure to prevent future attacks [2]. The strategy for placing these checkpoints must necessarily be decided in advance of attack attempts, should account for targets of differing importance, and should anticipate an intelligent adversary who can observe the strategy prior to attacking. In light of these requirements, game-theoretic approaches have been developed to assist in generating randomized security strategies in several real-world domains, including applications in use by the Los Angeles International Airport [12] and the Federal Air Marshals Service [13]. To account for the attacker’s ability to observe deployment patterns, these methods model the problem as a Stackelberg game and solve for an optimal probability distribution over the possible deployments to ensure unpredictability. Novel solvers for classes of security games have recently been developed [3, 11, 4]. However, these solvers take time at least polynomial in the number of actions of both players. In our setting, every path from an entry point to a target is an attacker action, and every set of r or fewer edges is a defender action. (r is the maximum number of checkpoints.) Since the attacker’s actions grow exponentially with the size of the network, and the defender’s actions grow exponentially with r, existing methods quickly become too slow when applied to large real-world domains. Therefore, our goal is to develop faster methods for these settings and evaluate them theoretically and empirically. %B Extended Abstract for International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Journal Article %J Journal of Web Intelligence and Agent Systems (WIAS) %D 2010 %T Introducing Communication in Dis-POMDPs with Locality of Interaction %A Makoto Tasaki %A Yuichi Yabu %A Yuki Iwanari %A Makoto Yokoo %A Janusz Marecki %A Varakantham, Pradeep %A Milind Tambed %X The Networked Distributed POMDPs (ND-POMDPs) can model multiagent systems in uncertain domains and has begun to scale-up the number of agents. However, prior work in ND-POMDPs has failed to address communication. Without communication, the size of a local policy at each agent within the ND-POMDPs grows exponentially in the time horizon. To overcome this problem, we extend existing algorithms so that agents periodically communicate their observation and action histories with each other. After communication, agents can start from new synchronized belief state. Thus, we can avoid the exponential growth in the size of local policies at agents. Furthermore, we introduce an idea that is similar to the Point-based Value Iteration algorithm to approximate the value function with a fixed number of representative points. Our experimental results show that we can obtain much longer policies than existing algorithms as long as the interval between communications is small. %B Journal of Web Intelligence and Agent Systems (WIAS) %V 8 %P 303-311 %G eng %N 3 %0 Conference Paper %B Book Chapter in 'Multi-Agent Systems for Education and Interactive Entertainment:Design, Use and Experience' %D 2010 %T Introducing Multiagent Systems to Undergraduates Through Games and Chocolate %A Bowring, Emma %A Tambe, Milind %X The field of ―intelligent agents and multiagent systems‖ is maturing; no longer is it a special topic to be introduced to graduate students after years of training in computer science and many introductory courses in artificial intelligence. Instead, the time is ripe to introduce agents and multiagents directly to undergraduate students, whether majoring in computer science or not. This chapter focuses on exactly this challenge, drawing on the co-authors‘ experience of teaching several such undergraduate courses on agents and multiagents, over the last three years at two different universities. The chapter outlines three key issues that must be addressed. The first issue is facilitating students‘ intuitive understanding of fundamental concepts of multiagent systems; we illustrate uses of science fiction materials and classroom games to not only provide students with the necessary intuitive understanding but with the excitement and motivation for studying multiagent systems. The second is in selecting the right material — either science-fiction material or games — for providing students the necessary motivation and intuition; we outline several criteria that have been useful in selecting such material. The third issue is in educating students about the fundamental philosophical, ethical and social issues surrounding agents and multiagent systems: we outline course materials and classroom activities that allow students to obtain this ―big picture‖ futuristic vision of our science. We conclude with feedback received, lessons learned and impact on both the computer science students and non computer-science students. %B Book Chapter in 'Multi-Agent Systems for Education and Interactive Entertainment:Design, Use and Experience' %G eng %0 Conference Paper %B Conference on Decision and Game Theory for Security %D 2010 %T Methods and Algorithms for Infinite Bayesian Stackelberg Security Games (Extended Abstract) %A Kiekintveld, Christopher %A Janusz Marecki %A Tambe, Milind %X Recently there has been significant interest in applications of gametheoretic analysis to analyze security resource allocation decisions. Two examples of deployed systems based on this line of research are the ARMOR system in use at the Los Angeles International Airport [20], and the IRIS system used by the Federal Air Marshals Service [25]. Game analysis always begins by developing a model of the domain, often based on inputs from domain experts or historical data. These models inevitably contain significant uncertainty—especially in security domains where intelligence about adversary capabilities and preferences is very difficult to gather. In this work we focus on developing new models and algorithms that capture this uncertainty using continuous payoff distributions. These models are richer and more powerful than previous approaches that are limited to small finite Bayesian game models. We present the first algorithms for approximating equilibrium solutions in these games, and study these algorithms empirically. Our results show dramatic improvements over existing techniques, even in cases where there is very limited uncertainty about an adversaries’ payoffs. %B Conference on Decision and Game Theory for Security %G eng %0 Conference Paper %B AAMAS 2010 Workshop on Optimisation in Multi-Agent Systems (OptMas) %D 2010 %T Optimal defender allocation for massive security games: A branch and price approach %A Jain, Manish %A Erim Kardes %A Kiekintveld, Christopher %A Ordonez, Fernando %A Tambe, Milind %X Algorithms to solve security games, an important class of Stackelberg games, have seen successful real-world deployment by LAX police and the Federal Air Marshal Service. These algorithms provide randomized schedules to optimally allocate limited security resources for infrastructure protection. Unfortunately, these stateof-the-art algorithms fail to scale-up or to provide a correct solution for massive security games with arbitrary scheduling constraints. This paper provides ASPEN, a branch-and-price algorithm to overcome this limitation based on two key contributions: (i) A column-generation approach that exploits an innovative compact network flow representation, avoiding a combinatorial explosion of schedule allocations; (ii) A branch-and-bound approach with novel upper-bound generation via a fast algorithm for solving under-constrained security games. ASPEN is the first known method for efficiently solving real-world-sized security games with arbitrary schedules. This work contributes to a very new area of work that applies techniques used in large-scale optimization to game-theoretic problems—an exciting new avenue with the potential to greatly expand the reach of game theory. %B AAMAS 2010 Workshop on Optimisation in Multi-Agent Systems (OptMas) %G eng %0 Conference Paper %B AAMAS 2010 Workshop on Optimisation in Multi-Agent Systems (OptMas) %D 2010 %T Randomizing Security Activities with Attacker Circumvention Strategies %A Pita, James %A Kiekintveld, Christopher %A Michael Scott %A Tambe, Milind %X Game theoretic methods for making resource allocation decision in security domains have attracted growing attention from both researchers and security practitioners, including deployed applications at both the LAX airport and the Federal Air Marshals Service. We develop a new class of security games designed to model decisions faced by the Transportation Security Administration and other agencies in protecting airports, ports, and other critical infrastructure. Our model allows for a more diverse set of security activities for the defensive resources than previous work, which has generally focused on interchangeable resources that can only defend against possible attacks in one way. Here, we are concerned in particular with the possibility that adversaries can circumvent specific security activities if they are aware of common security measures. The model we propose takes this capability into account and generates more unpredictable, diverse security policies as a result—without resorting to an external value for entropy or randomness. Solving these games is a significant computational challenge, and existing algorithms are not capable of solving realistic games. We introduce a new method that exploits common structure in these problems to reduce the size of the game representation and enable faster solution algorithm. These algorithms are able to scale to make larger games than existing solvers, as we show in our experimental results. %B AAMAS 2010 Workshop on Optimisation in Multi-Agent Systems (OptMas) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) [Short Paper] %D 2010 %T Robust Bayesian Methods for Stackelberg Security Games (Extended Abstract) %A Kiekintveld, Christopher %A Tambe, Milind %A Janusz Marecki %X Recent work has applied game-theoretic models to real-world security problems at the Los Angeles International Airport (LAX) and Federal Air Marshals Service (FAMS). The analysis of these domains is based on input from domain experts intended to capture the best available intelligence information about potential terrorist activities and possible security countermeasures. Nevertheless, these models are subject to significant uncertainty—especially in security domains where intelligence about adversary capabilities and preferences is very difficult to gather. This uncertainty presents significant challenges for applying game-theoretic analysis in these domains. Our experimental results show that standard solution methods based on perfect information assumptions are very sensitive to payoff uncertainty, resulting in low payoffs for the defender. We describe a model of Bayesian Stackelberg games that allows for general distributional uncertainty over the attacker’s payoffs. We conduct an experimental analysis of two algorithms for approximating equilibria of these games, and show that the resulting solutions give much better results than the standard approach when there is payoff uncertainty. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) [Short Paper] %G eng %0 Journal Article %J Artificial Intelligence Journal %D 2010 %T Robust Solutions to Stackelberg Games: Addressing Bounded Rationality and Limited Observations in Human Cognition %A Pita, James %A Jain, Manish %A Ordonez, Fernando %A Tambe, Milind %A Kraus, Sarit %X How do we build algorithms for agent interactions with human adversaries? Stackelberg games are natural models for many important applications that involve human interaction, such as oligopolistic markets and security domains. In Stackelberg games, one player, the leader, commits to a strategy and the follower makes her decision with knowledge of the leader’s commitment. Existing algorithms for Stackelberg games efficiently find optimal solutions (leader strategy), but they critically assume that the follower plays optimally. Unfortunately, in many applications, agents face human followers (adversaries) who — because of their bounded rationality and limited observation of the leader strategy — may deviate from their expected optimal response. In other words, human adversaries’ decisions are biased due to their bounded rationality and limited observations. Not taking into account these likely deviations when dealing with human adversaries may cause an unacceptable degradation in the leader’s reward, particularly in security applications where these algorithms have seen deployment. The objective of this paper therefore is to investigate how to build algorithms for agent interactions with human adversaries. To address this crucial problem, this paper introduces a new mixed-integer linear program (MILP) for Stackelberg games to consider human adversaries, incorporating: (i) novel anchoring theories on human perception of probability distributions and (ii) robustness approaches for MILPs to address human imprecision. Since this new approach considers human adversaries, traditional proofs of correctness or optimality are insufficient; instead, it is necessary to rely on empirical validation. To that end, this paper considers four settings based on real deployed security systems at Los Angeles International Airport [43], and compares 6 different approaches (three based on our new approach and three previous approaches), in 4 different observability conditions, involving 218 human subjects playing 2960 games in total. The final conclusion is that a model which incorporates both the ideas of robustness and anchoring achieves statistically significant higher rewards and also maintains equivalent or faster solution speeds compared to existing approaches. %B Artificial Intelligence Journal %V 174 %P 1142-1171 %G eng %N 15 %0 Conference Paper %B National Conference on Artificial Intelligence (AAAI) %D 2010 %T Security Games with Arbitrary Schedules: A Branch and Price Approach %A Jain, Manish %A Erim Kardes %A Kiekintveld, Christopher %A Tambe, Milind %A Ordonez, Fernando %X Security games, and important class of Stackelberg games, are used in deployed decision-support tools in use by LAX police and the Federal Air Marshals Service. The algorithms used to solve these games find optimal randomized schedules to allocate security resources for infrastructure protection. Unfortunately, the state of the art algorithms either fail to scale or to provide a correct solution for large problems with arbitrary scheduling constraints. We introduce ASPEN, a branch-and-price approach that overcomes these limitations based on two key contributions: (i) A columngeneration approach that exploits a novel network flow representation, avoiding a combinatorial explosion of schedule allocations; (ii) A branch-and-bound algorithm that generates bounds via a fast algorithm for solving security games with relaxed scheduling constraints. ASPEN is the first known method for efficiently solving massive security games with arbitrary schedules. %B National Conference on Artificial Intelligence (AAAI) %G eng %0 Journal Article %J Interfaces %D 2010 %T Software Assistants for patrol planning at LAX and Federal Air Marshals Service %A Jain, Manish %A Pita, James %A Tsai, Jason %A Kiekintveld, Christopher %A Shyamsunder Rathi %A Ordonez, Fernando %A Tambe, Milind %X Security at major locations of economic or political importance is a key concern around the world, particularly given the increasing threat of terrorism. Limited security resources prevent full security coverage at all times, which allows adversaries to observe and exploit patterns in patrolling or monitoring, e.g. they can plan an attack that avoids existing patrols. An important method of countering the surveillance capabilities of an adversary is to use randomized security policies that are more difficult to predict and exploit. We describe two deployed applications that assist security forces in randomizing their operations based on fast algorithms for solving large instances of Bayesian Stackelberg games. The first is the ARMOR system (Assistant for Randomized Monitoring over Routes), which has been successfully deployed since August 2007 at the Los Angeles International Airport (LAX). This system is used by airport police to randomize the placement of checkpoints on roads entering the airport, and the routes of canine unit patrols in the airport terminals. The IRIS system (Intelligent Randomization in Scheduling) is designed to randomize flight schedules for the Federal Air Marshals Service (FAMS). IRIS has been deployed in a pilot program by FAMS since October 2009 to randomize schedules of air marshals on international flights. These assistants share several key features: (i) they are based on Stackelberg game models to intelligently weight the randomized schedules, (ii) they use efficient mixed-integer programming formulations of the game models to enable fast solutions for large games, and (iii) they allow for interactive manipulation of the domain constraints and parameters by the users. This paper examines the design choices, information, and evaluation that went into building these effective applications. %B Interfaces %V 40 %P 267-290 %G eng %N 4 %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2010 %T Stackelberg vs. Nash in Security Games: Interchangeability, Equivalence, and Uniqueness %A Yin, Zhengyu %A Dmytro Korzhyk %A Kiekintveld, Christopher %A Vincent Conitzer %A Tambe, Milind %X There has been significant recent interest in game theoretic approaches to security, with much of the recent research focused on utilizing the leader-follower Stackelberg game model; for example, these games are at the heart of major applications such as the ARMOR program deployed for security at the LAX airport since 2007 and the IRIS program in use by the US Federal Air Marshals (FAMS). The foundational assumption for using Stackelberg games is that security forces (leaders), acting first, commit to a randomized strategy; while their adversaries (followers) choose their best response after surveillance of this randomized strategy. Yet, in many situations, the followers may act without observation of the leader’s strategy, essentially converting the game into a simultaneous-move game model. Previous work fails to address how a leader should compute her strategy given this fundamental uncertainty about the type of game faced. Focusing on the complex games that are directly inspired by realworld security applications, the paper provides four contributions in the context of a general class of security games. First, exploiting the structure of these security games, the paper shows that the Nash equilibria in security games are interchangeable, thus alleviating the equilibrium selection problem. Second, resolving the leader’s dilemma, it shows that under a natural restriction on security games, any Stackelberg strategy is also a Nash equilibrium strategy; and furthermore, the solution is unique in a class of realworld security games of which ARMOR is a key exemplar. Third, when faced with a follower that can attack multiple targets, many of these properties no longer hold. Fourth, our experimental results emphasize positive properties of games that do not fit our restrictions. Our contributions have major implications for the real-world applications. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B AAAI 2010 Workshop on Interactive Decision Theory and Game Theory (IDTGT) %D 2010 %T Teamwork and Coordination under Model Uncertainty in DEC-POMDPs %A Kwak, Jun-young %A Yang, Rong %A Yin, Zhengyu %A Matthew E. Taylor %A Tambe, Milind %X Distributed Partially Observable Markov Decision Processes (DEC-POMDPs) are a popular planning framework for multiagent teamwork to compute (near-)optimal plans. However, these methods assume a complete and correct world model, which is often violated in real-world domains. We provide a new algorithm for DEC-POMDPs that is more robust to model uncertainty, with a focus on domains with sparse agent interactions. Our STC algorithm relies on the following key ideas: (1) reduce planning-time computation by shifting some of the burden to execution-time reasoning, (2) exploit sparse interactions between agents, and (3) maintain an approximate model of agents’ beliefs. We empirically show that STC is often substantially faster to existing DEC-POMDP methods without sacrificing reward performance. %B AAAI 2010 Workshop on Interactive Decision Theory and Game Theory (IDTGT) %G eng %0 Conference Paper %B DCR 10 %D 2010 %T Towards a Theoretic Understanding of DCEE %A Scott Alfeld %A Matthew E. Taylor %A Tandon, Prateek %A Tambe, Milind %X Common wisdom says that the greater the level of teamwork, the higher the performance of the team. In teams of cooperative autonomous agents, working together rather than independently can increase the team reward. However, recent results show that in uncertain environments, increasing the level of teamwork can actually decrease overall performance. Coined the team uncertainty penalty, this phenomenon has been shown empirically in simulation, but the underlying mathematics are not yet understood. By understanding the mathematics, we could develop algorithms that reduce or eliminate this penalty of increased teamwork. In this paper we investigate the team uncertainty penalty on two fronts. First, we provide results of robots exhibiting the same behavior seen in simulations. Second, we present a mathematical foundation by which to analyze the phenomenon. Using this model, we present findings indicating that the team uncertainty penalty is inherent to the level of teamwork allowed, rather than to specific algorithms. %B DCR 10 %G eng %0 Conference Paper %B National Conference on Artificial Intelligence (AAAI) %D 2010 %T Urban Security: Game-Theoretic Resource Allocation in Networked Physical Domains %A Tsai, Jason %A Yin, Zhengyu %A Kwak, Jun-young %A David Kempe %A Kiekintveld, Christopher %A Tambe, Milind %X Law enforcement agencies frequently must allocate limited resources to protect targets embedded in a network, such as important buildings in a city road network. Since intelligent attackers may observe and exploit patterns in the allocation, it is crucial that the allocations be randomized. We cast this problem as an attacker-defender Stackelberg game: the defender’s goal is to obtain an optimal mixed strategy for allocating resources. The defender’s strategy space is exponential in the number of resources, and the attacker’s exponential in the network size. Existing algorithms are therefore useless for all but the smallest networks. We present a solution approach based on two key ideas: (i) A polynomial-sized game model obtained via an approximation of the strategy space, solved efficiently using a linear program; (ii) Two efficient techniques that map solutions from the approximate game to the original, with proofs of correctness under certain assumptions. We present in-depth experimental results, including an evaluation on part of the Mumbai road network. %B National Conference on Artificial Intelligence (AAAI) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2010 %T When Should There be a "Me" in "Team"? Distributed Multi-Agent Optimization Under Uncertainty %A Matthew E. Taylor %A Jain, Manish %A Yanquin Jin %A Makoto Yooko %A Tambe, Milind %X Increasing teamwork between agents typically increases the performance of a multi-agent system, at the cost of increased communication and higher computational complexity. This work examines joint actions in the context of a multi-agent optimization problem where agents must cooperate to balance exploration and exploitation. Surprisingly, results show that increased teamwork can hurt agent performance, even when communication and computation costs are ignored, which we term the team uncertainty penalty. This paper introduces the above phenomena, analyzes it, and presents algorithms to reduce the effect of the penalty in our problem setting. %B International Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B IJCAI 2009 Workshop on Quantitative Risk Analysis for Security Applications. QRASA-2009 %D 2009 %T Is There a Chink in Your ARMOR? Towards Robust Evaluations for Deployed Security Systems %A Matthew E. Taylor %A Chris Kiekintveld %A Craig Western %A Tambe, Milind %X A growing number of security applications, designed to reduce risk from adversaries’ actions, are being developed and deployed. However, there are many challenges when attempting to evaluate such systems, both in the lab and in the real world. Traditional evaluations used by computer scientists, such as runtime analysis and optimality proofs, may be largely irrelevant. The primary contribution of this paper is to provide a preliminary framework which can guide the evaluation of such systems and to apply the framework to the evaluation of ARMOR (a system deployed at LAX since August 2007). This framework helps determine what experiments could, and should, be run in order to measure a system’s overall utility. A secondary contribution of this paper is to help familiarize our community with some of the difficulties inherent in evaluating deployed applications, focusing on those in security domains. %B IJCAI 2009 Workshop on Quantitative Risk Analysis for Security Applications. QRASA-2009 %G eng %0 Conference Paper %B Workshop on Emergency Management: Incident, Resource, and Supply Chain Management (EMWS09) %D 2009 %T Agent-based Evacuation Modeling: Simulating the Los Angeles International Airport %A Tsai, Jason %A Bowring, Emma %A Shira Epstein %A Natalie Fridman %A Prakhar Garg %A Gal Kaminka %A Andrew Ogden %A Tambe, Milind %A Matthew Taylor %X In the aftermath of a terrorist attack on a large public venue such as an airport, a train station, or a theme park, rapid but safe evacuation is critical. For example, multiple IED explosions at the Los Angeles International Airport (LAX) will require evacuation of thousands of travelers and airport staff, while mitigating the risks from possible secondary explosions. In such cases, it is crucial to obtain information on the fastest evacuation routes, the time needed to evacuate people, the lag between the disaster and the arrival of a bomb squad or other emergency personnel, and what tactics and procedures authorities can use to avoid stampedes, confusion, and loss of life. By understanding possible scenarios in simulation before hand, emergency personnel can be trained so that they respond in the event of an actual evacuation. Unfortunately, conducting live exercises for evacuating thousands of people is generally impossible. It would be time-consuming and would result in tremendous lost revenue. Moreover, a staged evacuation would necessarily miss crucial aspects of the required response (e.g. fear, confusion) on the part of both emergency personnel and crowd evacuees, or the exercise would be considered unethical. Smaller scale evacuation exercises miss the most important feature of an evacuation: its massive scale. Simulations provide one attractive answer. Evacuation simulations allow us to meet the required training, evacuation planning, and tactics development goals; by running large numbers of “faster than real-time” simulations, we can obtain data from large numbers of scenarios that would be near-impossible in live exercises. This information can be used by policy-makers to predetermine optimal solutions to timecritical situations such as those involving injuries, IEDs, or active shooters. Additionally, real-time simulations can be created for officers on the ground who may only see a handful of real emergencies in their careers and thus could benefit immensely from repeated scenario-style training tools to learn with. In building these simulations, we bring to bear over two-decades of experience in agent-based simulations, including battlefield simulations of battalions of virtual helicopter pilots or teams of virtual fighter pilots for DARPA’s Synthetic Theater of War program and disaster response simulations in collaboration with the Los Angeles Fire Department. %B Workshop on Emergency Management: Incident, Resource, and Supply Chain Management (EMWS09) %G eng %0 Book Section %B Protecting Airline Passengers in the Age of Terrorism %D 2009 %T ARMOR Software: A game theoretic approach to airport security %A J. Pita %A M. Jain %A C. Western %A P. Paruchuri %A J. Marecki %A M. Tambe %A F. Ordonez %A Kraus, S. %E P. Seidenstat %X Protecting national infrastructure such as airports, is a challenging task for police and security agencies around the world; a challenge that is exacerbated by the threat of terrorism. Such protection of these important locations includes, but is not limited to, tasks such as monitoring all entrances or inbound roads and checking inbound traffic. However, limited resources imply that it is typically impossible to provide full security coverage at all times. Furthermore, adversaries can observe security arrangements over time and exploit any predictable patterns to their advantage. Randomizing schedules for patrolling, checking, or monitoring is thus an important tool in the police arsenal to avoid the vulnerability that comes with predictability. This chapter focuses on a deployed software assistant agent that can aid police or other security agencies in randomizing their security schedules. We face at least three key challenges in building such a software assistant. First, the assistant must provide quality guarantees in randomization by appropriately weighing the costs and benefits of the different options available. For example, if an attack on one part of an infrastructure will cause economic damage while an attack on another could potentially cost human lives, we must weigh the two options differently – giving higher weight (probability) to guarding the latter. Second, the assistant must address the uncertainty in information that security forces have about the adversary. Third, the assistant must enable a mixed-initiative interaction with potential users rather than dictating a schedule; the assistant may be unaware of users’ real-world constraints and hence users must be able to shape the schedule development. We have addressed these challenges in a software assistant agent called ARMOR (Assistant for Randomized Monitoring over Routes). Based on game-theoretic principles, ARMOR combines three key features to address each of the challenges outlined above. Game theory is a well-established foundational principle within multi-agent systems to reason about multiple agents each pursuing their own interests (Fudenberg & Tirole 1991). We build on these game theoretic foundations to reason about two agents – the police force and their adversary – in providing a method of randomization. In particular, the main contribution of our work is mapping the problem of security scheduling as a Bayesian Stackelberg game (Conitzer & Sandholm 2006) and solving it within our software system using the fastest optimal algorithm for such games (Paruchuri et al. 2008), addressing the first two challenges. While a Bayesian game allows us to address uncertainty over adversary types, by optimally solving such Bayesian Stackelberg games (which yield optimal randomized strategies as solutions), ARMOR provides quality guarantees on the schedules generated. The algorithm used builds on several years of research regarding multi-agent systems and security (Paruchuri et al. 205; 2006; 2007). ARMOR employs an algorithm that is a logical culmination of this line of research; in particular, ARMOR relies on an optimal algorithm called DOBSS (Decomposed Optimal Bayesian Stackelberg Solver) (Paruchuri et al. 2008). The third challenge is addressed by ARMOR’s use of a mixed-initiative based interface, where users are allowed to graphically enter different constraints to shape the schedule generated. ARMOR is thus a collaborative assistant that iterates over generated schedules rather than a rigid one-shot scheduler. ARMOR also alerts users in case overrides may potentially deteriorate schedule quality. ARMOR therefore represents a very promising transition of multi-agent research into a deployed application. ARMOR has been successfully deployed since August 2007 at the Los Angeles International Airport (LAX) to assist the Los Angeles World Airport (LAWA) police in randomized scheduling of checkpoints, and since November 2007 for generating randomized patrolling schedules for canine units. In particular, it assists police in determining where to randomly set up checkpoints and where to randomly allocate canines to terminals. Indeed, February 2008 marked the successful end of the six month trial period of ARMOR deployment at LAX. The feedback from police at the end of this six month period is extremely positive; ARMOR will continue to be deployed at LAX and expand to other police activities at LAX. %B Protecting Airline Passengers in the Age of Terrorism %I Praeger Publishers %G eng %0 Conference Paper %B Educational Uses of Multi-Agent Systems (EDUMAS) %D 2009 %T Bridging the Gap: Introducing Agents and Multiagent Systems to Undergraduate Students %A Bowring, Emma %A Tambe, Milind %X The field of “intelligent agents and multiagent systems” is maturing; no longer is it a special topic to be introduced to graduate students after years of training in computer science and many introductory courses in Artificial Intelligence. Instead, time is ripe to face the challenge of introducing agents and multiagents directly to undergraduate students, whether majoring in computer science or not. This paper focuses on exactly this challenge, drawing on the co-authors’ experience of teaching several such undergraduate courses on agents and multiagents, over the last three years at two different universities. The paper outlines three key issues that must be addressed. The first issue is facilitating students’ intuitive understanding of fundamental concepts of multiagent systems; we illustrate uses of science fiction materials and classroom games to not only provide students with the necessary intuitive understanding but with the excitement and motivation for studying multiagent systems. The second is in selecting the right material — either science-fiction material or games — for providing students the necessary motivation and intuition; we outline several criteria that have been useful in selecting such material. The third issue is in educating students about the fundamental philosophical, ethical and social issues surrounding agents and multiagent systems: we outline course materials and classroom activities that allow students to obtain this “big picture” futuristic vision of our science. We conclude with feedback received, lessons learned and impact on both the computer science students and non computer-science students. %B Educational Uses of Multi-Agent Systems (EDUMAS) %G eng %0 Conference Paper %B The Eighth International Conference on Autonomous Agents and Multiagent Systems %D 2009 %T Computing Optimal Randomized Resource Allocations for Massive Security Games %A Kiekintveld, Christopher %A Jain, Manish %A Tsai, Jason %A Pita, James %A Ordóñez, Fernando %A Tambe, Milind %X Predictable allocations of security resources such as police officers, canine units, or checkpoints are vulnerable to exploitation by attackers. Recent work has applied game-theoretic methods to find optimal randomized security policies, including a fielded application at the Los Angeles International Airport (LAX). This approach has promising applications in many similar domains, including police patrolling for subway and bus systems, randomized baggage screening, and scheduling for the Federal Air Marshal Service (FAMS) on commercial flights. However, the existing methods scale poorly when the security policy requires coordination of many resources, which is central to many of these potential applications. We develop new models and algorithms that scale to much more complex instances of security games. The key idea is to use a compact model of security games, which allows exponential improvements in both memory and runtime relative to the best known algorithms for solving general Stackelberg games. We develop even faster algorithms for security games under payoff restrictions that are natural in many security domains. Finally, introduce additional realistic scheduling constraints while retaining comparable performance improvements. The empirical evaluation comprises both random data and realistic instances of the FAMS and LAX problems. Our new methods scale to problems several orders of magnitude larger than the fastest known algorithm. %B The Eighth International Conference on Autonomous Agents and Multiagent Systems %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2009 %T DCOPs Meet the RealWorld: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks %A Jain, Manish %A Matthew Taylor %A Tambe, Milind %A Makoto Yokoo %X Buoyed by recent successes in the area of distributed constraint optimization problems (DCOPs), this paper addresses challenges faced when applying DCOPs to real-world domains. Three fundamental challenges must be addressed for a class of real-world domains, requiring novel DCOP algorithms. First, agents may not know the payoff matrix and must explore the environment to determine rewards associated with variable settings. Second, agents may need to maximize total accumulated reward rather than instantaneous final reward. Third, limited time horizons disallow exhaustive exploration of the environment. We propose and implement a set of novel algorithms that combine decision-theoretic exploration approaches with DCOP-mandated coordination. In addition to simulation results, we implement these algorithms on robots, deploying DCOPs on a distributed mobile sensor network. %B International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Conference Paper %B The Eighth International Conference on Autonomous Agents and Multiagent Systems %D 2009 %T Effective Solutions for Real-World Stackelberg Games: When Agents Must Deal with Human Uncertainties %A Pita, James %A Jain, Manish %A Ordóñez, Fernando %A Tambe, Milind %A Kraus, Sarit %A Reuma Magori-Cohen %X How do we build multiagent algorithms for agent interactions with human adversaries? Stackelberg games are natural models for many important applications that involve human interaction, such as oligopolistic markets and security domains. In Stackelberg games, one player, the leader, commits to a strategy and the follower makes their decision with knowledge of the leader’s commitment. Existing algorithms for Stackelberg games efficiently find optimal solutions (leader strategy), but they critically assume that the follower plays optimally. Unfortunately, in real-world applications, agents face human followers who — because of their bounded rationality and limited observation of the leader strategy — may deviate from their expected optimal response. Not taking into account these likely deviations when dealing with human adversaries can cause an unacceptable degradation in the leader’s reward, particularly in security applications where these algorithms have seen real-world deployment. To address this crucial problem, this paper introduces three new mixed-integer linear programs (MILPs) for Stackelberg games to consider human followers, incorporating: (i) novel anchoring theories on human perception of probability distributions and (ii) robustness approaches for MILPs to address human imprecision. Since these new approaches consider human followers, traditional proofs of correctness or optimality are insufficient; instead, it is necessary to rely on empirical validation. To that end, this paper considers two settings based on real deployed security systems, and compares 6 different approaches (three new with three previous approaches), in 4 different observability conditions, involving 98 human subjects playing 1360 games in total. The final conclusion was that a model which incorporates both the ideas of robustness and anchoring achieves statistically significant better rewards and also maintains equivalent or faster solution speeds compared to existing approaches. %B The Eighth International Conference on Autonomous Agents and Multiagent Systems %G eng %0 Conference Paper %B International conference on automated planning and scheduling %D 2009 %T Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping %A Varakantham, Pradeep %A Kwak, Jun-young %A Matthew Taylor %A Janusz Marecki %A Paul Scerri %A Tambe, Milind %X Distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, but NEXPComplete complexity hinders their scalability and application in real-world domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, an algorithm to solve such distributed POMDPs. The primary novelty of TREMOR is that agents plan individually with a single agent POMDP solver and use social model shaping to implicitly coordinate with other agents. Experiments demonstrate that TREMOR can provide solutions orders of magnitude faster than existing algorithms while achieving comparable, or even superior, solution quality %B International conference on automated planning and scheduling %G eng %0 Conference Paper %B The Eighth International Conference on Autonomous Agents and Multiagent Systems %D 2009 %T Improving Adjustable Autonomy Strategies for Time-Critical Domains %A Nathan Schurr %A Janusz Marecki %A Tambe, Milind %X As agents begin to perform complex tasks alongside humans as collaborative teammates, it becomes crucial that the resulting humanmultiagent teams adapt to time-critical domains. In such domains, adjustable autonomy has proven useful by allowing for a dynamic transfer of control of decision making between human and agents. However, existing adjustable autonomy algorithms commonly discretize time, which not only results in high algorithm runtimes but also translates into inaccurate transfer of control policies. In addition, existing techniques fail to address decision making inconsistencies often encountered in human multiagent decision making. To address these limitations, we present novel approach for Resolving Inconsistencies in Adjustable Autonomy in Continuous Time (RIAACT) that makes three contributions: First, we apply continuous time planning paradigm to adjustable autonomy, resulting in high-accuracy transfer of control policies. Second, our new adjustable autonomy framework both models and plans for the resolving of inconsistencies between human and agent decisions. Third, we introduce a new model, Interruptible Action Time-dependent Markov Decision Problem (IA-TMDP), which allows for actions to be interrupted at any point in continuous time. We show how to solve IA-TMDPs efficiently and leverage them to plan for the resolving of inconsistencies in RIAACT. Furthermore, these contributions have been realized and evaluated in a complex disaster response simulation system. %B The Eighth International Conference on Autonomous Agents and Multiagent Systems %G eng %0 Conference Paper %B The Eighth International Conference on Autonomous Agents and Multiagent Systems - Industry Track %D 2009 %T IRIS - A Tool for Strategic Security Allocation in Transportation Networks %A Tsai, Jason %A Shyamsunder Rathi %A Kiekintveld, Christopher %A Ordonez, Fernando %A Tambe, Milind %X Security is a concern of major importance to governments and companies throughout the world. With limited resources, complete coverage of potential points of attack is not possible. Deterministic allocation of available law enforcement agents introduces predictable vulnerabilities that can be exploited by adversaries. Strategic randomization is a game theoretic alternative that we implement in Intelligent Randomization In Scheduling (IRIS) system, a software scheduling assistant for the Federal Air Marshals (FAMs) that provide law enforcement aboard U.S. commercial flights. In IRIS, we model the problem as a Stackelberg game, with FAMS as leaders that commit to a flight coverage schedule and terrorists as followers that attempt to attack a flight. The FAMS domain presents three challenges unique to transportation network security that we address in the implementation of IRIS. First, with tens of thousands of commercial flights per day, the size of the Stackelberg game we need to solve is tremendous. We use ERASERC, the fastest known algorithm for solving this class of Stackelberg games. Second, creating the game itself becomes a challenge due to number of payoffs we must enter for these large games. To address this, we create an attribute-based preference elicitation system to determine reward values. Third, the complex scheduling constraints in transportation networks make it computationally prohibitive to model the game by explicitly modeling all combinations of valid schedules. Instead, we model the leader’s strategy space by incorporating a representation of the underlying scheduling constraints. The scheduling assistant has been delivered to the FAMS and is currently undergoing testing and review for possible incorporation into their scheduling practices. In this paper, we discuss the design choices and challenges encountered during the implementation of IRIS. %B The Eighth International Conference on Autonomous Agents and Multiagent Systems - Industry Track %G eng %0 Conference Paper %B AAMAS 2009 Workshop on Optimisation in Multi-Agent Systems (OptMas) %D 2009 %T Local Optimal Solutions for DCOP: New Criteria, Bound, and Algorithm %A Yin, Zhengyu %A Kiekintveld, Christopher %A Kumar, Atul %A Tambe, Milind %X Distributed constraint optimization (DCOP) is a popular formalism for modeling cooperative multi-agent systems. In large-scale networks, finding a global optimum using complete algorithms is often impractical, which leads to the study on incomplete algorithms. Traditionally incomplete algorithms can only find locally optimal solution with no quality guarantees. Recent work on ksize-optimality has established bounds on solution quality, but size is not the only criteria for forming local optimization groups. In addition, there is only one algorithm for computing solutions for arbitrary k and it is quite inefficient. We introduce t-distanceoptimality, which offers an alternative way to specify optimization groups. We establish bounds for this criteria that are often tighter than those for k-optimality. We then introduce an asynchronous local search algorithm for t-distance-optimality. We implement and evaluate the algorithm for both t and k optimality that offer significant improvements over KOPT – the only existing algorithm for ksize-optimality. Our experiment shows t-distance-optimality converges more quickly and to better solutions than k-size-optimality in scale-free graphs, but k-size-optimality has advantages for random graphs. %B AAMAS 2009 Workshop on Optimisation in Multi-Agent Systems (OptMas) %G eng %0 Conference Paper %B The Eighth International Conference on Autonomous Agents and Multiagent Systems %D 2009 %T Planning with Continuous Resources for Agent Teams %A Janusz Marecki %A Tambe, Milind %X Many problems of multiagent planning under uncertainty require distributed reasoning with continuous resources and resource limits. Decentralized Markov Decision Problems (Dec-MDPs) are well-suited to address such problems, but unfortunately, prior DecMDP approaches either discretize resources at the expense of speed and quality guarantees, or avoid discretization only by limiting agents’ action choices or interactions (e.g. assumption of transition independence). To address these shortcomings, this paper proposes MDPFP, a novel algorithm for planning with continuous resources for agent teams, with three key features: (i) it maintains the agent team interaction graph to identify and prune the suboptimal policies and to allow the agents to be transition dependent, (ii) it operates in a continuous space of probability functions to provide the error bound on the solution quality and finally (iii) it focuses the search for policies on the most relevant parts of this search space to allow for a systematic trade-off of solution quality for speed. Our experiments show that M-DPFP finds high quality solutions and exhibits superior performance when compared with a discretization-based approach. We also show that M-DPFP is applicable to solving problems that are beyond the scope of existing approaches. %B The Eighth International Conference on Autonomous Agents and Multiagent Systems %G eng %0 Journal Article %J ACM SIGecom Exchanges %D 2009 %T Security Applications: Lessons of Real-World Deployment %A J. Pita %A M. Jain %A C. Kiekintveld %A H. Bellamane %A J. Tsai %A M. Tambe %A F. Ordonez %X Game theory has played an important role in security decisions. Recent work using Stackelberg games [Fudenberg and Tirole 1991] to model security domains has been particularly influential [Basilico et al. 2009; Kiekintveld et al. 2009; Paruchuri et al. 2008; Pita et al. 2008; Pita et al. 2009]. In a Stackelberg game, a leader (in this case the defender) acts first and commits to a randomized security policy. The follower (attacker) optimizes its reward considering the strategy chosen by the leader. These games are well-suited to representing the problem security forces face in allocating limited resources, such as officers, canine units, and checkpoints. In particular, the fact that the attacker is able to observe the policy reflects the way real terrorist organizations plan attacks using extensive surveillance and long planning cycles. Stackelberg game models are not just theoretical models; they are at the heart of deployed decision-support software now in use the the Los Angeles World Airport (LAWA) police and the United States Federal Air Marshals Service (FAMS). A new application is under development for the Transportation Security Administration (TSA), also using game-theoretic analysis. Moving from theoretical analysis to applying game theory in real applications posed many new challenged, and there remain many open questions to be solved in this exciting area of work. In this article we will highlight several of the main issues that have come up, including (i) developing efficient algorithms to solve large-scale Stackelberg Security Games, (ii) evaluating deployed security systems, (iii) knowledge acquisition from security experts to specify the game models, and (iv) handling mixed-initiative interactions. We begin with an overview of the deployed systems and then discuss these issues in turn. %B ACM SIGecom Exchanges %V 8 %G eng %N 2 %0 Conference Paper %B The Eighth International Conference on Autonomous Agents and Multiagent Systems %D 2009 %T Sensitivity analysis for distributed optimization with resource constraints %A Bowring, Emma %A Yin, Zhengyu %A Rob Zinkov %A Tambe, Milind %X Previous work in multiagent coordination has addressed the challenge of planning in domains where agents must optimize a global goal, while satisfying local resource constraints. However, the imposition of resource constraints naturally raises the question of whether the agents could significantly improve their team performance if a few more resources were made available. Sensitivity analysis aims to answer that question. This paper focuses on sensitivity analysis in the context of the distributed coordination framework, Multiply-Constrained DCOP (MC-DCOP). There are three main challenges in performing sensitivity analysis: (i) to perform it in a distributed fashion, (ii) to avoid re-solving an NP-hard MC-DCOP optimization from scratch, and (iii) to avoid considering unproductive uses for extra resources. To meet these challenges, this paper presents three types of locally optimal algorithms: link analysis, local reoptimization and local constraint propagation. These algorithms are distributed and avoid redundant computation by ascertaining just the effects of local perturbations on the original problem. Deploying our algorithms on a large number of MC-DCOP problems revealed several results. While our cheapest algorithm successfully identified quality improvements for a few problems, our more complex techniques were necessary to identify the best uses for additional resources. Furthermore, we identified two heuristics that can help identify a priori which agents might benefit most from additional resources: density rank, which works well when nodes received identical resources and remaining resource rank, which works well when nodes received resources based on the number of neighbors they had. %B The Eighth International Conference on Autonomous Agents and Multiagent Systems %G eng %0 Conference Paper %B In IJCAI 2009 Workshop on Quantitative Risk Analysis for Security Applications %D 2009 %T Strategic Security Placement in Network Domains with Applications to Transit Security %A Tsai, Jason %A Yin, Zhengyu %A Kwak, Jun-young %A David Kempe %A Kiekintveld, Christopher %A Tambe, Milind %X Deterministic placement of security personnel creates serious vulnerabilities for any organization attempting to prevent intrusion. Recent work in use at the Los Angeles International Airport (LAX) and in progress with the United States Federal Air Marshal Service (FAMS) has applied game-theoretic analysis to the problem by modeling it as a Stackelberg game wherein security forces are the leaders that commit to a strategy that is observed and countered by attackers. In this work, we explore efficient techniques for performing the same analysis on games with a graph structure, wherein an attacker must follow a path from an entry point to a target. If we frame these problems in the straightforward manner with leader actions being sets of edges that can be guarded and follower actions being paths from entry to targets, the size of the game increases exponentially, quickly reaching memory limitations when using general Stackelberg solvers. In this work, we propose a novel linear program that is able to solve this type of problem efficiently. While it provides exact solutions for games where only one checkpoint is allowed, it is an approximation in the general case. Finally, we compare the performance of this and other methods by generating optimal policies for the Seoul Metropolitan Subway in Seoul, South Korea. %B In IJCAI 2009 Workshop on Quantitative Risk Analysis for Security Applications %G eng %0 Conference Paper %B IJCAI 2009 Workshop on Distributed Constraint Reasoning (DCR 2009) %D 2009 %T Using DCOPs to Balance Exploration and Exploitation in Time-Critical Domains %A Matthew E. Taylor %A Jain, Manish %A Tandon, Prateek %A Tambe, Milind %X Substantial work has investigated balancing exploration and exploitation, but relatively little has addressed this tradeoff in the context of coordinated multi-agent interactions. This paper introduces a class of problems in which agents must maximize their on-line reward, a decomposable function dependent on pairs of agent’s decisions. Unlike previous work, agents must both learn the reward function and exploit it on-line, critical properties for a class of physicallymotivated systems, such as mobile wireless networks. This paper introduces algorithms motivated by the Distributed Constraint Optimization Problem framework and demonstrates when, and at what cost, increasing agents’ coordination can improve the global reward on such problems. %B IJCAI 2009 Workshop on Distributed Constraint Reasoning (DCR 2009) %G eng %0 Magazine Article %D 2009 %T Using Game Theory for Los Angeles Airport Security %A Pita, James %A Jain, Manish %A Ordonez, Fernando %A Christopher Portway %A Tambe, Milind %A Craig Western %A Praveen Paruchuri %A Kraus, Sarit %X Security at major locations of economic or political importance is a key concern around the world, particularly given the threat of terrorism. Limited security resources prevent full security coverage at all times, which allows adversaries to observe and exploit patterns in selective patrolling or monitoring, e.g. they can plan an attack avoiding existing patrols. Hence, randomized patrolling or monitoring is important, but randomization must provide distinct weights to different actions based on their complex costs and benefits. To this end, this paper describes a promising transition of the latest in multi-agent algorithms into a deployed application. In particular, it describes a software assistant agent called ARMOR (Assistant for Randomized Monitoring over Routes) that casts this patrolling/monitoring problem as a Bayesian Stackelberg game, allowing the agent to appropriately weigh the different actions in randomization, as well as uncertainty over adversary types. ARMOR combines two key features: (i) It uses the fastest known solver for Bayesian Stackelberg games called DOBSS, where the dominant mixed strategies enable randomization; (ii) Its mixed-initiative based interface allows users to occasionally adjust or override the automated schedule based on their local constraints. ARMOR has been successfully deployed since August 2007 at the Los Angeles International Airport (LAX) to randomize checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals. This paper examines the information, design choices, challenges, and evaluation that went into designing ARMOR. %B AI Magazine %V 30 %P 43-57 %G eng %N 1 %0 Journal Article %J Journal of Information Technology and Management (ITM) %D 2009 %T Coordinating randomized policies for increasing security of agent systems %A P. Paruchuri %A J. Pearce %A J. Marecki %A M. Tambe %A F. Ordonez %A Kraus, S. %X We consider the problem of providing decision support to a patrolling or security service in an adversarial domain. The idea is to create patrols that can achieve a high level of coverage or reward while taking into account the presence of an adversary. We assume that the adversary can learn or observe the patrolling strategy and use this to its advantage. We follow two different approaches depending on what is known about the adversary. If there is no information about the adversary we use a Markov Decision Process (MDP) to represent patrols and identify randomized solutions that minimize the information available to the adversary. This lead to the development of algorithms CRLP and BRLP, for policy randomization of MDPs. Second, when there is partial information about the adversary we decide on efficient patrols by solving a Bayesian Stackelberg game. Here, the leader decides first on a patrolling strategy and then an adversary, of possibly many adversary types, selects its best response for the given patrol. We provide two efficient MIP formulations named DOBSS and ASAP to solve this NP-hard problem. Our experimental results show the efficiency of these algorithms and illustrate how these techniques provide optimal and secure patrolling policies. Note that DOBSS is at the heart of the ARMOR system that is currently deployed at the Los Angeles International airport (LAX) for randomizing checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals. Key words: Multiagent Systems, Decision Theory, Game Theory, Security, Randomized Policies %B Journal of Information Technology and Management (ITM) %V 10 %P 67-79 %G eng %N 1 %0 Conference Paper %B AAAI Intelligent Systems Demonstrations %D 2008 %T ARMOR Security for Los Angeles International Airport %A J. Pita %A Jain, Manish %A Ordonez, Fernando %A Christopher Portway %A Tambe, Milind %A Craig Western %A Praveen Paruchuri %A Kraus, Sarit %X Security at major locations of economic or political importance is a key concern around the world, particularly given the threat of terrorism. Limited security resources prevent full security coverage at all times, which allows adversaries to observe and exploit patterns in selective patrolling or monitoring, e.g. they can plan an attack avoiding existing patrols. Hence, randomized patrolling or monitoring is important, but randomization must provide distinct weights to different actions based on their complex costs and benefits. To this end, this demonstration showcases a promising transition of the latest in multi-agent algorithms into a deployed application. In particular, it exhibits a software assistant agent called ARMOR (Assistant for Randomized Monitoring over Routes) that casts this patrolling/monitoring problem as a Bayesian Stackelberg game, allowing the agent to appropriately weigh the different actions in randomization, as well as uncertainty over adversary types. ARMOR combines two key features: (i) It uses the fastest known solver for Bayesian Stackelberg games called DOBSS, where the dominant mixed strategies enable randomization; (ii) Its mixed-initiative based interface allows users to occasionally adjust or override the automated schedule based on their local constraints. ARMOR has been successfully deployed since August 2007 at the Los Angeles International Airport (LAX) to randomize checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals. %B AAAI Intelligent Systems Demonstrations %G eng %0 Conference Paper %B SIGecom Exchanges %D 2008 %T Bayesian Stackelberg Games and their Application for Security at Los Angeles International Airport %A Jain, Manish %A J . Pita %A Tambe, Milind %A Ordonez, Fernando %A Praveen Paruchuri %A Kraus, Sarit %X Many multiagent settings are appropriately modeled as Stackelberg games [Fudenberg and Tirole 1991; Paruchuri et al. 2007], where a leader commits to a strategy first, and then a follower selfishly optimizes its own reward, considering the action chosen by the leader. Stackelberg games are commonly used to model attacker-defender scenarios in security domains [Brown et al. 2006] as well as in patrolling [Paruchuri et al. 2007; Paruchuri et al. 2008]. For example, security personnel patrolling an infrastructure commit to a patrolling strategy first, before their adversaries act taking this committed strategy into account. Indeed, Stackelberg games are being used at the Los Angeles International Airport to schedule security checkpoints and canine patrols [Murr 2007; Paruchuri et al. 2008; Pita et al. 2008a]. They could potentially be used in network routing, pricing in transportation systems and many other situations [Korilis et al. 1997; Cardinal et al. 2005]. Although the follower in a Stackelberg game is allowed to observe the leader’s strategy before choosing its own strategy, there is often an advantage for the leader over the case where both players must choose their moves simultaneously. To see the advantage of being the leader in a Stackelberg game, consider the game with the payoff as shown in Table I. The leader is the row player and the follower is the column player. The only pure-strategy Nash equilibrium for this game is when the leader plays a and the follower plays c which gives the leader a payoff of 2. However, if the leader commits to a mixed strategy of playing a and b with equal (0.5) probability, then the follower will play d, leading to an expected payoff for the leader of 3.5. c d %B SIGecom Exchanges %G eng %0 Conference Paper %B International Joint Conference on Autonomous Agents and Multiagent Systems, 2008 %D 2008 %T Deployed ARMOR protection: The application of a game theoretic model for security at the Los Angeles International Airport %A J. Pita %A Jain, Manish %A Craig Western %A Christopher Portway %A Tambe, Milind %A Ordonez, Fernando %A Kraus, Sarit %A Praveen Paruchuri %X Security at major locations of economic or political importance is a key concern around the world, particularly given the threat of terrorism. Limited security resources prevent full security coverage at all times, which allows adversaries to observe and exploit patterns in selective patrolling or monitoring, e.g. they can plan an attack avoiding existing patrols. Hence, randomized patrolling or monitoring is important, but randomization must provide distinct weights to different actions based on their complex costs and benefits. To this end, this paper describes a promising transition of the latest in multi-agent algorithms – in fact, an algorithm that represents a culmination of research presented at AAMAS – into a deployed application. In particular, it describes a software assistant agent called ARMOR (Assistant for Randomized Monitoring over Routes) that casts this patrolling/monitoring problem as a Bayesian Stackelberg game, allowing the agent to appropriately weigh the different actions in randomization, as well as uncertainty over adversary types. ARMOR combines three key features: (i) It uses the fastest known solver for Bayesian Stackelberg games called DOBSS, where the dominant mixed strategies enable randomization; (ii) Its mixed-initiative based interface allows users to occasionally adjust or override the automated schedule based on their local constraints; (iii) It alerts the users if mixed-initiative overrides appear to degrade the overall desired randomization. ARMOR has been successfully deployed since August 2007 at the Los Angeles International Airport (LAX) to randomize checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals. This paper examines the information, design choices, challenges, and evaluation that went into designing ARMOR. %B International Joint Conference on Autonomous Agents and Multiagent Systems, 2008 %G eng %0 Conference Paper %B National Conference on Artificial Intelligence (AAAI) %D 2008 %T Efficient Algorithms to solve Bayesian Stackelberg Games for Security Applications %A Praveen Paruchuri %A Jonathan P. Pearce %A Janusz Marecki %A Tambe, Milind %A Ordonez, Fernando %A Kraus, Sarit %X In a class of games known as Stackelberg games, one agent (the leader) must commit to a strategy that can be observed by the other agent (the adversary/follower) before the adversary chooses its own strategy. We consider Bayesian Stackelberg games, in which the leader is uncertain about the type of the adversary it may face. Such games are important in security domains, where, for example, a security agent (leader) must commit to a strategy of patrolling certain areas, and an adversary (follower) can observe this strategy over time before choosing where to attack. We present here two different MIP-formulations, ASAP (providing approximate policies with controlled randomization) and DOBSS (providing optimal policies) for Bayesian Stackelberg games. DOBSS is currently the fastest optimal procedure for Bayesian Stackelberg games and is in use by police at the Los Angeles International Airport(LAX) to schedule their activities. %B National Conference on Artificial Intelligence (AAAI) %G eng %0 Conference Paper %B IEEE International conference on Intelligent Agent Technology (IAT) %D 2008 %T Introducing Communication in Dis-POMDPs with Locality of Interaction %A M. Tasaki %A Y. Yabu %A Y. Iwanari %A M. Yokoo %A M. Tambe %A J. Marecki %A P. Varakantham %X While Distributed POMDPs have become popular for modeling multiagent systems in uncertain domains, it is the Networked Distributed POMDPs (ND-POMDPs) model — a model tailored to real agent networks — that has begun to scale-up the number of agents. However, prior work in ND-POMDPs has failed to address communication, a shortcoming that has the side-effect of limiting the planning horizon. In particular, without communication, the size of a local policy at each agent within the ND-POMDPs grows exponentially in the time horizon. To overcome this problem, we extend existing algorithms (LID-JESP and SPIDER) so that agents periodically communicate their observation and action histories with each other. After communication, agents can start from new synchronized belief states. While by introducing communication, we can avoid the exponential growth in the size of local policies at agents, the key idea is to avoid an exponential number of synchronized belief states after communication. To this end, we introduce an idea that is similar the Point-based Value Iteration (PBVI) algorithm and approximate the value function with a fixed number of representative points and their α vectors. Our experimental results show that we can obtain much longer policies than existing algorithms as long as the interval between communications is small. %B IEEE International conference on Intelligent Agent Technology (IAT) %G eng %0 Conference Paper %B The Seventh International Conference on Autonomous Agents and Multiagent Systems %D 2008 %T On K-Optimal Distributed Constraint Optimization Algorithms: New Bounds and Algorithms %A Bowring, Emma %A Jonathan P. Pearce %A Christopher Portway %A Jain, Manish %A Tambe, Milind %X Distributed constraint optimization (DCOP) is a promising approach to coordination, scheduling and task allocation in multi agent networks. In large-scale or low-bandwidth networks, finding the global optimum is often impractical. K-optimality is a promising new approach: for the first time it provides us a set of locally optimal algorithms with quality guarantees as a fraction of global optimum. Unfortunately, previous work in k-optimality did not address domains where we may have prior knowledge of reward structure; and it failed to provide quality guarantees or algorithms for domains with hard constraints (such as agents’ local resource constraints). This paper addresses these shortcomings with three key contributions. It provides: (i) improved lower-bounds on k-optima quality incorporating available prior knowledge of reward structure; (ii) lower bounds on k-optima quality for problems with hard constraints; and (iii) k-optimal algorithms for solving DCOPs with hard constraints and detailed experimental results on large-scale networks. %B The Seventh International Conference on Autonomous Agents and Multiagent Systems %G eng %0 Conference Paper %B International symposium on collaborative technologies (CTS) %D 2008 %T Multiagent adjustable autonomy framework (MAAF) for multirobot, multihuman teams %A A. Freedy %A O. Sert %A E. Freedy %A G. Weltman %A J. Mcdonough %A Tambe, Milind %A Tapana Gupta %X This paper describes the ongoing development of a Multiagent Adjustable Autonomy Framework (MAAF) for multi-robot, multi-human teams performing tactical maneuvers. The challenge being addressed in this SBIR Phase I R&D project is how to exploit fully the unique capabilities of heterogeneous teams composed of a mixture of Robots, Agents or Persons (RAPs): that is, how to improve the safety, efficiency, reliability and cost of achieving mission goals while maintaining dynamic adaptation to the unique limitations and contingencies of a real-world operating environment. Our response to this challenge is the creation of a new infrastructure that will facilitate cooperative and collaborative performance of human and robots as equal team partners through the application of advances in goal-oriented, multiagent planning and coordination technology. At the heart of our approach is the USC Teamcore Group’s Machinetta, a state-of-the-art robot proxy framework with adjustable autonomy. Machinetta facilitates robot-human role allocation decisions and collaborative sharing of team tasks in the non-deterministic and unpredictable military environment through the use of a domain-independent teamwork model that supports flexible teamwork. This paper presents our innovative proxy architecture and its constituent algorithms, and also describes our initial demonstration of technical feasibility in a realistic simulation scenario. %B International symposium on collaborative technologies (CTS) %G eng %0 Conference Paper %B The Seventh International Conference on Autonomous Agents and Multiagent Systems %D 2008 %T Not All Agents Are Equal: Scaling up Distributed POMDPs for Agent Networks %A Janusz Marecki %A Tapana Gupta %A Varakantham, Pradeep %A Tambe, Milind %A Makoto Yokoo %X Many applications of networks of agents, including mobile sensor networks, unmanned air vehicles, autonomous underwater vehicles, involve 100s of agents acting collaboratively under uncertainty. Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are well-suited to address such applications, but so far, only limited scale-ups of up to five agents have been demonstrated. This paper escalates the scale-up, presenting an algorithm called FANS, increasing the number of agents in distributed POMDPs for the first time into double digits. FANS is founded on finite state machines (FSMs) for policy representation and expoits these FSMs to provide three key contributions: (i) Not all agents within an agent network need the same expressivity of policy representation; FANS introduces novel heuristics to automatically vary the FSM size in different agents for scaleup; (ii) FANS illustrates efficient integration of its FSM-based policy search within algorithms that exploit agent network structure; (iii) FANS provides significant speedups in policy evaluation and heuristic computations within the network algorithms by exploiting the FSMs for dynamic programming. Experimental results show not only orders of magnitude improvements over previous best known algorithms for smaller-scale domains (with similar solution quality), but also a scale-up into double digits in terms of numbers of agents. %B The Seventh International Conference on Autonomous Agents and Multiagent Systems %G eng %0 Thesis %D 2008 %T Planning with Continuous Resources in Agent Systems %A Janusz Marecki %X

My research concentrates on developing reasoning techniques for intelligent, autonomous agent systems. In particular, I focus on planning techniques for both single and multi-agent systems acting in uncertain domains. In modeling these domains, I consider two types of uncertainty: (i) the outcomes of agent actions are uncertain and (ii) the amount of resources consumed by agent actions is uncertain and only characterized by continuous probability density functions. Such rich domains, that range from the Mars rover exploration to the unmanned aerial surveillance to the automated disaster rescue operations are commonly modeled as continuous resource Markov decision processes (MDPs) that can then be solved in order to construct policies for agents acting in these domains. This thesis addresses two major unresolved problems in continuous resource MDPs. First, they are very difficult to solve and existing algorithms are either fast, but make additional restrictive assumptions about the model, or do not introduce these assumptions but are very inefficient. Second, continuous resource MDP framework is not directly applicable to multi-agent systems and current approaches all discretize resource levels or assume deterministic resource consumption which automatically invalidates the formal solution quality guarantees. The goal of my thesis is to fundamentally alter this landscape in three contributions:

I first introduce CPH, a fast analytic algorithm for solving continuous resource MDPs. CPH solves the planning problems at hand by first approximating with a desired accuracy the probability distributions over the resource consumptions with phase-type distributions, which use exponential distributions as building blocks. It then uses value iteration to solve the resulting MDPs more efficiently than its closest competitor, and allows for a systematic trade-off of solution quality for speed. Second, to improve the anytime performance of CPH and other continuous resource MDP solvers I introduce the DPFP algorithm. Rather than using value iteration to solve the problem at hand, DPFP performs a forward search in the corresponding dual space of cumulative distribution functions. In doing so, DPFP discriminates in its policy generation effort providing only approximate policies for regions of the state-space reachable with low probability yet it bounds the error that such approximation entails. Third, I introduce CR-DEC-MDP, a framework for planning with continuous resources in multi-agent systems and propose two algorithms for solving CR-DEC-MDPs: The first algorithm (VFP) emphasizes scalability. It performs a series of policy iterations in order to quickly find a locally optimal policy. In contrast, the second algorithm (M-DPFP) stresses optimality; it allows for a systematic trade-off of solution quality for speed by using the concept of DPFP in a multiagent setting. My results show up to three orders of magnitude speedups in solving single agent planning problems and up to one order of magnitude speedup in solving multi-agent planning problems. Furthermore, I demonstrate the practical use of one of my algorithms in a large-scale disaster simulation where it allows for a more efficient rescue operation.

%G eng %9 PhD thesis %0 Conference Paper %B International Joint Conference on Autonomous Agents and Multiagent Systems, 2008 %D 2008 %T Playing games with security: An efficient exact algorithm for Bayesian Stackelberg Games %A Praveen Paruchuri %A Jonathan P. Pearce %A Janusz Marecki %A Tambe, Milind %A Ordonez, Fernando %A Kraus, Sarit %X In a class of games known as Stackelberg games, one agent (the leader) must commit to a strategy that can be observed by the other agent (the follower or adversary) before the adversary chooses its own strategy. We consider Bayesian Stackelberg games, in which the leader is uncertain about the types of adversary it may face. Such games are important in security domains, where, for example, a security agent (leader) must commit to a strategy of patrolling certain areas, and a robber (follower) has a chance to observe this strategy over time before choosing its own strategy of where to attack. This paper presents an efficient exact algorithm for finding the optimal strategy for the leader to commit to in these games. This algorithm, DOBSS, is based on a novel and compact mixed-integer linear programming formulation. Compared to the most efficient algorithm known previously for this problem, DOBSS is not only faster, but also leads to higher quality solutions, and does not suffer from problems of infeasibility that were faced by this previous algorithm. Note that DOBSS is at the heart of the ARMOR system that is currently being tested for security scheduling at the Los Angeles International Airport. %B International Joint Conference on Autonomous Agents and Multiagent Systems, 2008 %G eng %0 Conference Paper %B The Seventh International Conference on Autonomous Agents and Multiagent Systems %D 2008 %T RIAACT: A robust approach to adjustable autonomy for human-multiagent teams %A Nathan Schurr %A Janusz Marecki %A Tambe, Milind %X When human-multiagent teams act in real-time uncertain domains, adjustable autonomy (dynamic transferring of decisions between human and agents) raises three key challenges. First, the human and agents may differ significantly in their worldviews, leading to inconsistencies in their decisions. Second, these human-multiagent teams must operate and plan in real-time with deadlines with uncertain duration of human actions. Thirdly, adjustable autonomy in teams is an inherently distributed and complex problem that cannot be solved optimally and completely online. To address these challenges, our paper presents a solution for Resolving Inconsistencies in Adjustable Autonomy in Continuous Time (RIAACT). RIAACT incorporates models of the resolution of inconsistencies, continuous time planning techniques, and hybrid method to address coordination complexity. These contributions have been realized in a disaster response simulation system. %B The Seventh International Conference on Autonomous Agents and Multiagent Systems %G eng %0 Conference Paper %B Association for the Advancement of Artificial Intelligence 4th Multidiciplinary Workshop on Advances in Preference Handling %D 2008 %T Robust Solutions in Stackelberg Games: Addressing Boundedly Rational Human Preference Models %A Jain, Manish %A Fernando Ord´onez %A Pita, James %A Christopher Portway %A Tambe, Milind %A Craig Western %A Praveen Paruchuri %A Kraus, Sarit %X Stackelberg games represent an important class of games in which one player, the leader, commits to a strategy and the remaining players, the followers, make their decision with knowledge of the leader’s commitment. Existing algorithms for Bayesian Stackelberg games find optimal solutions while modeling uncertainty over follower types with an a-priori probability distribution. Unfortunately, in real-world applications, the leader may also face uncertainty over the follower’s response which makes the optimality guarantees of these algorithms fail. Such uncertainty arises because the follower’s specific preferences or the follower’s observations of the leader’s strategy may not align with the rational strategy, and it is not amenable to a-priori probability distributions. These conditions especially hold when dealing with human subjects. To address these uncertainties while providing quality guarantees, we propose three new robust algorithms based on mixed-integer linear programs (MILPs) for Bayesian Stackelberg games. A key result of this paper is a detailed experimental analysis that demonstrates that these new MILPs deal better with human responses: a conclusion based on 800 games with 57 human subjects as followers. We also provide run-time results on these MILPs. %B Association for the Advancement of Artificial Intelligence 4th Multidiciplinary Workshop on Advances in Preference Handling %G eng %0 Magazine Article %D 2008 %T Solving multiagent networks using distributed constraint optimization %A Jonathan P. Pearce %A Tambe, Milind %A Rajiv T. Maheswaran %X In many cooperative multiagent domains, the effect of local interactions between agents can be compactly represented as a network structure. Given that agents are spread across such a network, agents directly interact only with a small group of neighbors. A distributed constraint optimization problem (DCOP) is a useful framework to reason about such networks of agents. Given agents’ inability to communicate and collaborate in large groups in such networks, we focus on an approach called k-optimality for solving DCOPs. In this approach, agents form groups of one or more agents until no group of k or fewer agents can possibly improve the DCOP solution; we define this type of local optimum, and any algorithm guaranteed to reach such a local optimum, as k-optimal. The article provides an overview of three key results related to k-optimality. The first set of results are worst-case guarantees on the solution quality of k-optima in a DCOP. These guarantees can help determine an appropriate k-optimal algorithm, or possibly an appropriate constraint graph structure, for agents to use in situations where the cost of coordination between agents must be weighed against the quality of the solution reached. The second set of results are upper bounds on the number of k-optima that can exist in a DCOP. These results are useful in domains where a DCOP must generate a set of solutions rather than single solution. Finally, we sketch algorithms for k-optimality and provide some experimental results for 1-, 2- and 3-optimal algorithms for several types of DCOPs. %B AI Magazine %V 29 %P 47-66 %G eng %N 3 %0 Conference Paper %B Twenty Third AAAI Conference on Artificial Intelligence %D 2008 %T Towards Faster Planning with Continuous Resources in Stochastic Domains %A Janusz Marecki %A Tambe, Milind %X Agents often have to construct plans that obey resource limits for continuous resources whose consumption can only be characterized by probability distributions. While Markov Decision Processes (MDPs) with a state space of continuous and discrete variables are popular for modeling these domains, current algorithms for such MDPs can exhibit poor performance with a scale-up in their state space. To remedy that we propose an algorithm called DPFP. DPFP’s key contribution is its exploitation of the dual space cumulative distribution functions. This dual formulation is key to DPFP’s novel combination of three features. First, it enables DPFP’s membership in a class of algorithms that perform forward search in a large (possibly infinite) policy space. Second, it provides a new and efficient approach for varying the policy generation effort based on the likelihood of reaching different regions of the MDP state space. Third, it yields a bound on the error produced by such approximations. These three features conspire to allow DPFP’s superior performance and systematic trade-off of optimality for speed. Our experimental evaluation shows that, when run stand-alone, DPFP outperforms other algorithms in terms of its any-time performance, whereas when run as a hybrid, it allows for a significant speedup of a leading continuous resource MDP solver. %B Twenty Third AAAI Conference on Artificial Intelligence %G eng %0 Conference Paper %B AAAI Spring Symposium 2008 %D 2008 %T Using science fiction in teaching Artificial Intelligence %A Tambe, Milind %A Anne Balsamo %A Bowring, Emma %X Many factors are blamed for the decreasing enrollments in computer science and engineering programs in the U.S., including the dot-com economic bust and the increase in the use of “off-shore” programming labor. One major factor is also the lack of bold new vision and excitement about computer science, which thus results in a view of computer science as a field wedded to routine programming. To address this concern, we have focused on science fiction as a means to generate excitement about Artificial Intelligence, and thus in turn in Computer Science and Engineering. In particular, since the Fall of 2006, we have used science fiction in teaching Artificial Intelligence to undergraduate students at the University of Southern California (USC), in teaching activities ranging from an undergraduate upper division class in computer science to a semester-long freshman seminar for nonengineering students to micro-seminars during the welcome week. As an interdisciplinary team of scholar/instructors, our goal has been to use science fiction not only in motivating students to learn about AI, but also to use science fiction in understanding fundamental issues that arise at the intersection of technology and culture, as well as to provide students with a more creative and well-rounded course that provided a big picture view of computer science. This paper outlines the courses taught using this theme, provides an overview of our classroom teaching techniques in using science fiction, and discusses some of the lectures in more detail as exemplars. We conclude with feedback received, lessons learned and impact on both the computer science students and noncomputer-science (and non-engineering) students. %B AAAI Spring Symposium 2008 %G eng %0 Thesis %D 2007 %T Balancing local resources and global goals in multiply constrained distributed constraint optimization %A Bowring, Emma %X Distributed constraint optimization (DCOP) is a useful framework for cooperative multiagent coordination. DCOP focuses on optimizing a single team objective. However, in many domains, agents must satisfy constraints on resources consumed locally while optimizing the team goal. These resource constraints may need to be kept private or shared to improve efficiency. Extending DCOP to these domains raises two issues: algorithm design and sensitivity analysis. Algorithm design requires creating algorithms that trade off completeness, scalability, privacy and efficiency. Sensitivity analysis examines whether slightly increasing the available resources could yield a significantly better outcome. This thesis defines the multiply-constrained DCOP (MC-DCOP) framework and provides complete and incomplete algorithms for solving MC-DCOP problems. Complete algorithms find the best allocation of scarce resources, while incomplete algorithms are more scalable. The algorithms use mutually-intervening search; they use local resource constraints to intervene in the search for the globally optimal solution. The algorithms use four key techniques: (i) transforming constraints to maintain privacy; (ii) dynamically setting upper bounds on resource consumption; (iii) identifying the extent to which the local graph structure allows agents to compute exact bounds on resource consumption; and (iv) using a virtual assignment to flag problems rendered unsatisfiable by their resource constraints. Proofs of correctness are presented for all algorithms. Finally, the complete and incomplete algorithms are used in conjunction with one another to perform distributed local reoptimization to address sensitivity analysis. Experimental results demonstrated that MC-DCOP problems are most challenging when resources are scarce but sufficient. In problems where there are insufficient resources, the team goal is largely irrelevant. In problems with ample resources, the local resource constraints require little consideration. The incomplete algorithms were two orders of magnitude more efficient than the complete algorithm for the most challenging MC-DCOP problems and their runtime increased very little as the number of agents in the network increased. Finally, sensitivity analysis results indicated that local reoptimization is an effective way to identify resource constraints that are creating bottlenecks. Taken together these new algorithms and examination of the problem of sensitivity analysis help extend the applicability of DCOP to more complex domains. %G eng %9 PhD thesis %0 Conference Paper %B In H. Mouratidis editors Proceedings of the workshop on Safety and Security in multiagent systems, Lecture notes in Artificial Intelligence, Springer %D 2007 %T Coordinating Randomized Policies for Increasing Security in Multiagent Systems %A Praveen Paruchuri %A Tambe, Milind %A Ordonez, Fernando %A Kraus, Sarit %X Despite significant recent advances in decision theoretic frameworks for reasoning about multiagent teams, little attention has been paid to applying such frameworks in adversarial domains, where the agent team may face security threats from other agents. This paper focuses on domains where such threats are caused by unseen adversaries whose actions or payoffs are unknown. In such domains, action randomization is recognized as a key technique to deteriorate an adversarys capability to predict and exploit an agent/agent teams actions. Unfortunately, there are two key challenges in such randomization. First, randomization can reduce the expected reward (quality) of the agent team’s plans, and thus we must provide some guarantees on such rewards. Second, randomization results in miscoordination in teams. While communication within an agent team can help in alleviating the miscoordination problem, communication is unavailable in many real domains or sometimes scarcely available. To address these challenges, this paper provides the following contributions. First, we recall the Multiagent Constrained MDP (MCMDP) framework that enables policy generation for a team of agents where each agent may have a limited or no(communication) resource. Second, since randomized policies generated directly for MCMDPs lead to miscoordination, we introduce a transformation algorithm that converts the MCMDP into a transformed MCMDP incorporating explicit communication and no communication actions. Third, we show that incorporating randomization results in a non-linear program and the unavailability/limited availability of communication results in addition of non-convex constraints to the non-linear program. Finally, we experimentally illustrate the benefits of our work. %B In H. Mouratidis editors Proceedings of the workshop on Safety and Security in multiagent systems, Lecture notes in Artificial Intelligence, Springer %G eng %0 Conference Paper %B Sixth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) Demo Track %D 2007 %T Demonstration of Teamwork in Uncertain Domains using Hybrid BDI-POMDP systems %A Tapana Gupta %A Varakantham, Pradeep %A Timothy W. Rauenbusch %A Tambe, Milind %X

Personal Assistant agents are becoming increasingly important in a variety of application domains in offices, at home, for medical care and many others [5, 1]. These agents are required to constantly monitor their environment (including the state of their users), and make periodic decisions based on their monitoring. For example, in an office environment, agents may need to monitor the location of their user in order to ascertain whether the user would be able to make it on time to a meeting [5]. Or, they may be required to monitor the progress of a user on a particular assignment and decide whether or not the user would be able to meet the deadline for completing the assignment. Teamwork between such agents is important in Personal Assistant applications to enable agents working together to achieve a common goal (such as finishing a project on time). This working demonstration shows a hybrid(BDI-POMDP) approach to accomplish such teamwork. Agents must be able to make decisions despite observational uncertainty in the environment. For example, if the user is busy and does not respond to a request from its personal assistant agent, the agent loses track of the user’s progress and hence, cannot determine it with certainty. Also, an incorrect action on the agent’s part can have undesirable consequences. For example, an agent might reallocate a task again and again even if there is sufficient progress on the task. In the past, teamwork among Personal Assistant agents typically has not addressed such observational uncertainty. Markov Decision Processes [5] have been used to model the agent’s environment, with simplifying assumptions regarding either observational uncertainty in the environment or the agent’s observational abilities.

Partially Observable Markov Decision Processes (POMDPs) are equipped to deal with the inherent uncertainty in Personal Assistant domains. Computational complexity has been a major hurdle in deploying POMDPs in real-world application domains, but the emergence of new exact and approximate techniques [8] recently shows much promise in being able to compute a POMDP policy for an agent in real time. In this demonstration, we actually deploy POMDPs to compute the Adjustable Autonomy policy for an agent based on which the agent makes decisions. Integrating such POMDPs with architectures that enble teamwork among personal assistants is then the next key part of our demonstration. Several teamwork models have been developed over the past few years to handle communication and coordination between agents [7]. Machinetta [6] is a proxy-based integration architecture for coordinating teams of heterogeneous entities (e.g. robots, agents, persons), which builds on the STEAM teamwork model. Machinetta is designed to meet key challenges such as effective utilization of diverse capabilities of group members, improving coordination between agents by overcoming challenges posed by the environment and reacting to changes in the environment in a flexible manner. We use Machinetta proxies to co-ordinate the agents in our demonstration. Machinetta enables integrating POMDPs and also enables interfacing with BDI architectures that may provide us team plans. In particular, we interface with the SPARK agent framework [2] being developed at the Artificial Intelligence Center of SRI international. SPARK is a Belief-Desire-Intention (BDI) style agent framework grounded in a model of procedural reasoning. This architecture allows the development of active systems that interact with a constantly changing and unpredictable world. By using BDI-based approaches for generating team plans for agents as well as communication and coordination, and POMDPs for adjustable autonomy decision making, we arrive at a hybrid model for multiagent teamwork [3] in Personal Assistant applications. The following sections describe the application domain in which we deploy this hybrid system as well as the interaction between various components of the system, and its working.

%B Sixth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) Demo Track %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems, AAMAS-2007 %D 2007 %T An Efficient Heuristic Approach for Security Against Multiple Adversaries %A Praveen Paruchuri %A Jonathan P. Pearce %A Tambe, Milind %A Ordonez, Fernando %A Kraus, Sarit %X In adversarial multiagent domains, security, commonly defined as the ability to deal with intentional threats from other agents, is a critical issue. This paper focuses on domains where these threats come from unknown adversaries. These domains can be modeled as Bayesian games; much work has been done on finding equilibria for such games. However, it is often the case in multiagent security domains that one agent can commit to a mixed strategy which its adversaries observe before choosing their own strategies. In this case, the agent can maximize reward by finding an optimal strategy, without requiring equilibrium. Previous work has shown this problem of optimal strategy selection to be NP-hard. Therefore, we present a heuristic called ASAP, with three key advantages to address the problem. First, ASAP searches for the highest-reward strategy, rather than a Bayes-Nash equilibrium, allowing it to find feasible strategies that exploit the natural first-mover advantage of the game. Second, it provides strategies which are simple to understand, represent, and implement. Third, it operates directly on the compact, Bayesian game representation, without requiring conversion to normal form. We provide an efficient Mixed Integer Linear Program (MILP) implementation for ASAP, along with experimental results illustrating significant speedups and higher rewards over other approaches. %B International Conference on Autonomous Agents and Multiagent Systems, AAMAS-2007 %G eng %0 Conference Paper %B AAAI Spring Symposium on Game and Decision-Theoretic Agents %D 2007 %T An Efficient Heuristic for Security Against Multiple Adversaries in Stackelberg Games %A Praveen Paruchuri %A Jonathan P. Pearce %A Tambe, Milind %A Kraus, Sarit %A Ordonez, Fernando %X In adversarial multiagent domains, security, commonly defined as the ability to deal with intentional threats from other agents, is a critical issue. This paper focuses on domains where these threats come from unknown adversaries. These domains can be modeled as Bayesian games; much work has been done on finding equilibria for such games. However, it is often the case in multiagent security domains that one agent can commit to a mixed strategy which its adversaries observe before choosing their own strategies. In this case, the agent can maximize reward by finding an optimal strategy, without requiring equilibrium. Previous work has shown this problem of optimal strategy selection to be NP-hard. Therefore, we present a heuristic called ASAP, with three key advantages to address the problem. First, ASAP searches for the highest-reward strategy, rather than a Bayes-Nash equilibrium, allowing it to find feasible strategies that exploit the natural first-mover advantage of the game. Second, it provides strategies which are simple to understand, represent, and implement. Third, it operates directly on the compact, Bayesian game representation, without requiring conversion to normal form. We provide an efficient Mixed Integer Linear Program (MILP) implementation for ASAP, along with experimental results illustrating significant speedups and higher rewards over other approaches. %B AAAI Spring Symposium on Game and Decision-Theoretic Agents %G eng %0 Magazine Article %D 2007 %T Electric Elves: What went wrong and why %A Tambe, Milind %A Bowring, Emma %A Varakantham, Pradeep %A K. Lerman %A Paul Scerri %A D. V. Pynadath %X Software personal assistants continue to be a topic of significant research interest. This paper outlines some of the important lessons learned from a successfully-deployed team of personal assistant agents (Electric Elves) in an office environment. In the Electric Elves project, a team of almost a dozen personal assistant agents were continually active for seven months. Each elf (agent) represented one person and assisted in daily activities in an actual office environment. This project led to several important observations about privacy, adjustable autonomy, and social norms in office environments. This paper outlines some of the key lessons learned and, more importantly, outlines our continued research to address some of the concerns raised. %B AI Magazine %V 29 %P 23-32 %G eng %N 2 %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2007 %T A Fast Analytical Algorithm for Solving Markov Decision Processes with Continuous Resources %A Janusz Marecki %A Sven Koenig %A Tambe, Milind %X Agents often have to construct plans that obey deadlines or, more generally, resource limits for real-valued resources whose consumption can only be characterized by probability distributions, such as execution time or battery power. These planning problems can be modeled with continuous state Markov decision processes (MDPs) but existing solution methods are either inefficient or provide no guarantee on the quality of the resulting policy. We therefore present CPH, a novel solution method that solves the planning problems by first approximating with any desired accuracy the probability distributions over the resource consumptions with phasetype distributions, which use exponential distributions as building blocks. It then uses value iteration to solve the resulting MDPs by exploiting properties of exponential distributions to calculate the necessary convolutions accurately and efficiently while providing strong guarantees on the quality of the resulting policy. Our experimental feasibility study in a Mars rover domain demonstrates a substantial speedup over Lazy Approximation, which is currently the leading algorithm for solving continuous state MDPs with quality guarantees. %B International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Magazine Article %D 2007 %T An Intelligent Personal Assistant for Task and Time Management %A Karen Myers %A Pauline Berry %A Jim Blythe %A Ken Conley %A Melinda Gervasio %A Deborah Mcguinness %A David Morley %A Avi Pfeffer %A M Pollack %A Tambe, Milind %X We describe an intelligent personal assistant that has been developed to aid a busy knowledge worker in managing time commitments and performing tasks. The design of the system was motivated by the complementary objectives of (a) relieving the user of routine tasks, thus allowing her to focus on tasks that critically require human problem-solving skills, and (b) intervening in situations where cognitive overload leads to oversights or mistakes by the user. The system draws on a diverse set of AI technologies that are linked within a Belief-DesireIntention agent system. Although the system provides a number of automated functions, the overall framework is highly user-centric in its support for human needs, responsiveness to human inputs, and adaptivity to user working style and preferences. %B AI Magazine %V 28 %P 47-61 %G eng %N 2 %0 Thesis %D 2007 %T Keep the Adversary Guessing: Agent Security by Policy Randomization %A Praveen Paruchuri %X Recent advances in the field of agent/multiagent systems brings us closer to agents acting in real world domains, which can be uncertain and many times adversarial. Security, commonly defined as the ability to deal with intentional threats from other agents is a major challenge for agents or agent-teams deployed in these adversarial domains. Such adversarial scenarios arise in a wide variety of situations that are becoming increasingly important such as agents patrolling to provide perimeter security around critical infrastructure or performing routine security checks. These domains have the following characteristics: (a) The agent or agent-team needs to commit to a security policy while the adversaries may observe and exploit the policy committed to. (b) The agent/agent-team potentially faces different types of adversaries and has varying information available about the adversaries (thus limiting the agents’ ability to model its adversaries). To address security in such domains, I developed two types of algorithms. First, when the agent has no model of its adversaries, my key idea is to randomize agent’s policies to minimize the information gained by adversaries. To that end, I developed algorithms for policy randomization for both the Markov Decision Processes (MDPs) and the Decentralized-Partially Observable MDPs (Dec POMDPs). Since arbitrary randomization can violate quality constraints (for example, the resource usage should be below a certain threshold or key areas must be patrolled with a certain frequency), my algorithms guarantee quality constraints on the randomized policies generated. For efficiency, I provide a novel linear program for randomized policy generation in MDPs, and then build on this program for a heuristic solution for Dec-POMDPs. Second, when the agent has partial model of the adversaries, I model the security domain as a Bayesian Stackelberg game where the agent’s model of the adversary includes a probability distribution over possible adversary types. While the optimal policy selection for a Bayesian Stackelberg game is known to be NP-hard, my solution approach based on an efficient Mixed Integer Linear Program (MILP) provides significant speedups over existing approaches while obtaining the optimal solution. The resulting policy randomizes the agent’s possible strategies, while taking into account the probability distribution over adversary types. Finally, I provide experimental results for all my algorithms, illustrating the new techniques developed have enabled us to find optimal secure policies efficiently for an increasingly important class of security domains. %G eng %9 PhD thesis %0 Conference Paper %B Ninth Distributed Constraint Reasoning workshop (DCR) %D 2007 %T KOPT : Distributed DCOP Algorithm for Arbitrary k- optima with Monotonically Increasing Utility %A Hideaki Katagishi %A Jonathan P. Pearce %X A distributed constraint optimization problem (DCOP) is a formalism that captures the rewards and costs of local interactions within a team of agents. Because complete algorithms to solve DCOPs are unsuitable for some dynamic or anytime domains, researchers have explored incomplete DCOP algorithms that result in locally optimal solutions. One type of categorization of such algorithms, and the solutions they produce, is k-optimality; a k-optimal solution is one that cannot be improved by any deviation by k or fewer agents. There are no k-optimal algorithms (k>3) so far. In addition, the quality of solution existing algorithm can produce is fixed. We need different algorithms for different optimality. This paper introduces the first DCOP algorithm which can produce arbitrary k-optimal solutions. %B Ninth Distributed Constraint Reasoning workshop (DCR) %G eng %0 Conference Paper %B Distributed Constraint Reasoning Workshop %D 2007 %T K-optimal algorithms for Distributed Constraint Optimization: Extending to domains with hard constraints %A Jonathan Pearce %A Bowring, Emma %A Christopher Portway %A Tambe, Milind %X Distributed constraint optimization (DCOP) has proven to be a promising approach to address coordination, scheduling and task allocation in largescale multiagent networks, in domains involving sensor networks, teams of unmanned air vehicles, or teams of software personal assistants and others. Locally optimal approaches to DCOP suggest themselves as appropriate for such large-scale multiagent networks, particularly when such networks are accompanied by lack of high-bandwidth communications among agents. K-optimal algorithms provide an important class of these locally optimal algorithms, given analytical results proving quality guarantees. Previous work on koptimality, including its theoretical guarantees, focused exclusively on soft constraints. This paper extends the results to DCOPs with hard constraints. It focuses in particular on DCOPs where such hard constraints are resource constraints which individual agents must not violate. We provide two key results in the context of such DCOPs. First we provide reward-independent lower bounds on the quality of k-optima in the presence of hard (resource) constraints. Second, we present algorithms for k-optimality given hard resource constraints, and present detailed experimental results over DCOP graphs of 1000 agents with varying constraint density. %B Distributed Constraint Reasoning Workshop %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems, AAMAS-2007 %D 2007 %T Letting loose a SPIDER on a network of POMDPs: Generating quality guaranteed policies %A Varakantham, Pradeep %A Janusz Marecki %A Yuichi Yabu %A Tambe, Milind %A Makoto Yokoo %X Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are a popular approach for modeling multi-agent systems acting in uncertain domains. Given the significant complexity of solving distributed POMDPs, particularly as we scale up the numbers of agents, one popular approach has focused on approximate solutions. Though this approach is efficient, the algorithms within this approach do not provide any guarantees on solution quality. A second less popular approach focuses on global optimality, but typical results are available only for two agents, and also at considerable computational cost. This paper overcomes the limitations of both these approaches by providing SPIDER, a novel combination of three key features for policy generation in distributed POMDPs: (i) it exploits agent interaction structure given a network of agents (i.e. allowing easier scale-up to larger number of agents); (ii) it uses a combination of heuristics to speedup policy search; and (iii) it allows quality guaranteed approximations, allowing a systematic tradeoff of solution quality for time. Experimental results show orders of magnitude improvement in performance when compared with previous global optimal algorithms. %B International Conference on Autonomous Agents and Multiagent Systems, AAMAS-2007 %G eng %0 Thesis %D 2007 %T Local Optimization in Cooperative Agent Networks %A Jonathan P. Pearce %X

My research focuses on constructing and analyzing systems of intelligent, autonomous agents. These agents may include people, physical robots, or software programs acting as assistants, teammates, opponents, or trading partners. In a large class of multi-agent scenarios, the effect of local interactions between agents can be compactly represented as a network structure such as a distributed constraint optimization problem (DCOP) for cooperative domains. Collaboration between large groups of agents, given such a network, can be difficult to achieve; often agents can only manage to collaborate in smaller subgroups of a certain size, in order to find a workable solution in a timely manner. The goal of my thesis is to provide algorithms to enable networks of agents that are bounded in this way to quickly find high-quality solutions, as well as theoretical results to understand key properties of these solutions. Relevant domains for my work include personal assistant agents, sensor networks, and teams of autonomous robots. In particular, this thesis considers the case in which agents optimize a DCOP by forming groups of one or more agents until no group of k or fewer agents can possibly improve the solution; we define this type of local optimum, and any algorithm guaranteed to reach such a local optimum, as k-optimal. In this document, I present four key contributions related to k-optimality. The first set of results are worst-case guarantees on the solution quality of k-optima in a DCOP.

These guarantees can help determine an appropriate k-optimal algorithm, or possibly an appropriate constraint graph structure, for agents to use in situations where the cost of coordination between agents must be weighed against the quality of the solution reached. The second set of results are upper bounds on the number of k-optima that can exist in a DCOP. Because each joint action consumes resources, knowing the maximal number of k-optimal joint actions that could exist for a given DCOP allows us to allocate sufficient resources for a given level of k, or, alternatively, choosing an appropriate level of k-optimality, given fixed resource. The third contribution is a set of 2-optimal and 3-optimal algorithms and an experimental analysis of the performance of 1-, 2-, and 3-optimal algorithms on several types of DCOPs. The final contribution of this thesis is a case study of the application of k-optimal DCOP algorithms and solutions to the problem of the formation of human teams spanning multiple organizations. Given a particular specification of a human team (such as a task force to respond to an emergency) and a pool of possible team members, a DCOP can be formulated to match this specification. A set of k-optimal solutions to the DCOP represents a set of diverse, locally optimal options from which a human commander can choose the team that will be used.

%G eng %9 PhD thesis %0 Conference Paper %B Sixth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) %D 2007 %T On Opportunistic Techniques for Solving Decentralized Markov Decision Processes with Temporal Constraints %A Janusz Marecki %A Tambe, Milind %X Decentralized Markov Decision Processes (DEC-MDPs) are a popular model of agent-coordination problems in domains with uncertainty and time constraints but very difficult to solve. In this paper, we improve a state-of-the-art heuristic solution method for DECMDPs, called OC-DEC-MDP, that has recently been shown to scale up to larger DEC-MDPs. Our heuristic solution method, called Value Function Propagation (VFP), combines two orthogonal improvements of OC-DEC-MDP. First, it speeds up OC-DEC-MDP by an order of magnitude by maintaining and manipulating a value function for each state (as a function of time) rather than a separate value for each pair of state and time interval. Furthermore, it achieves better solution qualities than OC-DEC-MDP because, as our analytical results show, it does not overestimate the expected total reward like OC-DEC- MDP. We test both improvements independently in a crisis-management domain as well as for other types of domains. Our experimental results demonstrate a significant speedup of VFP over OC-DEC-MDP as well as higher solution qualities in a variety of situations. %B Sixth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2007 %T Quality Guarantees on k-Optimal Solutions for Distributed Constraint Optimization Problems %A Jonathan P. Pearce %A Tambe, Milind %X A distributed constraint optimization problem (DCOP) is a formalism that captures the rewards and costs of local interactions within a team of agents. Because complete algorithms to solve DCOPs are unsuitable for some dynamic or anytime domains, researchers have explored incomplete DCOP algorithms that result in locally optimal solutions. One type of categorization of such algorithms, and the solutions they produce, is koptimality; a k-optimal solution is one that cannot be improved by any deviation by k or fewer agents. This paper presents the first known guarantees on solution quality for k-optimal solutions. The guarantees are independent of the costs and rewards in the DCOP, and once computed can be used for any DCOP of a given constraint graph structure. %B International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Conference Paper %B AAAI Spring Symposium 2007 %D 2007 %T SPIDER attack on a network of POMDPs: Towards quality bounded solutions %A Varakantham, Pradeep %A Janusz Marecki %A Tambe, Milind %A Makoto Yokoo %X Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are a popular approach for modeling multi-agent systems acting in uncertain domains. Given the significant computational complexity of solving distributed POMDPs, one popular approach has focused on approximate solutions. Though this approach provides for efficient computation of solutions, the algorithms within this approach do not provide any guarantees on the quality of the solutions. A second less popular approach has focused on a global optimal result, but at considerable computational cost. This paper overcomes the limitations of both these approaches by providing SPIDER (Search for Policies In Distributed EnviRonments), which provides quality-guaranteed approximations for distributed POMDPs. SPIDER allows us to vary this quality guarantee, thus allowing us to vary solution quality systematically. SPIDER and its enhancements employ heuristic search techniques for finding a joint policy that satisfies the required bound on the quality of the solution. %B AAAI Spring Symposium 2007 %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2007 %T Towards efficient computation of error bounded solutions in POMDPs: Expected Value Approximation and Dynamic Disjunctive Beliefs %A Varakantham, Pradeep %A Rajiv T. Maheswaran %A Tapana Gupta %A Tambe, Milind %X While POMDPs (partially observable markov decision problems) are a popular computational model with wide-ranging applications, the computational cost for optimal policy generation is prohibitive. Researchers are investigating ever-more efficient algorithms, yet many applications demand such algorithms bound any loss in policy quality when chasing efficiency. To address this challenge, we present two new techniques. The first approximates in the value space to obtain solutions efficiently for a pre-specified error bound. Unlike existing techniques, our technique guarantees the resulting policy will meet this bound. Furthermore, it does not require costly computations to determine the quality loss of the policy. Our second technique prunes large tracts of belief space that are unreachable, allowing faster policy computation without any sacrifice in optimality. The combination of the two techniques, which are complementary to existing optimal policy generation algorithms, provides solutions with tight error bounds efficiently in domains where competing algorithms fail to provide such tight bounds. %B International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Thesis %D 2007 %T Towards Efficient Planning in Real World Partially Observable Domains %A Varakantham, Pradeep %X My research goal is to build large-scale intelligent systems (both single- and multi-agent) that reason with uncertainty in complex, real-world environments. I foresee an integration of such systems in many critical facets of human life ranging from intelligent assistants in hospitals to offices, from rescue agents in large scale disaster response to sensor agents tracking weather phenomena in earth observing sensor webs, and others. In my thesis, I have taken steps towards achieving this goal in the context of systems that operate in partially observable domains that also have transitional (non-deterministic outcomes to actions) uncertainty. Given this uncertainty, Partially Observable Markov Decision Problems (POMDPs) and Distributed POMDPs present themselves as natural choices for modeling these domains. Unfortunately, the significant computational complexity involved in solving POMDPs (PSPACEComplete) and Distributed POMDPs (NEXP-Complete) is a key obstacle. Due to this significant computational complexity, existing approaches that provide exact solutions do not scale, while approximate solutions do not provide any usable guarantees on quality. My thesis addresses these issues using the following key ideas: The first is exploiting structure in the domain. Utilizing the structure present in the dynamics of the domain or the interactions between the agents allows improved efficiency without sacrificing on the quality of the solution. The second is direct approximation in the value space. This allows for calculated approximations at each step of the algorithm, which in turn allows us to provide usable quality guarantees; such quality guarantees may be specified in advance. In contrast, the existing approaches approximate in the belief space leading to an approximation in the value space (indirect approximation in value space), thus making it difficult to compute functional bounds on approximations. In fact, these key ideas allow for the efficient computation of optimal and quality bounded solutions to complex, large-scale problems, that were not in the purview of existing algorithms. %G eng %9 PhD thesis %0 Conference Paper %B The Twenty-First National Conference on Artificial Intelligence (AAAI-06) %D 2006 %T Analysis of Privacy Loss in Distributed Constraint Optimization %A Rachel Greenstadt %A Jonathan P. Pearce %A Tambe, Milind %X Distributed Constraint Optimization (DCOP) is rapidly emerging as a prominent technique for multiagent coordination. However, despite agent privacy being a key motivation for applying DCOPs in many applications, rigorous quantitative evaluations of privacy loss in DCOP algorithms have been lacking. Recently, [Maheswaran et al.2005] introduced a framework for quantitative evaluations of privacy in DCOP algorithms, showing that some DCOP algorithms lose more privacy than purely centralized approaches and questioning the motivation for applying DCOPs. This paper addresses the question of whether state-of-the art DCOP algorithms suffer from a similar shortcoming by investigating several of the most efficient DCOP algorithms, including both DPOP and ADOPT. Furthermore, while previous work investigated the impact on efficiency of distributed contraint reasoning design decisions (e.g. constraint-graph topology, asynchrony, message-contents), this paper examines the privacy aspect of such decisions, providing an improved understanding of privacy-efficiency tradeoffs. %B The Twenty-First National Conference on Artificial Intelligence (AAAI-06) %G eng %0 Conference Paper %B WORKSHOP on programming multiagent systems %D 2006 %T Asimovian Multiagents: Applying Laws of Robotics to Teams of Humans and Agents %A Nathan Schurr %A Varakantham, Pradeep %A Bowring, Emma %A Tambe, Milind %A Grosz, Barbara %X In the March 1942 issue of “Astounding Science Fiction”, Isaac Asimov for the first time enumerated his three laws of robotics. Decades later, researchers in agents and multiagent systems have begun to examine these laws for providing a useful set of guarantees on deployed agent systems. Motivated by unexpected failures or behavior degradations in complex mixed agent-human teams, this paper for the first time focuses on applying Asimov’s first two laws to provide behavioral guarantees in such teams. However, operationalizing these laws in the context of such mixed agent-human teams raises three novel issues. First, while the laws were originally written for interaction of an individual robot and an individual human, clearly, our systems must operate in a team context. Second, key notions in these laws (e.g. causing “harm” to humans) are specified in very abstract terms and must be specified in concrete terms in implemented systems. Third, since removed from science-fiction, agents or humans may not have perfect information about the world, they must act based on these laws despite uncertainty of information. Addressing this uncertainty is a key thrust of this paper, and we illustrate that agents must detect and overcome such states of uncertainty while ensuring adherence to Asimov’s laws. We illustrate the results of two different domains that each have different approaches to operationalizing Asimov’s laws. %B WORKSHOP on programming multiagent systems %G eng %0 Conference Paper %B AAAI Spring Symposium %D 2006 %T Electric Elves: What Went Wrong and Why %A Tambe, Milind %A Bowring, Emma %A Jonathan P. Pearce %A Varakantham, Pradeep %A Paul Scerri %A D V. Pynadath %X Software personal assistants continue to be a topic of significant research interest. This paper outlines some of the important lessons learned from a successfully-deployed team of personal assistant agents (Electric Elves) in an office environment. These lessons have important implications for similar on-going research projects. The Electric Elves project was a team of almost a dozen personal assistant agents which were continually active for seven months. Each elf (agent) represented one person and assisted in daily activities in an actual office environment. This project led to several important observations about privacy, adjustable autonomy, and social norms in office environments. This paper outlines some of the key lessons learned and, more importantly, outlines our continued research to address some of the concerns raised. %B AAAI Spring Symposium %G eng %0 Conference Paper %B Fifth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2006 %T Experimental analysis of privacy loss in DCOP algorithms (short paper) %A Rachel Greenstadt %A Jonathan P. Pearce %A Bowring, Emma %A Tambe, Milind %X Distributed Constraint Optimization (DCOP) is rapidly emerging as a prominent technique for multiagent coordination. Unfortunately, rigorous quantitative evaluations of privacy loss in DCOP algorithms have been lacking despite the fact that agent privacy is a key motivation for applying DCOPs in many applications. Recently, Maheswaran et al. [3, 4] introduced a framework for quantitative evaluations of privacy in DCOP algorithms, showing that early DCOP algorithms lose more privacy than purely centralized approaches and questioning the motivation for applying DCOPs. Do state-of-the art DCOP algorithms suffer from a similar shortcoming? This paper answers that question by investigating the most efficient DCOP algorithms, including both DPOP and ADOPT. %B Fifth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B AAAI Spring Symposium on Distributed Planning and Scheduling %D 2006 %T Exploiting Locality of Interaction in Networked Distributed POMDPs %A Yoonheui Kim %A Ranjit Nair %A Varakantham, Pradeep %A Tambe, Milind %A Makoto Yokoo %X In many real-world multiagent applications such as distributed sensor nets, a network of agents is formed based on each agent’s limited interactions with a small number of neighbors. While distributed POMDPs capture the real-world uncertainty in multiagent domains, they fail to exploit such locality of interaction. Distributed constraint optimization (DCOP) captures the locality of interaction but fails to capture planning under uncertainty. In previous work, we presented a model synthesized from distributed POMDPs and DCOPs, called Networked Distributed POMDPs (ND-POMDPs). Also, we presented LID-JESP (locally interacting distributed joint equilibrium-based search for policies: a distributed policy generation algorithm based on DBA (distributed breakout algorithm). In this paper, we present a stochastic variation of the LID-JESP that is based on DSA (distributed stochastic algorithm) that allows neighboring agents to change their policies in the same cycle. Through detailed experiments, we show how this can result in speedups without a large difference in solution quality. We also introduce a technique called hyper-linkbased decomposition that allows us to exploit locality of interaction further, resulting in faster run times for both LID-JESP and its stochastic variant without any loss in solution quality. %B AAAI Spring Symposium on Distributed Planning and Scheduling %G eng %0 Book Section %B Programming Multiagent Systems (PROMAS) %D 2006 %T Implementation Techniques for solving POMDPs in Personal Assistant Domains %X Agents or agent teams deployed to assist humans often face the challenges of monitoring the state of key processes in their environment (including the state of their human users themselves) and making periodic decisions based on such monitoring. POMDPs appear well suited to enable agents to address these challenges, given the uncertain environment and cost of actions, but optimal policy generation for POMDPs is computationally expensive. This paper introduces two key implementation techniques (one exact and one approximate), where the policy computation is restricted to the belief space polytope that remains reachable given the progress structure of a domain. One technique uses Lagrangian methods to compute tighter bounds on belief space support in polynomial time, while the other technique is based on approximating policy vectors in dense policy regions of the bounded belief polytope. We illustrate this by enhancing two of the fastest existing algorithms for exact POMDP policy generation. The order of magnitude speedups demonstrate the utility of our implementation techniques in facilitating the deployment of POMDPs within agents assisting human users. %B Programming Multiagent Systems (PROMAS) %I Springer Press %G eng %0 Conference Paper %B AAAI Spring Symposium on Distributed Planning and Scheduling %D 2006 %T Multiply-Constrained DCOP for Distributed Planning and Scheduling %A Bowring, Emma %A Tambe, Milind %A Makoto Yokoo %X Distributed constraint optimization (DCOP) has emerged as a useful technique for multiagent planning and scheduling. While previous DCOP work focuses on optimizing a single team objective, in many domains, agents must satisfy additional constraints on resources consumed locally (due to interactions within their local neighborhoods). Such local resource constraints may be required to be private or shared for efficiency’s sake. This paper provides a novel multiplyconstrained DCOP algorithm for addressing these domains. This algorithm is based on mutually-intervening search, i.e. using local resource constraints to intervene in the search for the optimal solution and vice versa, realized via three key ideas: (i) transforming n-ary constraints via virtual variables to maintain privacy; (ii) dynamically setting upper bounds on joint resource consumption with neighbors; and (iii) identifying if the local DCOP graph structure allows agents to compute exact resource bounds for additional efficiency. These ideas are implemented by modifying Adopt, one of the most efficient DCOP algorithms. Both detailed experimental results as well as proofs of correctness are presented. %B AAAI Spring Symposium on Distributed Planning and Scheduling %G eng %0 Conference Paper %B Fifth International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS) %D 2006 %T Multiply-Constrained Distributed Constraint Optimization %A Bowring, Emma %A Tambe, Milind %A Makoto Yokoo %X Distributed constraint optimization (DCOP) has emerged as a useful technique for multiagent coordination. While previous DCOP work focuses on optimizing a single team objective, in many domains, agents must satisfy additional constraints on resources consumed locally (due to interactions within their local neighborhoods). Such resource constraints may be required to be private or shared for efficiency’s sake. This paper provides a novel multiply-constrained DCOP algorithm for addressing these domains which is based on mutually-intervening search, i.e. using local resource constraints to intervene in the search for the optimal solution and vice versa. It is realized through three key ideas: (i) transforming n-ary constraints to maintain privacy; (ii) dynamically setting upper bounds on joint resource consumption with neighbors; and (iii) identifying if the local DCOP graph structure allows agents to compute exact resource bounds for additional efficiency. These ideas are implemented by modifying Adopt, one of the most efficient DCOP algorithms. Both detailed experimental results as well as proofs of correctness are presented. %B Fifth International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS) %G eng %0 Conference Paper %B Computer society of India Communications (Invited) %D 2006 %T Mutiagent Teamwork: Hybrid Approaches %A Praveen Paruchuri %A Bowring, Emma %A Ranjit Nair %A Jonathan P. Pearce %A Nathan Schurr %A Tambe, Milind %A Varakantham, Pradeep %X Today within the multiagent community, we see at least four competing methods to building multiagent systems: beliefdesire-intention (BDI), distributed constraint optimization (DCOP), distributed POMDPs, and auctions or game-theoretic methods. While there is exciting progress within each approach, there is a lack of cross-cutting research. This article highlights the various hybrid techniques for multiagent teamwork developed by the teamcore group. In particular, for the past decade, the TEAMCORE research group has focused on building agent teams in complex, dynamic domains. While our early work was inspired by BDI, we will present an overview of recent research that uses DCOPs and distributed POMDPs in building agent teams. While DCOP and distributed POMDP algorithms provide promising results, hybrid approaches allow us to use the complementary strengths of different techniques to create algorithms that perform better than either of their component algorithms alone. For example, in the BDI-POMDP hybrid approach, BDI team plans are exploited to improve POMDP tractability, and POMDPs improve BDI team plan performance. %B Computer society of India Communications (Invited) %G eng %0 Journal Article %J Journal of Autonomous Agents and Multiagent Systems (JAAMAS) %D 2006 %T Privacy Loss in Distributed Constraint Reasoning: A Quantitative Framework for Analysis and its Applications %A Rajiv T. Maheswaran %A Jonathan P. Pearce %A Bowring, Emma %A Varakantham, Pradeep %A Tambe, Milind %X It is critical that agents deployed in real-world settings, such as businesses, offices, universities and research laboratories, protect their individual users’ privacy when interacting with other entities. Indeed, privacy is recognized as a key motivating factor in the design of several multiagent algorithms, such as in distributed constraint reasoning (including both algorithms for distributed constraint optimization (DCOP) and distributed constraint satisfaction (DisCSPs)), and researchers have begun to propose metrics for analysis of privacy loss in such multiagent algorithms. Unfortunately, a general quantitative framework to compare these existing metrics for privacy loss or to identify dimensions along which to construct new metrics is currently lacking. This paper presents three key contributions to address this shortcoming. First, the paper presents VPS (Valuations of Possible States), a general quantitative framework to express, analyze and compare existing metrics of privacy loss. Based on a state-space model, VPS is shown to capture various existing measures of privacy created for specific domains of DisCSPs. The utility of VPS is further illustrated through analysis of privacy loss in DCOP algorithms, when such algorithms are used by personal assistant agents to schedule meetings among users. In addition, VPS helps identify dimensions along which to classify and construct new privacy metrics and it also supports their quantitative comparison. Second, the article presents key inference rules that may be used in analysis of privacy loss in DCOP algorithms under different assumptions. Third, detailed experiments based on the VPS-driven analysis lead to the following key results: (i) decentralization by itself does not provide superior protection of privacy in DisCSP/DCOP algorithms when compared with centralization; instead, privacy protection also requires the presence of uncertainty about agents’ knowledge of the constraint graph. (ii) one needs to carefully examine the metrics chosen to measure privacy loss; the qualitative properties of privacy loss and hence the conclusions that can be drawn about an algorithm can vary widely based on the metric chosen. This paper should thus serve as a call to arms for further privacy research, particularly within the DisCSP/DCOP arena. %B Journal of Autonomous Agents and Multiagent Systems (JAAMAS) %V 13 %P 27 - 60 %G eng %0 Conference Paper %B Fifth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2006 %T Security in Multiagent Systems by Policy Randomization %A Praveen Paruchuri %A Tambe, Milind %A Ordonez, Fernando %A Kraus, Sarit %X Security in multiagent systems is commonly defined as the ability of the system to deal with intentional threats from other agents. This paper focuses on domains where such intentional threats are caused by unseen adversaries whose actions or payoffs are unknown. In such domains, action randomization can effectively deteriorate an adversary’s capability to predict and exploit an agent/agent team’s actions. Unfortunately, little attention has been paid to intentional randomization of agents’ policies in single-agent or decentralized (PO)MDPs without significantly sacrificing rewards or breaking down coordination. This paper provides two key contributions to remedy this situation. First, it provides three novel algorithms, one based on a non-linear program and two based on linear programs (LP), to randomize single-agent policies, while attaining a certain level of expected reward. Second, it provides Rolling Down Randomization (RDR), a new algorithm that efficiently generates randomized policies for decentralized POMDPs via the single-agent LP method. %B Fifth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B Fifth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2006 %T Solution Sets for DCOPs and Graphical Games %A Jonathan P. Pearce %A Rajiv T. Maheswaran %A Tambe, Milind %X A distributed constraint optimization problem (DCOP) is a formalism that captures the rewards and costs of local interactions within a team of agents, each of whom is choosing an individual action. When rapidly selecting a single joint action for a team, we typically solve DCOPs (often using locally optimal algorithms) to generate a single solution. However, in scenarios where a set of joint actions (i.e. a set of assignments to a DCOP) is to be generated, metrics are needed to help appropriately select this set and efficiently allocate resources for the joint actions in the set. To address this need, we introduce k-optimality, a metric that captures the desirable properties of diversity and relative quality of a set of locally-optimal solutions using a parameter that can be tuned based on the level of these properties required. To achieve effective resource allocation for this set, we introduce several upper bounds on the cardinalities of k-optimal joint action sets. These bounds are computable in constant time if we ignore the graph structure, but tighter, graphbased bounds are feasible with higher computation cost. Bounds help choose the appropriate level of k-optimality for settings with fixed resources and help determine appropriate resource allocation for settings where a fixed level of k-optimality is desired. In addition, our bounds for a 1-optimal joint action set for a DCOP also apply to the number of pure-strategy Nash equilibria in a graphical game of noncooperative agents. %B Fifth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B Fifth International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS) Industry Track %D 2006 %T Using Multiagent Teams to Improve the Training of Incident Commanders %A Nathan Schurr %A Pratik Patil %A Fred Pighin %A Tambe, Milind %X The DEFACTO system is a multiagent based tool for training incident commanders for large scale disasters. In this paper, we highlight some of the lessons that we have learned from our interaction with the Los Angeles Fire Department (LAFD) and how they have affected the way that we continued the design of our training system. These lessons were gleaned from LAFD feedback and initial training exercises and they include: system design, visualization, improving trainee situational awareness, adjusting training level of difficulty and situation scale. We have taken these lessons and used them to improve the DEFACTO system’s training capabilities. We have conducted initial training exercises to illustrate the utility of the system in terms of providing useful feedback to the trainee. %B Fifth International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS) Industry Track %G eng %0 Conference Paper %B Fifth International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS) %D 2006 %T Winning back the CUP for distributed POMDPs: Planning over continuous belief spaces %A Varakantham, Pradeep %A Ranjit Nair %A Tambe, Milind %A Makoto Yokoo %X Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are evolving as a popular approach for modeling multiagent systems, and many different algorithms have been proposed to obtain locally or globally optimal policies. Unfortunately, most of these algorithms have either been explicitly designed or experimentally evaluated assuming knowledge of a starting belief point, an assumption that often does not hold in complex, uncertain domains. Instead, in such domains, it is important for agents to explicitly plan over continuous belief spaces. This paper provides a novel algorithm to explicitly compute finite horizon policies over continuous belief spaces, without restricting the space of policies. By marrying an efficient single-agent POMDP solver with a heuristic distributed POMDP policy-generation algorithm, locally optimal joint policies are obtained, each of which dominates within a different part of the belief region. We provide heuristics that significantly improve the efficiency of the resulting algorithm and provide detailed experimental results. To the best of our knowledge, these are the first run-time results for analytically generating policies over continuous belief spaces in distributed POMDPs. %B Fifth International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS) %G eng %0 Conference Paper %B Fourth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2005 %T Allocating tasks in extreme teams %A Paul Scerri %A A. Farinelli %A Steven Okamoto %A Tambe, Milind %X Extreme teams, large-scale agent teams operating in dynamic environments, are on the horizon. Such environments are problematic for current task allocation algorithms due to the lack of locality in agent interactions. We propose a novel distributed task allocation algorithm for extreme teams, called LA-DCOP, that incorporates three key ideas. First, LA-DCOP’s task allocation is based on a dynamically computed minimum capability threshold which uses approximate knowledge of overall task load. Second, LA-DCOP uses tokens to represent tasks and further minimize communication. Third, it creates potential tokens to deal with inter-task constraints of simultaneous execution. We show that LA-DCOP convincingly outperforms competing distributed task allocation algorithms while using orders of magnitude fewer messages, allowing a dramatic scale-up in extreme teams, upto a fully distributed, proxybased team of 200 agents. Varying threshold are seen as a key to outperforming competing distributed algorithms in the domain of simulated disaster rescue. %B Fourth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B Fourth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2005 %T Conflicts in teamwork: Hybrids to the rescue %A Tambe, Milind %A Bowring, Emma %A H. Jung %A Gal Kaminka %A Rajiv T. Maheswaran %A Janusz Marecki %A Pragnesh J. Modi %A Ranjit Nair %A Steven Okamoto %A Jonathan P. Pearce %A Praveen Paruchuri %A D. V. Pynadath %A Paul Scerri %A Nathan Schurr %A Varakantham, Pradeep %X Today within the AAMAS community, we see at least four competing approaches to building multiagent systems: beliefdesire-intention (BDI), distributed constraint optimization (DCOP), distributed POMDPs, and auctions or game-theoretic approaches. While there is exciting progress within each approach, there is a lack of cross-cutting research. This paper highlights hybrid approaches for multiagent teamwork. In particular, for the past decade, the TEAMCORE research group has focused on building agent teams in complex, dynamic domains. While our early work was inspired by BDI, we will present an overview of recent research that uses DCOPs and distributed POMDPs in building agent teams. While DCOP and distributed POMDP algorithms provide promising results, hybrid approaches help us address problems of scalability and expressiveness. For example, in the BDI-POMDP hybrid approach, BDI team plans are exploited to improve POMDP tractability, and POMDPs improve BDI team plan performance. We present some recent results from applying this approach in a Disaster Rescue simulation domain being developed with help from the Los Angeles Fire Department. %B Fourth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B Fourth International Joint Conference Poster on Autonomous Agents and Multiagent Systems (AAMAS) %D 2005 %T How Local Is That Optimum? k-optimality for DCOP %A Jonathan P. Pearce %A Rajiv T. Maheswaran %A Tambe, Milind %X In multi-agent systems where sets of joint actions (JAs) are generated, metrics are needed to evaluate these sets and efficiently allocate resources for the many JAs. For the case where a JA set can be represented by multiple solutions to a DCOP, we introduce koptimality as a metric that captures desirable properties of diversity and relative quality, and apply results from coding theory to obtain upper bounds on cardinalities of k-optimal JA sets. These bounds can help choose the appropriate level of k-optimality for settings with fixed resources and help determine appropriate resource allocation for settings where a fixed level of k-optimality is desired. %B Fourth International Joint Conference Poster on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI) %D 2005 %T Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs %A Nair, R. %A P. Varakantam %A M. Tambe %A M. Yokoo %X In many real-world multiagent applications such as distributed sensor nets, a network of agents is formed based on each agent’s limited interactions with a small number of neighbors. While distributed POMDPs capture the realworld uncertainty in multiagent domains, they fail to exploit such locality of interaction. Distributed constraint optimization (DCOP) captures the locality of interaction but fails to capture planning under uncertainty. This paper present a new model synthesized from distributed POMDPs and DCOPs, called Networked Distributed POMDPs (ND-POMDPs). Exploiting network structure enables us to present a distributed policy generation algorithm that performs local search. %B International Joint Conference on Artificial Intelligence (IJCAI) %G eng %0 Conference Paper %B Twentieth AAAI-05 National Conference on Artificial Intelligence %D 2005 %T Networked Distributed POMDPs: A Synthesis of Distributed Constraint Optimization and POMDPs %A Ranjit Nair %A Varakantham, Pradeep %A Tambe, Milind %A Makoto Yokoo %X In many real-world multiagent applications such as distributed sensor nets, a network of agents is formed based on each agent’s limited interactions with a small number of neighbors. While distributed POMDPs capture the real-world uncertainty in multiagent domains, they fail to exploit such locality of interaction. Distributed constraint optimization (DCOP) captures the locality of interaction but fails to capture planning under uncertainty. This paper present a new model synthesized from distributed POMDPs and DCOPs, called Networked Distributed POMDPs (ND-POMDPs). Exploiting network structure enables us to present two novel algorithms for ND-POMDPs: a distributed policy generation algorithm that performs local search and a systematic policy search that is guaranteed to reach the global optimal. %B Twentieth AAAI-05 National Conference on Artificial Intelligence %G eng %0 Conference Paper %B Fourth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2005 %T Valuations of Possible States (VPS): A Quantitative Framework for Analysis of Privacy Loss Among Collaborative Personal Assistant Agents %A Rajiv T. Maheswaran %A Jonathan P. Pearce %A Varakantham, Pradeep %A Bowring, Emma %A Tambe, Milind %X For agents deployed in real-world settings, such as businesses, universities and research laboratories, it is critical that agents protect their individual users’ privacy when interacting with others entities. Indeed, privacy is recognized as a key motivating factor in design of several multiagent algorithms, such as distributed constraint optimization (DCOP) algorithms. Unfortunately, rigorous and general quantitative metrics for analysis and comparison of such multiagent algorithms with respect to privacy loss are lacking. This paper takes a key step towards developing a general quantitative model from which one can analyze and generate metrics of privacy loss by introducing the VPS (Valuations of Possible States) framework. VPS is shown to capture various existing measures of privacy created for specific domains of distributed constraint satisfactions problems (DCSPs). The utility of VPS is further illustrated via analysis of DCOP algorithms, when such algorithms are used by personal assistant agents to schedule meetings among users. In addition, VPS allows us to quantitatively evaluate the properties of several privacy metrics generated through qualitative notions. We obtain the unexpected result that decentralization does not automatically guarantee superior protection of privacy. %B Fourth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Conference Paper %B AAAI Spring Symposium on Homeland Security %D 2005 %T The Future of Disaster Response: Humans Working with Multiagent Teams using DEFACTO %A Nathan Schurr %A Janusz Marecki %A Tambe, Milind %A J. P. Lewis %A N. Kasinadhuni %X When addressing terrorist threats we must give special attention to both prevention and disaster response. Enabling effective interactions between agent teams and humans for disaster response is a critical area of research, with encouraging progress in the past few years. However, previous work suffers from two key limitations: (i) limited human situational awareness, reducing human effectiveness in directing agent teams and (ii) the agent team’s rigid interaction strategies that limit team performance. This paper focuses on a novel disaster response software prototype, called DEFACTO (Demonstrating Effective Flexible Agent Coordination of Teams through Omnipresence). DEFACTO is based on a software proxy architecture and 3D visualization system, which addresses the two limitations described above. First, the 3D visualization interface enables human virtual omnipresence in the environment, improving human situational awareness and ability to assist agents. Second, generalizing past work on adjustable autonomy, the agent team chooses among a variety of “team-level” interaction strategies, even excluding humans from the loop in extreme circumstances. %B AAAI Spring Symposium on Homeland Security %C Stanford, CA %G eng %0 Conference Paper %B AAAI Spring Symposium on Persistent Assistants: Living and Working with AI %D 2005 %T Optimize My Schedule but Keep It Flexible: Distributed Multi-Criteria Coordination for Personal Assistants %X Research projects have begun focusing on deploying personal assistant agents to coordinate users in such diverse environments as offices, distributed manufacturing or design centers, and in support of first responders for emergencies. In such environments, distributed constraint optimization (DCOP) has emerged as a key technology for multiple collaborative assistants to coordinate with each other. Unfortunately, while previous work in DCOP only focuses on coordination in service of optimizing a single global team objective, personal assistants often require satisfying additional individual userspecified criteria. This paper provides a novel DCOP algorithm that enables personal assistants to engage in such multicriteria coordination while maintaining the privacy of their additional criteria. It uses n-ary NOGOODS implemented as private variables to achieve this. In addition, we’ve developed an algorithm that reveals only the individual criteria of a link and can speed up performance for certain problem structures. The key idea in this algorithm is that interleaving the criteria searches — rather than sequentially attempting to satisfy the criteria — improves efficiency by mutually constraining the distributed search for solutions. These ideas are realized in the form of private-g and public-g Multi-criteria ADOPT, built on top of ADOPT, one of the most efficient DCOP algorithms. We present our detailed algorithm, as well as some experimental results in personal assistant domains. %B AAAI Spring Symposium on Persistent Assistants: Living and Working with AI %C Menlo Park, CA %G eng %0 Conference Paper %B AAAI Spring Symposium on Persistent Assistants: Living and Working with AI %D 2005 %T Valuations of Possible States (VPS): A Quantitative Framework for Analysis of Privacy Loss Among Collaborative Personal Assistant Agents %A Rajiv T. Maheswaran %A Jonathan P. Pearce %A Varakantham, Pradeep %A Bowring, Emma %A Tambe, Milind %X For agents deployed in real-world settings, such as businesses, universities and research laboratories, it is critical that agents protect their individual users’ privacy when interacting with others entities. Indeed, privacy is recognized as a key motivating factor in design of several multiagent algorithms, such as distributed constraint optimization (DCOP) algorithms. Unfortunately, rigorous and general quantitative metrics for analysis and comparison of such multiagent algorithms with respect to privacy loss are lacking. This paper takes a key step towards developing a general quantitative model from which one can analyze and generate metrics of privacy loss by introducing the VPS (Valuations of Possible States) framework. VPS is shown to capture various existing measures of privacy created for specific domains of distributed constraint satisfactions problems (DCSPs). The utility of VPS is further illustrated via analysis of DCOP algorithms, when such algorithms are used by personal assistant agents to schedule meetings among users. In addition, VPS allows us to quantitatively evaluate the properties of several privacy metrics generated through qualitative notions. We obtain the unexpected result that decentralization does not automatically guarantee superior protection of privacy %B AAAI Spring Symposium on Persistent Assistants: Living and Working with AI %C Menlo Park, CA %G eng %0 Conference Paper %B International Central and Eastern European Conference on Multi-Agent Systems (CEEMAS'05) %D 2005 %T Communication in distributed constraint satisfaction problems %A H. Jung %A Tambe, Milind %X Distributed Constraint Satisfaction Problems (DCSP) is a general framework for multi-agent coordination and conflict resolution. In most DCSP algorithms, inter-agent communication is restricted to only exchanging values of variables, since any additional information-exchange is assumed to lead to significant communication overheads and to a breach of privacy. This paper provides a detailed experimental investigation of the impact of inter-agent exchange of additional legal values among agents, within a collaborative setting. We provide a new run-time model that takes into account the overhead of the additional communication in various computing and networking environments. Our investigation of more than 300 problem settings with the new run-time model (i) shows that DCSP strategies with additional information-exchange can lead to big speedups in a significant range of settings; and (ii) provides categorization of problem settings with big speedups by the DCSP strategies based on extra communication, enabling us to selectively apply the strategies to a given domain. This paper not only provides a useful method for performance measurement to the DCSP community, but also shows the utility of additional communication in DCSP. %B International Central and Eastern European Conference on Multi-Agent Systems (CEEMAS'05) %G eng %0 Book Section %B Programming Multiagent Systems %D 2005 %T The DEFACTO System: Coordinating Human-Agent Teams for the Future of Disaster Response %A Nathan Schurr %A Janusz Marecki %A Paul Scerri %A J. P. Lewis %A Tambe, Milind %X Enabling effective interactions between agent teams and humans for disaster response is a critical area of research, with encouraging progress in the past few years. However, previous work suffers from two key limitations: (i) limited human situational awareness, reducing human effectiveness in directing agent teams and (ii) the agent team’s rigid interaction strategies that limit team performance. This paper presents a software prototype called DEFACTO (Demonstrating Effective Flexible Agent Coordination of Teams through Omnipresence). DEFACTO is based on a software proxy architecture and 3D visualization system, which addresses the two limitations described above. First, the 3D visualization interface enables human virtual omnipresence in the environment, improving human situational awareness and ability to assist agents. Second, generalizing past work on adjustable autonomy, the agent team chooses among a variety of “team-level” interaction strategies, even excluding humans from the loop in extreme circumstances. %B Programming Multiagent Systems %I Springer Press %G eng %0 Conference Paper %B Innovative Applications of Artificial Intelligence (IAAI'05) %D 2005 %T The DEFACTO System: Training Tool for Incident Commanders %A Nathan Schurr %A Janusz Marecki %A Paul Scerri %A J. P. Lewis %A Tambe, Milind %X Techniques for augmenting the automation of routine coordination are rapidly reaching a level of effectiveness where they can simulate realistic coordination on the ground for large numbers of emergency response entities (e.g. fire engines, police cars) for the sake of training. Furthermore, it seems inevitable that future disaster response systems will utilize such technology. We have constructed a new system, DEFACTO (Demonstrating Effective Flexible Agent Coordination of Teams through Omnipresence), that integrates stateof-the-art agent reasoning capabilities and 3D visualization into a unique high fidelity system for training incident commanders. The DEFACTO system achieves this goal via three main components: (i) Omnipresent Viewer - intuitive interface, (ii) Proxy Framework - for team coordination, and (iii) Flexible Interaction - between the incident commander and the team. We have performed detailed preliminary experiments with DEFACTO in the fire-fighting domain. In addition, DEFACTO has been repeatedly demonstrated to key police and fire department personnel in Los Angeles area, with very positive feedback. %B Innovative Applications of Artificial Intelligence (IAAI'05) %G eng %0 Conference Paper %B International Conference on Autonomous Agents and Multiagent Systems, AAMAS %D 2005 %T Exploiting Belief Bounds: Practical POMDPs for Personal Assistant Agents %A Varakantham, Pradeep %A Rajiv T. Maheswaran %A Tambe, Milind %X Agents or agent teams deployed to assist humans often face the challenges of monitoring the state of key processes in their environment (including the state of their human users themselves) and making periodic decisions based on such monitoring. POMDPs appear well suited to enable agents to address these challenges, given the uncertain environment and cost of actions, but optimal policy generation for POMDPs is computationally expensive. This paper introduces three key techniques to speedup POMDP policy generation that exploit the notion of progress or dynamics in personal assistant domains. Policy computation is restricted to the belief space polytope that remains reachable given the progress structure of a domain. We introduce new algorithms; particularly one based on applying Lagrangian methods to compute a bounded belief space support in polynomial time. Our techniques are complementary to many existing exact and approximate POMDP policy generation algorithms. Indeed, we illustrate this by enhancing two of the fastest existing algorithms for exact POMDP policy generation. The order of magnitude speedups demonstrate the utility of our techniques in facilitating the deployment of POMDPs within agents assisting human users. %B International Conference on Autonomous Agents and Multiagent Systems, AAMAS %G eng %0 Journal Article %J Journal of AI Research (JAIR) %D 2005 %T Hybrid BDI-POMDP Framework for Multiagent Teaming %A Ranjit Nair %A Tambe, Milind %X Many current large-scale multiagent team implementations can be characterized as following the “belief-desire-intention” (BDI) paradigm, with explicit representation of team plans. Despite their promise, current BDI team approaches lack tools for quantitative performance analysis under uncertainty. Distributed partially observable Markov decision problems (POMDPs) are well suited for such analysis, but the complexity of finding optimal policies in such models is highly intractable. The key contribution of this article is a hybrid BDI-POMDP approach, where BDI team plans are exploited to improve POMDP tractability and POMDP analysis improves BDI team plan performance. Concretely, we focus on role allocation, a fundamental problem in BDI teams: which agents to allocate to the different roles in the team. The article provides three key contributions. First, we describe a role allocation technique that takes into account future uncertainties in the domain; prior work in multiagent role allocation has failed to address such uncertainties. To that end, we introduce RMTDP (Role-based Markov Team Decision Problem), a new distributed POMDP model for analysis of role allocations. Our technique gains in tractability by significantly curtailing RMTDP policy search; in particular, BDI team plans provide incomplete RMTDP policies, and the RMTDP policy search fills the gaps in such incomplete policies by searching for the best role allocation. Our second key contribution is a novel decomposition technique to further improve RMTDP policy search efficiency. Even though limited to searching role allocations, there are still combinatorially many role allocations, and evaluating each in RMTDP to identify the best is extremely difficult. Our decomposition technique exploits the structure in the BDI team plans to significantly prune the search space of role allocations. Our third key contribution is a significantly faster policy evaluation algorithm suited for our BDI-POMDP hybrid approach. Finally, we also present experimental results from two domains: mission rehearsal simulation and RoboCupRescue disaster rescue simulation. %B Journal of AI Research (JAIR) %V 23 %P 367-420 %G eng %0 Conference Paper %B AAAI Spring Symposium %D 2005 %T Practical POMDPs for Personal Assistant Domains %A Varakantham, Pradeep %A Rajiv T. Maheswaran %A Tambe, Milind %X Agents or agent teams deployed to assist humans often face the challenge of monitoring state of key processes in their environment, including the state of their human users, and making periodic decisions based on such monitoring. The challenge is particularly difficult given the significant observational uncertainty, and uncertainty in the outcome of agent’s actions. POMDPs (partially observable markov decision problems) appear well-suited to enable agents to address such uncertainties and costs; yet slow run-times in generating optimal POMDP policies presents a significant hurdle. This slowness can be attributed to cautious planning for all possible belief states, e.g., the uncertainty in the monitored process is assumed to range over all possible states at all times. This paper introduces three key techniques to speedup POMDP policy generation that exploit the notion of progress or dynamics in personal assistant domains. The key insight is that given an initial (possibly uncertain) starting set of states, the agent needs to be prepared to act only in a limited range of belief states; most other belief states are simply unreachable given the dynamics of the monitored process, and no policy needs to be generated for such belief states. The techniques we propose are complementary to most existing exact and approximate POMDP policy generation algorithms. Indeed, we illustrate our technique by enhancing generalized incremental pruning (GIP), one of the most efficient exact algorithms for POMDP policy generation and illustrate orders of magnitude speedup in policy generation. Such speedup would facilitate agents’ deploying POMDPs in assisting human users. %B AAAI Spring Symposium %G eng %0 Conference Paper %B Fourth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) %D 2005 %T Preprocessing Techniques for Accelerating the DCOP Algorithm ADOPT %A Syed M. Ali %A Sven Koenig %A Tambe, Milind %X Methods for solving Distributed Constraint Optimization Problems (DCOP) have emerged as key techniques for distributed reasoning. Yet, their application faces significant hurdles in many multiagent domains due to their inefficiency. Preprocessing techniques have successfully been used to speed up algorithms for centralized constraint satisfaction problems. This paper introduces a framework of different preprocessing techniques that are based on dynamic programming and speed up ADOPT, an asynchronous complete and optimal DCOP algorithm. We investigate when preprocessing is useful and which factors influence the resulting speedups in two DCOP domains, namely graph coloring and distributed sensor networks. Our experimental results demonstrate that our preprocessing techniques are fast and can speed up ADOPT by an order of magnitude. %B Fourth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) %G eng %0 Journal Article %J IEEE Internet Computing %D 2005 %T Research Directions for Service Oriented Multiagent Systems %A M . Huhns %X Today’s service-oriented systems realize many ideas from the research conducted a decade or so ago in multiagent systems.Because these two fields are so deeply connected, further advances in multiagent systems could feed into tomorrow’s successful service-oriented computing approaches.This article describes a 15- year roadmap for service-oriented multiagent system research. %B IEEE Internet Computing %V 9 %P 65-70 %G eng %N 6 2005 %0 Journal Article %J Multiagent and Grid Systems - An International Journal %D 2005 %T Towards Flexible Coordination of Human-Agent Teams %A Nathan Schurr %A Janusz Marecki %A Tambe, Milind %A Paul Scerri %X Enabling interactions of agent-teams and humans is a critical area of research, with encouraging progress in the past few years. However, previous work suffers from three key limitations: (i) limited human situational awareness, reducing human effectiveness in directing agent teams, (ii) the agent team’s rigid interaction strategies that limit team performance, and (iii) lack of formal tools to analyze the impact of such interaction strategies. This article presents a software prototype called DEFACTO (Demonstrating Effective Flexible Agent Coordination of Teams through Omnipresence). DEFACTO is based on a software proxy architecture and 3D visualization system, which addresses the three limitations mentioned above. First, the 3D visualization interface enables human virtual omnipresence in the environment, improving human situational awareness and ability to assist agents. Second, generalizing past work on adjustable autonomy, the agent team chooses among a variety of team-level interaction strategies, even excluding humans from the loop in extreme circumstances. Third, analysis tools help predict the performance of (and choose among) different interaction strategies. DEFACTO is illustrated in a future disaster response simulation scenario, and extensive experimental results are presented. %B Multiagent and Grid Systems - An International Journal %V 1 %P 3-16 %G eng %0 Journal Article %J Artificial Intelligence Journal(AIJ) %D 2005 %T ADOPT: Asynchronous distributed constraint optimization with quality guarantees %A Pragnesh J. Modi %A W. Shen %A Tambe, Milind %A Makoto Yokoo %X The Distributed Constraint Optimization Problem (DCOP) is a promising approach for modeling distributed reasoning tasks that arise in multiagent systems. Unfortunately, existing methods for DCOP are not able to provide theoretical guarantees on global solution quality while allowing agents to operate asynchronously. We show how this failure can be remedied by allowing agents to make local decisions based on conservative cost estimates rather than relying on global certainty as previous approaches have done. This novel approach results in a polynomial-space algorithm for DCOP named Adopt that is guaranteed to find the globally optimal solution while allowing agents to execute asynchronously and in parallel. Detailed experimental results show that on benchmark problems Adopt obtains speedups of several orders of magnitude over other approaches. Adopt can also perform bounded-error approximation – it has the ability to quickly find approximate solutions and, unlike heuristic search methods, still maintain a theoretical guarantee on solution quality. %B Artificial Intelligence Journal(AIJ) %V 161 %P 149-180 %G eng %0 Conference Paper %B First Intelligent Systems Technical Conference of the American Institute of Aeronautics and Astronautics %D 2004 %T Adjustable autonomy in the context of coordination (invited paper) %A Paul Scerri %A K. Sycara %A Tambe, Milind %X Human-agent interaction in the context of coordination presents novel challenges as compared to isolated interactions between a single human and single agent. There are two broad reasons for the additional challenges: things continue to happen in the environment while a decision is pending and the inherent distributedness of the entities involved. Our approach to interaction in such a context has three key components which allow us to leverage human expertise by giving them responsibility for key coordination decisions, without risks to the coordination due to slow responses. First, to deal with the dynamic nature of the situation, we use pre-planned sequences of transfer of control actions called transfer-of-control strategies. Second, to allow identification of key coordination issues in a distributed way, individual coordination tasks are explicitly represented as coordination roles, rather than being implicitly represented within a monolithic protocol. Such a representation allows meta-reasoning about those roles to determine when human input may be useful. Third, the meta-reasoning and transfer-of-control strategies are encapsulated in a mobile agent that moves around the group to either get human input or autonomously make a decision. In this paper, we describe this approach and present initial results from interaction between a large number of UAVs and a small number of humans. %B First Intelligent Systems Technical Conference of the American Institute of Aeronautics and Astronautics %G eng %0 Conference Paper %B Third International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-04) %D 2004 %T Communication for Improving Policy Computation in Distributed POMDPs %A Ranjit Nair %A Makoto Yokoo %A Maayan Roth %A Tambe, Milind %X Distributed Partially Observable Markov Decision Problems (POMDPs) are emerging as a popular approach for modeling multiagent teamwork where a group of agents work together to jointly maximize a reward function. Since the problem of finding the optimal joint policy for a distributed POMDP has been shown to be NEXP-Complete if no assumptions are made about the domain conditions, several locally optimal approaches have emerged as a viable solution. However, the use of communicative actions as part of these locally optimal algorithms has been largely ignored or has been applied only under restrictive assumptions about the domain. In this paper, we show how communicative acts can be explicitly introduced in order to find locally optimal joint policies that allow agents to coordinate better through synchronization achieved via communication. Furthermore, the introduction of communication allows us to develop a novel compact policy representation that results in savings of both space and time which are verified empirically. Finally, through the imposition of constraints on communication such as not going without communicating for more than K steps, even greater space and time savings can be obtained. %B Third International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-04) %G eng %0 Thesis %D 2004 %T Coordinating multiagent teams in uncertain domains using distributed POMDPs %A Ranjit Nair %X Distributed Partially Observable Markov Decision Problems (POMDPs) have emerged as a popular decision-theoretic approach for planning for multiagent teams, where it is imperative for the agents to be able to reason about the rewards (and costs) for their actions in the presence of uncertainty. However, finding the optimal distributed POMDP policy is computationally intractable (NEXP-Complete). This dissertation presents two independent approaches which deal with this issue of intractability in distributed POMDPs. The primary focus is on the first approach, which represents a principled way to combine the two dominant paradigms for building multiagent team plans, namely the “beliefdesire-intention” (BDI) approach and distributed POMDPs. In this hybrid BDIPOMDP approach, BDI team plans are exploited to improve distributed POMDP tractability and distributed POMDP-based analysis improves BDI team plan performance. Concretely, we focus on role allocation, a fundamental problem in BDI teams – which agents to allocate to the different roles in the team. The hybrid BDI-POMDP approach provides three key contributions. First, unlike prior work in multiagent role allocation, we describe a role allocation technique that takes into account future uncertainties in the domain. The second contribution is a novel decomposition technique, which exploits the structure in the BDI team plans to significantly prune the search space of combinatorially many role allocations. Our third key contribution is a significantly faster policy evaluation algorithm suited for our BDI-POMDP hybrid approach. Finally, we also present experimental results from two domains: mission rehearsal simulation and RoboCupRescue disaster rescue simulation. In the RoboCupRescue domain, we show that the role allocation technique presented in this dissertation is capable of performing at human expert levels by comparing with the allocations chosen by humans in the actual RoboCupRescue simulation environment. The second approach for dealing with the intractability of distributed POMDPs is based on finding locally optimal joint policies using Nash equilibrium as a solution concept. Through the introduction of communication, we not only show improved coordination but also develop a novel compact policy representation that results in savings of both space and time which are verified empirically. %G eng %9 PhD thesis %0 Conference Paper %B Spring Symposium %D 2004 %T Coordination Advice: A Preliminary Investigation of Human Advice to Multiagent Teams %A Nathan Schurr %A Paul Scerri %A Tambe, Milind %X This paper introduces a new area of advice that is specific to advising a multiagent team: Coordination Advice. Coordination Advice differs from traditional advice because it pertains to coordinated tasks and interactions between agents. Given a large multiagent team interacting in a dynamic domain, optimal coordination is a difficult challenge. Human advisors can improve such coordination via advice. This paper is a preliminary look at the evolution of Coordination Advice from a human through three different domains: (i) disaster rescue simulation, (ii) a self-maintaining robotics sensors, and (iii) personal assistants in a office environment. We study how the useful advice a person can give changes as the domains change and the number of agents and roles increase. %B Spring Symposium %G eng %0 Conference Paper %B CP 2004 Workshop on Distributed Constraint Reasoning (DCR-04) %D 2004 %T DCOP Games for Multi-Agent Coordination %A Jonathan P. Pearce %A Rajiv T. Maheswaran %A Tambe, Milind %X Many challenges in multi-agent coordination can be modeled as distributed constraint optimization problems (DCOPs) but complete algorithms do not scale well nor respond effectively to dynamic or anytime environments. We introduce a transformation of DCOPs into graphical games that allows us to devise and analyze algorithms based on local utility and prove the monotonicity property of a class of such algorithms. The game-theoretic framework also enables us to characterize new equilibrium sets corresponding to a given degree of agent coordination. A key result in this paper is the discovery of a novel mapping between finite games and coding theory from which we can determine a priori bounds on the number of equilibria in these sets, which is useful in choosing the appropriate level of coordination given the communication cost of an algorithm %B CP 2004 Workshop on Distributed Constraint Reasoning (DCR-04) %G eng %0 Conference Paper %B 17th International Conference on Parallel and Distributed Computing Systems (PDCS-2004) %D 2004 %T Distributed Algorithms for DCOP: A Graphical Game-Based Approach %A Rajiv T. Maheswaran %A Jonathan P. Pearce %A Tambe, Milind %X This paper addresses the application of distributed constraint optimization problems (DCOPs) to large-scale dynamic environments. We introduce a decomposition of DCOP into a graphical game and investigate the evolution of various stochastic and deterministic algorithms. We also develop techniques that allow for coordinated negotiation while maintaining distributed control of variables. We prove monotonicity properties of certain approaches and detail arguments about equilibrium sets that offer insight into the tradeoffs involved in leveraging efficiency and solution quality. The algorithms and ideas were tested and illustrated on several graph coloring domains. %B 17th International Conference on Parallel and Distributed Computing Systems (PDCS-2004) %G eng %0 Book Section %B Cognition and Multi-Agent Interaction: From Cognitive Modeling to Social Simulation %D 2004 %T Evolution of a Teamwork Model %A Nathan Schurr %A Steven Okamoto %A Rajiv T. Maheswaran %A Paul Scerri %A Tambe, Milind %X For heterogeneous agents working together to achieve complex goals, teamwork (Jennings, 1995; Yen, Yin, Ioerger, Miller, Xu, & Volz, 2001; Tambe, 1997a) has emerged as the dominant coordination paradigm. For domains as diverse as rescue response, military, space, sports and collaboration between human workmates, flexible, dynamic coordination between cooperative agents needs to be achieved despite complex, uncertain, and hostile environments. There is now emerging consensus in the multiagent arena that for flexible teamwork among agents, each team member is provided with an explicit model of teamwork, which entails their commitments and responsibilities as a team member. This explicit modelling allows the coordination to be robust, despite individual failures and unpredictably changing environments. Building on the well developed theory of joint intentions (Cohen & Levesque, 1991) and shared plans (Grosz & Kraus, 1996), the STEAM teamwork model (Tambe, 1997a) was operationalized as a set of domain independent rules that describe how teams should work together. This domain independent teamwork model has been successfully applied to a variety of domains. From combat air missions (Hill, Chen, Gratch, Rosenbloom, & Tambe, 1997) to robot soccer (Kitano, Asada, Kuniyoshi, Noda, Osawa, & Matsubara, 1997) to teams supporting human organizations (Pynadath & Tambe, 2003) to rescue response (Scerri, Pynadath, Johnson, P., Schurr, Si, & Tambe, 2003), applying the same set of STEAM rules has resulted in successful coordination between heterogeneous agents. The successful use of the same teamwork model in a wide variety of diverse domains provides compelling evidence that it is the principles of teamwork, rather than exploitation of specific domain phenomena, that underlies the success of teamwork based approaches. Since the same rules can be successfully used in a range of domains, it is desirable to build a reusable software package that encapsulates those rules in order to provide a lightweight and portable implementation. The emerging standard for deploying such a package is via proxies (Pynadath & Tambe, 2003). Each proxy works closely with a single domain agent, representing that agent in the team. The second generation of teamwork proxies, called Machinetta (Pynadath & Tambe, 2003; Scerri et al., 2003), is currently being developed. The Machinetta proxies use less computing resources and are more flexible than the proxies they have superseded. While approaches to teamwork have been shown to be effective for agent teams, new emerging domains of teamwork require agent-human interactions in teams. These emerging domains and the teams that are being developed for them introduce a new set of issues and obstacles. Two algorithms that need to be revised in particular for these complex domains are the algorithms for adjustable autonomy (for agent-human interaction) and algorithms for role allocation. This chapter focuses in particular on the challenge of role allocation. Upon instantiation of a new plan, the roles needed to perform that plan are created and must be allocated to members of the team. In order to allocate a dynamically changing set of roles to team members, previous mechanisms required too much computation and/or communication and did not handle rapidly changing situations well for teams with many members. A novel algorithm has been created for role allocation in these extreme teams. Generally in teamwork, role allocation is the problem of assigning roles to agents so as to maximize overall team utility (Nair, Ito, Tambe, & Marsella, 2002; Tidhar, Rao, & Sonenberg, 1996; Werger & Mataric, 2000). Extreme teams emphasize key additional properties in role allocation: (i) domain dynamics may cause tasks to disappear; (ii) agents may perform one or more roles, but within resource limits; (iii) many agents can fulfill the same role. This role allocation challenge in extreme teams will be referred to as extended GAP (E-GAP), as it subsumes the generalized assignment problem (GAP), which is NP-complete (Shmoys & Tardos, 1993). %B Cognition and Multi-Agent Interaction: From Cognitive Modeling to Social Simulation %I Cambridge University Press %G eng %0 Conference Paper %B International Joint Conference on Principles and Practices of Constraint Programming (CP) %D 2004 %T Preprocessing Techniques for Distributed Constraint Optimization (Short Paper) %A Syed M. Ali %A Sven Koenig %A Tambe, Milind %X Although algorithms for Distributed Constraint Optimization Problems (DCOPs) have emerged as a key technique for distributed reasoning, their application faces significant hurdles in many multiagent domains due to their inefficiency. Preprocessing techniques have been successfully used to speed up algorithms for centralized constraint satisfaction problems. This paper introduces a framework of very different preprocessing techniques that speed up ADOPT, an asynchronous optimal DCOP algorithm that significantly outperforms competing DCOP algorithms by more than one order of magnitude. %B International Joint Conference on Principles and Practices of Constraint Programming (CP) %G eng %0 Conference Paper %B Third International Joint Conference on Agents and Multi Agent Systems, AAMAS %D 2004 %T Taking DCOP to the Real World : Efficient Complete Solutions for Distributed Event Scheduling %A Rajiv T. Maheswaran %A Tambe, Milind %A Bowring, Emma %A Jonathan P. Pearce %A Varakantham, Pradeep %X Distributed Constraint Optimization (DCOP) is an elegant formalism relevant to many areas in multiagent systems, yet complete algorithms have not been pursued for real world applications due to perceived complexity. To capably capture a rich class of complex problem domains, we introduce the Distributed Multi-Event Scheduling (DiMES) framework and design congruent DCOP formulations with binary constraints which are proven to yield the optimal solution. To approach real-world efficiency requirements, we obtain immense speedups by improving communication structure and precomputing best case bounds. Heuristics for generating better communication structures and calculating bound in a distributed manner are provided and tested on systematically developed domains for meeting scheduling and sensor networks, exemplifying the viability of complete algorithms. %B Third International Joint Conference on Agents and Multi Agent Systems, AAMAS %G eng %0 Conference Paper %B International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-04) %D 2004 %T Towards a formalization of teamwork with resource constraints %A Praveen Paruchuri %A Tambe, Milind %A Ordonez, Fernando %A Kraus, Sarit %X Despite the recent advances in distributed MDP frameworks for reasoning about multiagent teams, these frameworks mostly do not reason about resource constraints, a crucial issue in teams. To address this shortcoming, we provide four key contributions. First, we introduce EMTDP, a distributed MDP framework where agents must not only maximize expected team reward, but must simultaneously bound expected resource consumption. While there exist single-agent constrained MDP (CMDP) frameworks that reason about resource constraints, EMTDP is not just a CMDP with multiple agents. Instead, EMTDP must resolve the miscoordination that arises due to policy randomization. Thus, our second contribution is an algorithm for EMTDP transformation, so that resulting policies, even if randomized, avoid such miscoordination. Third, we prove equivalence of different techniques of EMTDP transformation. Finally, we present solution algorithms for these EMTDPs and show through experiments their efficiency in solving application-sized problems. %B International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-04) %G eng %0 Journal Article %J Journal of Autonomous Agents and Multiagent Systems (JAAMAS) %D 2004 %T Automated assistants for analyzing team behaviors %A Ranjit Nair %A Tambe, Milind %A S. Marsella %A R. Raines %X Multi-agent teamwork is critical in a large number of agent applications, including training, education, virtual enterprises and collective robotics. The complex interactions of agents in a team as well as with other agents make it extremely difficult for human developers to understand and analyze agent-team behavior. It has thus become increasingly important to develop tools that can help humans analyze, evaluate, and understand team behaviors. However, the problem of automated team analysis is largely unaddressed in previous work. In this article, we identify several key constraints faced by team analysts. Most fundamentally, multiple types of models of team behavior are necessary to analyze different granularities of team events, including agent actions, interactions, and global performance. In addition, effective ways of presenting the analysis to humans is critical and the presentation techniques depend on the model being presented. Finally, analysis should be independent of underlying team architecture and implementation. We also demonstrate an approach to addressing these constraints by building an automated team analyst called ISAAC for post-hoc, off-line agent-team analysis. ISAAC acquires multiple, heterogeneous team models via machine learning over teams’ external behavior traces, where the specific learning techniques are tailored to the particular model learned. Additionally, ISAAC employs multiple presentation techniques that can aid human understanding of the analyses. ISAAC also provides feedback on team improvement in two novel ways: (i) It supports principled ”whatif” reasoning about possible agent improvements; (ii) It allows the user to compare different teams based on their patterns of interactions. This paper presents ISAAC’s general conceptual framework, motivating its design, as well as its concrete application in two domains: (i) RoboCup Soccer; (ii) software agent teams participating in a simulated evacuation scenario. In the RoboCup domain, ISAAC was used prior to and during the RoboCup’99 tournament, and was awarded the RoboCup Scientific Challenge Award. In the evacuation domain, ISAAC was used to analyze patterns of message exchanges among software agents, illustrating the generality of ISAAC’s techniques. We present detailed algorithms and experimental results from ISAAC’s application. %B Journal of Autonomous Agents and Multiagent Systems (JAAMAS) %V 8 %P 69-111 %G eng %0 Conference Paper %B Autonomy %D 2003 %T Adjustable Autonomy challenges in Personal Assistant Agents: A Position Paper %A Rajiv T. Maheswaran %A Tambe, Milind %A Varakantham, Pradeep %A Karen Myers %X The successful integration and acceptance of many multi-agent systems into daily lives crucially depends on the ability to develop effective policies for adjustable autonomy. Adjustable autonomy encompasses the strategies by which an agent selects the appropriate entity (itself, a human user, or another agent) to make a decision at key moments when an action is required. We present two formulations that address this issue: user-based and agent-based autonomy. Furthermore, we discuss the current and future implications on systems composed of personal assistant agents, where autonomy issues are of vital interest. %B Autonomy %G eng %0 Conference Paper %B ACM Symposium on applied computing (SAC'2003) %D 2003 %T Are multiagent algorithms relevant for real hardware? A case study of distributed constraint algorithms %A Paul Scerri %A Pragnesh J. Modi %A Tambe, Milind %A W. Shen %X Researchers building multi-agent algorithms typically work with problems abstracted away from real applications. The abstracted problem instances allow systematic and detailed investigations of new algorithms. However, a key question is how to apply algorithm, developed on an abstract problem, in a real application. In this paper, we report on what was required to apply a particular distributed resource allocation algorithm developed for an abstract coordination problem in a real hardware application. A probabilistic representation of resources and tasks was used to deal with uncertainty and dynamics and local reasoning was used to deal with delays in the distributed resource allocation algorithm. The probabilistic representation and local reasoning enabled the use of the multi-agent algorithm which, in turn, improved the overall performance of the system. %B ACM Symposium on applied computing (SAC'2003) %G eng %0 Conference Paper %B Second International Joint conference on agents and multiagent systems (AAMAS) %D 2003 %T An asynchronous complete method for distributed constraint optimization %A Pragnesh J. Modi %A W. Shen %A Tambe, Milind %A Makoto Yokoo %X We present a new polynomial-space algorithm, called Adopt, for distributed constraint optimization (DCOP). DCOP is able to model a large class of collaboration problems in multi-agent systems where a solution within given quality parameters must be found. Existing methods for DCOP are not able to provide theoretical guarantees on global solution quality while operating both efficiently and asynchronously. Adopt is guaranteed to find an optimal solution, or a solution within a user-specified distance from the optimal, while allowing agents to execute asynchronously and in parallel. Adopt obtains these properties via a distributed search algorithm with several novel characteristics including the ability for each agent to make local decisions based on currently available information and without necessarily having global certainty. Theoretical analysis shows that Adopt provides provable quality guarantees, while experimental results show that Adopt is significantly more efficient than synchronous methods. The speedups are shown to be partly due to the novel search strategy employed and partly due to the asynchrony of the algorithm. %B Second International Joint conference on agents and multiagent systems (AAMAS) %G eng %0 Journal Article %J Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS) %D 2003 %T Automated teamwork among heterogeneous software agents and humans %A D. V. Pynadath %A Tambe, Milind %X Agent integration architectures enable a heterogeneous, distributed set of agents to work together to address problems of greater complexity than those addressed by the individual agents themselves. Unfortunately, integrating software agents and humans to perform real-world tasks in a large-scale system remains difficult, especially due to three main challenges: ensuring robust execution in the face of a dynamic environment, providing abstract task specifications without all the low-level coordination details, and finding appropriate agents for inclusion in the overall system. To address these challenges, our Teamcore project provides the integration architecture with general-purpose teamwork coordination capabilities. We make each agent team-ready by providing it with a proxy capable of general teamwork reasoning. Thus, a key novelty and strength of our framework is that powerful teamwork capabilities are built into its foundations by providing the proxies themselves with a teamwork model. Given this teamwork model, the Teamcore proxies addresses the first agent integration challenge, robust execution, by automatically generating the required coordination actions for the agents they represent. We can also exploit the proxies’ reusable general teamwork knowledge to address the second agent integration challenge. Through team-oriented programming, a developer specifies a hierarchical organization and its goals and plans, abstracting away from coordination details. Finally, KARMA, our Knowledgeable Agent Resources Manager Assistant, can aid the developer in conquering the third agent integration challenge by locating agents that match the specified organization’s requirements. Our integration architecture enables teamwork among agents with no coordination capabilities, and it establishes and automates consistent teamwork among agents with some coordination capabilities. Thus, team-oriented programming provides a level of abstraction that can be used on top of previous approaches to agent-oriented programming. We illustrate how the Teamcore architecture successfully addressed the challenges of agent integration in two application domains: simulated rehearsal of a military evacuation mission and facilitation of human collaboration. %B Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS) %V 7 %P 71-100 %G eng %0 Conference Paper %B GTDT workshop, AAMAS-03 %D 2003 %T Between collaboration and competition: An Initial Formalization using Distributed POMDPs %A Praveen Paruchuri %A Tambe, Milind %A Spiros Kapetanakis %A Kraus, Sarit %X This paper presents an initial formalization of teamwork in multi-agent domains. Although analyses of teamwork already exist in the literature of multi-agent systems, almost no work has dealt with the problem of teams that comprise self-interested agents. The main contribution of this work is that it concentrates specifically on such teams of self interested agents. Teams of this kind are common in multi-agent systems as they model the implicit competition between team members that often arises within a team. %B GTDT workshop, AAMAS-03 %C Melbourne, Australia %G eng %0 Conference Paper %B AAAI Spring Symposium %D 2003 %T Composing POMDP building blocks to analyze large-scale multiagent systems %A H. Jung %A Tambe, Milind %X Given a large group of cooperative agents, selecting the right coordination or conflict resolution strategy can have a significant impact on their performance (e.g., speed of convergence). While performance models of such coordination or conflict resolution strategies could aid in selecting the right strategy for a given domain, such models remain largely uninvestigated in the multiagent literature. This paper takes a step towards applying the recently emerging distributed POMDP (partially observable markov decision process) frameworks, such as the MTDP (markov team decision process) in service of creating such performance models. To address issues of scale-up, we use small-scale models, called building blocks that represent the local interaction among a small group of agents. We discuss several ways to combine building blocks for performance prediction of a larger-scale multiagent system. Our approach is presented in the context of DCSP (distributed constraint satisfaction problem), where we are able to predict the performance of five different DCSP strategies in different domain settings by modeling and combining building blocks. Our approach points the way to new tools based on building blocks for performance analysis in multiagent systems. %B AAAI Spring Symposium %G eng %0 Thesis %D 2003 %T Conflict Resolution Strategies and their Performance models for large scale Multi Agent Systems %A Hyuckchul Jung %X Distributed, collaborative agents are promising to play an important role in large-scale multiagent applications, such as distributed sensors and distributed spacecraft. Since no single agent can have complete global knowledge in such large scale applications, conflicts are inevitable even among collaborative agents over shared resources, plans, or tasks. Fast conflict resolution techniques are required in many multiagent systems under soft or hard time constraints. In resolving conflicts, we focus on the approaches based on DCSP (distributed constraint satisfaction problems), a major paradigm in multiagent conflict resolution. We aim to speed up conflict resolution convergence via developing efficient DCSP strategies. We focus on multiagent systems characterized by agents that are collaborative, homogeneous, arranged in regular networks, and relying on local communication (found in many multiagent applications). This thesis provides the followings major contributions. First, we develop novel DCSP strategies that significantly speed up conflict resolution convergence. The novel strategies are based on the extra communication of local information between neighboring agents. We formalize a set of DCSP strategies which exploit the extra communication: in selecting a new choice of actions, plans, or resources to resolve conflicts, each agent takes into account how much flexibility is given to neighboring agents. Second, we provide a new run-time model for performance measurement of DCSP strategies since a popular existing DCSP performance metric does not consider the extra communication overhead. The run-time model enables us to evaluate the strategy performance in various computing and networking environments. Third, the analysis of message processing and communication overhead of the novel strategies shows that such overhead caused by the novel strategy is not overwhelming. Thus, despite extra communication, the novel strategies indeed show big speedups in a significant range of problems (particularly for harder problems). Fourth, we provide categorization of problem settings with big speedups by the novel strategies Finally, to select the right strategy in a given domain, we develop performance modeling techniques based on distributed POMDP (Partially Observable Markov Decision Process) based model where scalability issue is addressed with a new decomposition technique. %G eng %9 PhD thesis %0 Thesis %D 2003 %T Distributed Constraint Optimization of Multi-Agent Systems %A Pragnesh 'Jay' Modi %X To coordinate effectively, multiple agents must reason and communicate about the interactions between their individual local decisions. Distributed planning, distributed scheduling, distributed resource allocation and distributed task allocation are some examples of multiagent problems where such reasoning is required. In order to represent these types of automated reasoning problems, researchers in Multiagent Systems have proposed distributed constraints as a key paradigm. Previous research in Artificial Intelligence and Constraint Programming has shown that constraints are a convenient yet powerful way to represent automated reasoning problems. This dissertation advances the state-of-the-art in Multiagent Systems and Constraint Programming through three key innovations. First, this dissertation introduces a novel algorithm for Distributed Constraint Optimization Problems (DCOP). DCOP significantly generalizes existing satisfaction-based constraint representations to allow optimization. We present Adopt, the first algorithm for DCOP that allows asynchronous concurrent execution and is guaranteed to terminate with the globally optimal solution. The key idea is to perform systematic distributed optimization based on conservative solution quality estimates rather than exact knowledge of global solution quality. This method is empirically shown to yield orders of magnitude speedups over existing synchronous methods and is shown to be robust to lost messages. Second, this dissertation introduces bounded-error approximation as a flexible method whereby agents can find global solutions that may not be optimal but are guaranteed to be within a given distance from optimal. This method is useful for time-limited domains because it decreases solution time and communication overhead. Bounded-error approximation is a significant departure from existing incomplete local methods, which rely exclusively on local information to obtain a decrease in solution time but at the cost of abandoning all theoretical guarantees on solution quality. Third, this dissertation presents generalized mapping strategies that allow a significant class of distributed resource allocation problem to be automatically represented via distributed constraints. These mapping strategies are signficant because they illustrate the utility of the distributed constraint representation. These mapping strategies are useful because they provide multiagent researchers with a general, reusable methodology for understanding, representing and solving their own distributed resource allocation problems. Our theoretical results show the correctness of the mappings. %G eng %9 PhD thesis %0 Book %D 2003 %T Distributed sensor nets: A multiagent perspective %A V. Lesser %A C. Ortiz %A Tambe, Milind %X Distributed Sensor Networks is the first book of its kind to examine solutions to this problem using ideas taken from the field of multiagent systems. The field of multiagent systems has itself seen an exponential growth in the past decade, and has developed a variety of techniques for distributed resource allocation.
Distributed Sensor Networks contains contributions from leading, international researchers describing a variety of approaches to this problem based on examples of implemented systems taken from a common distributed sensor network application; each approach is motivated, demonstrated and tested by way of a common challenge problem. The book focuses on both practical systems and their theoretical analysis, and is divided into three parts: the first part describes the common sensor network challenge problem; the second part explains the different technical approaches to the common challenge problem; and the third part provides results on the formal analysis of a number of approaches taken to address the challenge problem. %I Kluwer academic %G eng %0 Conference Paper %B AAAI Spring Symposium %D 2003 %T Getting robots, agents and people to cooperate: An initial study %A Paul Scerri %A Johnson L. %A D. V. Pynadath %A P. S. Rosenbloom %A M. Si %A Nathan Schurr %A Tambe, Milind %X Combining the unique capabilities of robots, agents and people (RAPs) promises to improve the safety, efficiency, reliability and cost at which some goals can be achieved, while allowing the achievement of other goals not previously achievable. Despite their heterogeneity, and indeed, because of their heterogeneity, our key hypothesis is that in order for RAPs to work together effectively, they must work as a team: they should be aware of the overall goals of the team, and coordinate their activities with their fellow team members in order to further the team’s goals. This poses a challenge, since different RAP entities may have differing social abilities and hence differing abilities to coordinate with their teammates. To construct such RAP teams, we make the RAPs “team ready” by providing each of them with teamwork-enabled proxies. While proxy-based architectures have been explored earlier for flexible multiagent teamwork, their application to RAP teams exposes two weaknesses. To address the problems with existing role allocation algorithms for RAP teams operating in dynamic environments, we provide a new approach for highly flexible role allocation and reallocation. Second, we enrich the communication between RAPs and between a RAP and its proxy, to improve teamwork flexibility while limiting the number of messages exchanged. This paper discusses the proxy based architecture and our initial attempts at developing algorithms that address the problems that emerge when the RAP teams are used in complex domains. %B AAAI Spring Symposium %G eng %0 Conference Paper %B AAAI Spring Symposium %D 2003 %T Integrating BDI approaches with POMDPs: The case of team-oriented programs %A Ranjit Nair %A Tambe, Milind %A S. Marsella %X Integrating approaches based on belief-desire-intentions (BDI) logics with the more recent developments of distributed POMDPs is today a fundamental challenge in the multiagent systems arena. One common suggestion for such an integration is to use stochastic models (POMDPs) for generating agent behaviors, while using the BDI components for monitoring and creating explanations. We propose a completely inverse approach, where the BDI components are used to generate agent behaviors, and distributed POMDPs are used in an analysis mode. In particular, we focus on teamoriented programs for tasking multiagent teams, where the team-oriented programs specify hierarchies of team plans that the team and its subteams must adopt as their joint intentions. However, given a limited number of agents, finding a good way to allocate them to different teams and subteams to execute such a team-oriented program is a difficult challenge We use distributed POMDPs to analyze different allocations of agents within a team-oriented program, and to suggest improvements to the program. The key innovation is to use the distributed POMDP analysis not as a black box, but as a glass box, offering insights into why particular allocations lead to good or bad outcomes. These insights help to prune the search space of different allocations, offering significant speedups in the search. We present preliminary experimental results to illustrate our methodology. %B AAAI Spring Symposium %G eng %0 Conference Paper %B AAAI Spring Symposium on Logical Formalisms for Commonsense Reasoning %D 2003 %T Integrating Belief-Desire-Intention Approaches with POMDPs: The Case of Team-Oriented Programs %A Ranjit Nair %A Tambe, Milind %A Marsella, Stacy %X Integrating approaches based on belief-desire-intentions (BDI) logics with the more recent developments of distributed POMDPs is today a fundamental challenge in the multiagent systems arena. One common suggestion for such an integration is to use stochastic models (POMDPs) for generating agent behaviors, while using the BDI components for monitoring and creating explanations. We propose a completely inverse approach, where the BDI components are used to generate agent behaviors, and distributed POMDPs are used in an analysis mode. In particular, we focus on teamoriented programs for tasking multiagent teams, where the team-oriented programs specify hierarchies of team plans that the team and its subteams must adopt as their joint intentions. However, given a limited number of agents, finding a good way to allocate them to different teams and subteams to execute such a team-oriented program is a difficult challenge We use distributed POMDPs to analyze different allocations of agents within a team-oriented program, and to suggest improvements to the program. The key innovation is to use the distributed POMDP analysis not as a black box, but as a glass box, offering insights into why particular allocations lead to good or bad outcomes. These insights help to prune the search space of different allocations, offering significant speedups in the search. We present preliminary experimental results to illustrate our methodology. %B AAAI Spring Symposium on Logical Formalisms for Commonsense Reasoning %G eng %0 Conference Paper %B Second International Joint conference on agents and multiagent systems (AAMAS) %D 2003 %T Performance models for large-scale multiagent systems: Using Distributed POMDP building blocks %A H. Jung %A Tambe, Milind %X Given a large group of cooperative agents, selecting the right coordination or conflict resolution strategy can have a significant impact on their performance (e.g., speed of convergence). While performance models of such coordination or conflict resolution strategies could aid in selecting the right strategy for a given domain, such models remain largely uninvestigated in the multiagent literature. This paper takes a step towards applying the recently emerging distributed POMDP (partially observable Markov decision process) frameworks, such as MTDP (Markov team decision process), in service of creating such performance models. To address issues of scale-up, we use small-scale models, called building blocks that represent the local interaction among a small group of agents. We discuss several ways to combine building blocks for performance prediction of a larger-scale multiagent system. We present our approach in the context of DCSPs (distributed constraint satisfaction problems), where we first show that there is a large bank of conflict resolution strategies and no strategy dominates all others across different domains. By modeling and combining building blocks, we are able to predict the performance of five different DCSP strategies for four different domain settings, for a large-scale multiagent system. Our approach thus points the way to new tools for strategy analysis and performance modeling in multiagent systems in general. %B Second International Joint conference on agents and multiagent systems (AAMAS) %G eng %0 Conference Paper %B Second International Joint conference on agents and multiagent systems (AAMAS) %D 2003 %T A prototype infrastructure for distributed robot, agent, person teams %A Paul Scerri %A Johnson L. %A D. V. Pynadath %A P. S. Rosenbloom %A M. Si %A Nathan Schurr %A Tambe, Milind %X Effective coordination of robots, agents and people promises to improve the safety, robustness and quality with which shared goals are achieved by harnessing the highly heterogeneous entities’ diverse capabilities. Proxy-based integration architectures are emerging as a standard method for coordinating teams of heterogeneous entities. Such architectures are designed to meet imposing challenges such as ensuring that the diverse capabilities of the group members are effectively utilized, avoiding miscoordination in a noisy, uncertain environment and reacting flexibly to changes in the environment. However, we contend that previous architectures have gone too far in taking coordination responsibility away from entities and giving it to proxies. Our goal is to create a proxy-based integration infrastructure where there is a beneficial symbiotic relationship between the proxies and the team members. By leveraging the coordination abilities of both proxies and socially capable team members the quality of the coordination can be improved. We present two key new ideas to achieve this goal. First, coordination tasks are represented as explicit roles, hence the responsibilities not the actions are specified, thus allowing the team to leverage the coordination skills of the most capable team members. Second, building on the first idea, we have developed a novel role allocation and reallocation algorithm. These ideas have been realized in a prototype software proxy architecture and used to create heterogeneous teams for an urban disaster recovery domain. Using the rescue domain as a testbed, we have experimented with the role allocation algorithm and observed results to support the hypothesis that leveraging the coordination capabilities of people can help the performance of the team. %B Second International Joint conference on agents and multiagent systems (AAMAS) %G eng %0 Conference Paper %B Second International Joint conference on agents and multiagent systems (AAMAS) %D 2003 %T Role allocation and reallocation in multiagent teams: Towards a practical analysis %A Ranjit Nair %A Tambe, Milind %A S. Marsella %X Despite the success of the BDI approach to agent teamwork, initial role allocation (i.e. deciding which agents to allocate to key roles in the team) and role reallocation upon failure remain open challenges. What remain missing are analysis techniques to aid human developers in quantitatively comparing different initial role allocations and competing role reallocation algorithms. To remedy this problem, this paper makes three key contributions. First, the paper introduces RMTDP (Role-based Multiagent Team Decision Problem), an extension to MTDP [9], for quantitative evaluations of role allocation and reallocation approaches. Second, the paper illustrates an RMTDP-based methodology for not only comparing two competing algorithms for role reallocation, but also for identifying the types of domains where each algorithm is suboptimal, how much each algorithm can be improved and at what computational cost (complexity). Such algorithmic improvements are identified via a new automated procedure that generates a family of locally optimal policies for comparative evaluations. Third, since there are combinatorially many initial role allocations, evaluating each in RMTDP to identify the best is extremely difficult. Therefore, we introduce methods to exploit task decompositions among subteams to significantly prune the search space of initial role allocations. We present experimental results from two distinct domains. %B Second International Joint conference on agents and multiagent systems (AAMAS) %G eng %0 Book Section %B Who needs emotions: The brain meets the machine %D 2003 %T The role of emotions in multiagent teamwork %A Ranjit Nair %A Tambe, Milind %A S. Marsella %E J. Fellous %E M. Arbib %X Emotions play a significant role in human teamwork. However, despite the significant progress in multiagent teamwork, as well as progress in computational models of emotions, there have been very few investigations of the role of emotions in multiagent teamwork. This chapter attempts a first step towards addressing this shortcoming. It provides a short survey of the state of the art in multiagent teamwork and in computational models of emotions. It considers three cases of teamwork, in particular, teams of simulated humans, agent-human teams and pure agent teams, and examine the effects of introducing emotions in each. Finally, it also provides preliminary experimental results illustrating the impact of emotions on multiagent teamwork. %B Who needs emotions: The brain meets the machine %I MIT Press %G eng %0 Conference Paper %B International Joint conference on Artificial Intelligence (IJCAI) %D 2003 %T Taming Decentralized POMDPs: Towards efficient policy computation for multiagent settings %A Ranjit Nair %A Tambe, Milind %A Makoto Yokoo %A D. V. Pynadath %A S. Marsella %X The problem of deriving joint policies for a group of agents that maximize some joint reward function can be modeled as a decentralized partially observable Markov decision process (POMDP). Yet, despite the growing importance and applications of decentralized POMDP models in the multiagents arena, few algorithms have been developed for efficiently deriving joint policies for these models. This paper presents a new class of locally optimal algorithms called “Joint Equilibriumbased search for policies (JESP)”. We first describe an exhaustive version of JESP and subsequently a novel dynamic programming approach to JESP. Our complexity analysis reveals the potential for exponential speedups due to the dynamic programming approach. These theoretical results are verified via empirical comparisons of the two JESP versions with each other and with a globally optimal brute-force search algorithm. Finally, we prove piece-wise linear and convexity (PWLC) properties, thus taking steps towards developing algorithms for continuous belief states %B International Joint conference on Artificial Intelligence (IJCAI) %G eng %0 Conference Paper %B Lecture notes in computer science, Proceedings of the workshop on Programming multiagent systems, Springer %D 2003 %T Team Oriented Programming and Proxy Agents: The Next Generation %A Paul Scerri %A David V. Pynadath %A Nathan Schurr %A Alessandro Farinelli %A Sudeep Gandhe %A Tambe, Milind %X Coordination between large teams of highly heterogeneous entities will change the way complex goals are pursued in real world environments. One approach to achieving the required coordination in such teams is to give each team member a proxy that assumes routine coordination activities on behalf of its team member. Despite that approach’s success, as we attempt to apply this first generation of proxy architecture to larger teams in more challenging environments, some limitations become clear. In this paper, we present initial efforts on the next generation of proxy architecture and Team Oriented Programming (TOP), called Machinetta. Machinetta aims to overcome the limitations of the previous generation of proxies and allow effective coordination between very large teams of highly heterogeneous agents. We describe the principles underlying the design of the Machinetta proxies and present initial results from two domains. %B Lecture notes in computer science, Proceedings of the workshop on Programming multiagent systems, Springer %G eng %0 Conference Paper %B International workshop on distributed constraint reasoning DCR-03 %D 2003 %T Distributed Constraint Reasoning under Unreliable Communication %A Pragnesh J. Modi %A Syed M. Ali %A Rishi Goel %A Tambe, Milind %X The Distributed Constraint Optimization Problem (DCOP) is able to model many problems in multiagent systems but existing research has not considered the issue of unreliable communication which often arises in real world applications. Limited bandwidth, interference, loss of line-of-sight are some reasons why communication fails in the real world. In this paper we show that an existing asynchronous algorithm for DCOP can be made to operate effectively in the face of message loss through the introduction of a very simple timeout mechanism for selective communication. Despite its simplicity, this mechanism is shown to dramatically reduce communication overhead while preventing deadlocks that can occur when messages are lost. Results show that the optimal solution can be guaranteed even in the presence of message loss and that algorithm performance measured in terms of time to solution degrades gracefully as message loss probability increases. %B International workshop on distributed constraint reasoning DCR-03 %G eng %0 Conference Paper %B AAAI Spring Symposium on Safe learning agents %D 2002 %T Adjustable autonomy for the real world %A Tambe, Milind %A Paul Scerri %A D. V. Pynadath %X Adjustable autonomy refers to agents’ dynamically varying their own autonomy, transferring decision making control to other entities (typically human users) in key situations. Determining whether and when such transfers of control must occur is arguably the fundamental research question in adjustable autonomy. Previous work, often focused on individual agent-human interactions, has provided several different techniques to address this question. Unfortunately, domains requiring collaboration between teams of agents and humans reveals two key shortcomings of these previous techniques. First, these techniques use rigid one-shot transfers of control that can result in unacceptable coordination failures in multiagent settings. Second, they ignore costs (e.g., in terms of time delays or effects of actions) to an agent’s team due to such transfers of control. To remedy these problems, this paper presents a novel approach to adjustable autonomy, based on the notion of transfer of control strategy. A transfer of control strategy consists of a sequence of two types of actions: (i) actions to transfer decision-making control (e.g., from the agent to the user or vice versa) (ii) actions to change an agent’s pre-specified coordination constraints with others, aimed at minimizing miscoordination costs. The goal is for high quality individual decisions to be made with minimal disruption to the coordination of the team. These strategies are operationalized using Markov Decision Processes to select the optimal strategy given an uncertain environment and costs to individuals and teams. We present a detailed evaluation of the approach in the context of a real-world, deployed multi-agent system that assists a research group in daily activities. %B AAAI Spring Symposium on Safe learning agents %G eng %0 Conference Paper %B Springer lecture notes in Artificial Intelligence LNAI 2333 %D 2002 %T Agents, theories, architectures and languages (ATAL-01) %A Karen Myers %A Tambe, Milind %B Springer lecture notes in Artificial Intelligence LNAI 2333 %G eng %0 Conference Paper %B Autonomous Agents and Multi-Agent Systems Workshop on Distributed Constraint Reasoning %D 2002 %T Applying Constraint Reasoning to Real-world Distributed Task Allocation %A Paul Scerri %A Pragnesh Jay Modi %A Wei-Min Shen %A Tambe, Milind %X Distributed task allocation algorithms requires a set of agents to intelligently allocate their resources to a set of tasks. The problem is often complicated by the fact that resources may be limited, the set of tasks may not be exactly known, and the set of tasks may change over time. Previous resource allocation algorithms have not been able to handle over- constrained situations, the uncertainty in the environment and/or dynamics. In this paper, we present extensions to an algorithm for distributed constraint optimization, called Adopt-SC which allows it to be applied in such real-world domains. The approach relies on maintaining a probability distribution over tasks that are potentially present. The distribution is updated with both information from local sensors and information inferred from communication between agents. We present promising results with the approach on a distributed task allocation problem consisting of a set of stationary sensors that must track a moving target. The techniques proposed in this paper are evaluated on real hardware tracking real moving targets. %B Autonomous Agents and Multi-Agent Systems Workshop on Distributed Constraint Reasoning %G eng %0 Journal Article %J Journal of AI Research (JAIR) %D 2002 %T The communicative multiagent team decision problem: Analyzing teamwork theories and models %A D. V. Pynadath %A Tambe, Milind %X Despite the signi cant progress in multiagent teamwork, existing research does not address the optimality of its prescriptions nor the complexity of the teamwork problem. Without a characterization of the optimality-complexity tradeo s, it is impossible to determine whether the assumptions and approximations made by a particular theory gain enough eciency to justify the losses in overall performance. To provide a tool for use by multiagent researchers in evaluating this tradeo , we present a uni ed framework, the COMmunicative Multiagent Team Decision Problem (COM-MTDP). The COM-MTDP model combines and extends existing multiagent theories, such as decentralized partially observable Markov decision processes and economic team theory. In addition to their generality of representation, COM-MTDPs also support the analysis of both the optimality of team performance and the computational complexity of the agents' decision problem. In analyzing complexity, we present a breakdown of the computational complexity of constructing optimal teams under various classes of problem domains, along the dimensions of observability and communication cost. In analyzing optimality, we exploit the COM-MTDP's ability to encode existing teamwork theories and models to encode two instantiations of joint intentions theory taken from the literature. Furthermore, the COM-MTDP model provides a basis for the development of novel team coordination algorithms. We derive a domain-independent criterion for optimal communication and provide a comparative analysis of the two joint intentions instantiations with respect to this optimal policy. We have implemented a reusable, domain-independent software package based on COM-MTDPs to analyze teamwork coordination strategies, and we demonstrate its use by encoding and evaluating the two joint intentions strategies within an example domain. %B Journal of AI Research (JAIR) %V 16 %P 389-423 %G eng %0 Conference Paper %B Socially Intelligent Agents creating relationships with computers and Robots %D 2002 %T Electric Elves: Adjustable autonomy in real-world multiagent environments %A D. V. Pynadath %A Tambe, Milind %X Through adjustable autonomy (AA), an agent can dynamically vary the degree to which it acts autonomously, allowing it to exploit human abilities to improve its performance, but without becoming overly dependent and intrusive. AA research is critical for successful deployment of agents to support important human activities. While most previous work has focused on individual agent-human interactions, this paper focuses on teams of agents operating in real-world human organizations, as well as the novel AA coordination challenge that arises when one agent’s inaction while waiting for a human response can lead to potential miscoordination. Our multi-agent AA framework, based on Markov decision processes, provides an adaptive model of users that reasons about the uncertainty, costs, and constraints of decisions. Our approach to AA has proven essential to the success of our deployed Electric Elves system that assists our research group in rescheduling meetings, choosing presenters, tracking people’s locations, and ordering meals. %B Socially Intelligent Agents creating relationships with computers and Robots %I Kluwer Academic Publishers %G eng %0 Magazine Article %D 2002 %T Electric Elves: Agent Technology for Supporting Human Organizations (longer version of IAAI'01 paper) %A H. Chalupsky %A Y. Gil %A Craig Knoblock %A K. Lerman %A J. Oh %A D. V. Pynadath %A T. Russ %A Tambe, Milind %X The operation of a human organization requires dozens of everyday tasks to ensure coherence in organizational activities, to monitor the status of such activities, to gather information relevant to the organization, to keep everyone in the organization informed, etc. Teams of software agents can aid humans in accomplishing these tasks, facilitating the organization’s coherent functioning and rapid response to crises, while reducing the burden on humans. Based on this vision, this paper reports on Electric Elves, a system that has been operational, 24/7, at our research institute since June 1, 2000. Tied to individual user workstations, fax machines, voice, mobile devices such as cell phones and palm pilots, Electric Elves has assisted us in routine tasks, such as rescheduling meetings, selecting presenters for research meetings, tracking people’s locations, organizing lunch meetings, etc. We discuss the underlying AI technologies that led to the success of Electric Elves, including technologies devoted to agenthuman interactions, agent coordination, accessing multiple heterogeneous information sources, dynamic assignment of organizational tasks, and deriving information about organization members. We also report the results of deploying Electric Elves in our own research organization. %B AI Magazine %G eng %0 Conference Paper %B International NASA Workshop on Planning and Scheduling for Space %D 2002 %T Enabling Efficient Conflict Resolution in Multiple Spacecraft Missions via DCSP %A Hyuckchul Jung %A Tambe, Milind %A Anthony Barrett %A Bradley Clement %X While NASA is increasingly interested in multi-platform space missions, controlling such platforms via timed command sequences -- the current favored operations technique -- is unfortunately very difficult, and a key source of this complexity involves resolving conflicts to coordinate multiple spacecraft plans. We propose distributed constraint satisfaction (DCSP) techniques for automated coordination and conflict resolution of such multi-spacecraft plans. We introduce novel value ordering heuristics in DCSP to significantly improve the rate of conflict resolution convergence to meet the efficiency needs of multispacecraft missions. In addition, we introduce distributed POMDP (partially observable markov decision process) based techniques for DCSP convergence analysis, which facilitates automated selection of the most appropriate DCSP strategy for a given situation, and points the way to a new generation of analytical tools for analysis of DCSP and multi-agent systems in general. %B International NASA Workshop on Planning and Scheduling for Space %G eng %0 Conference Paper %B NASA workshop on planning and scheduling %D 2002 %T Enabling efficient conflict resolution in multiple spacecraft missions via DCSP %A H. Jung %A Tambe, Milind %A Barrett, A. %A B. Clement %X While NASA is increasingly interested in multi-platform space missions, controlling such platforms via timed command sequences -- the current favored operations technique -- is unfortunately very difficult, and a key source of this complexity involves resolving conflicts to coordinate multiple spacecraft plans. We propose distributed constraint satisfaction (DCSP) techniques for automated coordination and conflict resolution of such multi-spacecraft plans. We introduce novel value ordering heuristics in DCSP to significantly improve the rate of conflict resolution convergence to meet the efficiency needs of multispacecraft missions. In addition, we introduce distributed POMDP (partially observable markov decision process) based techniques for DCSP convergence analysis, which facilitates automated selection of the most appropriate DCSP strategy for a given situation, and points the way to a new generation of analytical tools for analysis of DCSP and multi-agent systems in general. %B NASA workshop on planning and scheduling %G eng %0 Journal Article %J Journal of AI Research (JAIR %D 2002 %T Monitoring teams by overhearing: A multiagent plan-recognition approach %A Gal Kaminka %A D. V. Pynadath %A Tambe, Milind %X Recent years are seeing an increasing need for on-line monitoring of teams of cooperating agents, e.g., for visualization, or performance tracking. However, in monitoring deployed teams, we often cannot rely on the agents to always communicate their state to the monitoring system. This paper presents a non-intrusive approach to monitoring by overhearing, where the monitored team's state is inferred (via plan-recognition) from team-members' routine communications, ex- changed as part of their coordinated task execution, and observed (overheard) by the monitoring system. Key challenges in this approach include the demanding run-time requirements of monitoring, the scarceness of observations (increasing monitoring uncertainty), and the need to scale-up monitoring to address potentially large teams. To address these, we present a set of complementary novel techniques, exploiting knowledge of the social structures and procedures in the monitored team: (i) an ecient probabilistic plan-recognition algorithm, well-suited for processing communications as observations; (ii) an approach to exploiting knowledge of the team's social behavior to predict future observations during execution (reducing monitoring uncertainty); and (iii) monitoring algorithms that trade expressivity for scalability, representing only certain useful monitoring hypotheses, but allowing for any number of agents and their dierent activities to be represented in a single coherent entity. We present an empirical evaluation of these techniques, in combination and apart, in monitoring a deployed team of agents, running on machines physically distributed across the country, and engaged in complex, dynamic task execution. We also compare the performance of these techniques to human expert and novice monitors, and show that the techniques presented are capable of monitoring at human-expert levels, despite the diculty of the task. %B Journal of AI Research (JAIR %V 17 %P 83-135 %G eng %0 Conference Paper %B First Autonomous Agents and Multiagent Systems Conference (AAMAS) %D 2002 %T Multiagent teamwork: Analyzing key teamwork theories and models %A D. V. Pynadath %A Tambe, Milind %X Despite the significant progress in multiagent teamwork, existing research does not address the optimality of its prescriptions nor the complexity of the teamwork problem. Thus, we cannot determine whether the assumptions and approximations made by a particular theory gain enough efficiency to justify the losses in overall performance. To provide a tool for evaluating this tradeoff, we present a unified framework, the COMmunicative Multiagent Team Decision Problem (COM-MTDP) model, which is general enough to subsume many existing models of multiagent systems. We analyze use the COM-MTDP model to provide a breakdown of the computational complexity of constructing optimal teams under problem domains divided along the dimensions of observability and communication cost. We then exploit the COM-MTDP’s ability to encode existing teamwork theories and models to encode two instantiations of joint intentions theory, including STEAM. We then derive a domain-independent criterion for optimal communication and provide a comparative analysis of the two joint intentions instantiations. We have implemented a reusable, domain-independent software package based COM-MTDPs to analyze teamwork coordination strategies, and we demonstrate its use by encoding and evaluating the two joint intentions strategies within an example domain. %B First Autonomous Agents and Multiagent Systems Conference (AAMAS) %G eng %0 Conference Paper %B International Symposium on RoboCup (RoboCup'01) %D 2002 %T Task allocation in the RoboCup Rescue simulation domain: A short note %A Ranjit Nair %A T. Ito %A Tambe, Milind %A S. Marsella %X We consider the problem of disaster mitigation in the RoboCup Rescue Simulation Environment [3] to be a task allocation problem where the tasks arrive dynamically and can change in intensity. These tasks can be performed by ambulance teams, re brigades and police forces with the help of an ambulance center, a re station and a police oce. However the agents don't get automatically noti ed of the tasks as soon as they arrive and hence it is necessary for the agents to explore the simulated world to discover new tasks and to notify other agents of these. In this paper we focus on the problem of task allocation. We have developed two approaches, a centralized combinatorial auction mechanism demonstrated at Agents-2001 and a distributed method which helped our agents nish third in RoboCup-Rescue 2001. With regard to task discovery, we use a greedy search method to explore the world{ agents count the number of times they have visited each node, and attempt to visit nodes that have been visited the least number of times. %B International Symposium on RoboCup (RoboCup'01) %G eng %0 Conference Paper %B AAAI Spring Symposium on Intelligent Distributed and Embedded Systems %D 2002 %T Team coordination among distributed agents: Analyzing key teamwork theories and models %A D. V. Pynadath %A Tambe, Milind %X Multiagent research has made significant progress in constructing teams of distributed entities (e.g., robots, agents, embedded systems) that act autonomously in the pursuit of common goals. There now exist a variety of prescriptive theories, as well as implemented systems, that can specify good team behavior in different domains. However, each of these theories and systems addresses different aspects of the teamwork problem, and each does so in a different language. In this work, we seek to provide a unified framework that can capture all of the common aspects of the teamwork problem (e.g., heterogeneous, distributed entities, uncertain and dynamic environment), while still supporting analyses of both the optimality of team performance and the computational complexity of the agents’ decision problem. Our COMmunicative Multiagent Team Decision Problem (COM-MTDP) model provides such a framework for specifying and analyzing distributed teamwork. The COM-MTDP model is general enough to capture many existing models of multiagent systems, and we use this model to provide some comparative results of these theories. We also provide a breakdown of the computational complexity of constructing optimal teams under various classes of problem domains. We then use the COM-MTDP model to compare (both analytically and empirically) two specific coordination theories (joint intentions theory and STEAM) against optimal coordination, in terms of both performance and computational complexity. %B AAAI Spring Symposium on Intelligent Distributed and Embedded Systems %G eng %0 Conference Paper %B AAAI Spring Symposium on Intelligent Distributed and Embedded Systems %D 2002 %T Team formation for reformation %A Ranjit Nair %A Tambe, Milind %A S. Marsella %X The utility of the multi-agent team approach for coordination of distributed agents has been demonstrated in a number of large-scale systems for sensing and acting like sensor networks for real-time tracking of moving targets (Modi et al. 2001) and disaster rescue simulation domains, such as RoboCup Rescue Simulation Domain (Kitano et al. 1999; Tadokoro et al. 2000) These domains contain tasks that can be performed only by collaborative actions of the agents. Incomplete or incorrect knowledge owing to constrained sensing and uncertainty of the environment further motivate the need for these agents to explicitly work in teams. A key precursor to teamwork is team formation, the problem of how best to organize the agents into collaborating teams that perform the tasks that arise. For instance, in the disaster rescue simulation domain, injured civilians in a burning building may require teaming of two ambulances and three nearby fire-brigades to extinguish the fire and quickly rescue the civilians. If there are several such fires and injured civilians, the teams must be carefully formed to optimize performance. Our work in team formation focuses on dynamic, realtime environments, such as sensor networks (Modi et al. 2001) and RoboCup Rescue Simulation Domain (Kitano et al. 1999; Tadokoro et al. 2000). In such domains teams must be formed rapidly so tasks are performed within given deadlines, and teams must be reformed in response to the dynamic appearance or disappearance of tasks. The problems with the current team formation work for such dynamic real-time domains are two-fold: i) most team formation algorithms (Tidhar, Rao, & Sonenberg 1996; Hunsberger & Grosz 2000; Fatima & Wooldridge 2001; Horling, Benyo, & Lesser 2001; Modi et al. 2001) are static. In order to adapt to the changing environment the static algorithm would have to be run repeatedly, ii) Team formation has largely relied on experimental work, without any theoretical analysis of key properties of team formation algorithms, such as their worst-case complexity. This is especially important because of the real-time nature of the domains. In this paper we take initial steps to attack both these problems. As the tasks change and members of the team fail, the current team needs to evolve to handle the changes. In both the sensor network domain (Modi et al. 2001) and RoboCup. Rescue (Kitano et al. 1999; Tadokoro et al. 2000), each re-organization of the team requires time (e.g., fire-brigades may need to drive to a new location) and is hence expensive because of the need for quick response. Clearly, the current configuration of agents is relevant to how quickly and well they can be re-organized in the future. Each reorganization of the teams should be such that the resulting team is effective at performing the existing tasks but also flexible enough to adapt to new scenarios quickly. We refer to this reorganization of the team as ”Team Formation for Reformation”. In order to solve the “Team Formation for Reformation” problem, we present R-COM-MTDPs (Roles and Communication in a Markov Team Decision Process), a formal model based on communicating decentralized POMDPs, to address the above shortcomings. RCOM-MTDP significantly extends an earlier model called COM-MTDP (Pynadath & Tambe 2002), by making important additions of roles and agents’ local states, to more closely model current complex multiagent teams. Thus, RCOM-MTDP provides decentralized optimal policies to take up and change roles in a team (planning ahead to minimize reorganization costs), and to execute such roles. R-COM-MTDPs provide a general tool to analyze roletaking and role-executing policies in multiagent teams. We show that while generation of optimal policies in R-COMMTDPs is NEXP-complete, different communication and observability conditions significantly reduce such complexity. In this paper, we use the disaster rescue domain to motivate the “Team Formation for Reformation” problem. We present real world scenarios where such an approach would be useful and use the RoboCup Rescue Simulation Environment (Kitano et al. 1999; Tadokoro et al. 2000) to explain the working of our model. %B AAAI Spring Symposium on Intelligent Distributed and Embedded Systems %G eng %0 Conference Paper %B International Symposium on RoboCup (RoboCup'02) %D 2002 %T Team formation for reformation in multiagent domains like RoboCupRescue %A Ranjit Nair %A Tambe, Milind %A S. Marsella %X Team formation, i.e., allocating agents to roles within a team or subteams of a team, and the reorganization of a team upon team member failure or arrival of new tasks are critical aspects of teamwork. They are very important issues in RoboCupRescue where many tasks need to be done jointly. While empirical comparisons (e.g., in a competition setting as in RoboCup) are useful, we need a quantitative analysis beyond the competition | to understand the strengths and limitations of di erent approaches, and their tradeo s as we scale up the domain or change domain properties. To this end, we need to provide complexityoptimality tradeo s, which have been lacking not only in RoboCup but in the multiagent eld in general. To alleviate these diculties, this paper presents R-COM-MTDP, a formal model based on decentralized communicating POMDPs, where agents explicitly take on and change roles to (re)form teams. R-COM-MTDP signi cantly extends an earlier COM-MTDP model, by introducing roles and local states to better model domains like RoboCupRescue where agents can take on di erent roles and each agent has a local state consisting of the ob jects in its vicinity. R-COM-MTDP tells us where the problem is highly intractable (NEXP-complete) and where it can be tractable (P-complete), and thus understand where algorithms may need to tradeo optimality and where they could strive for near optimal behaviors. R-COM-MTDP model could enable comparison of various team formation and reformation strategies | including the strategies used by our own teams that came in the top three in 2001 | in the RoboCup Rescue domain and beyond. %B International Symposium on RoboCup (RoboCup'02) %G eng %0 Journal Article %J Journal of AI Research (JAIR) %D 2002 %T Towards adjustable autonomy for the real-world %A Paul Scerri %A D. V. Pynadath %A Tambe, Milind %X Adjustable autonomy refers to entities dynamically varying their own autonomy, transferring decision-making control to other entities (typically agents transferring control to human users) in key situations. Determining whether and when such transfers-of-control should occur is arguably the fundamental research problem in adjustable autonomy. Previous work has investigated various approaches to addressing this problem but has often focused on individual agent-human interactions. Unfortunately, domains requiring collaboration between teams of agents and humans reveal two key shortcomings of these previous approaches. First, these approaches use rigid one-shot transfers of control that can result in unacceptable coordination failures in multiagent settings. Second, they ignore costs (e.g., in terms of time delays or e ects on actions) to an agent's team due to such transfers-ofcontrol. To remedy these problems, this article presents a novel approach to adjustable autonomy, based on the notion of a transfer-of-control strategy. A transfer-of-control strategy consists of a conditional sequence of two types of actions: (i) actions to transfer decisionmaking control (e.g., from an agent to a user or vice versa) and (ii) actions to change an agent's pre-speci ed coordination constraints with team members, aimed at minimizing miscoordination costs. The goal is for high-quality individual decisions to be made with minimal disruption to the coordination of the team. We present a mathematical model of transfer-of-control strategies. The model guides and informs the operationalization of the strategies using Markov Decision Processes, which select an optimal strategy, given an uncertain environment and costs to the individuals and teams. The approach has been carefully evaluated, including via its use in a real-world, deployed multi-agent system that assists a research group in its daily activities. %B Journal of AI Research (JAIR) %V 17 %P 171-228 %G eng %0 Conference Paper %B First Autonomous Agents and Multiagent Systems Conference (AAMAS) %D 2002 %T Why the elf acted autonomously: Towards a theory of adjustable autonomy %A Paul Scerri %A D. V. Pynadath %A Tambe, Milind %X Adjustable autonomy refers to agents' dynamically varying their own autonomy, transferring decision making control to other entities (typically human users) in key situations. Determining whether and when such transfer of control must occur is arguably the fundamental research question in adjustable autonomy. Practical systems have made significant in-roads in answering this question and in providing high-level guidelines for transfer of control decisions. For instance, [11] report that Markov decision processes were successfully used in transfer of control decisions in a real-world multiagent system, but that use of C4.5 led to failures. Yet, an underlying theory of transfer of control, that would explain such successes or failures is missing. To take a step in building this theory, we introduce the notion of a transfer-of-control strategy, which potentially involves several transfer of control actions. A mathematical model based on this notion allows both analysis of previously reported implementations and guidance for the design of new implementations. The practical benefits of this model are illustrated in a dramatic simplification of an existing adjustable autonomy system. %B First Autonomous Agents and Multiagent Systems Conference (AAMAS) %G eng %0 Conference Paper %B First International Workshop on Infrastructures for Scalable multi-agent systems, Springer Lecture Notes in Computer Science %D 2001 %T Adaptive infrastructures for agent integration %A David V. Pynadath %A Tambe, Milind %A Gal A. Kaminka %X With the proliferation of software agents and smart hardware devices there is a growing realization that large-scale problems can be addressed by integration of such stand-alone systems. This has led to an increasing interest in integration infrastructures that enable a heterogeneous variety of agents and humans to work together. In our work, this infrastructure has taken the form of an integration architecture called Teamcore. We have deployed Teamcore to facilitate/enable collaboration between different agents and humans that differ in their capabilities, preferences, the level of autonomy they are willing to grant the integration architecture, their information requirements and performance. This paper first provides a brief overview of the Teamcore architecture and its current applications. The paper then discusses some of the research challenges we have focused on. In particular, the Teamcore architecture is based on general purpose teamwork coordination capabilities. However, it is important for this architecture to adapt to meet the needs and requirements of specific individuals. We describe the different techniques of architectural adaptation, and present initial experimental results. %B First International Workshop on Infrastructures for Scalable multi-agent systems, Springer Lecture Notes in Computer Science %G eng %0 Conference Paper %B Intelligent Agents VII Proceedings of the International workshop on Agents, theories, architectures and languages %D 2001 %T Adjustable Autonomy: A Response %A Tambe, Milind %A D. V. Pynadath %A Paul Scerri %X Gaining a fundamental understanding of adjustable autonomy (AA) is critical if we are to deploy multi-agent systems in support of critical human activities. Indeed, our recent work with intelligent agents in the “Electric Elves” (E-Elves) system has convinced us that AA is a critical part of any human collaboration software. In the following, we first briefly describe E-Elves, then discuss AA issues in E-Elves. %B Intelligent Agents VII Proceedings of the International workshop on Agents, theories, architectures and languages %G eng %0 Conference Paper %B International Conference on Autonomous Agents (Agents'01) %D 2001 %T Adjustable autonomy in real-world multi-agent environments %A Paul Scerri %A D. V. Pynadath %A Tambe, Milind %X Through adjustable autonomy (AA), an agent can dynamically vary the degree to which it acts autonomously, allowing it to exploit human abilities to improve its performance, but without becoming overly dependent and intrusive in its human interaction. AA research is critical for successful deployment of multi-agent systems in support of important human activities. While most previous AA work has focused on individual agent-human interactions, this paper focuses on teams of agents operating in real-world human organizations. The need for agent teamwork and coordination in such environments introduces novel AA challenges. First, agents must be more judicious in asking for human intervention, because, although human input can prevent erroneous actions that have high team costs, one agent’s inaction while waiting for a human response can lead to potential miscoordination with the other agents in the team. Second, despite appropriate local decisions by individual agents, the overall team of agents can potentially make global decisions that are unacceptable to the human team. Third, the diversity in real-world human organizations requires that agents gradually learn individualized models of the human members, while still making reasonable decisions even before sufficient data are available. We address these challenges using a multi-agent AA framework based on an adaptive model of users (and teams) that reasons about the uncertainty, costs, and constraints of decisions at all levels of the team hierarchy, from the individual users to the overall human organization. We have implemented this framework through Markov decision processes, which are well suited to reason about the costs and uncertainty of individual and team actions. Our approach to AA has proven essential to the success of our deployed multi-agent Electric Elves system that assists our research group in rescheduling meetings, choosing presenters, tracking people’s locations, and ordering meals. %B International Conference on Autonomous Agents (Agents'01) %G eng %0 Conference Paper %B International Conference on Autonomous Agents (Agents'01) %D 2001 %T Argumentation as Distributed Constraint Satisfaction: Applications and Results %A H. Jung %A Tambe, Milind %X Conflict resolution is a critical problem in distributed and collaborative multi-agent systems. Negotiation via argumentation (NVA), where agents provide explicit arguments or justifications for their proposals for resolving conflicts, is an effective approach to resolve conflicts. Indeed, we are applying argumentation in some realworld multi-agent applications. However, a key problem in such applications is that a well-understood computational model of argumentation is currently missing, making it difficult to investigate convergence and scalability of argumentation techniques, and to understand and characterize different collaborative NVA strategies in a principled manner. To alleviate these difficulties, we present distributed constraint satisfaction problem (DCSP) as a computational model for investigating NVA. We model argumentation as constraint propagation in DCSP. This model enables us to study convergence properties of argumentation, and formulate and experimentally compare 16 different NVA strategies with different levels of agent cooperativeness towards others. One surprising result from our experiments is that maximizing cooperativeness is not necessarily the best strategy even in a completely cooperative environment. The paper illustrates the usefulness of these results in applying NVA to multi-agent systems, as well as to DCSP systems in general. %B International Conference on Autonomous Agents (Agents'01) %G eng %0 Conference Paper %B International Conference on Principles and Practices of Constraint programming %D 2001 %T A Dynamic Distributed Constraint Satisfaction Approach to Resource Allocation %A Pragnesh J. Modi %A H. Jung %A Tambe, Milind %A W. Shen %A Kulkarni, S. %X In distributed resource allocation a set of agents must assign their resources to a set of tasks. This problem arises in many real-world domains such as disaster rescue, hospital scheduling and the domain described in this paper: distributed sensor networks. Despite the variety of approaches proposed for distributed resource allocation, a systematic formalization of the problem and a general solution strategy are missing. This paper takes a step towards this goal by proposing a formalization of distributed resource allocation that represents both dynamic and distributed aspects of the problem and a general solution strategy that uses distributed constraint satisfaction techniques. This paper defines the notion of Dynamic Distributed Constraint Satisfaction Problem (DyDCSP) and proposes two generalized mappings from distributed resource allocation to DyDCSP, each proven to correctly perform resource allocation problems of specific difficulty and this theoretical result is verified in practice by an implementation on a real-world distributed sensor network %B International Conference on Principles and Practices of Constraint programming %G eng %0 Conference Paper %B Intelligent Agents VIII Proceedings of the International workshop on Agents, theories, architectures and languages (ATAL'01) %D 2001 %T Dynamic distributed resource allocation: A distributed constraint satisfaction approach %A Pragnesh J. Modi %A H. Jung %A Tambe, Milind %A W. Shen %A Kulkarni, S. %X In distributed resource allocation a set of agents must assign their resources to a set of tasks. This problem arises in many real-world domains such as distributed sensor networks, disaster rescue, hospital scheduling and others. Despite the variety of approaches proposed for distributed resource allocation, a systematic formalization of the problem, explaining the different sources of difficulties, and a formal explanation of the strengths and limitations of key approaches is missing. We take a step towards this goal by proposing a formalization of distributed resource allocation that represents both dynamic and distributed aspects of the problem. We define four categories of difficulties of the problem. To address this formalized problem, the paper defines the notion of Dynamic Distributed Constraint Satisfaction Problem (DDCSP). The central contribution of the paper is a generalized mapping from distributed resource allocation to DDCSP. This mapping is proven to correctly perform resource allocation problems of specific difficulty. This theoretical result is verified in practice by an implementation on a real-world distributed sensor network. %B Intelligent Agents VIII Proceedings of the International workshop on Agents, theories, architectures and languages (ATAL'01) %G eng %0 Conference Paper %B International Conference on Innovative Applications of AI (IAAI'01) %D 2001 %T Electric Elves: Applying Agent Technology to Support Human Organizations %A H. Chalupsky %A Y. Gil %A Craig Knoblock %A K. Lerman %A J. Oh %A D. V. Pynadath %A T. Russ %A Tambe, Milind %X The operation of a human organization requires dozens of everyday tasks to ensure coherence in organizational activities, to monitor the status of such activities, to gather information relevant to the organization, to keep everyone in the organization informed, etc. Teams of software agents can aid humans in accomplishing these tasks, facilitating the organization’s coherent functioning and rapid response to crises, while reducing the burden on humans. Based on this vision, this paper reports on Electric Elves, a system that has been operational, 24/7, at our research institute since June 1, 2000. Tied to individual user workstations, fax machines, voice, mobile devices such as cell phones and palm pilots, Electric Elves has assisted us in routine tasks, such as rescheduling meetings, selecting presenters for research meetings, tracking people’s locations, organizing lunch meetings, etc. We discuss the underlying AI technologies that led to the success of Electric Elves, including technologies devoted to agenthuman interactions, agent coordination, accessing multiple heterogeneous information sources, dynamic assignment of organizational tasks, and deriving information about organization members. We also report the results of deploying Electric Elves in our own research organization. %B International Conference on Innovative Applications of AI (IAAI'01) %G eng %0 Conference Paper %B AAAI Spring Symposium on Decision theoretic and Game Theoretic Agents %D 2001 %T MDPs for Adjustable autonomy in a real-world multi-agent environment %A D. V. Pynadath %A Paul Scerri %A Tambe, Milind %X Research on adjustable autonomy (AA) is critical if we are to deploy multiagent systems in support of important human activities. Through AA, an agent can dynamically vary its level of autonomy — harnessing human abilities when needed, but also limiting such interaction. While most previous AA work has focused on individual agent-human interactions, this paper focuses on agent teams embedded in human organizations in the context of real-world applications. The need for agent teamwork and coordination in such environments introduces novel AA challenges. In particular, transferring control to human users becomes more difficult, as a lack of human response can cause agent team miscoordination, yet not transferring control causes agents to take enormous risks. Furthermore, despite appropriate individual agent decisions, the agent teams may reach decisions that are completely unacceptable to the human team. We address these challenges by pursuing a two-part decisiontheoretic approach. First, to avoid team miscoordination due to transfer of control decisions, an agent: (i) considers the cost of potential miscoordination with teammates; (ii) does not rigidly commit to a transfer of control decision; (iii) if forced into a risky autonomous action to avoid miscoordination, considers changes in the team’s coordination that mitigate the risk. Second, to ensure effective team decisions, not only individual agents, but also subteams and teams can dynamically adjust their own autonomy. We implement these ideas using Markov Decision Processes, providing a decision-theoretic basis for reasoning about costs and uncertainty of individual and team actions. This approach is central to our deployed multi-agent system, called Electric Elves, that assists our research group in rescheduling meetings, choosing presenters, tracking people’s locations and ordering meals. %B AAAI Spring Symposium on Decision theoretic and Game Theoretic Agents %G eng %0 Conference Paper %B International Conference on Autonomous Agents (Agents'01) %D 2001 %T Monitoring Deployed Agent Teams %A Gal Kaminka %A D. V. Pynadath %A Tambe, Milind %X Recent years have seen an increasing need for on-line monitoring of deployed distributed teams of cooperating agents, for visualization, for performance tracking, etc. However, in deployed applications, we often cannot rely on the agents communicating their state to the monitoring system: (a) we rarely have the ability to change the behavior of already-deployed agents such that they communicate the required information (e.g., in legacy or proprietary systems); (b) different monitoring goals require different information to be communicated (e.g., agents’ beliefs vs. plans); and (c) communications may be expensive, unreliable, or insecure. This paper presents a non-intrusive approach based on plan-recognition, in which the monitored agents’ state is inferred from observations of their normal course of actions. In particular, we focus on inference of the team state based on its observed routine communications, exchanged as part of coordinated task execution. The paper includes the following key novel contributions: (i) a linear time probabilistic plan-recognition algorithm, particularly well-suited for processing communications as observations; (ii) an approach to exploiting general knowledge of teamwork to predict agent responses during normal and failing execution, to reduce monitoring uncertainty; and (iii) a technique for trading expressivity for scalability, representing only certain useful monitoring hypotheses, but allowing for any number of agents and their different activities, to be represented in a single coherent entity. Our empirical evaluation illustrates that monitoring based on observed routine communications enables significant monitoring accuracy, while not being intrusive. The results also demonstrate a key lesson: A combination of complementary low-quality techniques is cheaper, and better, than a single, highly optimized monitoring approach. %B International Conference on Autonomous Agents (Agents'01) %G eng %0 Conference Paper %B Intelligent Agents VIII Proceedings of the International workshop on Agents, theories, architectures and languages (ATAL'01) %D 2001 %T Revisiting Asimov's First Law: A Response to the Call to Arms %A D. V. Pynadath %A Tambe, Milind %X The deployment of autonomous agents in real applications promises great benefits, but it also risks potentially great harm to humans who interact with these agents. Indeed, in many applications, agent designers pursue adjustable autonomy (AA) to enable agents to harness human skills when faced with the inevitable difficulties in making autonomous decisions. There are two key shortcomings in current AA research. First, current AA techniques focus on individual agent-human interactions, making assumptions that break down in settings with teams of agents. Second, humans who interact with agents want guarantees of safety, possibly beyond the scope of the agent’s initial conception of optimal AA. Our approach to AA integrates Markov Decision Processes (MDPs) that are applicable in team settings, with support for explicit safety constraints on agents’ behaviors. We introduce four types of safety constraints that forbid or require certain agent behaviors. The paper then presents a novel algorithm that enforces obedience of such constraints by modifying standard MDP algorithms for generating optimal policies. We prove that the resulting algorithm is correct and present results from a real-world deployment. %B Intelligent Agents VIII Proceedings of the International workshop on Agents, theories, architectures and languages (ATAL'01) %G eng %0 Conference Paper %B AAAI Fall Symposium 2001 on Agent Negotiation %D 2001 %T Towards Argumentation as Distributed Constraint Satisfaction %A H. Jung %A Tambe, Milind %X Conflict resolution is a critical problem in distributed and collaborative multi-agent systems. Negotiation via argumentation (NVA), where agents provide explicit arguments (justifications) for their proposals to resolve conflicts, is an effective approach to resolve conflicts. Indeed, we are applying argumentation in some real-world multi-agent applications. However, a key problem in such applications is that a well-understood computational model of argumentation is currently missing, making it difficult to investigate convergence and scalability of argumentation techniques, and to understand and characterize different collaborative NVA strategies in a principled manner. To alleviate these difficulties, we present distributed constraint satisfaction problem (DCSP) as a computational model for NVA. We model argumentation as constraint propagation in DCSP. This model enables us to study convergence properties of argumentation, and formulate and experimentally compare two sets of 16 different NVA strategies (over 30 strategies in all) with different levels of agent cooperativeness towards others. One surprising result from our experiments is that maximizing cooperativeness is not necessarily the best strategy even in a completely cooperative environment. In addition to their usefulness in understanding computational properties of argumentation, these results could also provide new heuristics for speeding up DCSPs. %B AAAI Fall Symposium 2001 on Agent Negotiation %G eng %0 Conference Paper %B International Conference on Multi-agent Systems (ICMAS) %D 2000 %T Adaptive agent architectures for heterogeneous team members %A Tambe, Milind %A D. V. Pynadath %A C. Chauvat %A A. Das %A Gal Kaminka %X With the proliferation of software agents and smart hardware devices there is a growing realization that large-scale problems can be addressed by integration of such standalone systems. This has led to an increasing interest in integration architectures that enable a heterogeneous variety of agents and humansto work together. These agents and humans differ in their capabilities, preferences, the level of autonomy they are willing to grant the integration architecture and their information requirements and performance. The challenge in coordinating such a diverse agentset isthat potentially a large number of domain-specific and agentspecific coordination plans may be required. We present a novel two-tiered approach to address this coordination problem. We first provide the integration architecture with general purpose teamwork coordination capabilities, but then enable adaptation of such capabilities for the needs or requirements of specific individuals. A key novel aspect of this adaptation is that it takes place in the context of other heterogeneous team members. We are realizing this approach in an implemented distributed agent integration architecture called Teamcore. Experimental results from two different domains are presented. %B International Conference on Multi-agent Systems (ICMAS) %G eng %0 Magazine Article %D 2000 %T Agent Assistants for Team Analysis %A Tambe, Milind %A T. Raines %A S. Marsella %X With the growing importance of multi-agent teamwork, tools that can help humans analyze, evaluate, and understand team behaviors are becoming increasingly important as well. To this end, we are creating ISAAC, a team analyst agent for post-hoc, off-line agentteam analysis. ISAAC's novelty stems from a key design constraint that arises in team analysis: multiple types of models of team behavior are necessary to analyze different granularities of team events, including agent actions, interactions, and global performance. These heterogeneous team models are automatically acquired via machine learning over teams' external behavior traces, where the specific learning techniques are tailored to the particular model learned. Additionally, ISAAC employs multiple presentation techniques that can aid human understanding of the analyses. This paper presents ISAAC's general conceptual framework and its application in the RoboCup soccer domain, where ISAAC was awarded the RoboCup scientific challenge award. %B AI Magazine %G eng %0 Conference Paper %B International conference on Autonomous Agents (Agents) %D 2000 %T Automated agents that help humans understand agent team behaviors %A T. Raines %A Tambe, Milind %A S. Marsella %X Multi-agent teamwork is critical in a large number of agent applications, including training, education, virtual enterprises and collective robotics. Tools that can help humans analyze, evaluate, and understand team behaviors are becoming increasingly important as well. We have taken a step towards building such a tool by creating an automated analyst agent called ISAAC for post-hoc, off-line agent-team analysis. ISAAC’s novelty stems from a key design constraint that arises in team analysis: multiple types of models of team behavior are necessary to analyze different granularities of team events, including agent actions, interactions, and global performance. These heterogeneous team models are automatically acquired via machine learning over teams’ external behavior traces, where the specific learning techniques are tailored to the particular model learned. Additionally, ISAAC employs multiple presentation techniques that can aid human understanding of the analyses. This paper presents ISAAC’s general conceptual framework, motivating its design, as well as its concrete application in the domain of RoboCup soccer. In the RoboCup domain, ISAAC was used prior to and during the RoboCup’99 tournament, and was awarded the RoboCup scientific challenge award. %B International conference on Autonomous Agents (Agents) %G eng %0 Magazine Article %D 2000 %T Building dynamic agent organizations in cyberspace %A Tambe, Milind %A D. V. Pynadath %A N. Chauvat %X With the promise of agent-based systems, a variety of research/industrial groups are developing autonomous, heterogeneous agents, that are distributed over a variety of platforms and environments in cyberspace. Rapid integration of such distributed, heterogeneous agents would enable software to be rapidly developed to address large-scale problems of interest. Unfortunately, rapid and robust integration remains a difficult challenge. To address this challenge, we are developing a novel teamwork-based agent integration framework. In this framework, software developers specify an agent organization called a team-oriented program. To recruit agents for this organization, an agent resources manager (an analogue of a “human resources manager”) searches the cyberspace for agents of interest to this organization, and monitors their performance over time. Agents in this organization are wrapped with TEAMCORE wrappers, that make them team ready, and thus ensure robust, flexible teamwork among the members of the newly formed organization. This implemented framework promises to reduce the software development effort in agent integration while providing robustness due to its teamwork-based foundations. A concrete, running example, based on heterogeneous, distributed agents is presented. %B IEEE Internet Computing %V 4 %G eng %N 2 %0 Book Section %B Conflicting agents %D 2000 %T Conflicts in agent teams %X Multi-agent teamwork is a critical capability in a large number of applications. Yet, despite the considerable progress in teamwork research, the challenge of intra-team conflict resolution has remained largely unaddressed. This chapter presents a system called CONSA, to resolve conflicts using argumentation-based negotiations. The key insight in CONSA(COllaborative Negotiation System based on Argumentation) is to fully exploit the benefits of argumentation in a team setting. Thus, CONSA casts conflict resolution as a team problem, so that the recent advances in teamwork can be fully brought to bear during conflict resolution to improve argumentation flexibility. Furthermore, since teamwork conflicts often involve past teamwork, recently developed teamwork models can be exploited to provide agents with reusable argumentation knowledge. Additionally, CONSA also includes argumentation strategies geared towards benefiting the team rather than the individual, and techniques to reduce argumentation overhead. We present detailed algorithms used in CONSA and shows a detailed trace from CONSA’s implementations. %B Conflicting agents %I Kluwer academic publishers %G eng %0 Conference Paper %B AAAI Fall Symposium on Socially Intelligent Agents --- the human in the loop %D 2000 %T Don't cancel my Barcelona trip: Adjusting the autonomy of agent proxies in human organizations %A Paul Scerri %A Tambe, Milind %A H. Lee %A D. V. Pynadath %X Teamwork is a critical capability in multiagent environments Many such en vironments mandate that the agents and agentteams must be persistent ie exist over long periods of time Agents in such persistent teams are bound together by their longterm common interests and goals This paper focuses on exible teamwork in such persistent teams Unfortunately while previous work has investigated exible teamwork persistent teams remain unexplored For exible teamwork one promising approach that has emerged is modelbased ie providing agents with general models of teamwork that explicitly specify their commitments in teamwork Such models enable agents to autonomously reason about coordination Unfortunately for persistent teams such models may lead to coordination and communication actions that while locally optimal are highly problematic for the teams longterm goals We present a decisiontheoretic technique based on Markov decision processes to enable persistent teams to over come such limitations of the modelbased approach In particular agents reason about expected team utilities of future team states that are pro jected to result from actions recommended by the teamwork model as well as lowercost or highercost variations on these actions To accomodate realtime constraints this reasoning is done in an anytime fashion Implemented examples from an analytic search tree and some realworld domains are presented. %B AAAI Fall Symposium on Socially Intelligent Agents --- the human in the loop %G eng %0 Conference Paper %B AAAI Fall Symposium on Socially Intelligent Agents --- the human in the loop %D 2000 %T Electric Elves: Immersing an agent organization in a human organization %A D. V. Pynadath %A Tambe, Milind %A Y. Arens %A H. Chalupsky %X Future large-scale human organizations will be highly agentized, with software agents supporting the traditional tasks of information gathering, planning, and execution monitoring, as well as having increased control of resources and devices (communication and otherwise). As these heterogeneous software agents take on more of these activities, they will face the additional tasks of interfacing with people and sometimes acting as their proxies. Dynamic teaming of such heterogeneous agents will enable organizations to act coherently, to robustly attain their mission goals, to react swiftly to crises, and to dynamically adapt to events. Advances in this agentization could potentially assist all organizations, including the military, civilian disaster response organizations, corporations, and universities and research institutions. Within an organization, we envision that agent-based technology will facilitate (and sometimes supervise) all collaborative activities. For a research institution, agentization may facilitate such activities as meeting organization, paper composition, software development, and deployment of people and equipment for out-of-town demonstrations. For a military organization, agentization may enable the teaming of military units and equipment for rapid deployment, the monitoring of the progress of such deployments, and the rapid response to any crises that may arise. To accomplish such goals, we envision the presence of agent proxies for each person within an organization. Thus, for instance, if an organizational crisis requires an urgent deployment of a team of people and equipment, then agent proxies could dynamically volunteer for team membership on behalf of the people or resources they represent, while also ensuring that the selected team collectively possesses sufficient resources and capabilities. The proxies must also manage efficient transportation of such resources, the monitoring of the progress of individual participants and of the mission as a whole, and the execution of corrective actions when goals appear to be endangered. The complexity inherent in human organizations complicates all of these tasks and provides a challenging research testbed for agent technology. First, there is the key research question of adjustable autonomy. In particular, agents acting as proxies for people must automatically adjust their own autonomy, e.g., avoiding critical errors, possibly by letting people make important decisions while autonomously making the more routine decisions. Second, human organizations operate continually over time, and the agents must operate continually as well. In fact, the agent systems must be up and running 24 hours a day 7 days a week (24/7). Third, people, as well as their associated tasks are very heterogeneous, having a wide and rich variety of capabilities, interests, preferences, etc. To enable teaming among such people for crisis response or other organizational tasks, agents acting as proxies must represent and reason with such capabilities and interests. We thus require powerful matchmaking capabilities to match two people with similar interests. Fourth, human organizations are often large, so providing proxies often means a big scale-up in the number of agents, as compared against typical multiagent systems in current operation. Our Electric Elves project is currently investigating the above research issues and the impact of agentization on human organizations in general, using our own Intelligent Systems Division of USC/ISI as a testbed. Within our research institution, we intend that our Electric Elves agent proxies automatically manage tasks such as: Select teams of researchers for giving a demonstration out of town, plan all of their travel arrangements and ship relevant equipment; also, resolve problems that come up during such a demonstration (e.g., a selected researcher becomes ill at the last minute) Determine the researchers interested in meeting with a visitor to our institute, and schedule meetings with the visitor Reschedule meetings if one or more users are absent or unable to arrive on time at a meeting Monitor the location of users and keep others informed (within privacy limits) about their whereabouts This short paper presents an overview of our project, as space limitations preclude a detailed discussion of the research issues and operation of the current system. We do have a working prototype of about 10 agent proxies running almost continuously, managing the schedules of one research group. In the following section, we first present an overview of the agent organization, which immerses several heterogeneous agents and sets of agents within the existing human organization of our division. Following that, we describe the current state of the system, and then conclude. %B AAAI Fall Symposium on Socially Intelligent Agents --- the human in the loop %G eng %0 Thesis %D 2000 %T Execution Monitoring in Multi-Agent Environments %A Gal Kaminka %X Agents in complex, dynamic, multi-agent environments face uncertainty in the execution of their tasks, as their sensors, plans, and actions may fail unexpectedly, e.g., the weather may render a robots camera useless, its grip too slippery, etc. The explosive number of states in such environments prohibits any resource-bounded designer from predicting all failures at design time. This situation is exacerbated in multi-agent settings, where interactions between agents increase the complexity. For instance, it is difficult to predict an opponent's behavior. Agents in such environments must therefore rely on runtime execution monitoring and diagnosis to detect a failure, diagnose it, and recover. Previous approaches have focused on supplying the agent with goal-attentive knowledge of the ideal behavior expected of the agent with respect to its goals. These approaches encounter key pitfalls and fail to exploit key opportunities in multi-agent settings: (a) only a subset of the sensors (those that measure achievement of goals) are used, despite other agents' sensed behavior that can be used to indirectly sense the environment or complete the agent's knowledge; (b) there is no monitoring of social relationships that must be maintained between the agents regardless of achievement of the goal (e.g., teamwork); and (c) there is no recognition of failures in others, though these change the ideal behavior expected of an agent (for instance, assisting a failing teammate). To address these problems, we investigate a novel complementary paradigm for multi-agent monitoring and diagnosis. Socially-Attentive Monitoring (SAM) focuses on monitoring the social relationships between the agents as they are executing their tasks, and uses models of multiple agents and their relationships in monitoring and diagnosis. We hypothesize that failures to maintain relationships would be indicative of failures in behavior, and diagnosis of relationships can be used to complement goal-attentive methods. In particular, SAM addresses the weaknesses listed above: (a) it allows inference of missing knowledge and sensor readings through other agents' sensed behavior; (b) it directly monitors social relationships, with no attention to the goals; and (c) it allows recognition of failures in others (even if they are not using SAM!). SAM currently uses the STEAM teamwork model, and a role-similarity relationship model to monitor agents. It relies on plan-recognition to infer agents' reactive-plan hierarchies from their observed actions. These hierarchies are compared in a top-down fashion to find relationship violations, e.g., cases where two agents selected different plans despite their being on the same team. Such detections trigger diagnosis which uses the relationship models to facilitate recovery. For example, in teamwork, a commitment to joint selection of plans further mandates mutual belief in preconditions. Thus a difference in selected plans may be explained by a difference in preconditions, and can lead to recovery using negotiations. We empirically and analytically investigate SAM in two dynamic, complex, multi-agent domains: the ModSAF battlefield simulation, where SAM is employed by helicopter pilot agents; and the RoboCup soccer simulation where SAM is used by a coach agent to monitor teams' behavior. We show that SAM can capture failures that are otherwise undetectable, and that distributed monitoring is better (correct and complete) detection) and simpler (no representation of ambiguity) than a centralized scheme (complete and incorrect, requiring representation of ambiguity). Key contributions and novelties include: (a) a general framework for socially-attentive monitoring, and a deployed implementation for monitoring teamwork; (b) rigorously proven guarantees on the applicability and results of practical socially-attentive monitoring of teamwork under conditions of uncertainty; (c) procedures for diagnosis based on a teamwork relationship model. Future work includes the use of additional relationship models in monitoring and diagnosis, formalization of the social diagnosis capabilities, and further demonstration of SAM's usefulness in current domains and others. %G eng %9 PhD thesis %0 Journal Article %J Journal of Autonomous Agents and Multi-agent Systems, special issue on Best of Agents '99 %D 2000 %T Experiences acquired in the design of Robocup teams: A comparison of two fielded teams %A S. Marsella %A J. Adibi %A Y. Alonaizan %A Gal Kaminka %A I. Muslea %A Tambe, Milind %X tract Increasingly multiagent systems are being designed for a variety of complex dynamic domains Eective agent interactions in such domains raise some of the most fundamental research challenges for agentbased systems in teamwork multiagent learning and agent modelling The RoboCup research initiative particularly the simulation league has been proposed to pursue such multiagent research challenges using the common testbed of simulation soccer Despite the significant popularity of RoboCup within the research community general lessons have not often been extracted from participation in RoboCup This is what we attempt to do here We have elded two teams ISIS and ISIS in RoboCup competitions These teams have been in the top four teams in these competitions We compare the teams and attempt to analyze and generalize the lessons learned This analysis reveals several surprises pointing out lessons for teamwork and for multi-agent learning. %B Journal of Autonomous Agents and Multi-agent Systems, special issue on Best of Agents '99 %V 4 %P 115-129 %G eng %0 Magazine Article %D 2000 %T Overview of RoboCup'98 %A M. Asada %A M. Veloso %A Tambe, Milind %A H. Kitano %A I. Noda %A G. K. Kraetzschmar %X The Robot World Cup Soccer Games and Conferences (RoboCup) are a series of competitions and events designed to promote the full integration of AI and robotics research. Following the first RoboCup, held in Nagoya, Japan, in 1997, RoboCup-98 was held in Paris from 2–9 July, overlapping with the real World Cup soccer competition. RoboCup-98 included competitions in three leagues: (1) the simulation league, (2) the real robot small-size league, and (3) the real robot middle- size league. Champion teams were CMUNITED-98 in both the simulation and the real robot smallsize leagues and CS-FREIBURG (Freiburg, Germany) in the real robot middle-size league. RoboCup-98 also included a Scientific Challenge Award, which was given to three research groups for their simultaneous development of fully automatic commentator systems for the RoboCup simulator league. Over 15,000 spectators watched the games, and 120 international media provided worldwide coverage of the competition. %B AI Magazine, Spring 2000 %G eng %0 Conference Paper %B ICMAS workshop on RoboCup Rescue %D 2000 %T RoboCup Rescue: A Proposal and Preliminary Experiences %A Ranjit Nair %A T. Ito %A Tambe, Milind %A S. Marsella %X Abstract RoboCup Rescue is an international project aimed at apply ing multiagent research to the domain of search and rescue in large scale disasters This paper reports our initial experiences with using the Robocup Rescue Simulator and building agents capable of making decisions based on observation of other agents behavior We also plan on analyzing team behavior to obtain rules that explain this behavior. %B ICMAS workshop on RoboCup Rescue %G eng %0 Journal Article %J Journal of Artificial Intelligence Research (JAIR) %D 2000 %T Robust agent teams via socially attentive monitoring %A Gal Kaminka %A Tambe, Milind %X

Agents in dynamic multi-agent environments must monitor their peers to execute individual and group plans. A key open question is how much monitoring of other agents' states is required to be effective: The Monitoring Selectivity Problem. We investigate this question in the context of detecting failures in teams of cooperating agents, via Socially-Attentive Monitoring, which focuses on monitoring for failures in the social relationships between the agents. We empirically and analytically explore a family of socially-attentive teamwork monitoring algorithms in two dynamic, complex, multi-agent domains, under varying conditions of task distribution and uncertainty. We show that a centralized scheme using a complex algorithm trades correctness for completeness and requires monitoring all teammates. In contrast, a simple distributed teamwork monitoring algorithm results in correct and complete detection of teamwork failures, despite relying on limited, uncertain knowledge, and monitoring only key agents in a team. In addition, we report on the design of a socially-attentive monitoring system and demonstrate its generality in monitoring several coordination relationships, diagnosing detected failures, and both on-line and off-line applications.

%B Journal of Artificial Intelligence Research (JAIR) %V 12 %P 105-147 %G eng %0 Journal Article %J Journal of Autonomous Agents and Multi-agent Systems, special issue on 'Best of ICMAS 98' %D 2000 %T Towards flexible teamwork in persistent teams: extended report %A Tambe, Milind %A Zhang, W. %X Teams of heterogeneous agents working within and alongside human organizations offer exciting possibilities for streamlining processes in ways not possible with conventional software[4, 6]. For example, personal software assistants and information gathering and scheduling agents can coordinate with each other to achieve a variety of coordination and organizational tasks, e.g. facilitating teaming of experts in an organization for crisis response and aiding in execution and monitoring of such a response[5]. Inevitably, due to the complexity of the environment, the unpredictability of human beings and the range of situation with which the multi-agent systems must deal, there will be times when the system does not produce the results it’s users desire. In such cases human intervention is required. Sometimes simple tweaks are required due to system failures. In other cases, perhaps because a particular user has more experience than the system, the user will want to “steer” the entire multi-agent system on a different course. For example, some researchers at USC/ISI, including ourselves, are currently focused on the Electric Elves project (http://www.isi.edu/agents-united). In this project humans will be agentified by providing agent proxies to act on their behalf, while entities such as meeting schedulers will be active agents that can communicate with the proxies to achieve a variety of scheduling and rescheduling tasks. In this domain at an individual level a user will sometimes want to override decisions of their proxy. At a team level a human will want to fix undesirable properties of overall team behavior, such as large breaks in a visitor’s schedule. However, to require a human to completely take control of an entire multi-agent system, or even a single agent, defeats the purpose for which the agents were deployed. Thus, while it is desirable that the multi-agent system should not assume full autonomy neither should it be a zero autonomy system. Rather, some form of Adjustable Autonomy (AA) is desired. A system supporting AA is able to dynamically change the autonomy it has to make and carry out decisions, i.e. the system can continuously vary its autonomy from being completely dependent on humans to being completely in control. An AA tool needs to support user interaction with such a system. To support effective user interaction with complex multi-agent system we are developing a layered Adjustable Autonomy approach that allows users to intervene either with a single agent or with a team of agents. Previous work has in AA has looked at either individual agents or whole teams but not, to our knowledge, a layered approach to AA. The layering of the AA parallels the levels of autonomy existing in human organizations. Technically, the layered approach separates out issues relevant at different levels of abstraction, making it easier to provide users with the information and tools they need to effectively interact with a complex multi-agent system. %B Journal of Autonomous Agents and Multi-agent Systems, special issue on 'Best of ICMAS 98' %V 3 %P 159-183 %G eng %0 Conference Paper %B Initial results In Proceedings of the International Conference on Multi-Agent Systems (ICMAS) (POSTER) %D 2000 %T Towards large-scale conflict resolution: Initial results %A H. Jung %A M. Tambe %A W. Shen %A Zhang, W. %X With the increasing interest in distributed and collaborative multi-agent applications, conflict resolution in largescale systems becomes an important problem. Our approach to collaborative conflict resolution is based on argumentation. To understand the feasibility and the scope of the approach, we first implemented the process in a system called CONSA and applied it to two complex, dynamic domains. We then modeled this approach in distributed constraint satisfaction problems (DCSP) to investigate the effect of different conflict resolution configurations, such as the degree of shared responsibility and unshared information, and their effects in large-scale conflict resolution via argumentation. Our results suggest some interesting correlations between these configurations and the performance of conflict resolution. %B Initial results In Proceedings of the International Conference on Multi-Agent Systems (ICMAS) (POSTER) %G eng %0 Conference Proceedings %B In Intelligent Agents, Volume VI: Workshop on Agents, theories, architectures and Languages %D 2000 %T Towards team-oriented programming %A D. Pynadath %A M. Tambe %A N. Chauvat %A L. Cavedon %X . The promise of agent-based systems is leading towards the development of autonomous, heterogeneous agents, designed by a variety of research/industrial groups and distributed over a variety of platforms and environments. Teamwork among these heterogeneous agents is critical in realizing the full potential of these systems and scaling up to the demands of large-scale applications. Indeed, to succeed in highly uncertain, complex applications, the agent teams must be both robust and exible. Unfortunately, development of such agent teams is currently extremely dicult. This paper focuses on signi cantly accelerating the process of building such teams using a simpli ed, abstract framework called team-oriented programming (TOP). In TOP, a programmer speci es an agent organization hierarchy and the team tasks for the organization to perform, but abstracts away from the large number of coordination plans potentially necessary to ensure robust and exible team operation. We support TOP through a distributed, domain-independent teamwork layer that integrates core teamwork coordination and communication capabilities. We have recently used TOP to integrate a diverse team of heterogeneous distributed agents in performing a complex task. We outline the current state of our TOP implementation and the outstanding issues in developing such a framework. %B In Intelligent Agents, Volume VI: Workshop on Agents, theories, architectures and Languages %C Springer, Heidelberg, Germany %G eng %0 Conference Paper %B International conference on Autonomous agents, Agents '99 %D 1999 %T On being a teammate: Experiences acquired in the design of Robocup teams %A S. Marsella %A J. Adibi %A Y. Alonaizan %A Gal Kaminka %A I. Muslea %A Tambe, Milind %X tract Increasingly multiagent systems are being designed for a variety of complex dynamic domains Eective agent inter actions in such domains raise some of the most fundamental research challenges for agentbased systems in teamwork multiagent learning and agent modelling The RoboCup research initiative particularly the simulation league has been proposed to pursue such multiagent research chal lenges using the common testbed of simulation soccer De spite the signicant popularity of RoboCup within the re search community general lessons have not often been ex tracted from participation in RoboCup This is what we attempt to do here We have elded two teams ISIS and ISIS in RoboCup competitions These teams have been in the top four teams in these competitions We compare the teams and attempt to analyze and generalize the lessons learned This analysis reveals several surprises pointing out lessons for teamwork and for multi-agent learning. %B International conference on Autonomous agents, Agents '99 %G eng %0 Conference Paper %B Agents, Theories, Architectures and Languages (ATAL) %D 1999 %T The Belief-Desire-Intention model of agency %A M. Georgeff %A B. Pell %A M. Pollack %A Tambe, Milind %A M. Wooldrige %X Within the ATAL community, the belief-desire-intention (BDI) model has come to be possibly the best known and best studied model of practical reasoning agents. There are several reasons for its success, but perhaps the most compelling are that the BDI model combines a respectable philosophical model of human practical reasoning, (originally developed by Michael Bratman [1]), a number of implementations (in the IRMA architecture [2] and the various PRS-like systems currently available [7]), several successful applications (including the now-famous fault diagnosis system for the space shuttle, as well as factory process control systems and business process management [8]), and finally, an elegant abstract logical semantics, which have been taken up and elaborated upon widely within the agent research community [14, 16]. However, it could be argued that the BDI model is now becoming somewhat dated: the principles of the architecture were established in the mid-1980s, and have remained essentially unchanged since then. With the explosion of interest in intelligent agents and multi-agent systems that has occurred since then, a great many other architectures have been developed, which, it could be argued, address some issues that the BDI model fundamentally fails to. Furthermore, the focus of agent research (and AI in general) has shifted significantly since the BDI model was originally developed. New advances in understanding (such as Russell and Subramanian’s model of “boundedoptimal agents” [15]) have led to radical changes in how the agents community (and more generally, the artificial intelligence community) views its enterprise. The purpose of this panel is therefore to establish how the BDI model stands in relation to other contemporary models of agency, and in particular where it can or should go next. %B Agents, Theories, Architectures and Languages (ATAL) %G eng %0 Magazine Article %D 1999 %T The benefits of arguing in a team %A Tambe, Milind %A H. Jungh %X In a complex, dynamic multi-agentsetting, coherent team actions are often jeopardized by conflicts in agents’ beliefs, plans and actions. Despite the considerable progress in teamwork research, the challenge ofintra-team conflict resolutionhas remained largely unaddressed. This paper presents CONSA, a system we are developing to resolve conflicts using argumentation-based negotiations. CONSA is focused on exploiting the benefits of argumentation in a team setting. Thus, CONSA casts conflict resolution as a team problem, so that the recent advances in teamwork can be brought to bear during conflict resolution to improve argumentation flexibility. Furthermore, since teamwork conflicts sometimes involve past teamwork, teamwork models can be exploited to provide agents with reusable argumentation knowledge. Additionally, CONSA also includes argumentation strategies geared towards benefiting the team rather than the individual, and techniques to reduce argumentation overhead. %B AI Magazine, Winter 1999 %V 20 %G eng %N 4 %0 Conference Paper %B International conference on Automonomous Agents, Agents 99 %D 1999 %T I'm OK, You're OK, We're OK: Experiments in Centralized and Distributed Socially Attentive Monitoring %A Gal Kaminka %A Tambe, Milind %X Execution monitoring is a critical challenge for agents in dynamic, complex, multi-agent domains. Existing approaches utilize goalattentive models which monitor achievement of task goals. However, they lack knowledge of the intended relationships which should hold among the agents, and so fail to address key opportunities and difficulties in multi-agent settings. We explore SAM, a novel complementary framework for social monitoring that utilizes knowledge of social relationships among agents in monitoring them. We compare the performance of SAM when monitoring is done by a single agent in a centralized fashion, versus team monitoring in a distributed fashion. We experiment with several SAM instantiations, algorithms that are sound and incomplete, unsound and complete, and both sound and complete. While a more complex algorithm appears useful in the centralized case (but is unsound), the surprising result is that a much simpler algorithm in the distributed case is both sound and complete. We present a set of techniques for practical, efficient implementations with rigorously proven performance guarantees, and systematic empirical validation. %B International conference on Automonomous Agents, Agents 99 %G eng %0 Conference Paper %B DARPA JFACC symposium on advances in Enterprise Control %D 1999 %T Rapid integration and coordination of heterogeneous distributed agents for collaborative enterprises %A D. V. Pynadath %A Tambe, Milind %A N. Chauvat %X As the agent methodology proves more and more useful in organizationalenterprises, research/industrial groups are developing autonomous, heterogeneous agents that are distributed over a variety of platforms and environments. Rapid integration of such distributed, heterogeneous agent components could address large-scale problems of interest in these enterprises. Unfortunately, rapid and robust integration remains a difficult challenge. To address this challenge, we are developing a novel teamwork-based agent integration framework. In this framework, software developers specify an agent organization through a team-oriented program. To locate and recruit agent components for this organization, an agent resources manager (an analogue of a “human resources manager”) searches for agents of interest to this organization and monitors their performance over time. TEAMCORE wrappers render the agent components in this organization team ready, thus ensuring robust, flexible teamwork among the members of the newly formed organization. This implemented framework promises to reduce the development effort in enterprise integration while providing robustness due to its teamwork-based foundations. We have applied this framework to a concrete, running example, using heterogeneous, distributed agents in a problem setting comparable to many collaborative enterprises. %B DARPA JFACC symposium on advances in Enterprise Control %G eng %0 Conference Paper %B AAAI Spring Symposium on Agents in Cyberspace %D 1999 %T Teamwork in cyberspace: Using TEAMCORE to make agents team-ready %A Tambe, Milind %A W. Shen %A M. Mataric %A D. Goldberg %A Pragnesh J. Modi %A Z. Qiu %A B. Salemi %X In complex, dynamic and uncertain environments extending from disaster rescue missions, to future battlefields, to monitoring and surveillance tasks, to virtual training environments, to future robotic space missions, intelligent agents will play a key role in information gathering and filtering, as well as in task planning and execution. Although physically distributed on a variety of platforms, these agents will interact with information sources, network facilities, and other agents via cyberspace, in the form of the Internet, Intranet, the secure defense communication network, or other forms of cyberspace. Indeed, it now appears well accepted that cyberspace will be (if it is not already) populated by a vast number of such distributed, individual agents. Thus, a new distributed model of agent development has begun to emerge. In particular, when faced with a new task, this model prescribes working with a distributed set of agents rather than building a centralized, large-scale, monolithic individual agent. A centralized approach suffers from problems in robustness (due to a single point of failure), exhibits a lack of modularity (as a single monolithic system), suffers from difficulty in scalability (by not utilizing existing agents as components), and is often a mismatch with the distributed ground reality. The distributed approach addresses these weaknesses of the centralized approach. Our hypothesis is that the key to the success of such a distributed approach is teamwork in cyberspace. That is, multiple distributed agents must collaborate in teams in cyberspace so as to scale up to the complexities of the complex and dynamic environments mentioned earlier. For instance, consider an application such as monitoring traffic violators in a city. Ideally, we wish to be able to construct a suitable agent-team quickly, from existing agents that can control UAVs (Unmanned Air Vehicles), an existing 3D route-planning agent, and an agent capable of recognizing traffic violations based on a video input. Furthermore, by suitable substitution, we wish to be able to quickly reconfigure the team to monitor enemy activity on a battlefield or illegal poaching in forests. Such rapid agent-team assembly obviates the need to construct a monolithic agent for each new application from scratch, preserves modularity, and appears better suited for scalability. Of course, such agent teamwork in cyberspace raises a variety of important challenges. In particular, agents must engage in robust and flexible teamwork to overcome the uncertainties in their environment. They must also adapt by learning from past failures. Unfortunately, currently, constructing robust, flexible and adaptive agent teams is extremely difficult. Current approaches to teamwork suffer from a lack of general-purpose teamwork models, which would enable agents to autonomously reason about teamwork or communication and coordination in teamwork and to improve the team performance by learning at the team level. The absence of such teamwork models gives rise to four types of problems. First, team construction becomes highly labor-intensive. In particular, since agents cannot autonomously reason about coordination, human developers have to provide them with large numbers of domain-specific coordination and communication plans. These domain-specific plans are not reusable, and must be developed anew for each new domain. Second, teams suffer from inflexibility. In real-world domains, teams face a variety of uncertainties, such as a team member’s unanticipated failure in fulfilling responsibilities, team members’ divergent beliefs about their environment [CL91], and unexpectedly noisy or faulty communication. Without a teamwork model, it is difficult to anticipate and preplan for the vast number of coordination failures possible due to such uncertainties, leading to inflexibility. A third problem arises in team scale-up. Since creating even small-scale teams is difficult, scaling up to larger ones is even harder. Finally, since agents cannot reason about teamwork, learning about teamwork has also proved to be problematic. Thus, even after repeating a failure, teams are often unable to avoid it in the future. To remedy this situation and to enable rapid development of agent teams, we are developing a novel software system called TEAMCORE that integrates a general-purpose teamwork model and team learning capabilities. TEAMCORE provides these core teamwork capabilities to individual agents, i.e., it wraps them with TEAMCORE. Here, we call the individual TEAMCORE “wrapper” a teamcore agent. A teamcore agent is a pure “social agent”, in that it is provided with only core teamwork capabilities. Given an existing agent with domain-level action capabilities (i.e., the domain-level agent), it is made teamready by interfacing with a teamcore agent. Agents made team-ready will be able to rapidly assemble themselves into a team in any given domain. That is, unlike past approaches such as the open-agent-architecture (OAA) that provides a centralized blackboard facilitator to integrate a distributed set of agents, TEAMCORE is fundamentally a distributed team-oriented system. Our goal is a TEAMCORE system capable of generating teams that are: 1. Flexible and robust, able to surmount the uncertainties mentioned above. 2. Capable of scale-up to hundreds of team members 3. Able to improve the team performance by learning at the team level and avoiding past team failures. An initial version of TEAMCORE system based on the Soar [Newell90] integrated agent architecture is currently up and running. A distributed set of teamcore agents can form teams in cyberspace. The underlying communication infrastructure is currently based on KQML. The rest of this document now briefly describes the TEAMCORE design, architecture and implementation. %B AAAI Spring Symposium on Agents in Cyberspace %G eng %0 Conference Paper %B Agents, theories, architectures and languages (ATAL'99) workshop, to be published in Springer Verlag 'Intelligent Agents VI' %D 1999 %T Toward team-oriented programming %A D. V. Pynadath %A Tambe, Milind %A N. Chauvat %A L. Cavedon %X t. The promise of agent-based systems is leading towards the development of autonomous, heterogeneous agents, designed by a variety of research/industrial groups and distributed over a variety of platforms and environments. Teamwork among these heterogeneous agents is critical in realizing the full potential of these systems and scaling up to the demands of large-scale applications. Unfortunately, development of robust, flexible agent teams is currently extremely difficult. This paper focuses on significantly accelerating the process of building such teams using a simplified, abstract framework called team-oriented programming (TOP). In TOP, a programmer specifies an agent organization hierarchy and the team tasks for the organization to perform, abstracting away from the innumerable coordination plans potentially necessary to ensure robust and flexible team operation. Our TEAMCORE system supports TOP through a distributed, domain-independent layer that integrates core teamwork coordination and communication capabilities. We have recently used TOP to integrate a diverse team of heterogeneous distributed agents in performing a complex task. We outline the current state of our TOP implementation and the outstanding issues in developing such a framework. %B Agents, theories, architectures and languages (ATAL'99) workshop, to be published in Springer Verlag 'Intelligent Agents VI' %G eng %0 Conference Paper %B Third international RoboCup competitions and workshop %D 1999 %T Towards automated team analysis: a machine learning approach %A T. Raines %A Tambe, Milind %A S. Marsella %X Teamwork is becoming increasingly important in a large number of multiagent applications. With the growing importance of teamwork, there is now an increasing need for tools for analysis and evaluation of such teams. We are developing automated techniques for analyzing agent teams. In this paper we present ISAAC, an automated assistant that uses these techniques to perform post-hoc analysis of RoboCup teams. ISAAC requires little domain knowledge, instead using data mining and inductive learning tools to produce the analysis. ISAAC has been applied to all of the teams from RoboCup’97, RoboCup’98, and Pricai’98 in a fully automated fashion. Furthermore, ISAAC is available online for use by developers of RoboCup teams. %B Third international RoboCup competitions and workshop %G eng %0 Conference Paper %B Third International Conference on Autonomous Agents (Agents) %D 1999 %T Towards flexible negotiation in teamwork %A Z. Qiu %A M. Tambe %A H. Jung %X In a complex, dynamic multi-agent setting, coherent team actions are often jeopardized by agents' conflicting beliefs about different aspects of their environment, about resource availability, and about their own or teammates' capabilities and performance. Team members thus need to communicate and negotiate to restore team coherence. This paper focuses on the problem of negotiations in teamwork to resolve such conflicts. The basis of such negotiations is inter-agent argumentation based on Toulmin's argumentation pattern. There are several novel aspects in our approach. First, our approach to argumentation exploits recently developed general, explicit teamwork models, which make it possible to provide a generalized and reusable argumentation facility based on teamwork constraints. Second, an emphasis on collaboration in argumentation leads to novel argumentation strategies geared towards benefiting the team rather than the individual. Third, our goal, to realize argumentation in practice in an agent team, has led to decision theoretic and pruning techniques to reduce argumentation overhead. Our approach is implemented in a system called CONSA. %B Third International Conference on Autonomous Agents (Agents) %G eng %0 Conference Paper %B International joint conference on Artificial Intelligence, IJCAI 99 %D 1999 %T Two fielded teams and two experts: A Robocup response challenge from the trenches %A Tambe, Milind %A Gal Kaminka %A S. Marsella %A I. Muslea %A T. Raines %X The RoboCup (robot world-cup soccer) effort, initiated to stimulate research in multi-agents and robotics, has blossomed into a significant effort of international proportions. RoboCup is simultaneously a fundamental research effort and a set of competitions for testing research ideas. At IJCAI’97, a broad research challenge was issued for the RoboCup synthetic agents, covering areas of multi-agent learning, teamwork and agent modeling. This paper outlines our attack on the entire breadth of the RoboCup research challenge, on all of its categories, in the form of two fielded, contrasting RoboCup teams, and two off-line soccer analysis agents. We compare the teams and the agents to generalize the lessons learned in learning, teamwork and agent modeling. %B International joint conference on Artificial Intelligence, IJCAI 99 %G eng %0 Magazine Article %D 1999 %T Building agent teams using an explicit teamwork model and learning %A Tambe, Milind %A Jafar Adibi %A Yasar Alonaizan %A Ali Erdem %A Gal Kaminka %A Ion Muslea %A Marsella, Stacy %X

Multi-agent collaboration or teamwork and learning are two critical research challenges in a large number of multi-agent applications. These research challenges are highlighted in RoboCup, an international project focused on robotic and synthetic soccer as a common testbed for research in multi-agent systems. This article describes our approach to address these challenges, based on a team of soccer-playing agents built for the simulation league of RoboCup—the most popular of the RoboCup leagues so far.

To address the challenge of teamwork, we investigate a novel approach based on the (re)use of a domain-independent, explicit model of teamwork, an explicitly represented hierarchy of team plans and goals, and a team organization hierarchy based on roles and role-relationships. This general approach to teamwork, shown to be applicable in other domains beyond RoboCup, both reduces development time and improves teamwork flexibility. We also demonstrate the application of off-line and on-line learning to improve and specialize agents' individual skills in RoboCup. These capabilities enabled our soccer-playing team, ISIS, to successfully participate in the first international RoboCup soccer tournament (RoboCup'97) held in Nagoya, Japan, in August 1997. ISIS won the third-place prize in over 30 teams that participated in the simulation league.

%B Artificial Intelligence %V 110 %P 215-240 %G eng %0 Conference Paper %B AAAI Fall Symposium on Distributed Continual Planning %D 1998 %T Flexible Negotiations in Teamwork: Extended Abstract %A Z. Qiu %A Tambe, Milind %X In a complex, dynamic multi-agent setting, coherent team actions are often jeopardized by agents' conflicting beliefs about different aspects of their environment, about resource availability, and about their own or teammates' capabilities and performance. Team members thus need to communicate and negotiate to restore team coherence. This paper focuses on the problem of negotiations in teamwork to resolve such conflicts. The basis of such negotiations is inter-agent argumentation (based on Toulmin's argumentation structure), where agents assert their beliefs to others, with supporting arguments. One key novelty in our work is that agents' argumentation exploits previous research on general, explicit teamwork models. Based on such teamwork models, it is possible categorize the conflicts that arise into different classes, and more importantly provide a generalized and reusable argumentation facility based on teamwork constraints. Our approach is implemented in a system called CONSA (COllaborative Negotiation System based on Argumentation). %B AAAI Fall Symposium on Distributed Continual Planning %G eng %0 Magazine Article %D 1998 %T Implementing agent teams in dynamic multi-agent environments %A Tambe, Milind %X Teamwork is becoming increasingly critical in multi-agent environments ranging from virtual environments for training and education, to information integration on the internet, to potential multi-robotic space missions. Teamwork in such complex, dynamic environments is more than a simple union of simultaneous individual activity, even if supplemented with preplanned coordination. Indeed in these dynamic environments, unanticipated events can easily cause a breakdown in such preplanned coordination. The central hypothesis in this article is that for effective teamwork, agents should be provided explicit representation of team goals and plans, as well as an explicit representation of a model of teamwork to support the execution of team plans. In our work, this model of teamwork takes the form of a set of domain independent rules that clearly outline an agent’s commitments and responsibilities as a participant in team activities, and thus guide the agent’s social activities while executing team plans. This article describes two implementations of agent teams based on the above principles, one for a real-world helicopter combat simulation, and one for the RoboCup soccer simulation. The article also provides a preliminary comparison of the two agent-teams to illustrate some of the strengths and weaknesses of RoboCup as a common testbed for multi-agent systems. %B Applied Artificial Intelligence %V 12 %P 189-210 %G eng %0 Conference Paper %B Intelligent Agents IV: Agents, Theories, Architectures and Languages (ATAL) %D 1998 %T Social comparison for failure detection and recovery %X Plan execution monitoring in dynamic and uncertain domains is an important and difficult problem. Multi-agent environments exacerbate this problem, given that interacting and coordinated activities of multiple agents are to be monitored. Previous approaches to this problem do not detect certain classes of failures, are inflexible, and are hard to scale up. We present a novel approach, SOCFAD, to failure detection and recovery in multi-agent settings. SOCFAD is inspired by Social Comparison Theory from social psychology and includes the following key novel concepts: (a) utilizing other agents in the environment as information sources for failure detection, (b) a detection and repair method for previously undetectable failures using abductive inference based on other agents’ beliefs, and (c) a decision-theoretic approach to selecting the information acquisition medium. An analysis of SOCFAD is presented, showing that the new method is complementary to previous approaches in terms of classes of failures detected. %B Intelligent Agents IV: Agents, Theories, Architectures and Languages (ATAL) %I Springer Verlag %G eng %0 Journal Article %J International Journal of Human-Computer Studies %D 1998 %T Adaptive agent tracking in real-world multi-agent domains: a preliminary report %A Tambe, Milind %A W. L. Johnson %A W. Shen %X In multi-agent environments, the task of agent tracking (i.e., tracking other agents’ mental states) increases in difficulty when a tracker (tracking agent) only has an imperfect model of the trackee (tracked agent). Such model imperfections arise in many realworld situations, where a tracker faces resource constraints and imperfect information, and the trackees themselves modify their behaviors dynamically. While such model imperfections are unavoidable, a tracker must nonetheless attempt to be adaptive in its agent tracking. In this paper, we analyze some key issues in adaptive agent tracking, and describe an initial approach based on discrimination-based learning. The main idea is to identify the deficiency of a model based on prediction failures, and revise the model by using features that are critical in discriminating successful and failed episodes. Our preliminary experiments in simulated air-to-air combat environments have shown some promising results but many problems remain open for future research. %B International Journal of Human-Computer Studies %V 48 %P 105-124 %G eng %0 Conference Paper %B Second International Conference on Autonomous Agents (Agents) %D 1998 %T Agent component synergy: Social comparison for failure detection %A G. Kaminka %A M. Tambe %X Recently, encouraging progress has been made in integrating independent components in complete agents for real-world environments. While such systems demonstrate component integration, they often do not explicitly utilize synergistic interactions, which allow each component to function beyond its original capabilities because of the presence of other components. This abstract presents an implemented illustration of such explicit component synergy and its usefulness in dynamic multi-agent environments. In such environments, agents often have three important abilities: (a) collaboration with other agents (teamwork), (b) monitoring the agent’s own progress (execution monitoring), and (c) modeling other agents’ beliefs/goals (agent-modeling). Generally, these capabilities are independently developed, and are integrated in a single system such that each component operates independently of the others, e.g., monitoring techniques do not take into account the modeled plans of other agents, etc. In contrast, we highlight a synergy between these three agent components that results in significant improvement in capabilities of each component: (a) The collaboration component constrains the search space for the agentmodeling component via maintenance of mutual beliefs and facilitates better modeling, (b) the modeling and collaboration components enable SOCFAD (Social Comparison for Failure Detection), a novel execution monitoring technique which uses other agents to detect and diagnose failures (the focus of this abstract), and (c) the monitoring component, using SOCFAD, detects failures in individual performance that affect coordination, and allows the collaboration component to replan. SOCFAD addresses the well known problem of agent execution monitoring in complex dynamic environments, e.g., [4]. This problem is exacerbated in multi-agent environments due to the added requirements for coordination. The complexity and unpredictability of these environments causes an explosion of state space complexity, which inhibits the ability of any designer to enumerate the correct response in each possible state in advance. For instance, it is generally difficult to predict when communication message will get lost, sensors return unreliable answers, etc. The agents are therefore presented with countless opportunities for failure, and must autonomously detect them and recover. To detect failures, an agent must have information about the ideal behavior expected of it. This ideal is compared to the agent’s actual behavior to detect discrepancies indicating possible failure. Previous approaches to this problem (e.g., [4]) have focused on the designer or planner supplying the agent with redundant information, either in the form of explicitly specified execution-monitoring conditions, or a model of the agent itself which may be used for comparison. While powerful in themselves, these approaches have limitations which render them insufficient in dynamic multi-agent environments: (a) They fail to take into account information from sensors that monitor other agents, and are thus less robust. For example, a driver may not see an obstacle on the road, but if she sees another car swerve, she can infer the presence of the obstacle; (b) Monitoring conditions on agent behavior can be too rigid in highly dynamic environments, as agents must often adjust their behavior flexibly to respond to actual circumstances; and (c) Both approaches require the designer to supply redundant information, which entails further work for the designer, and encounters difficulties in scaling up to more complex domains. We propose a novel complementary approach to failure detection and recovery, which is unique to multi-agent settings. This approach, SOCFAD, is inspired by ideas from Social Comparison Theory [1], a theory from social psychology. The key idea in SOCFAD is that agents use other agents as information sources on the situation and the ideal behavior. The agents compare their own behavior, beliefs, goals, and plans to those of other agents, in order to detect failures and correct their behavior. The agents do not necessarily adapt the other agents’ beliefs, but can reason about the differences in belief and behavior, and draw useful conclusions regarding the correctness of their own actions. This approach alleviates the problems described above: (a) It allows relevant information to be inferred from other agents’ behavior and used to complement the agent’s own erroneous perceptions, (b) It allows for flexibility in monitoring, since the flexible behavior of other agents is used as an ideal, and (c) It doesn’t require the designer to provide the agent with redundant information, utilizing instead other agents as information sources. Teamwork or collaboration is ubiquitous in multi-agent domains. An important issue in SOCFAD is that the agents being compared should be socially similar to yield meaningful differences. By exploiting the synergy with the collaboration component, SOCFAD constrains the search for socially-similar agents to team-members only. Furthermore, the collaboration component is able to provide SOCFAD with guarantees on other agents’ behaviors (through mutual beliefs) which are exploited to generate confidence in any detected failures. By exploiting the agent-modeling component’s capacity to infer team members’ goals, SOCFAD enables efficient comparison without significant communication overhead. Knowledge of other agents can be communicated. However, such communication is often impractical given costs, risk in hostile territories, and unreliability in uncertain settings. Our implementation of SOCFAD relies instead on the agent modeling component that infers an agent’s beliefs, goals, and plans from its observable behavior and surroundings for comparison. %B Second International Conference on Autonomous Agents (Agents) %G eng %0 Book %D 1998 %T Collective Robotics Workshop: Lecture notes in Artificial Intelligence 1456 %A A. Drogoul %A Tambe, Milind %A T. Fukuda %I Springer Verlag %C Berlin %G eng %U https://books.google.com/books?id=a1dqCQAAQBAJ&pg=PA95&lpg=PA95&dq=Collective+Robotics+Workshop:+Lecture+notes+in+Artificial+Intelligence+1456&source=bl&ots=WvWb2yzacD&sig=ACfU3U0uX3i6g3SkqPWQ3EOXeVtpSLYUuw&hl=en&sa=X&ved=2ahUKEwj4j8XEzNbpAhUBmnIEHU1KCMQQ %0 Conference Paper %B First robot world cup competition and conferences %D 1998 %T ISIS: Using an explicit model of teamwork in RoboCup 97 %A Tambe, Milind %A J. Adibi %A Y. Alonaizan %A A. Erdem %A Gal Kaminka %A S. Marsella %A I. Muslea %A M. Tallis %X bstract Team ISIS ISI Synthetic successfully participated in the rst international RoboCup soccer tournament RoboCup held in Nagoya Japan in August  ISIS won the thirdplace prize in over  teams that participated in the simulation league of RoboCup the most popular among the three RoboCup leagues In terms of re search accomplishments ISIS illustrated the usefulness of an explicit model of teamwork both in terms of reduced development time and im proved teamwork exibility ISIS also took some initial steps towards learning of individual player skills This paper discusses the design of ISIS in detail with particular emphasis on its novel approach to tea %B First robot world cup competition and conferences %I Springer Verlag %G eng %0 Conference Paper %B Conference on AI meets the real-world %D 1998 %T The role of agent modeling in agent robustness %A Gal Kaminka %A Tambe, Milind %A C. Hopper %X A key challenge in using intelligent systems in complex, dynamic, multi-agent environments is the attainment of robustness in face of uncertainty. In such environments the combinatorial nature of state-space complexity inhibits any designer’s ability to anticipate all possible states that the agent might find itself in. Therefore, agents will fail in such environments, as the designer cannot supply them with full information about the correct response to take at any state. To overcome these failures, agents must display post-failure robustness, enabling them to autonomously detect, diagnose and recover from failures as they happen. Our hypothesis is that through agent-modeling (the ability of an agent to model the intentions, knowledge, and actions of other agents in the environment) an agent may significantly increase its robustness in a multi-agent environment, by allowing it to use others in the environment to evaluate and improve its own performance. We examine this hypothesis in light of two real-world applications in which we improve robustness: domain-independent teamwork, and target-recognition and identification systems. We discuss the relation between the ability of an agent-modeling algorithm to represent uncertainty and the applications, and highlight key lessons learned for real-world applications. %B Conference on AI meets the real-world %G eng %0 Conference Paper %B Seventh Conference on Computer Generated Forces and Behavioral Representation %D 1998 %T Soar-RWA: Planning, teamwork, and intelligent behavior for synthetic rotary wing aircraft %A R. Hill %A Chen, J. %A J. Gratch %A P. Rosenbloom %A M. Tambe %X We have constructed a team of intelligent agents that perform the tasks of an Army attack helicopter company and a Marine transport/escort combined team for a synthetic battlefield environment used for running largescale military exercises. We have used the Soar integrated architecture to develop: (1) pilot agents for a company of helicopters, (2) a command agent that makes decisions and plans for the helicopter company, and (3) an approach to teamwork that enables the pilot agents to coordinate their activities in accomplishing the goals of the company. This case study describes the task domain and architecture of our application, as well as the benefits and lessons learned from applying AI technology to this domain. %B Seventh Conference on Computer Generated Forces and Behavioral Representation %G eng %0 Conference Paper %B International conference on multi-agent systems (ICMAS) %D 1998 %T Towards flexible teamwork in persistent teams %A Tambe, Milind %A Zhang, W. %X In a complex, dynamic multi-agent setting, coherent team actions are often jeopardized by agents' conflicting beliefs about different aspects of their environment, about resource availability, and about their own or teammates' capabilities and performance. Team members thus need to communicate and negotiate to restore team coherence. This paper focuses on the problem of negotiations in teamwork to resolve such conflicts. The basis of such negotiations is inter-agent argumentation (based on Toulmin's argumentation structure), where agents assert their beliefs to others, with supporting arguments. One key novelty in our work is that agents' argumentation exploits previous research on general, explicit teamwork models. Based on such teamwork models, it is possible categorize the conflicts that arise into different classes, and more importantly provide a generalized and reusable argumentation facility based on teamwork constraints. Our approach is implemented in a system called CONSA (COllaborative Negotiation System based on Argumentation). %B International conference on multi-agent systems (ICMAS) %G eng %0 Conference Paper %B Second robot world cup competition and conferences %D 1998 %T Using an Explicit Teamwork Model and Learning in RoboCup: An Extended Abstract RoboCup 98 %A S. Marsella %A J. Adibi %A Y. Alonaizan %A A. Erdem %A R. Hill %A Gal Kaminka %A Tambe, Milind %A Q Zhun %X duction The RoboCup research initiative has established synthetic and robotic soccer as testbeds for pursuing research challenges in Articial Intelligence and robotics This extended abstract focuses on teamwork and learning two of the multi agent research challenges highlighted in RoboCup To address the challenge of teamwork we discuss the use of a domainindependent explicit model of team work and an explicit representation of team plans and goals We also discuss the application of agent learning in RoboCup The vehicle for our research investigations in RoboCup is ISIS ISI Synthetic a team of synthetic soccerplayers that successfully participated in the simula tion league of RoboCup by winning the third place prize in that tournament In this position paper we brie y overview the ISIS agent architecture and our investigations of the issues of teamwork and learning The key novel issues for our team in RoboCup will be a further investigation of agent learning and further analysis of teamwork related issues %B Second robot world cup competition and conferences %I Springer Verlag %G eng %0 Conference Paper %B National conference on Artificial Intelligence (AAAI) %D 1998 %T What is wrong with us? Improving robustness through social diagnosis %A Gal Kaminka %A Tambe, Milind %X Robust behavior in complex, dynamic environments mandates that intelligent agents autonomously monitor their own run-time behavior, detect and diagnose failures, and attempt recovery. This challenge is intensified in multiagent settings, where the coordinated and competitive behaviors of other agents affect an agent’s own performance. Previous approaches to this problem have often focused on single agent domains and have failed to address or exploit key facets of multi-agent domains, such as handling team failures. We present SAM, a complementary approach to monitoring and diagnosis for multi-agent domains that is particularly well-suited for collaborative settings. SAM includes the following key novel concepts: First, SAM’s failure detection technique, inspired by social psychology, utilizes other agents as information sources and detects failures both in an agent and in its teammates. Second, SAM performs social diagnosis, reasoning about the failures in its team using an explicit model of teamwork (previously, teamwork models have been employed only in prescribing agent behaviors in teamwork). Third, SAM employs model sharing to alleviate the inherent inefficiencies associated with representing multiple agent models. We have implemented SAM in a complex, realistic multi-agent domain, and provide detailed empirical results assessing its benefits. %B National conference on Artificial Intelligence (AAAI) %G eng %0 Conference Paper %B National Conference on Artificial Intelligence (AAAI-97) %D 1997 %T Agent architectures for flexible, practical teamwork %A Tambe, Milind %X Teamwork in complex, dynamic, multi-agent domains mandates highly flexible coordination and communication. Simply fitting individual agents with precomputed coordination plans will not do, for their inflexibility can cause severe failures in teamwork, and their domain-specificity hinders reusability. Our central hypothesis is that the key to such flexibility and reusability is agent architectures with integrated teamwork capabilities. This fundamental shift in agent architectures is illustrated via an implemented candidate: STEAM. While STEAM is founded on the joint intentions theory, practical operationalization has required it to integrate several key novel concepts: (i) team synchronization to establish joint intentions; (ii) constructs for monitoring joint intentions and repair; and (iii) decision-theoretic communication selectivity (to pragmatically extend the joint intentions theory). Applications in three different complex domains, with empirical results, are presented.1 %B National Conference on Artificial Intelligence (AAAI-97) %G eng %0 Journal Article %J Journal of Artificial Intelligence Research %D 1997 %T Towards Flexible Teamwork %A Tambe, Milind %X Many AI researchers are today striving to build agent teams for complex, dynamic multi-agent domains, with intended applications in arenas such as education, training, entertainment, information integration, and collective robotics. Unfortunately, uncertainties in these complex, dynamic domains obstruct coherent teamwork. In particular, team members often encounter di ering, incomplete, and possibly inconsistent views of their environment. Furthermore, team members can unexpectedly fail in ful lling responsibilities or discover unexpected opportunities. Highly exible coordination and communication is key in addressing such uncertainties. Simply tting individual agents with precomputed coordination plans will not do, for their inexibility can cause severe failures in teamwork, and their domain-speci city hinders reusability. Our central hypothesis is that the key to such exibility and reusability is providing agents with general models of teamwork. Agents exploit such models to autonomously reason about coordination and communication, providing requisite exibility. Furthermore, the models enable reuse across domains, both saving implementation e ort and enforcing consistency. This article presents one general, implemented model of teamwork, called STEAM. The basic building block of teamwork in STEAM is joint intentions (Cohen & Levesque, 1991b); teamwork in STEAM is based on agents' building up a (partial) hierarchy of joint intentions (this hierarchy is seen to parallel Grosz & Kraus's partial SharedPlans, 1996). Furthermore, in STEAM, team members monitor the team's and individual members' performance, reorganizing the team as necessary. Finally, decision-theoretic com- munication selectivity in STEAM ensures reduction in communication overheads of team- work, with appropriate sensitivity to the environmental conditions. This article describes STEAM's application in three di erent complex domains, and presents detailed empirical results. %B Journal of Artificial Intelligence Research %V 7 %P 83-124 %G eng %0 Journal Article %J Innovative Applications of Artificial Intelligence (IAAI-97) %D 1997 %T Intelligent agents for the synthetic battlefield: A company of rotary wing aircraft %A R. Hill %A Chen, J. %A J. Gratch %A P. S. Rosenbloom %A Tambe, Milind %X We have constructed a team of intelligent agents that perform the tasks of an attack helicopter company for a synthetic battlefield environment used for running largescale military exercises. We have used the Soar integrated architecture to develop: (1) pilot agents for a company of helicopters, (2) a command agent that makes decisions and plans for the helicopter company, and (3) an approach to teamwork that enables the pilot agents to coordinate their activities in accomplishing the goals of the company. This case study describes the task domain and architecture of our application, as well as the benefits and lessons learned from applying AI technology to this domain. %B Innovative Applications of Artificial Intelligence (IAAI-97) %G eng %0 Journal Article %J International Joint Conference on Artificial Intelligence (IJCAI97) %D 1997 %T The RoboCup Synthetic Agent Challenge 97 %A H. Kitano %A Tambe, Milind %A P. Stone %A M. Veloso %A S. Coradeschi %A E. Osawa %A H. Matsubara %A I. Noda %A M. Asada %X RoboCup Challenge o ers a set of challenges for intelligent agent researchers using a friendly competition in a dynamic, real-time, multi-agent domain: synthetic Soccer. While RoboCup in general envisions longer range challenges over the next few decades, RoboCup Challenge presents three speci c challenges for the next two years: (i) learning of individual agents and teams; (ii) multi-agent team planning and plan-execution in service of teamwork; and (iii) opponent modeling. RoboCup Challenge provides a novel opportunity for researchers in planning and multi-agent arenas | it not only supplies them a concrete domain to evalute their techniques, but also challenges them to evolve these techniques to face key constraints fundamental to this domain: real-time and teamwork. %B International Joint Conference on Artificial Intelligence (IJCAI97) %G eng %0 Conference Paper %B AAAI Spring Symposium on Adaptation, Coevolution and Learning in Multiagent Systems %D 1996 %T Adaptive agent tracking in real-world multiagent domains: A preliminary report %A M. Tambe %A W. L. Johnson %A W. Shen %X In multi-agent environments, the task of agent track-ing (i.e., tracking other agents’ mental states) in-creases in difficulty when a tracker (tracking agent)only has an imperfect model of the trackee (trackedagent). Such model imperfections arise in many real-world situations, where a tracker faces resource con-straints and imperfect information, and the trackeesthemselves modify their behaviors dynamically. Whilesuch model imperfections are unavoidable, a trackermust nonetheless attempt to be adaptive in its agenttracking. In this paper, we analyze some key issuesin adaptive agent tracking, and describe an initial ap-proach based on discrimination-based learning. Themain idea is to identify the deficiency of a model basedon prediction failures, and revise the model by usingfeatures that are critical in discriminating successfuland failed episodes. Our preliminary experiments insimulated air-to-air combat environments have shownsome promising results but many problems remainopen for future research. %B AAAI Spring Symposium on Adaptation, Coevolution and Learning in Multiagent Systems %G eng %0 Magazine Article %D 1996 %T Event tracking in a dynamic multi-agent environment %A Tambe, Milind %A P. S. Rosenbloom %X This paper focuses on event tracking in one complex and dynamic multi-agent environment: the air-combat simulation environment. It analyzes the challenges that an automated pilot agent must face when tracking events in this environment. This analysis reveals three new issues that have not been addressed in previous work in this area: (i) tracking events generated by agents' flexible and reactive behaviors, (ii) tracking events in the context of continuous agent interactions, and (iii) tracking events in real-time. The paper proposes one solution to address these issues. One key idea in this solution is that the (architectural) mechanisms that an agent employs in generating its own flexible and reactive behaviors can be used to track other agents' flexible and reactive behaviors in real-time. A second key idea is the use of a world-centered representation for modeling agent interactions. The solution is demonstrated using an implementation of an automated pilot agent. %B Computational Intelligence %V 12 %G eng %0 Conference Paper %B AAAI Fall Symposium on Plan Execution %D 1996 %T Executing Team Plans in Dynamic Multi-agent environments %A Tambe, Milind %X This paper focuses on flexible teamwork in dynamic and real-world multi-agent domains. Such teamwork is not simply a union of agents’ simultaneous execution of individual plans,even if such execution pre-coordinated. Indeed, uncertain-ties in complex, dynamic domains often obstruct pre-plannedcoordination, with a resultant breakdown in teamwork. Thecentral hypothesis in this paper is that for durable teamwork,agents should be provided explicit team plans, which directlyexpress a team’s joint activities. When agents execute such team plans, they abide by certain "commonsense" conventions of teamwork. Essentially, such conventions providea deeper model of teamwork, facilitating flexible reasoningabout coordination activities. Such a framework also frees the planner or the knowledge engineer from specifying very detailed low-level coordination plans. This framework has been implemented in the context of a real-world synthetic environment for helicopter-combat simulation. %B AAAI Fall Symposium on Plan Execution %G eng %0 Book %D 1996 %T Intelligent Agents: Vol II, Workshop on Agents, theories, architectures and languages (ATAL-95) %X This book is based on the second International Workshop on Agent Theories, Architectures, and Languages, held in conjunction with the International Joint Conference on Artificial Intelligence, IJCAI'95 in Montreal, Canada in August 1995.
The 26 papers are revised final versions of the workshop presentations selected from a total of 54 submissions; also included is a comprehensive introduction, a detailed bibliography listing 355 relevant publications, and a subject index. The book is structured into seven sections, reflecting the most current major directions in agent-related research. Together with its predecessor, Intelligent Agents, published as volume 890 in the LNAI series, this book provides a timely and comprehensive state-of-the-art report. %G eng %U https://www.springer.com/gp/book/9783540608059 %0 Conference Paper %B International conference on multi-agent systems (ICMAS96) %D 1996 %T Teamwork in real-world, dynamic environments %A Tambe, Milind %X Flexibleteamworkinreal-worldmulti-agentdomainsismorethana unionofagents’simultaneousexecutionofindividualplans,evenif suchexecutionpre-coordinated.Indeed,uncertaintiesincomplex,dynamicdomainsoftenobstructpre-plannedcoordination,witha resultantbreakdowninteamwork.Thecentralhypothesisinthispaperis thatfordurableteamwork,agentsshouldbeprovidedexplicitteamplansandanunderlyingmodelof teamworkthatexplicitlyoutlinestheircommitmentsandresponsibilitiesasparticipantsin teamactivities.Sucha modelenablesteammemberstoflexiblyreasonaboutcoordinationactivities.Theunderlyingmodelwehaveprovidedisbasedonthejointintentionsframework;althoughwepresentsomekeymodificationstoreflectthepracticalconstraintsin(some)real-worlddomains.Thisframeworkhasbeenimplementedinthecontextofa real-worldsyntheticenvironmentforhelicopter-combatsimulation;someempiricalresultsarepresented. %B International conference on multi-agent systems (ICMAS96) %G eng %0 Conference Paper %B National Conference on Artificial Intelligence (AAAI96) %D 1996 %T Tracking dynamic team activity %A Tambe, Milind %X AI researchers are striving to build complex multi-agent worlds with intended applications ranging from the RoboCup robotic soccer tournaments, to interactive virtual theatre, to large-scale real-world battlefield simulations. Agent tracking --- monitoring other agent's actions and inferring their higher-level goals and intentions --- is a central requirement in such worlds. While previous work has mostly focused on tracking individual agents, this paper goes beyond by focusing on agent teams. Team tracking poses the challenge of tracking a team's joint goals and plans. Dynamic, real-time environments add to the challenge, as ambiguities have to be resolved in real-time. %B National Conference on Artificial Intelligence (AAAI96) %G eng %0 Journal Article %J USC ISI Technical report RR %D 1996 %T Tracking dynamic team activity: An extended report %A Tambe, Milind %X
AI researchers are striving to build complex multi-agent worlds with intended applications ranging from the RoboCup robotic soccer tournaments, to interactive virtual theatre, to large-scale real-world battlefield simulations. Agent tracking --- monitoring other agent's actions and inferring their higher-level goals and intentions --- is a central requirement in such worlds. While previous work has mostly focused on tracking individual agents, this paper goes beyond by focusing on agent teams. Team tracking poses the challenge of tracking a team's joint goals and plans. Dynamic, real-time environments add to the challenge, as ambiguities have to be resolved in real-time.
%B USC ISI Technical report RR %P 96-435 %G eng %0 Conference Paper %B Intelligent Agents, Vol II Springer Verlag Lecture Notes in Artificial Intelligence (LNAI 1037) %D 1996 %T Architectures for agents that track other agents in multi-agent worlds %A Tambe, Milind %A P. S. Rosenbloom %X In multi-agent environments, an intelligent agent often needs to interact with other individuals or groups of agents to achieve its goals. Agent tracking is one key capability required for intelligent interaction. It involves monitoring the observable actions of other agents and inferring their unobserved actions, plans, goals and behaviors. This article examines the implications of such an agent tracking capability for agent architectures. It specifically focuses on real-time and dynamic environments, where an intelligent agent is faced with the challenge of tracking the highly flexible mix of goal-driven and reactive behaviors of other agents, in real-time. The key implication is that an agent architecture needs to provide direct support for flexible and efficient reasoning about other agents' models. In this article, such support takes the form of an architectural capability to execute the other agent's models, enabling mental simulation of their behaviors. Other architectural requirements that follow include the capabilities for (pseudo-) simultaneous execution of multiple agent models, dynamic sharing and unsharing of multiple agent models and high bandwidth inter-model communication. We have implemented an agent architecture, an experimental variant of the Soar integrated architecture, that conforms to all of these requirements. Agents based on this architecture have been implemented to execute two different tasks in a real-time, dynamic, multi-agent domain. The article presents experimental results illustrating the agents' dynamic behavior. %B Intelligent Agents, Vol II Springer Verlag Lecture Notes in Artificial Intelligence (LNAI 1037) %G eng %0 Conference Paper %B AAAI Spring Symposium on 'Lessons Learned from Implemented Software Architectures for Physical Agents' %D 1995 %T Constraints and design choice's in building intelligent pilots for simulated aircraft: Extended Abstract %A Tambe, Milind %A K. Schwamb %A P. S. Rosenbloom %X This paper focuses on our recent research effort aimedat developing human-like, intelligent agents (virtualhumans) for large-scale, interactive simulationenvironments (virtual reality). These simulatedenvironments have sufficiently high fidelity andrealism[l 1,23] that constructing intelligent agentsrequires us to face many of the hard research challengesfaced by physical agents in the real world -- inparticular, the integration of a variety of intelligentcapabilities, including goal-driven behavior, reactivity,real-time performance, planning, learning, spatial andtemporal reasoning, and natural languagecommunication. However, since this is a syntheticenvironment, these intelligent agents do not have to dealwith issues of low-level perception and robotic control.Important applications of this agent technology can be found in areas such as education [14],manufacturing [11],entertainment [2, 12]and training. %B AAAI Spring Symposium on 'Lessons Learned from Implemented Software Architectures for Physical Agents' %G eng %0 Conference Paper %B International conference on multi-agent systems (ICMAS) %D 1995 %T Recursive agent and agent-group tracking in a real-time dynamic environment %A Tambe, Milind %X Agent tracking is an important capability an intelligent agent requires for interacting with other agents. It involves monitoring the observable actions of other agents as well as inferring their unobserved actions or high-level goals and behaviors. This paper focuses on a key challenge for agent tracking: recursive tracking of individuals or groups of agents. The paper first introduces aa approach for tracking recursive agent models. To tame the resultant growth in the tracking effort and aid real-time performance, the paper then presents model sharing, an optimization that involves sharing the effort of tracking multiple models. Such shared models are dynamically unshared as needed -- in effect, a model is selectively tracked if it is dissimilar enough to require unsharing. The paper also discusses the application of recursive modeling in service of deception, and the impact of sensor imperfections. This investigation is based on our on-going effort to build intelligent pilot agents for a real-world synthetic air-combat environment. %B International conference on multi-agent systems (ICMAS) %G eng %0 Conference Paper %B International joint conference on Artificial Intelligence (IJCAI) %D 1995 %T RESC: An approach for dynamic, real-time agent tracking %A Tambe, Milind %A P. S. Rosenbloom %X Agent tracking involves monitoring the observ­able actions of other agents as well as infer­ring their unobserved actions, plans, goals and behaviors. In a dynamic, real-time environ­ment, an intelligent agent faces the challenge of tracking other agents' flexible mix of goal-driven and reactive behaviors, and doing so in real-time, despite ambiguities. This paper presents RESC (REal-time Situated Commit­ments), an approach that enables an intelligent agent to meet this challenge. RESC's situat-edness derives from its constant uninterrupted attention to the current world situation — it always tracks other agents' on-going actions in the context of this situation. Despite ambigu­ities, RESC quickly commits to a single inter­pretation of the on-going actions (without an extensive examination of the alternatives), and uses that in service of interpretation of future actions. However, should its commitments lead to inconsistencies in tracking, it uses single-state backtracking to undo some of the commit­ments and repair the inconsistencies. Together, RESC's situatedness, immediate commitment, and single-state backtracking conspire in pro­viding RESC its real-time character. RESC is implemented in the context of intelli­gent pilot agents participating in a real-world synthetic air-combat environment. Experimen­tal results illustrating RESC's effectiveness are presented. %B International joint conference on Artificial Intelligence (IJCAI) %7 3 %V 12 %P 499-522 %G eng %0 Conference Paper %B Summer Computer Simulation Conference %D 1995 %T Using Machine Learning To Extend Autonomous Agent Capabilities %A W. Lewis Johnson %A Tambe, Milind %X What kinds of knowledge can Soar/IFOR agents learn in the combat simulation environment? In our investigations so far, we have found a number of learning opportunities in our systems, which yield several types of learned rules. For example, some rules speed up the agents' decision making, while other rules reorganize the agent's tactical knowledge for the purpose of on-line explanation generation. Yet, it is also important to ask a second question: Can machine learning make a significant difference in Soar/IFOR agent performance? The main issue here is that battlefield simulations are a real-world application of AI technology. The threshold which machine learning must surpass in order to be useful in this environment is therefore quite high. It is not sufficient to show that machine learning is applicable "in principle" via small-scale demonstrations; we must also demonstrate that learning provides significant benefits that outweigh any hidden costs. Thus, the overall objective of this work is to determine how machine learning can provide practical benefits to real-world applications of artificial intelligence. Our results so far have identified instances where machine learning succeeds in meeting these various requirements, and therefore can be an important resource in agent development. We have conducted extensive learning experiments in the laboratory, and have conducted demonstrations employing agents that learn; to date, however, learning has not yet been employed in large-scale exercises. The role of machine learning in Soar/IFOR is expected to broaden as practical impediments to learning are resolved, and the capabilities that agents are expected to exhibit are broadened. %B Summer Computer Simulation Conference %G eng %0 Conference Paper %B Conference on computer generated forces and behavioral representation %D 1995 %T Building intelligent pilots for simulated rotary wing aircraft %A Tambe, Milind %A K. Schwamb %A P. S. Rosenbloom %X Abstract There are two RWA in the scenario, just behind the The Soar/IFOR project has been developing ridge, indicated by the contour lines. The other intelligent pilot agents (henceforth IPs) for vehicles in the figure are a convoy of "enemy" participation in simulated battlefield environments. ground vehicles  tanks and anti-aircraft vehicles  While previously the project was mainly focused on controlled by ModSAF. The RWA are IPs for fixed-wing aircraft (FWA), more recently, the approximately 2.5 miles from the convoy. The IPs project has also started developing IPs for rotaryhave hidden their helicopters behind the ridge (their wing aircraft (RWA). This paper presents a approximate hiding area is specified to them in preliminary report on the development of IPs for advance). They unmask these helicopters by popping RWA. It focuses on two important issues that arise in out from behind the ridge to launch missiles at the this development. The first is a requirement for enemy vehicles, and quickly remask (hide) by reasoning about the terrain  when compared to an dipping behind the ridge to survive retaliatory FWA IP, an RWA IP needs to fly much closer to the attacks. They subsequently change their hiding terrain and in general take advantage of the terrain for position to avoid predictability when they pop out cover and concealment. The second issue relates to later. code and concept sharing between the FWA and RWA IPs. While sharing promises to cut down the development time for RWA IPs by taking advantage of our previous work for the FWA, it is not straightforward. The paper discusses the two issues in some detail and presents our initial resolutions of these issues. %B Conference on computer generated forces and behavioral representation %G eng %0 Magazine Article %D 1995 %T Intelligent Agents for Interactive Simulation Environments %A Tambe, Milind %A W. L. Johnson %A R. M. Jones %A F. Koss %A J. E. Laird %A P. S. Rosenbloom %A K Schwamb %X Interactive simulation environments constitute one of today’s promising emerging technologies, withapplications in areas such as education, manufacturing, entertainment and training. These environmentsare also rich domains for building and investigating intelligent automated agents, with requirements forthe integration of a variety of agent capabilities, but without the costs and demands of low-levelperceptual processing or robotic control.Our project is aimed at developing human-like, intelligent agents that can interact with each other, as wellas with humans in such virtual environments. Our current target is intelligent automated pilots forbattlefield simulation environments. These are dynamic, interactive, multi-agent environments that poseinteresting challenges for research on specialized agent capabilities as well as on the integration of thesecapabilities in the development of "complete" pilot agents. We are addressing these challenges throughdevelopment of a pilot agent, called TacAir-Soar, within the Soar architecture.The purpose of this article is to provide an overview of this domain and project by analyzing thechallenges that automated pilots face in battlefield simulations, describing how TacAir-Soar issuccessfully able to address many of themTacAir-Soar pilots have already successfully participated inconstrained air-combat simulations against expert human pilotsand discussing the issues involved inresolving the remaining research challenges %B AI Magazine %V 16 %P 15-39 %G eng %N 1 %0 Conference Paper %B AAAI Spring Symposium on 'Believable Agents' %D 1994 %T Building believable agents for simulation environments: Extended Abstract %A Tambe, Milind %A R. M. Jones %A J. E. Laird %A P. S. Rosenbloom %A K. Schwamb %X The goal of our research effort is to develop generic technology for intelligent automated agents in simulation environments. These agents are to behave believably like humans in these environments. In this context, believability refers to the indistinguishability of these agents from humans, given the task being performed, its scope, and the allowable mode(s) of interaction during task performance. For instance, for a given simulation task, one allowable mode of interaction with an agent may be typewritten questions and answers on a limited subject matter. Alternatively, a different allowable mode of interaction for the same (or different) task may be speech rather than typewritten words. In all these cases, believability implies that the agent must be indistinguishable from a human, given the particular mode of interaction. %B AAAI Spring Symposium on 'Believable Agents' %G eng %0 Conference Paper %B Time94: An international workshop on temporal representation and reasoning. %D 1994 %T Event Tracking for an Intelligent Automated Agent %A Tambe, Milind %A Paul S. Rosenbloom %X In a dynamic, multi-agent environment, an automated intelligent agent is often faced with the possibility that other agents may instigate events that actually hinder or help the achievement of its own goals. To act intelligently in such an environment, an automated agent needs an event tracking capability to continually monitor the occurrence of such events and the temporal relationships among them.  This capability enables an agent to infer the occurrence of important unobserved events as well as obtain a better understanding of interaction among events. This paper focuses on event tracking in one complex and dynamic multi-agent environment: the air-combat simulation environment. It analyzes the challenges that an automated pilot agent must face when tracking events in this environment. This analysis reveals some novel constraints on event tracking that arise from complex multi-agent interactions. The paper proposes one solution to address these constraints, and demonstrates it using a simple re-implementation of an existing automated pilot agent. %B Time94: An international workshop on temporal representation and reasoning. %G eng %0 Conference Paper %B Conference on computer generated forces and behavioral representation %D 1994 %T Event tracking in complex multiagent environments %A Tambe, Milind %A P. S. Rosenbloom %X
This paper focuses on event tracking in one complex and dynamic multi-agent environment: the air-combat simulation environment. It analyzes the challenges that an automated pilot agent must face when tracking events in this environment. This analysis reveals three new issues that have not been addressed in previous work in this area: (i) tracking events generated by agents' flexible and reactive behaviors, (ii) tracking events in the context of continuous agent interactions, and (iii) tracking events in real-time. The paper proposes one solution to address these issues. One key idea in this solution is that the (architectural) mechanisms that an agent employs in generating its own flexible and reactive behaviors can be used to track other agents' flexible and reactive behaviors in real-time. A second key idea is the use of a world-centered representation for modeling agent interactions. The solution is demonstrated using an implementation of an automated pilot agent.
%B Conference on computer generated forces and behavioral representation %G eng %0 Conference Paper %B Fourth Conference on Computer Generated Forces and Behavioral Representation %D 1994 %T Intelligent Automated Agents for Tactical Air Simulation: A Progress Report %A Paul S. Rosenbloom %A W. Lewis Johnson %A Randolph M. Jones %A Frank Koss %A John E. Laird %X

time, flexibly use a small amount of tactical This article reports on recent progress in the development of TacAir-Soar, an intelligent automated agent for tactical air simulation. This includes progress in expanding the agent’s coverage of the tactical air domain, progress in enhancing the quality of the agent’s behavior, and progress in building an infrastructure for research and development in this area. knowledge about two classes of one-versusone (1-v-1) Beyond Visual Range (BVR) tactical air scenarios. In the non-jinking bogey scenarios, one plane (the non-jinking bogey) is unarmed and maintains a straight-and-level flight path. The other plane is armed with long-range radar-guided, medium-range radar-guided, and short-range infrared-guided missiles. Its task is to set up for a sequence of missile shots, at

%B Fourth Conference on Computer Generated Forces and Behavioral Representation %C Orlando, FL. %G eng %0 Journal Article %J Artificial Intelligence (AIJ) %D 1994 %T Investigating production system representations for non-combinatorial match %A M. Tambe %A P. S. Rosenbloom %X

Eliminating combinatorics from the match in production systems (or rule-based systems) is important for expert systems, real-time performance, machine learning (particularly with respect to the utility issue), parallel implementations and cognitive modeling. In [74], the unique-attribute representation was introduced to eliminate combinatorics from the match. However, in so doing, unique-attributes engender a sufficiently negative set of trade-offs, so that investigating whether there are alternative representations that yield better trade-offs becomes of critical importance.

This article identifies two promising spaces of such alternatives, and explores a number of the alternatives within these spaces. The first space is generated from local syntactic restrictions on working memory. Within this space, unique-attributes is shown to be the best alternative possible. The second space comes from restrictions on the search performed during the match of individual productions (match-search). In particular, this space is derived from the combination of a new, more relaxed, match formulation (instantiationless match) and a set of restrictions derived from the constraint-satisfaction literature. Within this space, new alternatives are found that outperform unique-attributes in some, but not yet all, domains.

%B Artificial Intelligence (AIJ) %V 68 %P 155-199 %G eng %N 1 %0 Conference Paper %B Conference on Information and Knowledge Management %D 1993 %T Collection oriented match %A A. Acharya %A Tambe, Milind %X Match algorithms capable of handling large amounts of dat% without giving up expressiveness are a key requirement for successful integration of relational database systems and powerful rule-based systems. Algorithms that have been used for database rule systems have usually been unable to support large and complex rule sets, while the algorithms that have been used for rule-based expert systems do not scale welt with data. Furthermore% these algorithms do not ~ovide support for collection (or set) oriented production languages. This paper ~oposes a basic shift in the nature of match algorithms: from tuple-oriented to collectwn-oriented. A collection-oriented match algorithm matches each condition in a production with a collection of tuples and generatea collection-oriented instarrtiutwns, i.e., instantiation that have collection of tuples corresponding to each condition. This approach shows great promise for efllciently matching expressive productions against large amounts of data. In addition, it provides direct support for collection-oriented production languages. We have found that many existing tuple-oriented match algorithms can be easily rmnsfonned to their collection-oriented analogues. This paper presents the transformation of Rete to Collection Rete as an example and compares the two based on a set of benchmarks. Results presented in this paper show tha~ for large amounts of data, a relatively underoptitnized implementation of Collection Rete achieves orders of magnitude improvement in time and space over an optimized version of Rete. The results establish the feasibility of collection-oriented match for integrated database-production systems. %B Conference on Information and Knowledge Management %G eng %0 Conference Paper %B School of Computer Science, Carnegie Mellon University, technical report CMU-CS-93-195 %D 1993 %T Experiments in Knowledge Refinement for a Large Rule-Based System %A Wilson A. Harvey %A Tambe, Milind %X Knowledge-refinement is a central problem in the field of expert systems[Buchanan and Shortliffe,1984]. It refers to the progressive refinement of the initial knowledge-base of an expert systeminto a high-performance knowledge-base. For rule-based systems, refinement implies the addition,deletion and modification of rules in the system so as to improve the system’sempirical adequacy,i.e., its ability to reach correct conclusions in the problems it is intended to solve[Ginsberget al.,1988].The goal of our research effort is to understand the methodology for refining large rule-basedsystems, as well as to develop tools that will be useful in refining such systems. The vehicle forour investigation isspam, a production system (rule-based system) for the interpretation of aerialimagery[McKeownet al., 1985, McKeownet al., 1989]. It is a mature research system havingover 600 productions, many of which interact with a variety of complex (non-rule-based) geometricalgorithms. A typical scene analysis task requires between50,000 to 400,000 production firings andan execution time of the order of 2 to 4 cpu hours1.Large, compute-intensive systems likespamimpose some unique constraints on knowledge refine-ment. First, the problem of credit/blame-assignment is complicated; it is extremely difficult toisolate a single culprit production (or a set of culprit productions) to blame for an error observed inthe output. As a result, the methodology adopted in well-known systems such asseekandseek2[Politakis and Weiss, 1984, Ginsberget al., 1988], orkrust[Craw and Sleeman, 1991], cannotbe directly employed to refine knowledge inspam. These systems identify errorful output andbackward-chain through the executed rules to localize the source of the error. A second problemis thatspam’s long run-times make it difficult to rely on extensive experimentation for knowledgerefinement. Iterative refinement and generate and test methods typically require many experimentsto improve the knowledge base. In particular,spam’s long run-times prohibit a thorough search ofthe space of possible refinements.Givenspam’s constraints, we have employed a bottom-up approach for knowledge refinement.Specifically, we begin by refining small portions ofspam, and then attempt to understand the inter-actions of these refinements and their impact on intermediate results. Fortunately,spamis alreadydivided into four phases, facilitating the identification of the ”modular” pieces on which we focusour refinement efforts, as well as our evaluation efforts (discussed below). We begin by identifyinggaps and/or faults within small portions ofspam’s individual phases by comparing their output tothat of an expert. The knowledge is modified to more accurately match the expert’s output. Wethen evaluate the new output to see how well the refined knowledge is performing. This methodchains forward from the refined knowledge to an evaluation ofit’s performance. This is in contrastto the backward-chaining systems cited above, which identify errorful output and reason backwardto find the faulty knowledge (i.e., assign credit/blame). Backward chaining is difficult when theinteractions between rules are complex. In particular, inspam, this implies backward-chaininghundreds of thousands of rule firings as well as applicationsof complex algorithmic constraints,which would be extremely difficult.However, forward-chaining does not obviate the credit/blame assignment problem. It introducesthe problem in a new form — that of evaluating the impact of theknowledge refinements. Inparticular, in forward-chaining systems, given complex rule interactions, refinement in one partof the system need not improve the end result. This may occur because the refined portion maycontribute to overall improvement in a relatively small number of cases; or a second culprit compo-nent involved in the chaining may actually minimize the impact of the refinements. Hence, there is a need for evaluation and analysis ofintermediateresults. Of course, these intermediate resultsshould more than exactly evaluate the parts of the system refined as a result of interaction withthe expert — they should progressively yield a better understanding of how to change the rules tobetter the system’s overall performance.spam’s decomposition into phases helps us to a certainextent: the endpoints of the phases serve as potential intermediate evaluation points.In our work so far, we have focused on the second phase inspamlocal-consistency (lcc). Thisphase was chosen because most ofspam’s time is spent in this phase, and because it shows the mostpotential for future growth.lccperforms a modified constraint satisfaction between hypothesesgenerated inspam’s first phase. It applies constraints to a set of plausible hypotheses and prunesthe hypotheses that are inconsistent with those constraints. For example, objects hypothesized ashangar-buildings and objects hypothesized as parking-aprons would participate in the constrainthangar-buildings and parking-aprons are close together. This rule is realized as a distance con-straint on pairs of objects hypothesized to be hangar-buildings and parking-aprons. A successfulapplication of anlccconstraint provides support for each pair of hypotheses, and an unsuccessfulapplication decreases support for that pair of hypotheses.In essence, each constraint in thelccphase classifies pairs of hypotheses — either the constraintsupports that pair, or it does not. (lccandspamare described in further detail in Section 2.)In working toward refining these constraints, we posed several questions:1. What role does this constraint play in the interpretation process? 2. If the constraint does play a role, is it positive (helpful) or negative (unhelpful)?3. If the role is positive, and the constraint is numeric, can we optimize the constraint values?4. How effective (powerful) is this constant? 5. What is this constraint’s impact on run time?To address these questions, we began by asking a user to manually compile a database of correct intermediate outputs — the ground-truth database. The database and the system outputs were compared. An automatic procedure was created to adjust the spam knowledge base so that the correspondence between the ground-truth database and the knowledge base improved. This adjusted knowledge was then re-run through the system and the entire set of intermediate outputs(refined or otherwise) were evaluated. The somewhat surprising result of this analysis was that the expert’s input did not help as much as expected. This result raised questions about why the expert’s input was not as helpful as anticipated, and how the results should be evaluated.In order to better understand both the interactions between constraints and the effects of modifying the knowledge base, we did a brute-force evaluation of the results, providing some answers to the questions we posed above. Namely, the distance constraints do play positive roles in spam’s interpretation process, though they are not very powerful.More importantly, we now see that re-fined knowledge may apply selectively. That is, the set of objects affected by individual constraints largely overlaps, implying that spam could reduce computation by selectively applying constraints and still achieve the similar performance. Tools eliciting the expert’s input must carefully take these interactions into consideration. Finally, we can also conclude that intermediate result evaluation is not straight-forward. For example, the complex structures that spam generates using the results of constraint application, called functional-areas, must be matched and these matches evaluated.Having provided some of the motivation for this work, we begin by describing the spam image interpretation system system. Following this background material, we present our refinement methodology and an analysis of our results. We then re-evaluate the methodology and discuss further experimental results. Finally, we will outline some areas of future work. %B School of Computer Science, Carnegie Mellon University, technical report CMU-CS-93-195 %G eng %0 Conference Paper %B National conference on Artificial Intelligence (AAAI-93) %D 1993 %T On the Masking Effect %A Tambe, Milind %A P. S. Rosenbloom %X

Machine learning approaches to knowledge compilation seek to improve the perfonnance of problem-solvers by storing solutions to previously solved problems in an efficient, generalized fonn. The problem-solver retrieves these learned solutions in appropriate later situations to obtain results more efficiently. However, by relying on its learned knowledge to provide a solution, the problem-solver may miss an alternative solution of higher quality - one that could have been generated using the original (non-learned) problem-solving knowledge. This phenomenon is referred to as the ITUlSking effect of learning.

In this paper, we examine a sequence of possible solutions for the masking effect. Each solution refines and builds on the previous one. The fmal solution is based on cascaded filters. When learned knowledge is retrieved, these filters alert the system about the inappropriateness of this knowledge so that the system can then derive a better alternative solution. We analyze conditions under which this solution will perfonn better than the others, and present experimental data supportivt: of the analysis. This investigation is based on a simulated robot domain called Groundworld.

%B National conference on Artificial Intelligence (AAAI-93) %G eng %0 Conference Paper %B International Conference on Tools of Artificial Intelligence %D 1992 %T An efficient algorithm for production systems with linear-time match %A M. Tambe %A Kalp. D. %A P. Rosenbloom %X Combinatorial match in production systems (rule-based system) is problematical in several areas of production system application: real-time performance, learning new productions for performance improvement. modeling human cognition. and parallelization. The unique-attribute representation in production systems is a promising approach to eliminate match combinatorics. Earlier investigations have focused on the ability of unique-attributes to alleviate the problems caused b~ combinatorial match 1331. This paper reports on an additional benefit of unique-attributes: a specialized match algorithm called Uni-Rete. Uni-Rete is a specialization of the widely used Rete match algorirhm for unique-attributes. The paper presents performance results for Uni-Rete. which indicate over 10-fold speedup with respect to Rete. It also discusses the implications of Uni-Rete for non-unique-attribute systems. %B International Conference on Tools of Artificial Intelligence %G eng %0 Magazine Article %D 1992 %T Flexible integration of path-planning capabilities %A lain C. Stobie %A Tambe, Milind %A Paul S. Rosenbloom %X Robots pursuing complex goals must plan paths according to several criteria of quality, including shortness, safety, speed and planning time. Many sources and kinds of knowledge, such as maps, procedures and perception, may be available or required. Both the quality criteria and sources of knowledge may vary widely over time, and in general they will interact. One approach to address this problem is to express all criteria and goals numerically in a single weighted graph, and then to search this graph to determine a path. Since this is problematic with symbolic or uncertain data and interacting criteria, we propose that what is needed instead is an integration of many kinds of planning capabilities. We describe a hybrid approach to integration, based on experiments with building simulated mobile robots using Soar, an integrated problem-solving and learning system. For flexibility, we have implemented a combination of internal planning, reactive capabilities and specialized tools. We illustrate how these components can complement each other's limitations and produce plans which integrate geometric and task knowledge. %B MOBILE ROBOTS VII %G eng %0 Conference Paper %B National Conference on Artificial Intelligence (AAAI) %D 1992 %T Learning 10,000 chunks: what’s it like out there %A R. Doorenbos %A M. Tambe %A A. Newell %X This paper describes an initial exploration into large learning systems, i.e., systems that learn a large number of rules. Given the well-known utility problem in learning systems, efficiency questions are a major concern. But the questions are much broader than just efficiency, e.g., will the effectiveness of the learned rules change with scale? This investigation uses a single problem-solving and learning system, Dispatcher-Soar, to begin to get answers to these questions. Dispatcher-Soar has currently learned 10,112 new productions, on top of an initial system of 1,819 productions, so its total size is 11,931 productions. This represents one of the largest production systems in existence, and by far the largest number of rules ever learned by an AI system. This paper presents a variety of data from our experiments with Dispatcher-Soar and raises important questions for large learning systems1 %B National Conference on Artificial Intelligence (AAAI) %G eng %0 Conference Paper %B IEEE Transactions on Parallel and Distributed Computing (IEEE TPDC) %D 1992 %T Implementation of production systems on message passing computers: Simulation results and analysis %A A. Acharya %A M. Tambe %A A. Gupta %X In the past, researchers working on parallel implementations of production systems have focused on shared- memory multiprocessors and special-purpose architectures. Message-passing computers have not been given as much attention. The main reasons for this have been the large message- passing latency (as large as a few milliseconds) and high message-handling overheads (several hundred microseconds) associated with the first generation message-passing computers. These overheads were too large for parallel implementations of production systems, which require a fine-grain decomposition to obtain a significant speedup. Recent advances in interconnection network technology and processing element design, however, promise to reduce the network latency and message-handling overhead by 2-3 orders of magnitude, making these computers much more interesting for implementation of production systems. In this paper, we examine the suitability of message-passing computers for parallel implementations of production systems. We present two mappings for production systems on these computers, one targeted toward fine-grained message-passing machines and the other targeted toward medium-grained machines. We also present simulation results for the medium- grained mapping and show that it is possible to exploit the available parallelism and to obtain reasonable speedups. Finally, we perform a detailed analysis of the results and suggest solutions for some of the problems. Index Terms- Coarse-grain mapping, concurrent distributed hash table, fine-grain mapping, medium-grain mapping, message- passing computers, OPS5, parallel production systems, Rete net- work, simulation results. %B IEEE Transactions on Parallel and Distributed Computing (IEEE TPDC) %7 4 %V 3 %P 477-487 %G eng %0 Journal Article %J Parallel and Distributed Computing (JPDC) %D 1991 %T The effectiveness of task-level parallelism for production system %A W. Harvey %A D. Kalp %A M. Tambe %A D. McKeown %A A. Newell %X Large production systems (rule-based systems) continue to suffer from extremely slow execution which limits their utility in practical applications as well as in research settings. Most investigations in speeding up these systems have focused on match parallelism. These investigations have revealed that the total speed-up available from this source is insufficient to alleviate the problem of slow execution in large-scale production system implementations. In this paper, we focus on task-level parallelism, which is obtained by a high-level decomposition of the production system. Speed-ups obtained from task-level parallelism will multiply with the speed-ups obtained from match parallelism. The vehicle for our investigation of task-level parallelism is SPAM, a high-level vision system, implemented as a production system. SPAM is a mature research system with a typical run requiring between 50,000 and 400,000 production firings. We report very encouraging speed-ups from task-level parallelism in SPAM… -our parallel implementation shows near linear speed-ups of over 12-fold using 14 processors and points the way to substantial (50- to 100-fold) speed-ups. We present a characterization of task-level parallelism in production systems and describe our methodology for selecting and applying a particular approach to parallelize SPAM. Additionally, we report the speed-ups obtained from the use of virtual shared memory. Overall, task-level parallelism has not received much attention in the literature. Our experience illustrates that it is potentially a very important tool for speeding up large-scale production systems. %B Parallel and Distributed Computing (JPDC) %V 13 %P 395-411 %G eng %N 4 %0 Conference Paper %B Carnegie Mellon University Computer Science Dept Technical Report %D 1991 %T Uni-Rete : specializing the Rete match algorithm for the unique-attribute representation %A Tambe, Milind %A Dirk Kalp %A Paul Rosenbloom %X The combinatorial match in production systems (rule-based systems) is problematical in several areas of production system application: real-time performance, learning new productions for performance improvement, modeling human cognition, and parallelization. The unique-attribute representation is a promising approach to eliminate match combinatorics. Earlier investigations have focused on the ability of unique-attributes to alleviate the problems caused by combinatorial match [Tambe, Newell and Rosenbloom 90]. This paper reports on an additional benefit of unique-attributes: a specialized match algorithm called Uni-Rete. Uni-Rete is a specialization of the widely used Rete match algorithm for unique-attributes, and it has shown over 10-fold speedup over Rete in performing match. %B Carnegie Mellon University Computer Science Dept Technical Report %G eng %0 Conference Paper %B ACM/SIGPLAN Symposium on Principles and Practices of Parallel Programming (PPOPP) %D 1990 %T The effectiveness of task-level parallelism for high-level vision %A W. Harvey %A D. Kalp %A M. Tambe %A D. McKeown %A A. Newell %X Large production systems (rule-based systems) continue to suffer from extremely slow execution which limits their utility in practical applications as well as in research settings. Most investigations in speeding up these systems have focused on match (or knowledge-search) parallelism. Although gocd speed-ups have been achieved in this pnxzss, these investigations have revealed the limitations on the total speed-up available from this source. This limited speed-up is insufficient to alleviate the problem of slow execution in large-scale production system implementations. Such large-scale systems are expected to increase as researchers develop increasingly more competent production systems. In this paper, we focus on task-level parallelism, which is obtained by a high-level decomposition of the production system. Speed-ups obtained from task-level parallelism will multiply with the speed-ups obtained from match parallelism. The vehicle for our investigation of task-level parallelism is SPAM. a high-level vision system, implemented as a production system. SPAM is a mature research system with a typical run requiring between 50.000 to 400,000 production firings and an execution time of tbe order of 10 to 100 cpu hours. We report very encouraging speed-ups from task-level parallelism in SPAM - our parallel implementation shows near linear speed-ups of over 12 fold using 14 processors and points the way to substantial (SO-100 fold) speed-ups from task-level parallelism. We present a characterization of task-level parallelism in production systems and describe our methodology for selecting and applying a particular approach to parallel&e SPAM. Additionally, we report the speed-ups obtained from the use of shared virtual memory (network shared memory) in this implementation. Overall, task-level parallelism has not received much attention in the literature. Our experience illustrates that it is potentially a very important tool for speeding up large-scale production systems’. %B ACM/SIGPLAN Symposium on Principles and Practices of Parallel Programming (PPOPP) %G eng %0 Conference Paper %B National Conference on Artificial Intelligence (AAAI) %D 1990 %T A framework for investigating production system formulations with polynomially bounded match %A M. Tambe %A P. Rosenbloom %X Real time constraints on AI systems require guaranteeing bounds on these systems’ performance. However, in the presence of sources of uncontrolled combinatorics, it is extremely difficult to guarantee such bounds on their performance. In production systems, the .prirnary source of uncontrolled combinatorics is the production match. To eliminate these combinatorics, the unique-attribute formulation was introduced in (Tambe and Rosenbloom, 1989). which achieved a linear bound on the production match. This formulation leads to several questions: is this unique-attributes formulation the best conceivable production system formulation? In fact, are there other alternative production system formulations? If there are other formulations, how should these alternatives be compared with the unique-attribute formulation? This paper attempts to address these questions in the context of Soar. It identifies independent dimensions along which alternative production system formulations can be specified. These dimensions are based on the fiied class of match algorithms currently employed in production systems. These dimensions create a framework for systematically generating alternative formulations. Using this framework we show that the unique-attribute formulation is the best one within the dimensions investigated. However, if a new class of match algorithms is admitted, by relaxing certain constraints, other competitor fonnulations emerge. The paper indicates which competitor formulations are promising and why. Although some of the concepts, such as unique-attributes, are introduced in the context of Soar, they should also be relevant to other rule-based systems. %B National Conference on Artificial Intelligence (AAAI) %G eng %0 Journal Article %J Machine Learning Journal %D 1990 %T The problem of expensive chunks and its solution by restricting expressiveness %X Soar is an architecture for a system that is intended to be capable of general intelligence. Chunking, a simpleexperience-based learning mechanism, is Soar’s only learning mechanism. Chunking creates new items ofinformation, called chunks, based on the results of problem-solving and stores them in the knowledge base. Thesechunks are accessed and used in appropriate later situations to avoid the problem-solving required to determinethem. It is already well-established that chunking improves performance in Soar when viewed in terms of thesubproblems required and the number of steps within a subproblem. However, despite the reduction in number ofsteps, sometimes there may be a severe degradation in the total run time. This problem arises due toexpensivechunks, i.e., chunks that require a large amount of effort in accessing them from the knowledge base. They pose amajor problem for Soar, since in their presence, no guarantees can be given about Soar’s performance.In this article, we establish that expensive chunks exist and analyze their causes. We use this analysis to propose asolution for expensive chunks. The solution is based on the notion of restricting the expressiveness of therepresentational language to guarantee that the chunks formed will require only a limited amount of accessing effort.We analyze the tradeoffs involved in restricting expressiveness and present some empirical evidence to support ouranalysis. %B Machine Learning Journal %V 5 %P 299-348 %G eng %N 3 %0 Conference Paper %B International Joint Conference on Artificial Intelligence (IJCAI'89) %D 1989 %T Eliminating expensive chunks by restricting expressiveness %A M. Tambe %A P. Rosenbloom %X Chunking, an experience based-learning mechanism, improves Soar's performance a great deal when viewed in terms of the number of subproblems required and the number of steps within a subproblem. This high-level view of the impact of chunking on performance is based on an deal computational model, which says that the time per step is constant. However, if the chunks created by chunking are expensive, then they consume a large amount of processing in the match, i.e, indexing the knowledge-base, distorting Soar*s constant time-per-stcp model. In these situations, the gain in number of steps does not reflect an improvement in performance; in fact there may be degradation in the total run time of the system. Such chunks form a major problem for the system, since absolutely 10 guarantees can be given about its behavior. I "his article presents a solution to the problem of expensive chunks. The solution is based on the notion of restricting the expressiveness of Soar's representational language to guarantee that chunks formed will require only a limited amount of matching effort. We analyze the tradeoffs involved in restricting expressiveness and present some empirical evidence to support our analysis. %B International Joint Conference on Artificial Intelligence (IJCAI'89) %G eng %0 Report %D 1989 %T Implementation of production systems on message passing computers : techniques, simulation results and analysis %A Tambe, Milind %A Anurag Acharya %A Anoop Gupta %X The authors examine the suitability of message-passing computers for parallel implementations of production systems. Two mappings for production systems on these computers, one targeted toward fine-grained message-passing machines and the other targeted toward medium-grained machines, are presented. Simulation results for the medium-grained mapping are presented, and it is shown that it is possible to exploit the available parallelism and to obtain reasonable speedups. The authors perform a detailed analysis of the results and suggest solutions for some of the problems %I Carnegie Mellon University Computer Science Dept Technical Report %G eng %0 Conference Paper %B International Conference on Parallel Processing (ICPP) %D 1988 %T Parallel ops5 on the encore multimax %A A. Gupta %A C. L. Forgy %A D. Kalp %A A. Newell %A M. Tambe %X Until now, most results reported for parallelism in production systems (rule-based systems) have been simulationresults -- very few real parallel implementations exist. In this paper, we present results from our parallelimplementation of OPS5 on the Encore multiprocessor. The implementation exploits very fine-grained parallelismto achieve significant speed-ups. For one of the applications, we achieve 12.4 fold speed-up using 13 processes.Our implementation is also distinct from other parallel implementations in that we parallelize a highly optimizedC-based implementation of OPS5. Running on a uniprocessor, our C-based implementation is 10-20 times fasterthan the standard lisp implementation distributed by Carnegie Mellon University. In addition to presenting theperformance numbers, the paper discusses the amount of contention observed for shared data structures, and thetechniques used to reduce such contention. %B International Conference on Parallel Processing (ICPP) %G eng %0 Conference Paper %B ACM/SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages, and Systems (PPEALS) %D 1988 %T Soar/Psm-e: Investigating match parallelism in a learning production system %A M. Tambe %A D. Kalp %A A. Gupta %A C. L. Forgy %A B. G. Milnes %A A. Newell %X Soar is an attempt to realize a set of hypotheses on the nature of general intelligence within a single system. Soar uses a Production system (rule based system) to encode its knowledge base. Its learning mechanism, chunking. adds productions continuously to the production system. The process of searching for relevant knowledge, matching, is known to be a performance bottleneck in production systems. PSM-E is a C-based implementation of the OPS5 production system on the Encore Multimax that has achieved significant speedups in matching. In this paper we describe our im- plementation, Soar/PSM-E, of Soar on the Encore Multimax that is built on top of PSM-E. We fiit describe the exten- sions and modifications required to PSM-E in order to support Soar, especially the capability of adding productions at run time as required by chunking. We present tbe speedups obtained on Soar/PSM-E and discuss some effects of chunk- ing on parallelism. We also analyze the performance of the system and identify the bottlenecks limiting parallelism. Finally, we discuss the work in progress to deal with some of them. %B ACM/SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages, and Systems (PPEALS) %G eng %0 Conference Paper %B National Conference on Artificial Intelligence (AAAI) %D 1988 %T Suitability of message passing computers for implementing production systems %A A. Gupta %A M. Tambe %X Two important parallel architecture types are the shared-memory architectures and the message-passing architectures. In the past researchers working on the parallel implementations of production systems have focussed either on shared-memory multiprocessors or on special purpose architectures. Message-passing computers have not been studied. The main reasons have been the large message- passing latency (as large as a few milliseconds) and high message reception overheads (several hundred microseconds) exhibited by the first generation message-passing computers. These overheads are too large for the parallel implementation of production systems, where it is necessary to exploit parallelism at a very fine granularity to obtain significant speed-up (subtasks execute about 100 machine instructions). However, recent advances in interconnection network technology and processing node design have cut the network latency and message reception overhead by 2-3 orders of magnitude, making these computers much more interesting. In this paper we present techniques for mapping production systems onto message-passing computers. We show that using a concurrent distributed hash table data structure, it is possible to exploit parallelism at a very fine granularity and to obtain significant speed-ups from paralIelism. %B National Conference on Artificial Intelligence (AAAI) %G eng %0 Report %D 1988 %T Why some chunks are expensive %A Tambe, Milind %A Allen Newell %X Soar is an attempt to realize a set of hypothesis on the nature of general intelligence within a single system. One central hypothesis is that chunking, Soar's simple experience-based learning mechanism, can form the basis for a general learning mechanism. It is already well established that the addition of chunks improves the performance in Soar a great deal, when viewed in terms of subproblems required and number of steps within a subproblem. But this high level view does not take into account potential offsetting costs that arise from various computational effects. This paper is an investigation into the computational effect of expensive chunks. These chunks add significantly to the time per step by being individually expensive. We decompose the causes of expensive chunks into three components and identify the features of the task environment that give rise to them. We then discuss the implications of the existence of expensive chunks for a complete implementation of Soar. %I Carnegie Mellon University Computer Science %G eng