2012
Jun-young Kwak, Pradeep Varakantham, Rajiv Maheswaran, Milind Tambe, Farrokh Jazizadeh, Geoffrey Kavulya, Laura Klein, Burcin Becerik-Gerber, Timothy Hayes, and Wendy Wood. 2012. “
SAVES: A Sustainable Multiagent Application to Conserve Building Energy Considering Occupants .” In International Conference on Autonomous Agents and Multiagent Systems (AAMAS) .
AbstractThis paper describes an innovative multiagent system called
SAVES with the goal of conserving energy in commercial buildings. We specifically focus on an application to be deployed in
an existing university building that provides several key novelties:
(i) jointly performed with the university facility management team,
SAVES is based on actual occupant preferences and schedules, actual energy consumption and loss data, real sensors and hand-held
devices, etc.; (ii) it addresses novel scenarios that require negotiations with groups of building occupants to conserve energy; (iii) it
focuses on a non-residential building, where human occupants do
not have a direct financial incentive in saving energy and thus requires a different mechanism to effectively motivate occupants; and
(iv) SAVES uses a novel algorithm for generating optimal MDP
policies that explicitly consider multiple criteria optimization (energy and personal comfort) as well as uncertainty over occupant
preferences when negotiating energy reduction – this combination
of challenges has not been considered in previous MDP algorithms.
In a validated simulation testbed, we show that SAVES substantially reduces the overall energy consumption compared to the existing control method while achieving comparable average satisfaction levels for occupants. As a real-world test, we provide results of
a trial study where SAVES is shown to lead occupants to conserve
energy in real buildings.
2012_9_teamcore_aamas12-saves-camera-ready-final.pdf Manish Jain, Bo An, and Milind Tambe. 2012. “
Security Games Applied to Real-World: Research Contributions and Challenges.” In In S. Jajodia, A.K. Ghosh, V.S. Subramanian, V. Swarup, C. Wang, and X. S. Wang editors Moving Target Defense II: Application of Game Theory and Adversarial Modeling, Springer .
AbstractThe goal of this chapter is to introduce a challenging real-world problem
for researchers in multiagent systems and beyond, where our collective efforts may
have a significant impact on activities in the real-world. The challenge is in applying
game theory for security: our goal is not only to introduce the problem, but also to
provide exemplars of initial successes of deployed systems in this problem arena.
Furthermore, we present key ideas and algorithms for solving and understanding
the characteristics large-scale real-world security games, and then present some key
open research challenges in this area.
2012_35_teamcore_manish.pdf Thanh H. Nguyen, Jason Tsai, Albert Jiang, Emma Bowring, Rajiv Maheswaran, and Milind Tambe. 2012. “
Security Games on Social Networks .” In AAAI Fall Symposium, 2012 .
AbstractMany real-world problems exhibit competitive situations in
which a defender (a defending agent, agency, or organization) has to address misinformation spread by its adversary,
e.g., health organizations cope with vaccination-related misinformation provided by anti-vaccination groups. The rise of
social networks has allowed misinformation to be easily and
quickly diffused to a large community. Taking into account
knowledge of its adversary’s actions, the defender has to seek
efficient strategies to limit the influence of the spread of misinformation by the opponent.
In this paper, we address this problem as a blocking influence
maximization problem using a game-theoretic approach. Two
players strategically select a number of seed nodes in the social network that could initiate their own influence propagation. While the adversary attempts to maximize its negative
influence, the defender tries to minimize this influence. We
represent the problem as a zero-sum game and apply the Double Oracle algorithm to solve the game in combination with
various heuristics for oracle phases. Our experimental results
reveal that by using the game theoretic approach, we are able
to significantly reduce the negative influence in comparison
to when the defender does not do anything. In addition, we
propose using an approximation of the payoff matrix, making
the algorithms scalable to large real-world networks.
2012_41_teamcore_security_games_on_social_networks.pdf Bo An, David Kempe, Christopher Kiekintveld, Eric Shieh, Satinder Singh, Milind Tambe, and Yevgeniy Vorobeychik. 2012. “
Security Games with Limited Surveillance .” In Conference on Artificial Intelligence (AAAI) .
AbstractRandomized first-mover strategies of Stackelberg games
are used in several deployed applications to allocate
limited resources for the protection of critical infrastructure.
Stackelberg games model the fact that a strategic attacker can
surveil and exploit the defender’s strategy, and randomization
guards against the worst effects by making the defender less
predictable. In accordance with the standard game-theoretic
model of Stackelberg games, past work has typically assumed
that the attacker has perfect knowledge of the defender’s
randomized strategy and will react correspondingly. In light
of the fact that surveillance is costly, risky, and delays an
attack, this assumption is clearly simplistic: attackers will
usually act on partial knowledge of the defender’s strategies.
The attacker’s imperfect estimate could present opportunities
and possibly also threats to a strategic defender.
In this paper, we therefore begin a systematic study of
security games with limited surveillance. We propose
a natural model wherein an attacker forms or updates a
belief based on observed actions, and chooses an optimal
response. We investigate the model both theoretically and
experimentally. In particular, we give mathematical programs
to compute optimal attacker and defender strategies for a
fixed observation duration, and show how to use them to
estimate the attacker’s observation durations. Our experimental results show that the defender can achieve significant
improvement in expected utility by taking the attacker’s
limited surveillance into account, validating the motivation
of our work.
2012_25_teamcore_boan_aaai12.pdf Bo An, David Kempe, Christopher Kiekintveld, Eric Shieh, Satinder Singh, Milind Tambe, and Yevgeniy Vorobeychik. 2012. “
Security Games with Limited Surveillance: An Initial Report.” In AAAI Spring Symposium on Game Theory for Security, Sustainability and Health.
AbstractStackelberg games have been used in several deployed applications of game theory to make recommendations for allocating limited resources for protecting critical infrastructure. The resource allocation strategies are randomized to
prevent a strategic attacker from using surveillance to learn
and exploit patterns in the allocation. An important limitation of previous work on security games is that it typically
assumes that attackers have perfect surveillance capabilities,
and can learn the exact strategy of the defender. We introduce
a new model that explicitly models the process of an attacker observing a sequence of resource allocation decisions and
updating his beliefs about the defender’s strategy. For this
model we present computational techniques for updating the
attacker’s beliefs and computing optimal strategies for both
the attacker and defender, given a specific number of observations. We provide multiple formulations for computing the
defender’s optimal strategy, including non-convex programming and a convex approximation. We also present an approximate method for computing the optimal length of time
for the attacker to observe the defender’s strategy before attacking. Finally, we present experimental results comparing
the efficiency and runtime of our methods.
2012_7_teamcore_aaaiss_final.pdf J. Tsai, E. Bowring, S. Marsella, W. Wood, and M. Tambe. 2012. “
A Study of Emotional Contagion with Virtual Characters .” In International Conference on Intelligent Virtual Agents (IVA) (short paper) .
AbstractIn social psychology, emotional contagion describes the widely observed phenomenon of one person’s emotions mimicking surrounding people’s
emotions [10]. In this paper, we perform a battery of experiments to explore
the existence of agent-human emotional contagion. The first study is a betweensubjects design, wherein subjects were shown an image of a character’s face with
either a neutral or happy expression. Findings indicate that even a still image
induces a very strong increase in self-reported happiness between Neutral and
Happy conditions with all characters tested.
In a second study, we examine the effect of a virtual character’s presence in a
strategic situation by presenting subjects with a modernized Stag Hunt game.
Our experiments show that the contagion effect is substantially dampened and
does not cause a consistent impact on behavior. A third study explores the impact
of the strategic decision within the Stag Hunt and conducts the same experiment
using a description of the same strategic situation with the decision already made.
We find that the emotional impact returns, implying that the contagion effect is
substantially lessened in the presence of a strategic decision.
2012_39_teamcore_20120625_iva_camera_ready_v2.pdf Jun-young Kwak, Pradeep Varakantham, Rajiv Maheswaran, Milind Tambe, Farrokh Jazizadeh, Geoffrey Kavulya, Laura Klein, Burcin Becerik-Gerber, Timothy Hayes, and Wendy Wood. 2012. “
Sustainable Multiagent Application to Conserve Energy .” In International Conference on Autonomous Agents and Multiagent Systems (AAMAS) Demonstration Track .
AbstractLimited availability of energy resources has motivated the need
for developing efficient measures of conserving energy. Conserving energy in commercial buildings is an important goal since these
buildings consume significant amount of energy, e.g., 46.2% of
all building energy and 18.4% of total energy consumption in the
US [1]. This demonstration focuses on a novel application to be
deployed at Ralph & Goldy Lewis Hall (RGL) at the University
of Southern California as a practical research testbed to optimize
multiple competing objectives: i) energy use in the building; ii)
occupants’ comfort level; and iii) practical usage considerations.
This demonstration complements our paper in the AAMAS innovative applications track [4], presenting a novel multiagent building application for sustainability called SAVES (Sustainable multiAgent systems for optimizing Variable objectives including Energy
and Satisfaction). This writeup will provide a high-level overview
of SAVES and focus more on the proposed demonstration, but readers are referred to [4] for a more technical description. SAVES provides three key contributions: (i) jointly performed with the university facility management team, our research is based on actual
building and occupant data as well as real sensors and devices, etc.;
(ii) it focuses on non-residential buildings, where human occupants
do not have a direct financial incentive in saving energy; and (iii)
SAVES uses a novel algorithm for generating optimal BM-MDP
(Bounded parameter Multi-objective MDP) policies.
We demonstrate SAVES to show how to achieve significant energy savings and comparable average satisfaction level of occupants while emphasizing the interactive aspects of our application.
2012_21_teamcore_aamas12-energy-demo-camera-ready.pdf Albert Xin Jiang, Zhengyu Yin, Matthew P. Johnson, Christopher Kiekintveld, Kevin Leyton-Brown, Tuomas Sandholm, and Milind Tambe. 2012. “
Towards Optimal Patrol Strategies for Fare Inspection in Transit Systems .” In AAAI Spring Symposium on Game Theory for Security, Sustainability and Health .
AbstractIn some urban transit systems, passengers are legally required
to purchase tickets before entering but are not physically
forced to do so. Instead, patrol units move about through
the transit system, inspecting tickets of passengers, who face
fines for fare evasion. This setting yields the problem of computing optimal patrol strategies satisfying certain temporal
and spacial constraints, to deter fare evasion and hence maximize revenue. In this paper we propose an initial model of
this problem as a leader-follower Stackelberg game. We then
formulate an LP relaxation of this problem and present initial
experimental results using real-world ridership data from the
Los Angeles Metro Rail system.
2012_10_teamcore_trustsrail.pdf Jun-young Kwak, Pradeep Varakantham, Rajiv Maheswaran, Milind Tambe, Timothy Hayes, Wendy Wood, and Burcin Becerik-Gerber. 2012. “
Towards Robust Multi-objective Optimization Under Model Uncertainty for Energy Conservation .” In Workshop on Agent Technologies for Energy Systems (ATES) at AAMAS .
Abstractto the significant growth in energy usage. Building multiagent systems for real-world energy applications raises several research challenges regarding scalability, optimizing multiple competing objectives, model uncertainty, and complexity in deploying the system.
Motivated by these challenges, this paper proposes a new approach
to effectively conserve building energy. This work contributes to a
very new area that requires considering large-scale multi-objective
optimization as well as uncertainty over occupant preferences when
negotiating energy reduction. There are three major contributions.
We (i) develop a new method called HRMM to compute robust
solutions in practical situations; (ii) experimentally show that obtained strategies from HRMM converge to near-optimal solutions;
and (iii) provide a systematic way to tightly incorporate the insights
from human subject studies into our computational model and algorithms. The HRMM method is verified in a validated simulation
testbed in terms of energy savings and comfort levels of occupants
2012_30_teamcore_ates12-workshop-camera-ready-final.pdf Zhengyu Yin, Albert Xin Jiang, Milind Tambe, Christopher Kiekintveld, Kevin Leyton-Brown, Tuomas Sandholm, and John P. Sullivan. 2012. “
TRUSTS: Scheduling Randomized Patrols for Fare Inspection in Transit Systems using Game Theory .” AI Magazine, 33 , 4 , Pp. 59-72.
AbstractIn proof-of-payment transit systems, passengers are legally
required to purchase tickets before entering but are not physically forced to do so. Instead, patrol units move about the
transit system, inspecting the tickets of passengers, who face
fines if caught fare evading. The deterrence of fare evasion
depends on the unpredictability and effectiveness of the patrols.
In this paper, we present TRUSTS, an application for scheduling randomized patrols for fare inspection in transit systems.
TRUSTS models the problem of computing patrol strategies as a leader-follower Stackelberg game where the objective is to deter fare evasion and hence maximize revenue. This problem differs from previously studied Stackelberg settings in that the leader strategies must satisfy massive
temporal and spatial constraints; moreover, unlike in these
counterterrorism-motivated Stackelberg applications, a large
fraction of the ridership might realistically consider fare evasion, and so the number of followers is potentially huge.
A third key novelty in our work is deliberate simplification
of leader strategies to make patrols easier to be executed.
We present an efficient algorithm for computing such patrol
strategies and present experimental results using real-world
ridership data from the Los Angeles Metro Rail system. The
Los Angeles County Sheriff’s department is currently carrying out trials of TRUSTS.
2012_45_teamcore_aimag-trusts.pdf Zhengyu Yin, Albert Jiang, Matthew Johnson, Milind Tambe, Christopher Kiekintveld, Kevin Leyton-Brown, Tuomas Sandholm, and John Sullivan. 2012. “
TRUSTS: Scheduling Randomized Patrols for Fare Inspection in Transit Systems .” In Conference on Innovative Applications of Artificial Intelligence (IAAI) .
AbstractIn proof-of-payment transit systems, passengers are legally
required to purchase tickets before entering but are not physically forced to do so. Instead, patrol units move about the
transit system, inspecting the tickets of passengers, who face
fines if caught fare evading. The deterrence of such fines depends on the unpredictability and effectiveness of the patrols.
In this paper, we present TRUSTS, an application for scheduling randomized patrols for fare inspection in transit systems.
TRUSTS models the problem of computing patrol strategies as a leader-follower Stackelberg game where the objective is to deter fare evasion and hence maximize revenue. This problem differs from previously studied Stackelberg settings in that the leader strategies must satisfy massive
temporal and spatial constraints; moreover, unlike in these
counterterrorism-motivated Stackelberg applications, a large
fraction of the ridership might realistically consider fare evasion, and so the number of followers is potentially huge.
A third key novelty in our work is deliberate simplification
of leader strategies to make patrols easier to be executed.
We present an efficient algorithm for computing such patrol strategies and present experimental results using realworld ridership data from the Los Angeles Metro Rail system. The Los Angeles Sheriff’s department has begun trials
of TRUSTS.
2012_13_teamcore_matchextendedabstract.pdf Zhengyu Yin and Milind Tambe. 2012. “
A Unified Method for Handling Discrete and Continuous Uncertainty in Bayesian Stackelberg Games .” In International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
AbstractGiven their existing and potential real-world security applications,
Bayesian Stackelberg games have received significant research interest [3, 12, 8]. In these games, the defender acts as a leader, and
the many different follower types model the uncertainty over discrete attacker types. Unfortunately since solving such games is an
NP-hard problem, scale-up has remained a difficult challenge.
This paper scales up Bayesian Stackelberg games, providing a
novel unified approach to handling uncertainty not only over discrete follower types but also other key continuously distributed real
world uncertainty, due to the leader’s execution error, the follower’s
observation error, and continuous payoff uncertainty. To that end,
this paper provides contributions in two parts. First, we present a
new algorithm for Bayesian Stackelberg games, called HUNTER,
to scale up the number of types. HUNTER combines the following five key features: i) efficient pruning via a best-first search of
the leader’s strategy space; ii) a novel linear program for computing tight upper bounds for this search; iii) using Bender’s decomposition for solving the upper bound linear program efficiently;
iv) efficient inheritance of Bender’s cuts from parent to child; v)
an efficient heuristic branching rule. Our experiments show that
HUNTER provides orders of magnitude speedups over the best existing methods to handle discrete follower types. In the second part,
we show HUNTER’s efficiency for Bayesian Stackelberg games
can be exploited to also handle the continuous uncertainty using
sample average approximation. We experimentally show that our
HUNTER-based approach also outperforms latest robust solution
methods under continuously distributed uncertainty.
2012_11_teamcore_hunter-aamas12.pdf Manish Jain, Kevin Leyton-Brown, and Milind Tambe. 2012. “
Which Security Games are Hard to Solve? .” In AAAI Spring Symposium on Game Theory for Security, Sustainability and Health .
AbstractStackelberg security games form the backbone of systems
like ARMOR, IRIS and PROTECT, which are in regular use
by the Los Angeles International Police, US Federal Air Marshal Service and the US Coast Guard respectively. An understanding of the runtime required by algorithms that power
such systems is critical to furthering the application of game
theory to other real-world domains. This paper identifies
the concept of the deployment-to-saturation ratio in random
Stackelberg security games, and shows that in a decision
problem related to these games, the probability that a solution
exists exhibits a phase transition as the ratio crosses 0.5. We
demonstrate that this phase transition is invariant to changes
both in the domain and the domain representation. Moreover, problem instances at this phase transition point are computationally harder than instances with other deployment-tosaturation ratios for a wide range of different equilibrium
computation methods, including (i) previously published different MIP algorithms, and (ii) different underlying solvers
and solution mechanisms. Our findings have at least two important implications. First, it is important for new algorithms
to be evaluated on the hardest problem instances. We show
that this has often not been done in the past, and introduce a
publicly available benchmark suite to facilitate such comparisons. Second, we provide evidence that this phase transition
region is also one where optimization would be of most benefit to security agencies, and thus requires significant attention
from researchers in this area.
2012_20_teamcore_aaaiss_manish.pdf Manish Jain, Bo An, and Milind Tambe. 2012. “
An Overview of Recent Application Trends at the AAMAS conference: Security, Sustainability and Safety.” AI Magazine, 33, 3, Pp. 14-28.
AbstractA key feature of the AAMAS conference is its emphasis on ties to real-world applications. The focus of this article is to provide a broad overview of application-focused papers published at the AAMAS 2010 and 2011 conferences. More specifically, recent applications at AAMAS could be broadly categorized as belonging to research areas of security, sustainability and safety. We outline the domains of applications, key research thrusts underlying each such application area, and emerging trends.
2012_1_teamcore_aimag2012.pdf Jason Tsai, Thanh H. Nguyen, and Milind Tambe. 2012. “
Security Games for Controlling Contagion.” In Conference on Artificial Intelligence (AAAI).
AbstractMany strategic actions carry a ‘contagious’ component beyond the immediate locale of the effort itself. Viral marketing and peacekeeping operations have both been observed to have a spreading effect. In this work, we use counterinsurgency as our illustrative domain. Defined as the effort to block the spread of support for an insurgency, such operations lack the manpower to defend the entire population and must focus on the opinions of a subset of local leaders. As past researchers of security resource allocation have done, we propose using game theory to develop such policies and model the interconnected network of leaders as a graph. Unlike this past work in security games, actions in these domains possess a probabilistic, non-local impact. To address this new class of security games, we combine recent research in influence blocking maximization with a double oracle approach and create novel heuristic oracles to generate mixed strategies for a real-world leadership network from Afghanistan, synthetic leadership networks, and a real social network. We find that leadership networks that exhibit highly interconnected clusters can be solved equally well by our heuristic methods, but our more sophisticated heuristics outperform simpler ones in less interconnected social networks.
2012_27_teamcore_aaai2012_-_camerareadyv6.pdf 2011
Manish Jain, Zhengyu Yin, Milind Tambe, and Fernando Ordonez. 2011. “
Addressing Execution and Observation Error in Security Games .” In AAAI'11 Workshop on Applied Adversarial Reasoning and Risk Modeling (AARM).
AbstractAttacker-defender Stackelberg games have become a popular
game-theoretic approach for security with deployments for
LAX Police, the FAMS and the TSA. Unfortunately, most of
the existing solution approaches do not model two key uncertainties of the real-world: there may be noise in the defender’s
execution of the suggested mixed strategy and/or the observations made by an attacker can be noisy. In this paper, we analyze a framework to model these uncertainties, and demonstrate that previous strategies perform poorly in such uncertain settings. We also analyze RECON, a novel algorithm that
computes strategies for the defender that are robust to such
uncertainties, and explore heuristics that further improve RECON’s efficiency.
2011_30_teamcore_aarm11_recon.pdf Matthew Brown, Emma Bowring, Shira Epstein, Mufaddal Jhaveri, Rajiv Maheswaran, Parag Mallick, Shannon Mumenthaler, Michelle Povinelli, and Milind Tambe. 2011. “
Applying Multi-Agent Techniques to Cancer Modeling .” In Workshop on Multiagent Sequential Decision Making in Uncertain Domains(MSDM) at AAMAS 2011 .
AbstractEach year, cancer is responsible for 13% of all deaths worldwide.
In the United States, that percentage increases to 25%, translating
to an estimated 569,490 deaths in 2010 [1]. Despite significant
advances in the fight against cancer, these statistics make clear the
need for additional research into new treatments. As such, there has
been growing interest in the use of computer simulations as a tool
to aid cancer researchers. We propose an innovative multi-agent
approach in which healthy cells and cancerous cells are modeled
as opposing teams of agents using a decentralized Markov decision
process (DEC-MDP). We then describe changes made to traditional
DEC-MDP algorithms in order to better handle the complexity and
scale of our domain. We conclude by presenting and analyzing
preliminary simulation results. This paper is intended to introduce
the cancer modeling domain to the multi-agent community with
the hope of fostering a discussion about the opportunities and challenges it presents. Given the complexity of the domain, we do not
claim our approach to be a definitive solution but rather a first step
toward the larger goal of creating realistic simulations of cancer.
2011_16_teamcore_msdm2011_brown.pdf Christopher Kiekintveld, Janusz Marecki, and Milind Tambe. 2011. “
Approximation Methods for Infinite Bayesian Stackelberg Games: Modeling Distributional Payoff Uncertainty.” In International Conference on Autonomous Agents and Multiagent Systems.
AbstractGame theory is fast becoming a vital tool for reasoning about complex real-world security problems, including critical infrastructure
protection. The game models for these applications are constructed
using expert analysis and historical data to estimate the values of
key parameters, including the preferences and capabilities of terrorists. In many cases, it would be natural to represent uncertainty over
these parameters using continuous distributions (such as uniform
intervals or Gaussians). However, existing solution algorithms are
limited to considering a small, finite number of possible attacker
types with different payoffs. We introduce a general model of infinite Bayesian Stackelberg security games that allows payoffs to be
represented using continuous payoff distributions. We then develop
several techniques for finding approximate solutions for this class
of games, and show empirically that our methods offer dramatic
improvements over the current state of the art, providing new ways
to improve the robustness of security game models.
2011_9_teamcore_aamas11_kiekintveld.pdf Zhengyu Yin and Milind Tambe. 2011. “
Continuous Time Planning for Multiagent Teams with Temporal Constraints .” In International Joint Conference on Artificial Intelligence (IJCAI).
AbstractContinuous state DEC-MDPs are critical for agent
teams in domains involving resources such as time,
but scaling them up is a significant challenge. To
meet this challenge, we first introduce a novel
continuous-time DEC-MDP model that exploits
transition independence in domains with temporal
constraints. More importantly, we present a new locally optimal algorithm called SPAC. Compared to
the best previous algorithm, SPAC finds solutions
of comparable quality substantially faster; SPAC
also scales to larger teams of agents.
2011_24_teamcore_ijcai11_mctmdp.pdf Matthew E. Taylor, Manish Jain, Prateek Tandon, Milind Tambe, and Makoto Yokoo. 2011. “
Distributed On-line Multi-Agent Optimization Under Uncertainty: Balancing Exploration and Exploitation .” In Advances in Complex Systems.
AbstractA significant body of work exists on effectively allowing multiple agents to coordinate to achieve a shared goal. In particular, a growing body of work in the Distributed
Constraint Optimization (DCOP) framework enables such coordination with different
amounts of teamwork. Such algorithms can implicitly or explicitly trade-off improved
solution quality with increased communication and computation requirements. However, the DCOP framework is limited to planning problems; DCOP agents must have
complete and accurate knowledge about the reward function at plan time.
We extend the DCOP framework, defining the Distributed Coordination of Exploration and Exploitation (DCEE) problem class to address real-world problems, such as
ad-hoc wireless network optimization, via multiple novel algorithms. DCEE algorithms
differ from DCOP algorithms in that they (1) are limited to a finite number of actions
in a single trial, (2) attempt to maximize the on-line, rather than final, reward, (3) are
unable to exhaustively explore all possible actions, and (4) may have knowledge about
the distribution of rewards in the environment, but not the rewards themselves. Thus, a
DCEE problem is not a type of planning problem, as DCEE algorithms must carefully
balance and coordinate multiple agents’ exploration and exploitation.
Two classes of algorithms are introduced: static estimation algorithms perform simple calculations that allow agents to either stay or explore, and balanced exploration
algorithms use knowledge about the distribution of the rewards and the time remaining
in an experiment to decide whether to stay, explore, or (in some algorithms) backtrack to
a previous location. These two classes of DCEE algorithms are compared in simulation
and on physical robots in a complex mobile ad-hoc wireless network setting. Contrary to our expectations, we found that increasing teamwork in DCEE algorithms may lower
team performance. In contrast, agents running DCOP algorithms improve their reward
as teamwork increases. We term this previously unknown phenomenon the team uncertainty penalty, analyze it in both simulation and on robots, and present techniques to
ameliorate the penalty.
2011_20_teamcore_11acs_taylor_revision.pdf