Distributed Constraint Optimization (DCOP) is rapidly
emerging as a prominent technique for multiagent coordination. However, despite agent privacy being a key motivation
for applying DCOPs in many applications, rigorous quantitative evaluations of privacy loss in DCOP algorithms have
been lacking. Recently, [Maheswaran et al.2005] introduced
a framework for quantitative evaluations of privacy in DCOP
algorithms, showing that some DCOP algorithms lose more
privacy than purely centralized approaches and questioning
the motivation for applying DCOPs. This paper addresses
the question of whether state-of-the art DCOP algorithms suffer from a similar shortcoming by investigating several of the
most efficient DCOP algorithms, including both DPOP and
ADOPT. Furthermore, while previous work investigated the
impact on efficiency of distributed contraint reasoning design decisions (e.g. constraint-graph topology, asynchrony,
message-contents), this paper examines the privacy aspect
of such decisions, providing an improved understanding of
In the March 1942 issue of “Astounding Science Fiction”,
Isaac Asimov for the first time enumerated his three laws of robotics.
Decades later, researchers in agents and multiagent systems have begun to examine these laws for providing a useful set of guarantees on
deployed agent systems. Motivated by unexpected failures or behavior
degradations in complex mixed agent-human teams, this paper for the
first time focuses on applying Asimov’s first two laws to provide behavioral guarantees in such teams. However, operationalizing these laws
in the context of such mixed agent-human teams raises three novel issues. First, while the laws were originally written for interaction of an
individual robot and an individual human, clearly, our systems must
operate in a team context. Second, key notions in these laws (e.g. causing “harm” to humans) are specified in very abstract terms and must be
specified in concrete terms in implemented systems. Third, since removed
from science-fiction, agents or humans may not have perfect information
about the world, they must act based on these laws despite uncertainty
of information. Addressing this uncertainty is a key thrust of this paper,
and we illustrate that agents must detect and overcome such states of
uncertainty while ensuring adherence to Asimov’s laws. We illustrate the
results of two different domains that each have different approaches to
operationalizing Asimov’s laws.
Software personal assistants continue to be a topic of significant research interest. This paper outlines some of the important lessons learned from a successfully-deployed team of
personal assistant agents (Electric Elves) in an office environment. These lessons have important implications for similar
on-going research projects.
The Electric Elves project was a team of almost a dozen
personal assistant agents which were continually active for
seven months. Each elf (agent) represented one person and
assisted in daily activities in an actual office environment.
This project led to several important observations about privacy, adjustable autonomy, and social norms in office environments. This paper outlines some of the key lessons learned
and, more importantly, outlines our continued research to address some of the concerns raised.
Distributed Constraint Optimization (DCOP) is rapidly emerging
as a prominent technique for multiagent coordination. Unfortunately, rigorous quantitative evaluations of privacy loss in DCOP
algorithms have been lacking despite the fact that agent privacy is
a key motivation for applying DCOPs in many applications. Recently, Maheswaran et al. [3, 4] introduced a framework for quantitative evaluations of privacy in DCOP algorithms, showing that
early DCOP algorithms lose more privacy than purely centralized
approaches and questioning the motivation for applying DCOPs.
Do state-of-the art DCOP algorithms suffer from a similar shortcoming? This paper answers that question by investigating the most
efficient DCOP algorithms, including both DPOP and ADOPT.
In many real-world multiagent applications such as distributed sensor nets, a network of agents is formed based
on each agent’s limited interactions with a small number
of neighbors. While distributed POMDPs capture the
real-world uncertainty in multiagent domains, they fail
to exploit such locality of interaction. Distributed constraint optimization (DCOP) captures the locality of interaction but fails to capture planning under uncertainty.
In previous work, we presented a model synthesized
from distributed POMDPs and DCOPs, called Networked Distributed POMDPs (ND-POMDPs). Also,
we presented LID-JESP (locally interacting distributed
joint equilibrium-based search for policies: a distributed
policy generation algorithm based on DBA (distributed
breakout algorithm). In this paper, we present a stochastic variation of the LID-JESP that is based on DSA
(distributed stochastic algorithm) that allows neighboring agents to change their policies in the same cycle.
Through detailed experiments, we show how this can
result in speedups without a large difference in solution
quality. We also introduce a technique called hyper-linkbased decomposition that allows us to exploit locality of
interaction further, resulting in faster run times for both
LID-JESP and its stochastic variant without any loss in
Agents or agent teams deployed to assist humans often face the challenges of monitoring the state of key processes in their environment (including the state of their human users themselves) and making periodic decisions based on such monitoring. POMDPs appear well suited to enable agents to address these challenges, given the uncertain environment and cost of actions, but optimal policy generation for POMDPs is computationally expensive. This paper introduces two key implementation techniques (one exact and one approximate), where the policy computation is restricted to the belief space polytope that remains reachable given the progress structure of a domain. One technique uses Lagrangian methods to compute tighter bounds on belief space support in polynomial time, while the other technique is based on approximating policy vectors in dense policy regions of the bounded belief polytope. We illustrate this by enhancing two of the fastest existing algorithms for exact POMDP policy generation. The order of magnitude speedups demonstrate the utility of our implementation techniques in facilitating the deployment of POMDPs within agents assisting human users.
Distributed constraint optimization (DCOP) has emerged as
a useful technique for multiagent planning and scheduling.
While previous DCOP work focuses on optimizing a single
team objective, in many domains, agents must satisfy additional constraints on resources consumed locally (due to interactions within their local neighborhoods). Such local resource constraints may be required to be private or shared
for efficiency’s sake. This paper provides a novel multiplyconstrained DCOP algorithm for addressing these domains.
This algorithm is based on mutually-intervening search, i.e.
using local resource constraints to intervene in the search for
the optimal solution and vice versa, realized via three key
ideas: (i) transforming n-ary constraints via virtual variables
to maintain privacy; (ii) dynamically setting upper bounds on
joint resource consumption with neighbors; and (iii) identifying if the local DCOP graph structure allows agents to compute exact resource bounds for additional efficiency. These
ideas are implemented by modifying Adopt, one of the most
efficient DCOP algorithms. Both detailed experimental results as well as proofs of correctness are presented.
Distributed constraint optimization (DCOP) has emerged as a useful technique for multiagent coordination. While previous DCOP
work focuses on optimizing a single team objective, in many domains, agents must satisfy additional constraints on resources consumed locally (due to interactions within their local neighborhoods).
Such resource constraints may be required to be private or shared
for efficiency’s sake. This paper provides a novel multiply-constrained
DCOP algorithm for addressing these domains which is based on
mutually-intervening search, i.e. using local resource constraints
to intervene in the search for the optimal solution and vice versa.
It is realized through three key ideas: (i) transforming n-ary constraints to maintain privacy; (ii) dynamically setting upper bounds
on joint resource consumption with neighbors; and (iii) identifying if the local DCOP graph structure allows agents to compute
exact resource bounds for additional efficiency. These ideas are
implemented by modifying Adopt, one of the most efficient DCOP
algorithms. Both detailed experimental results as well as proofs of
correctness are presented.
Praveen Paruchuri, Emma Bowring, Ranjit Nair, Jonathan P. Pearce, Nathan Schurr, Milind Tambe, and Pradeep Varakantham. 2006. “Mutiagent Teamwork: Hybrid Approaches .” In Computer society of India Communications (Invited).Abstract
Today within the multiagent community, we see at least four
competing methods to building multiagent systems: beliefdesire-intention (BDI), distributed constraint optimization
(DCOP), distributed POMDPs, and auctions or game-theoretic methods. While there is exciting progress within each
approach, there is a lack of cross-cutting research. This article highlights the various hybrid techniques for multiagent
teamwork developed by the teamcore group. In particular,
for the past decade, the TEAMCORE research group has
focused on building agent teams in complex, dynamic domains. While our early work was inspired by BDI, we will
present an overview of recent research that uses DCOPs and
distributed POMDPs in building agent teams. While DCOP
and distributed POMDP algorithms provide promising results, hybrid approaches allow us to use the complementary
strengths of different techniques to create algorithms that
perform better than either of their component algorithms
alone. For example, in the BDI-POMDP hybrid approach,
BDI team plans are exploited to improve POMDP tractability, and POMDPs improve BDI team plan performance.
It is critical that agents deployed in real-world settings, such as businesses, offices, universities and research laboratories, protect their individual users’ privacy when interacting
with other entities. Indeed, privacy is recognized as a key motivating factor in the design
of several multiagent algorithms, such as in distributed constraint reasoning (including both
algorithms for distributed constraint optimization (DCOP) and distributed constraint satisfaction (DisCSPs)), and researchers have begun to propose metrics for analysis of privacy loss
in such multiagent algorithms. Unfortunately, a general quantitative framework to compare
these existing metrics for privacy loss or to identify dimensions along which to construct new
metrics is currently lacking.
This paper presents three key contributions to address this shortcoming. First, the paper
presents VPS (Valuations of Possible States), a general quantitative framework to express,
analyze and compare existing metrics of privacy loss. Based on a state-space model, VPS
is shown to capture various existing measures of privacy created for specific domains of
DisCSPs. The utility of VPS is further illustrated through analysis of privacy loss in DCOP
algorithms, when such algorithms are used by personal assistant agents to schedule meetings
among users. In addition, VPS helps identify dimensions along which to classify and construct
new privacy metrics and it also supports their quantitative comparison. Second, the article
presents key inference rules that may be used in analysis of privacy loss in DCOP algorithms
under different assumptions. Third, detailed experiments based on the VPS-driven analysis
lead to the following key results: (i) decentralization by itself does not provide superior protection of privacy in DisCSP/DCOP algorithms when compared with centralization; instead,
privacy protection also requires the presence of uncertainty about agents’ knowledge of the
constraint graph. (ii) one needs to carefully examine the metrics chosen to measure privacy
loss; the qualitative properties of privacy loss and hence the conclusions that can be drawn
about an algorithm can vary widely based on the metric chosen. This paper should thus serve
as a call to arms for further privacy research, particularly within the DisCSP/DCOP arena.
Security in multiagent systems is commonly defined as the ability of the system to deal with intentional threats from other agents.
This paper focuses on domains where such intentional threats are
caused by unseen adversaries whose actions or payoffs are unknown. In such domains, action randomization can effectively deteriorate an adversary’s capability to predict and exploit an agent/agent
team’s actions. Unfortunately, little attention has been paid to intentional randomization of agents’ policies in single-agent or decentralized (PO)MDPs without significantly sacrificing rewards or
breaking down coordination. This paper provides two key contributions to remedy this situation. First, it provides three novel
algorithms, one based on a non-linear program and two based on
linear programs (LP), to randomize single-agent policies, while attaining a certain level of expected reward. Second, it provides
Rolling Down Randomization (RDR), a new algorithm that efficiently generates randomized policies for decentralized POMDPs
via the single-agent LP method.
A distributed constraint optimization problem (DCOP) is a formalism that captures the rewards and costs of local interactions within
a team of agents, each of whom is choosing an individual action.
When rapidly selecting a single joint action for a team, we typically
solve DCOPs (often using locally optimal algorithms) to generate
a single solution. However, in scenarios where a set of joint actions
(i.e. a set of assignments to a DCOP) is to be generated, metrics are
needed to help appropriately select this set and efficiently allocate
resources for the joint actions in the set. To address this need, we
introduce k-optimality, a metric that captures the desirable properties of diversity and relative quality of a set of locally-optimal
solutions using a parameter that can be tuned based on the level of
these properties required. To achieve effective resource allocation
for this set, we introduce several upper bounds on the cardinalities of k-optimal joint action sets. These bounds are computable in
constant time if we ignore the graph structure, but tighter, graphbased bounds are feasible with higher computation cost. Bounds
help choose the appropriate level of k-optimality for settings with
fixed resources and help determine appropriate resource allocation
for settings where a fixed level of k-optimality is desired. In addition, our bounds for a 1-optimal joint action set for a DCOP also
apply to the number of pure-strategy Nash equilibria in a graphical
game of noncooperative agents.
The DEFACTO system is a multiagent based tool for training incident commanders for large scale disasters. In this paper, we highlight some of the lessons that we have learned from our interaction
with the Los Angeles Fire Department (LAFD) and how they have
affected the way that we continued the design of our training system. These lessons were gleaned from LAFD feedback and initial
training exercises and they include: system design, visualization,
improving trainee situational awareness, adjusting training level of
difficulty and situation scale. We have taken these lessons and used
them to improve the DEFACTO system’s training capabilities. We
have conducted initial training exercises to illustrate the utility of
the system in terms of providing useful feedback to the trainee.
Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are evolving as a popular approach for modeling
multiagent systems, and many different algorithms have been proposed to obtain locally or globally optimal policies. Unfortunately,
most of these algorithms have either been explicitly designed or
experimentally evaluated assuming knowledge of a starting belief
point, an assumption that often does not hold in complex, uncertain domains. Instead, in such domains, it is important for agents
to explicitly plan over continuous belief spaces. This paper provides a novel algorithm to explicitly compute finite horizon policies
over continuous belief spaces, without restricting the space of policies. By marrying an efficient single-agent POMDP solver with a
heuristic distributed POMDP policy-generation algorithm, locally
optimal joint policies are obtained, each of which dominates within
a different part of the belief region. We provide heuristics that significantly improve the efficiency of the resulting algorithm and provide detailed experimental results. To the best of our knowledge,
these are the first run-time results for analytically generating policies over continuous belief spaces in distributed POMDPs.