Agents often have to construct plans that obey resource limits for continuous resources whose consumption can only be
characterized by probability distributions. While Markov Decision Processes (MDPs) with a state space of continuous and
discrete variables are popular for modeling these domains,
current algorithms for such MDPs can exhibit poor performance with a scale-up in their state space. To remedy that
we propose an algorithm called DPFP. DPFP’s key contribution is its exploitation of the dual space cumulative distribution functions. This dual formulation is key to DPFP’s novel
combination of three features. First, it enables DPFP’s membership in a class of algorithms that perform forward search
in a large (possibly infinite) policy space. Second, it provides
a new and efficient approach for varying the policy generation effort based on the likelihood of reaching different regions of the MDP state space. Third, it yields a bound on
the error produced by such approximations. These three features conspire to allow DPFP’s superior performance and systematic trade-off of optimality for speed. Our experimental
evaluation shows that, when run stand-alone, DPFP outperforms other algorithms in terms of its any-time performance,
whereas when run as a hybrid, it allows for a significant
speedup of a leading continuous resource MDP solver.
Many factors are blamed for the decreasing enrollments in
computer science and engineering programs in the U.S., including the dot-com economic bust and the increase in the use
of “off-shore” programming labor. One major factor is also
the lack of bold new vision and excitement about computer
science, which thus results in a view of computer science as
a field wedded to routine programming. To address this concern, we have focused on science fiction as a means to generate excitement about Artificial Intelligence, and thus in turn
in Computer Science and Engineering. In particular, since
the Fall of 2006, we have used science fiction in teaching
Artificial Intelligence to undergraduate students at the University of Southern California (USC), in teaching activities
ranging from an undergraduate upper division class in computer science to a semester-long freshman seminar for nonengineering students to micro-seminars during the welcome
week. As an interdisciplinary team of scholar/instructors, our
goal has been to use science fiction not only in motivating
students to learn about AI, but also to use science fiction in
understanding fundamental issues that arise at the intersection of technology and culture, as well as to provide students
with a more creative and well-rounded course that provided
a big picture view of computer science. This paper outlines
the courses taught using this theme, provides an overview of
our classroom teaching techniques in using science fiction,
and discusses some of the lectures in more detail as exemplars. We conclude with feedback received, lessons learned
and impact on both the computer science students and noncomputer-science (and non-engineering) students.
Distributed constraint optimization (DCOP) is a useful framework for cooperative multiagent coordination. DCOP focuses on optimizing a single team objective. However, in many domains, agents must satisfy constraints on resources consumed locally while optimizing the team goal. These resource constraints may need to be kept private or shared to improve efficiency. Extending DCOP to these domains raises two issues: algorithm design and sensitivity analysis. Algorithm design requires creating algorithms that trade off completeness, scalability, privacy and efficiency. Sensitivity analysis examines whether slightly increasing the available resources could yield a significantly better outcome. This thesis defines the multiply-constrained DCOP (MC-DCOP) framework and provides complete and incomplete algorithms for solving MC-DCOP problems. Complete algorithms find the best allocation of scarce resources, while incomplete algorithms are more scalable. The algorithms use mutually-intervening search; they use local resource constraints to intervene in the search for the globally optimal solution. The algorithms use four key techniques: (i) transforming constraints to maintain privacy; (ii) dynamically setting upper bounds on resource consumption; (iii) identifying the extent to which the local graph structure allows agents to compute exact bounds on resource consumption; and (iv) using a virtual assignment to flag problems rendered unsatisfiable by their resource constraints. Proofs of correctness are presented for all algorithms. Finally, the complete and incomplete algorithms are used in conjunction with one another to perform distributed local reoptimization to address sensitivity analysis. Experimental results demonstrated that MC-DCOP problems are most challenging when resources are scarce but sufficient. In problems where there are insufficient resources, the team goal is largely irrelevant. In problems with ample resources, the local resource constraints require little consideration. The incomplete algorithms were two orders of magnitude more efficient than the complete algorithm for the most challenging MC-DCOP problems and their runtime increased very little as the number of agents in the network increased. Finally, sensitivity analysis results indicated that local reoptimization is an effective way to identify resource constraints that are creating bottlenecks. Taken together these new algorithms and examination of the problem of sensitivity analysis help extend the applicability of DCOP to more complex domains.
Despite significant recent advances in decision theoretic frameworks for reasoning about multiagent teams, little attention has been paid to applying such frameworks in adversarial domains, where the agent team may face security threats from other agents. This paper focuses on domains where such threats are caused by unseen adversaries whose actions or payoffs are unknown. In such domains, action randomization is recognized as a key technique to deteriorate an adversarys capability to predict and exploit an agent/agent teams actions. Unfortunately, there are two key challenges in such randomization. First, randomization can reduce the expected reward (quality) of the agent team’s plans, and thus we must provide some guarantees on such rewards. Second, randomization results in miscoordination in teams. While communication within an agent team can help in alleviating the miscoordination problem, communication is unavailable in many real domains or sometimes scarcely available. To address these challenges, this paper provides the following contributions. First, we recall the Multiagent Constrained MDP (MCMDP) framework that enables policy generation for a team of agents where each agent may have a limited or no(communication) resource. Second, since randomized policies generated directly for MCMDPs lead to miscoordination, we introduce a transformation algorithm that converts the MCMDP into a transformed MCMDP incorporating explicit communication and no communication actions. Third, we show that incorporating randomization results in a non-linear program and the unavailability/limited availability of communication results in addition of non-convex constraints to the non-linear program. Finally, we experimentally illustrate the benefits of our work.
Personal Assistant agents are becoming increasingly important in a variety of application domains in offices, at home, for medical care and many others [5, 1]. These agents are required to constantly monitor their environment (including the state of their users), and make periodic decisions based on their monitoring. For example, in an office environment, agents may need to monitor the location of their user in order to ascertain whether the user would be able to make it on time to a meeting . Or, they may be required to monitor the progress of a user on a particular assignment and decide whether or not the user would be able to meet the deadline for completing the assignment. Teamwork between such agents is important in Personal Assistant applications to enable agents working together to achieve a common goal (such as finishing a project on time). This working demonstration shows a hybrid(BDI-POMDP) approach to accomplish such teamwork. Agents must be able to make decisions despite observational uncertainty in the environment. For example, if the user is busy and does not respond to a request from its personal assistant agent, the agent loses track of the user’s progress and hence, cannot determine it with certainty. Also, an incorrect action on the agent’s part can have undesirable consequences. For example, an agent might reallocate a task again and again even if there is sufficient progress on the task. In the past, teamwork among Personal Assistant agents typically has not addressed such observational uncertainty. Markov Decision Processes  have been used to model the agent’s environment, with simplifying assumptions regarding either observational uncertainty in the environment or the agent’s observational abilities.
Partially Observable Markov Decision Processes (POMDPs) are equipped to deal with the inherent uncertainty in Personal Assistant domains. Computational complexity has been a major hurdle in deploying POMDPs in real-world application domains, but the emergence of new exact and approximate techniques  recently shows much promise in being able to compute a POMDP policy for an agent in real time. In this demonstration, we actually deploy POMDPs to compute the Adjustable Autonomy policy for an agent based on which the agent makes decisions. Integrating such POMDPs with architectures that enble teamwork among personal assistants is then the next key part of our demonstration. Several teamwork models have been developed over the past few years to handle communication and coordination between agents . Machinetta  is a proxy-based integration architecture for coordinating teams of heterogeneous entities (e.g. robots, agents, persons), which builds on the STEAM teamwork model. Machinetta is designed to meet key challenges such as effective utilization of diverse capabilities of group members, improving coordination between agents by overcoming challenges posed by the environment and reacting to changes in the environment in a flexible manner. We use Machinetta proxies to co-ordinate the agents in our demonstration. Machinetta enables integrating POMDPs and also enables interfacing with BDI architectures that may provide us team plans. In particular, we interface with the SPARK agent framework  being developed at the Artificial Intelligence Center of SRI international. SPARK is a Belief-Desire-Intention (BDI) style agent framework grounded in a model of procedural reasoning. This architecture allows the development of active systems that interact with a constantly changing and unpredictable world. By using BDI-based approaches for generating team plans for agents as well as communication and coordination, and POMDPs for adjustable autonomy decision making, we arrive at a hybrid model for multiagent teamwork  in Personal Assistant applications. The following sections describe the application domain in which we deploy this hybrid system as well as the interaction between various components of the system, and its working.
In adversarial multiagent domains, security, commonly defined as
the ability to deal with intentional threats from other agents, is a
critical issue. This paper focuses on domains where these threats
come from unknown adversaries. These domains can be modeled
as Bayesian games; much work has been done on finding equilibria
for such games. However, it is often the case in multiagent security
domains that one agent can commit to a mixed strategy which its
adversaries observe before choosing their own strategies. In this
case, the agent can maximize reward by finding an optimal strategy, without requiring equilibrium. Previous work has shown this
problem of optimal strategy selection to be NP-hard. Therefore,
we present a heuristic called ASAP, with three key advantages to
address the problem. First, ASAP searches for the highest-reward
strategy, rather than a Bayes-Nash equilibrium, allowing it to find
feasible strategies that exploit the natural first-mover advantage of
the game. Second, it provides strategies which are simple to understand, represent, and implement. Third, it operates directly on the
compact, Bayesian game representation, without requiring conversion to normal form. We provide an efficient Mixed Integer Linear
Program (MILP) implementation for ASAP, along with experimental results illustrating significant speedups and higher rewards over
In adversarial multiagent domains, security, commonly defined as the ability to deal with intentional threats from other
agents, is a critical issue. This paper focuses on domains
where these threats come from unknown adversaries. These
domains can be modeled as Bayesian games; much work has
been done on finding equilibria for such games. However,
it is often the case in multiagent security domains that one
agent can commit to a mixed strategy which its adversaries
observe before choosing their own strategies. In this case, the
agent can maximize reward by finding an optimal strategy,
without requiring equilibrium. Previous work has shown this
problem of optimal strategy selection to be NP-hard. Therefore, we present a heuristic called ASAP, with three key advantages to address the problem. First, ASAP searches for
the highest-reward strategy, rather than a Bayes-Nash equilibrium, allowing it to find feasible strategies that exploit the
natural first-mover advantage of the game. Second, it provides strategies which are simple to understand, represent,
and implement. Third, it operates directly on the compact,
Bayesian game representation, without requiring conversion
to normal form. We provide an efficient Mixed Integer Linear
Program (MILP) implementation for ASAP, along with experimental results illustrating significant speedups and higher
rewards over other approaches.
Software personal assistants continue to be a topic of significant research interest. This paper outlines some of the important lessons learned from a successfully-deployed team of
personal assistant agents (Electric Elves) in an office environment. In the Electric Elves project, a team of almost a
dozen personal assistant agents were continually active for
seven months. Each elf (agent) represented one person and
assisted in daily activities in an actual office environment.
This project led to several important observations about privacy, adjustable autonomy, and social norms in office environments. This paper outlines some of the key lessons learned
and, more importantly, outlines our continued research to address some of the concerns raised.
Agents often have to construct plans that obey deadlines
or, more generally, resource limits for real-valued resources whose consumption can only be characterized
by probability distributions, such as execution time or
battery power. These planning problems can be modeled with continuous state Markov decision processes
(MDPs) but existing solution methods are either inefficient or provide no guarantee on the quality of the resulting policy. We therefore present CPH, a novel solution method that solves the planning problems by first
approximating with any desired accuracy the probability
distributions over the resource consumptions with phasetype distributions, which use exponential distributions as
building blocks. It then uses value iteration to solve the
resulting MDPs by exploiting properties of exponential
distributions to calculate the necessary convolutions accurately and efficiently while providing strong guarantees on the quality of the resulting policy. Our experimental feasibility study in a Mars rover domain demonstrates a substantial speedup over Lazy Approximation,
which is currently the leading algorithm for solving continuous state MDPs with quality guarantees.
We describe an intelligent personal assistant that has been developed to aid a busy knowledge worker in managing time commitments and performing tasks. The design of the system was motivated by the complementary objectives of (a) relieving the user of routine tasks, thus allowing her to focus on tasks that critically require human problem-solving skills, and (b) intervening in situations where cognitive overload leads to oversights or mistakes by the user. The system draws on a diverse set of AI technologies that are linked within a Belief-DesireIntention agent system. Although the system provides a number of automated functions, the overall framework is highly user-centric in its support for human needs, responsiveness to human inputs, and adaptivity to user working style and preferences.
Recent advances in the field of agent/multiagent systems brings us closer to agents acting in real world domains, which can be uncertain and many times adversarial. Security, commonly defined as the ability to deal with intentional threats from other agents is a major challenge for agents or agent-teams deployed in these adversarial domains. Such adversarial scenarios arise in a wide variety of situations that are becoming increasingly important such as agents patrolling to provide perimeter security around critical infrastructure or performing routine security checks. These domains have the following characteristics: (a) The agent or agent-team needs to commit to a security policy while the adversaries may observe and exploit the policy committed to. (b) The agent/agent-team potentially faces different types of adversaries and has varying information available about the adversaries (thus limiting the agents’ ability to model its adversaries). To address security in such domains, I developed two types of algorithms. First, when the agent has no model of its adversaries, my key idea is to randomize agent’s policies to minimize the information gained by adversaries. To that end, I developed algorithms for policy randomization for both the Markov Decision Processes (MDPs) and the Decentralized-Partially Observable MDPs (Dec POMDPs). Since arbitrary randomization can violate quality constraints (for example, the resource usage should be below a certain threshold or key areas must be patrolled with a certain frequency), my algorithms guarantee quality constraints on the randomized policies generated. For efficiency, I provide a novel linear program for randomized policy generation in MDPs, and then build on this program for a heuristic solution for Dec-POMDPs. Second, when the agent has partial model of the adversaries, I model the security domain as a Bayesian Stackelberg game where the agent’s model of the adversary includes a probability distribution over possible adversary types. While the optimal policy selection for a Bayesian Stackelberg game is known to be NP-hard, my solution approach based on an efficient Mixed Integer Linear Program (MILP) provides significant speedups over existing approaches while obtaining the optimal solution. The resulting policy randomizes the agent’s possible strategies, while taking into account the probability distribution over adversary types. Finally, I provide experimental results for all my algorithms, illustrating the new techniques developed have enabled us to find optimal secure policies efficiently for an increasingly important class of security domains.
A distributed constraint optimization problem (DCOP) is a formalism that captures the rewards and costs of local interactions within a team of agents. Because complete algorithms to solve DCOPs are unsuitable for some dynamic or anytime domains, researchers have explored incomplete DCOP algorithms that result in locally optimal solutions. One type of categorization of such algorithms, and the solutions they produce, is k-optimality; a k-optimal solution is one that cannot be improved by any deviation by k or fewer agents. There are no k-optimal algorithms (k>3) so far. In addition, the quality of solution existing algorithm can produce is fixed. We need different algorithms for different optimality. This paper introduces the first DCOP algorithm which can produce arbitrary k-optimal solutions.
Distributed constraint optimization (DCOP) has proven to be a promising approach to address coordination, scheduling and task allocation in largescale multiagent networks, in domains involving sensor networks, teams of unmanned air vehicles, or teams of software personal assistants and others. Locally optimal approaches to DCOP suggest themselves as appropriate for such large-scale multiagent networks, particularly when such networks are accompanied by lack of high-bandwidth communications among agents. K-optimal algorithms provide an important class of these locally optimal algorithms, given analytical results proving quality guarantees. Previous work on koptimality, including its theoretical guarantees, focused exclusively on soft constraints. This paper extends the results to DCOPs with hard constraints. It focuses in particular on DCOPs where such hard constraints are resource constraints which individual agents must not violate. We provide two key results in the context of such DCOPs. First we provide reward-independent lower bounds on the quality of k-optima in the presence of hard (resource) constraints. Second, we present algorithms for k-optimality given hard resource constraints, and present detailed experimental results over DCOP graphs of 1000 agents with varying constraint density.
Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are a popular approach for modeling multi-agent
systems acting in uncertain domains. Given the significant complexity of solving distributed POMDPs, particularly as we scale
up the numbers of agents, one popular approach has focused on
approximate solutions. Though this approach is efficient, the algorithms within this approach do not provide any guarantees on solution quality. A second less popular approach focuses on global
optimality, but typical results are available only for two agents,
and also at considerable computational cost. This paper overcomes
the limitations of both these approaches by providing SPIDER, a
novel combination of three key features for policy generation in distributed POMDPs: (i) it exploits agent interaction structure given
a network of agents (i.e. allowing easier scale-up to larger number
of agents); (ii) it uses a combination of heuristics to speedup policy
search; and (iii) it allows quality guaranteed approximations, allowing a systematic tradeoff of solution quality for time. Experimental results show orders of magnitude improvement in performance
when compared with previous global optimal algorithms.
My research focuses on constructing and analyzing systems of intelligent, autonomous agents. These agents may include people, physical robots, or software programs acting as assistants, teammates, opponents, or trading partners. In a large class of multi-agent scenarios, the effect of local interactions between agents can be compactly represented as a network structure such as a distributed constraint optimization problem (DCOP) for cooperative domains. Collaboration between large groups of agents, given such a network, can be difficult to achieve; often agents can only manage to collaborate in smaller subgroups of a certain size, in order to find a workable solution in a timely manner. The goal of my thesis is to provide algorithms to enable networks of agents that are bounded in this way to quickly find high-quality solutions, as well as theoretical results to understand key properties of these solutions. Relevant domains for my work include personal assistant agents, sensor networks, and teams of autonomous robots. In particular, this thesis considers the case in which agents optimize a DCOP by forming groups of one or more agents until no group of k or fewer agents can possibly improve the solution; we define this type of local optimum, and any algorithm guaranteed to reach such a local optimum, as k-optimal. In this document, I present four key contributions related to k-optimality. The first set of results are worst-case guarantees on the solution quality of k-optima in a DCOP.
These guarantees can help determine an appropriate k-optimal algorithm, or possibly an appropriate constraint graph structure, for agents to use in situations where the cost of coordination between agents must be weighed against the quality of the solution reached. The second set of results are upper bounds on the number of k-optima that can exist in a DCOP. Because each joint action consumes resources, knowing the maximal number of k-optimal joint actions that could exist for a given DCOP allows us to allocate sufficient resources for a given level of k, or, alternatively, choosing an appropriate level of k-optimality, given fixed resource. The third contribution is a set of 2-optimal and 3-optimal algorithms and an experimental analysis of the performance of 1-, 2-, and 3-optimal algorithms on several types of DCOPs. The final contribution of this thesis is a case study of the application of k-optimal DCOP algorithms and solutions to the problem of the formation of human teams spanning multiple organizations. Given a particular specification of a human team (such as a task force to respond to an emergency) and a pool of possible team members, a DCOP can be formulated to match this specification. A set of k-optimal solutions to the DCOP represents a set of diverse, locally optimal options from which a human commander can choose the team that will be used.
Decentralized Markov Decision Processes (DEC-MDPs) are a popular model of agent-coordination problems in domains with uncertainty and time constraints but very difficult to solve. In this paper,
we improve a state-of-the-art heuristic solution method for DECMDPs, called OC-DEC-MDP, that has recently been shown to scale
up to larger DEC-MDPs. Our heuristic solution method, called
Value Function Propagation (VFP), combines two orthogonal improvements of OC-DEC-MDP. First, it speeds up OC-DEC-MDP
by an order of magnitude by maintaining and manipulating a value
function for each state (as a function of time) rather than a separate value for each pair of state and time interval. Furthermore, it
achieves better solution qualities than OC-DEC-MDP because, as
our analytical results show, it does not overestimate the expected
total reward like OC-DEC- MDP. We test both improvements independently in a crisis-management domain as well as for other
types of domains. Our experimental results demonstrate a significant speedup of VFP over OC-DEC-MDP as well as higher solution
qualities in a variety of situations.
A distributed constraint optimization problem
(DCOP) is a formalism that captures the rewards
and costs of local interactions within a team of
agents. Because complete algorithms to solve
DCOPs are unsuitable for some dynamic or anytime domains, researchers have explored incomplete DCOP algorithms that result in locally optimal solutions. One type of categorization of such
algorithms, and the solutions they produce, is koptimality; a k-optimal solution is one that cannot
be improved by any deviation by k or fewer agents.
This paper presents the first known guarantees on
solution quality for k-optimal solutions. The guarantees are independent of the costs and rewards in
the DCOP, and once computed can be used for any
DCOP of a given constraint graph structure.
Distributed Partially Observable Markov Decision Problems (Distributed POMDPs)
are a popular approach for modeling multi-agent systems acting in uncertain domains. Given the significant computational complexity of solving distributed POMDPs,
one popular approach has focused on approximate solutions. Though this approach
provides for efficient computation of solutions, the algorithms within this approach
do not provide any guarantees on the quality of the solutions. A second less popular approach has focused on a global optimal result, but at considerable computational cost. This paper overcomes the limitations of both these approaches
by providing SPIDER (Search for Policies In Distributed EnviRonments), which
provides quality-guaranteed approximations for distributed POMDPs. SPIDER allows us to vary this quality guarantee, thus allowing us to vary solution quality
systematically. SPIDER and its enhancements employ heuristic search techniques
for finding a joint policy that satisfies the required bound on the quality of the
While POMDPs (partially observable markov decision problems) are a popular computational model
with wide-ranging applications, the computational
cost for optimal policy generation is prohibitive.
Researchers are investigating ever-more efficient
algorithms, yet many applications demand such algorithms bound any loss in policy quality when
chasing efficiency. To address this challenge, we
present two new techniques. The first approximates
in the value space to obtain solutions efficiently for
a pre-specified error bound. Unlike existing techniques, our technique guarantees the resulting policy will meet this bound. Furthermore, it does not
require costly computations to determine the quality loss of the policy. Our second technique prunes
large tracts of belief space that are unreachable, allowing faster policy computation without any sacrifice in optimality. The combination of the two techniques, which are complementary to existing optimal policy generation algorithms, provides solutions with tight error bounds efficiently in domains
where competing algorithms fail to provide such
My research goal is to build large-scale intelligent systems (both single- and multi-agent) that reason with uncertainty in complex, real-world environments. I foresee an integration of such systems in many critical facets of human life ranging from intelligent assistants in hospitals to offices, from rescue agents in large scale disaster response to sensor agents tracking weather phenomena in earth observing sensor webs, and others. In my thesis, I have taken steps towards achieving this goal in the context of systems that operate in partially observable domains that also have transitional (non-deterministic outcomes to actions) uncertainty. Given this uncertainty, Partially Observable Markov Decision Problems (POMDPs) and Distributed POMDPs present themselves as natural choices for modeling these domains. Unfortunately, the significant computational complexity involved in solving POMDPs (PSPACEComplete) and Distributed POMDPs (NEXP-Complete) is a key obstacle. Due to this significant computational complexity, existing approaches that provide exact solutions do not scale, while approximate solutions do not provide any usable guarantees on quality. My thesis addresses these issues using the following key ideas: The first is exploiting structure in the domain. Utilizing the structure present in the dynamics of the domain or the interactions between the agents allows improved efficiency without sacrificing on the quality of the solution. The second is direct approximation in the value space. This allows for calculated approximations at each step of the algorithm, which in turn allows us to provide usable quality guarantees; such quality guarantees may be specified in advance. In contrast, the existing approaches approximate in the belief space leading to an approximation in the value space (indirect approximation in value space), thus making it difficult to compute functional bounds on approximations. In fact, these key ideas allow for the efficient computation of optimal and quality bounded solutions to complex, large-scale problems, that were not in the purview of existing algorithms.