Publications

2008
Emma Bowring, Jonathan P. Pearce, Christopher Portway, Manish Jain, and Milind Tambe. 2008. “On K-Optimal Distributed Constraint Optimization Algorithms: New Bounds and Algorithms .” In The Seventh International Conference on Autonomous Agents and Multiagent Systems.Abstract
Distributed constraint optimization (DCOP) is a promising approach to coordination, scheduling and task allocation in multi agent networks. In large-scale or low-bandwidth networks, finding the global optimum is often impractical. K-optimality is a promising new approach: for the first time it provides us a set of locally optimal algorithms with quality guarantees as a fraction of global optimum. Unfortunately, previous work in k-optimality did not address domains where we may have prior knowledge of reward structure; and it failed to provide quality guarantees or algorithms for domains with hard constraints (such as agents’ local resource constraints). This paper addresses these shortcomings with three key contributions. It provides: (i) improved lower-bounds on k-optima quality incorporating available prior knowledge of reward structure; (ii) lower bounds on k-optima quality for problems with hard constraints; and (iii) k-optimal algorithms for solving DCOPs with hard constraints and detailed experimental results on large-scale networks.
2008_1_teamcore_final.pdf
A. Freedy, O. Sert, E. Freedy, G. Weltman, J. Mcdonough, Milind Tambe, and Tapana Gupta. 2008. “Multiagent adjustable autonomy framework (MAAF) for multirobot, multihuman teams .” In International symposium on collaborative technologies (CTS).Abstract
This paper describes the ongoing development of a Multiagent Adjustable Autonomy Framework (MAAF) for multi-robot, multi-human teams performing tactical maneuvers. The challenge being addressed in this SBIR Phase I R&D project is how to exploit fully the unique capabilities of heterogeneous teams composed of a mixture of Robots, Agents or Persons (RAPs): that is, how to improve the safety, efficiency, reliability and cost of achieving mission goals while maintaining dynamic adaptation to the unique limitations and contingencies of a real-world operating environment. Our response to this challenge is the creation of a new infrastructure that will facilitate cooperative and collaborative performance of human and robots as equal team partners through the application of advances in goal-oriented, multiagent planning and coordination technology. At the heart of our approach is the USC Teamcore Group’s Machinetta, a state-of-the-art robot proxy framework with adjustable autonomy. Machinetta facilitates robot-human role allocation decisions and collaborative sharing of team tasks in the non-deterministic and unpredictable military environment through the use of a domain-independent teamwork model that supports flexible teamwork. This paper presents our innovative proxy architecture and its constituent algorithms, and also describes our initial demonstration of technical feasibility in a realistic simulation scenario.
2008_11_teamcore_maaf_cts2008paper.pdf
Janusz Marecki, Tapana Gupta, Pradeep Varakantham, Milind Tambe, and Makoto Yokoo. 2008. “Not All Agents Are Equal: Scaling up Distributed POMDPs for Agent Networks .” In The Seventh International Conference on Autonomous Agents and Multiagent Systems.Abstract
Many applications of networks of agents, including mobile sensor networks, unmanned air vehicles, autonomous underwater vehicles, involve 100s of agents acting collaboratively under uncertainty. Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are well-suited to address such applications, but so far, only limited scale-ups of up to five agents have been demonstrated. This paper escalates the scale-up, presenting an algorithm called FANS, increasing the number of agents in distributed POMDPs for the first time into double digits. FANS is founded on finite state machines (FSMs) for policy representation and expoits these FSMs to provide three key contributions: (i) Not all agents within an agent network need the same expressivity of policy representation; FANS introduces novel heuristics to automatically vary the FSM size in different agents for scaleup; (ii) FANS illustrates efficient integration of its FSM-based policy search within algorithms that exploit agent network structure; (iii) FANS provides significant speedups in policy evaluation and heuristic computations within the network algorithms by exploiting the FSMs for dynamic programming. Experimental results show not only orders of magnitude improvements over previous best known algorithms for smaller-scale domains (with similar solution quality), but also a scale-up into double digits in terms of numbers of agents.
2008_5_teamcore_fans.pdf
Janusz Marecki. 2008. “Planning with Continuous Resources in Agent Systems ”.Abstract

My research concentrates on developing reasoning techniques for intelligent, autonomous agent systems. In particular, I focus on planning techniques for both single and multi-agent systems acting in uncertain domains. In modeling these domains, I consider two types of uncertainty: (i) the outcomes of agent actions are uncertain and (ii) the amount of resources consumed by agent actions is uncertain and only characterized by continuous probability density functions. Such rich domains, that range from the Mars rover exploration to the unmanned aerial surveillance to the automated disaster rescue operations are commonly modeled as continuous resource Markov decision processes (MDPs) that can then be solved in order to construct policies for agents acting in these domains. This thesis addresses two major unresolved problems in continuous resource MDPs. First, they are very difficult to solve and existing algorithms are either fast, but make additional restrictive assumptions about the model, or do not introduce these assumptions but are very inefficient. Second, continuous resource MDP framework is not directly applicable to multi-agent systems and current approaches all discretize resource levels or assume deterministic resource consumption which automatically invalidates the formal solution quality guarantees. The goal of my thesis is to fundamentally alter this landscape in three contributions:

I first introduce CPH, a fast analytic algorithm for solving continuous resource MDPs. CPH solves the planning problems at hand by first approximating with a desired accuracy the probability distributions over the resource consumptions with phase-type distributions, which use exponential distributions as building blocks. It then uses value iteration to solve the resulting MDPs more efficiently than its closest competitor, and allows for a systematic trade-off of solution quality for speed. Second, to improve the anytime performance of CPH and other continuous resource MDP solvers I introduce the DPFP algorithm. Rather than using value iteration to solve the problem at hand, DPFP performs a forward search in the corresponding dual space of cumulative distribution functions. In doing so, DPFP discriminates in its policy generation effort providing only approximate policies for regions of the state-space reachable with low probability yet it bounds the error that such approximation entails. Third, I introduce CR-DEC-MDP, a framework for planning with continuous resources in multi-agent systems and propose two algorithms for solving CR-DEC-MDPs: The first algorithm (VFP) emphasizes scalability. It performs a series of policy iterations in order to quickly find a locally optimal policy. In contrast, the second algorithm (M-DPFP) stresses optimality; it allows for a systematic trade-off of solution quality for speed by using the concept of DPFP in a multiagent setting. My results show up to three orders of magnitude speedups in solving single agent planning problems and up to one order of magnitude speedup in solving multi-agent planning problems. Furthermore, I demonstrate the practical use of one of my algorithms in a large-scale disaster simulation where it allows for a more efficient rescue operation.

2008_15_teamcore_marecki_j4015754173.pdf
Praveen Paruchuri, Jonathan P. Pearce, Janusz Marecki, Milind Tambe, Fernando Ordonez, and Sarit Kraus. 2008. “Playing games with security: An efficient exact algorithm for Bayesian Stackelberg Games .” In International Joint Conference on Autonomous Agents and Multiagent Systems, 2008.Abstract
In a class of games known as Stackelberg games, one agent (the leader) must commit to a strategy that can be observed by the other agent (the follower or adversary) before the adversary chooses its own strategy. We consider Bayesian Stackelberg games, in which the leader is uncertain about the types of adversary it may face. Such games are important in security domains, where, for example, a security agent (leader) must commit to a strategy of patrolling certain areas, and a robber (follower) has a chance to observe this strategy over time before choosing its own strategy of where to attack. This paper presents an efficient exact algorithm for finding the optimal strategy for the leader to commit to in these games. This algorithm, DOBSS, is based on a novel and compact mixed-integer linear programming formulation. Compared to the most efficient algorithm known previously for this problem, DOBSS is not only faster, but also leads to higher quality solutions, and does not suffer from problems of infeasibility that were faced by this previous algorithm. Note that DOBSS is at the heart of the ARMOR system that is currently being tested for security scheduling at the Los Angeles International Airport.
2008_4_teamcore_aamas_2008.pdf
Nathan Schurr, Janusz Marecki, and Milind Tambe. 2008. “RIAACT: A robust approach to adjustable autonomy for human-multiagent teams .” In The Seventh International Conference on Autonomous Agents and Multiagent Systems.Abstract
When human-multiagent teams act in real-time uncertain domains, adjustable autonomy (dynamic transferring of decisions between human and agents) raises three key challenges. First, the human and agents may differ significantly in their worldviews, leading to inconsistencies in their decisions. Second, these human-multiagent teams must operate and plan in real-time with deadlines with uncertain duration of human actions. Thirdly, adjustable autonomy in teams is an inherently distributed and complex problem that cannot be solved optimally and completely online. To address these challenges, our paper presents a solution for Resolving Inconsistencies in Adjustable Autonomy in Continuous Time (RIAACT). RIAACT incorporates models of the resolution of inconsistencies, continuous time planning techniques, and hybrid method to address coordination complexity. These contributions have been realized in a disaster response simulation system.
2008_6_teamcore_riaact.pdf
Manish Jain, Fernando Ord´onez, James Pita, Christopher Portway, Milind Tambe, Craig Western, Praveen Paruchuri, and Sarit Kraus. 2008. “Robust Solutions in Stackelberg Games: Addressing Boundedly Rational Human Preference Models .” In Association for the Advancement of Artificial Intelligence 4th Multidiciplinary Workshop on Advances in Preference Handling.Abstract
Stackelberg games represent an important class of games in which one player, the leader, commits to a strategy and the remaining players, the followers, make their decision with knowledge of the leader’s commitment. Existing algorithms for Bayesian Stackelberg games find optimal solutions while modeling uncertainty over follower types with an a-priori probability distribution. Unfortunately, in real-world applications, the leader may also face uncertainty over the follower’s response which makes the optimality guarantees of these algorithms fail. Such uncertainty arises because the follower’s specific preferences or the follower’s observations of the leader’s strategy may not align with the rational strategy, and it is not amenable to a-priori probability distributions. These conditions especially hold when dealing with human subjects. To address these uncertainties while providing quality guarantees, we propose three new robust algorithms based on mixed-integer linear programs (MILPs) for Bayesian Stackelberg games. A key result of this paper is a detailed experimental analysis that demonstrates that these new MILPs deal better with human responses: a conclusion based on 800 games with 57 human subjects as followers. We also provide run-time results on these MILPs.
2008_14_teamcore_aaai_preferences_2008.pdf
Jonathan P. Pearce, Milind Tambe, and Rajiv T. Maheswaran. 2008. “Solving multiagent networks using distributed constraint optimization .” AI Magazine 29 (3), Pp. 47-66.Abstract
In many cooperative multiagent domains, the effect of local interactions between agents can be compactly represented as a network structure. Given that agents are spread across such a network, agents directly interact only with a small group of neighbors. A distributed constraint optimization problem (DCOP) is a useful framework to reason about such networks of agents. Given agents’ inability to communicate and collaborate in large groups in such networks, we focus on an approach called k-optimality for solving DCOPs. In this approach, agents form groups of one or more agents until no group of k or fewer agents can possibly improve the DCOP solution; we define this type of local optimum, and any algorithm guaranteed to reach such a local optimum, as k-optimal. The article provides an overview of three key results related to k-optimality. The first set of results are worst-case guarantees on the solution quality of k-optima in a DCOP. These guarantees can help determine an appropriate k-optimal algorithm, or possibly an appropriate constraint graph structure, for agents to use in situations where the cost of coordination between agents must be weighed against the quality of the solution reached. The second set of results are upper bounds on the number of k-optima that can exist in a DCOP. These results are useful in domains where a DCOP must generate a set of solutions rather than single solution. Finally, we sketch algorithms for k-optimality and provide some experimental results for 1-, 2- and 3-optimal algorithms for several types of DCOPs.
2008_2_teamcore_k_optimality_ai_mag_2.pdf
Janusz Marecki and Milind Tambe. 2008. “Towards Faster Planning with Continuous Resources in Stochastic Domains .” In Twenty Third AAAI Conference on Artificial Intelligence .Abstract
Agents often have to construct plans that obey resource limits for continuous resources whose consumption can only be characterized by probability distributions. While Markov Decision Processes (MDPs) with a state space of continuous and discrete variables are popular for modeling these domains, current algorithms for such MDPs can exhibit poor performance with a scale-up in their state space. To remedy that we propose an algorithm called DPFP. DPFP’s key contribution is its exploitation of the dual space cumulative distribution functions. This dual formulation is key to DPFP’s novel combination of three features. First, it enables DPFP’s membership in a class of algorithms that perform forward search in a large (possibly infinite) policy space. Second, it provides a new and efficient approach for varying the policy generation effort based on the likelihood of reaching different regions of the MDP state space. Third, it yields a bound on the error produced by such approximations. These three features conspire to allow DPFP’s superior performance and systematic trade-off of optimality for speed. Our experimental evaluation shows that, when run stand-alone, DPFP outperforms other algorithms in terms of its any-time performance, whereas when run as a hybrid, it allows for a significant speedup of a leading continuous resource MDP solver.
2008_10_teamcore_dpfp.pdf
Milind Tambe, Anne Balsamo, and Emma Bowring. 2008. “Using science fiction in teaching Artificial Intelligence .” In AAAI Spring Symposium 2008.Abstract
Many factors are blamed for the decreasing enrollments in computer science and engineering programs in the U.S., including the dot-com economic bust and the increase in the use of “off-shore” programming labor. One major factor is also the lack of bold new vision and excitement about computer science, which thus results in a view of computer science as a field wedded to routine programming. To address this concern, we have focused on science fiction as a means to generate excitement about Artificial Intelligence, and thus in turn in Computer Science and Engineering. In particular, since the Fall of 2006, we have used science fiction in teaching Artificial Intelligence to undergraduate students at the University of Southern California (USC), in teaching activities ranging from an undergraduate upper division class in computer science to a semester-long freshman seminar for nonengineering students to micro-seminars during the welcome week. As an interdisciplinary team of scholar/instructors, our goal has been to use science fiction not only in motivating students to learn about AI, but also to use science fiction in understanding fundamental issues that arise at the intersection of technology and culture, as well as to provide students with a more creative and well-rounded course that provided a big picture view of computer science. This paper outlines the courses taught using this theme, provides an overview of our classroom teaching techniques in using science fiction, and discusses some of the lectures in more detail as exemplars. We conclude with feedback received, lessons learned and impact on both the computer science students and noncomputer-science (and non-engineering) students.
2008_3_teamcore_agents_scifi.pdf
2007
Emma Bowring. 2007. “Balancing local resources and global goals in multiply constrained distributed constraint optimization ”.Abstract
Distributed constraint optimization (DCOP) is a useful framework for cooperative multiagent coordination. DCOP focuses on optimizing a single team objective. However, in many domains, agents must satisfy constraints on resources consumed locally while optimizing the team goal. These resource constraints may need to be kept private or shared to improve efficiency. Extending DCOP to these domains raises two issues: algorithm design and sensitivity analysis. Algorithm design requires creating algorithms that trade off completeness, scalability, privacy and efficiency. Sensitivity analysis examines whether slightly increasing the available resources could yield a significantly better outcome. This thesis defines the multiply-constrained DCOP (MC-DCOP) framework and provides complete and incomplete algorithms for solving MC-DCOP problems. Complete algorithms find the best allocation of scarce resources, while incomplete algorithms are more scalable. The algorithms use mutually-intervening search; they use local resource constraints to intervene in the search for the globally optimal solution. The algorithms use four key techniques: (i) transforming constraints to maintain privacy; (ii) dynamically setting upper bounds on resource consumption; (iii) identifying the extent to which the local graph structure allows agents to compute exact bounds on resource consumption; and (iv) using a virtual assignment to flag problems rendered unsatisfiable by their resource constraints. Proofs of correctness are presented for all algorithms. Finally, the complete and incomplete algorithms are used in conjunction with one another to perform distributed local reoptimization to address sensitivity analysis. Experimental results demonstrated that MC-DCOP problems are most challenging when resources are scarce but sufficient. In problems where there are insufficient resources, the team goal is largely irrelevant. In problems with ample resources, the local resource constraints require little consideration. The incomplete algorithms were two orders of magnitude more efficient than the complete algorithm for the most challenging MC-DCOP problems and their runtime increased very little as the number of agents in the network increased. Finally, sensitivity analysis results indicated that local reoptimization is an effective way to identify resource constraints that are creating bottlenecks. Taken together these new algorithms and examination of the problem of sensitivity analysis help extend the applicability of DCOP to more complex domains.
2007_18_teamcore_emma_thesis.pdf
Praveen Paruchuri, Milind Tambe, Fernando Ordonez, and Sarit Kraus. 2007. “Coordinating Randomized Policies for Increasing Security in Multiagent Systems .” In In H. Mouratidis editors Proceedings of the workshop on Safety and Security in multiagent systems, Lecture notes in Artificial Intelligence, Springer.Abstract
Despite significant recent advances in decision theoretic frameworks for reasoning about multiagent teams, little attention has been paid to applying such frameworks in adversarial domains, where the agent team may face security threats from other agents. This paper focuses on domains where such threats are caused by unseen adversaries whose actions or payoffs are unknown. In such domains, action randomization is recognized as a key technique to deteriorate an adversarys capability to predict and exploit an agent/agent teams actions. Unfortunately, there are two key challenges in such randomization. First, randomization can reduce the expected reward (quality) of the agent team’s plans, and thus we must provide some guarantees on such rewards. Second, randomization results in miscoordination in teams. While communication within an agent team can help in alleviating the miscoordination problem, communication is unavailable in many real domains or sometimes scarcely available. To address these challenges, this paper provides the following contributions. First, we recall the Multiagent Constrained MDP (MCMDP) framework that enables policy generation for a team of agents where each agent may have a limited or no(communication) resource. Second, since randomized policies generated directly for MCMDPs lead to miscoordination, we introduce a transformation algorithm that converts the MCMDP into a transformed MCMDP incorporating explicit communication and no communication actions. Third, we show that incorporating randomization results in a non-linear program and the unavailability/limited availability of communication results in addition of non-convex constraints to the non-linear program. Finally, we experimentally illustrate the benefits of our work.
2007_16_teamcore_paruchuril07lnai.pdf
Tapana Gupta, Pradeep Varakantham, Timothy W. Rauenbusch, and Milind Tambe. 2007. “Demonstration of Teamwork in Uncertain Domains using Hybrid BDI-POMDP systems .” In Sixth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) Demo Track.Abstract

Personal Assistant agents are becoming increasingly important in a variety of application domains in offices, at home, for medical care and many others [5, 1]. These agents are required to constantly monitor their environment (including the state of their users), and make periodic decisions based on their monitoring. For example, in an office environment, agents may need to monitor the location of their user in order to ascertain whether the user would be able to make it on time to a meeting [5]. Or, they may be required to monitor the progress of a user on a particular assignment and decide whether or not the user would be able to meet the deadline for completing the assignment. Teamwork between such agents is important in Personal Assistant applications to enable agents working together to achieve a common goal (such as finishing a project on time). This working demonstration shows a hybrid(BDI-POMDP) approach to accomplish such teamwork. Agents must be able to make decisions despite observational uncertainty in the environment. For example, if the user is busy and does not respond to a request from its personal assistant agent, the agent loses track of the user’s progress and hence, cannot determine it with certainty. Also, an incorrect action on the agent’s part can have undesirable consequences. For example, an agent might reallocate a task again and again even if there is sufficient progress on the task. In the past, teamwork among Personal Assistant agents typically has not addressed such observational uncertainty. Markov Decision Processes [5] have been used to model the agent’s environment, with simplifying assumptions regarding either observational uncertainty in the environment or the agent’s observational abilities.

Partially Observable Markov Decision Processes (POMDPs) are equipped to deal with the inherent uncertainty in Personal Assistant domains. Computational complexity has been a major hurdle in deploying POMDPs in real-world application domains, but the emergence of new exact and approximate techniques [8] recently shows much promise in being able to compute a POMDP policy for an agent in real time. In this demonstration, we actually deploy POMDPs to compute the Adjustable Autonomy policy for an agent based on which the agent makes decisions. Integrating such POMDPs with architectures that enble teamwork among personal assistants is then the next key part of our demonstration. Several teamwork models have been developed over the past few years to handle communication and coordination between agents [7]. Machinetta [6] is a proxy-based integration architecture for coordinating teams of heterogeneous entities (e.g. robots, agents, persons), which builds on the STEAM teamwork model. Machinetta is designed to meet key challenges such as effective utilization of diverse capabilities of group members, improving coordination between agents by overcoming challenges posed by the environment and reacting to changes in the environment in a flexible manner. We use Machinetta proxies to co-ordinate the agents in our demonstration. Machinetta enables integrating POMDPs and also enables interfacing with BDI architectures that may provide us team plans. In particular, we interface with the SPARK agent framework [2] being developed at the Artificial Intelligence Center of SRI international. SPARK is a Belief-Desire-Intention (BDI) style agent framework grounded in a model of procedural reasoning. This architecture allows the development of active systems that interact with a constantly changing and unpredictable world. By using BDI-based approaches for generating team plans for agents as well as communication and coordination, and POMDPs for adjustable autonomy decision making, we arrive at a hybrid model for multiagent teamwork [3] in Personal Assistant applications. The following sections describe the application domain in which we deploy this hybrid system as well as the interaction between various components of the system, and its working.

2007_5_teamcore_aamas07demo.pdf
Praveen Paruchuri, Jonathan P. Pearce, Milind Tambe, Fernando Ordonez, and Sarit Kraus. 2007. “An Efficient Heuristic Approach for Security Against Multiple Adversaries .” In International Conference on Autonomous Agents and Multiagent Systems, AAMAS-2007 .Abstract
In adversarial multiagent domains, security, commonly defined as the ability to deal with intentional threats from other agents, is a critical issue. This paper focuses on domains where these threats come from unknown adversaries. These domains can be modeled as Bayesian games; much work has been done on finding equilibria for such games. However, it is often the case in multiagent security domains that one agent can commit to a mixed strategy which its adversaries observe before choosing their own strategies. In this case, the agent can maximize reward by finding an optimal strategy, without requiring equilibrium. Previous work has shown this problem of optimal strategy selection to be NP-hard. Therefore, we present a heuristic called ASAP, with three key advantages to address the problem. First, ASAP searches for the highest-reward strategy, rather than a Bayes-Nash equilibrium, allowing it to find feasible strategies that exploit the natural first-mover advantage of the game. Second, it provides strategies which are simple to understand, represent, and implement. Third, it operates directly on the compact, Bayesian game representation, without requiring conversion to normal form. We provide an efficient Mixed Integer Linear Program (MILP) implementation for ASAP, along with experimental results illustrating significant speedups and higher rewards over other approaches.
2007_8_teamcore_praveenaamas07.pdf
Praveen Paruchuri, Jonathan P. Pearce, Milind Tambe, Sarit Kraus, and Fernando Ordonez. 2007. “An Efficient Heuristic for Security Against Multiple Adversaries in Stackelberg Games .” In AAAI Spring Symposium on Game and Decision-Theoretic Agents.Abstract
In adversarial multiagent domains, security, commonly defined as the ability to deal with intentional threats from other agents, is a critical issue. This paper focuses on domains where these threats come from unknown adversaries. These domains can be modeled as Bayesian games; much work has been done on finding equilibria for such games. However, it is often the case in multiagent security domains that one agent can commit to a mixed strategy which its adversaries observe before choosing their own strategies. In this case, the agent can maximize reward by finding an optimal strategy, without requiring equilibrium. Previous work has shown this problem of optimal strategy selection to be NP-hard. Therefore, we present a heuristic called ASAP, with three key advantages to address the problem. First, ASAP searches for the highest-reward strategy, rather than a Bayes-Nash equilibrium, allowing it to find feasible strategies that exploit the natural first-mover advantage of the game. Second, it provides strategies which are simple to understand, represent, and implement. Third, it operates directly on the compact, Bayesian game representation, without requiring conversion to normal form. We provide an efficient Mixed Integer Linear Program (MILP) implementation for ASAP, along with experimental results illustrating significant speedups and higher rewards over other approaches.
2007_10_teamcore_aamas07security.pdf
Milind Tambe, Emma Bowring, Pradeep Varakantham, K. Lerman, Paul Scerri, and D. V. Pynadath. 2007. “Electric Elves: What went wrong and why .” AI Magazine 29 (2), Pp. 23-32.Abstract
Software personal assistants continue to be a topic of significant research interest. This paper outlines some of the important lessons learned from a successfully-deployed team of personal assistant agents (Electric Elves) in an office environment. In the Electric Elves project, a team of almost a dozen personal assistant agents were continually active for seven months. Each elf (agent) represented one person and assisted in daily activities in an actual office environment. This project led to several important observations about privacy, adjustable autonomy, and social norms in office environments. This paper outlines some of the key lessons learned and, more importantly, outlines our continued research to address some of the concerns raised.
2007_9_teamcore_elves_lessons.pdf
Janusz Marecki, Sven Koenig, and Milind Tambe. 2007. “A Fast Analytical Algorithm for Solving Markov Decision Processes with Continuous Resources .” In International Joint Conference on Artificial Intelligence (IJCAI).Abstract
Agents often have to construct plans that obey deadlines or, more generally, resource limits for real-valued resources whose consumption can only be characterized by probability distributions, such as execution time or battery power. These planning problems can be modeled with continuous state Markov decision processes (MDPs) but existing solution methods are either inefficient or provide no guarantee on the quality of the resulting policy. We therefore present CPH, a novel solution method that solves the planning problems by first approximating with any desired accuracy the probability distributions over the resource consumptions with phasetype distributions, which use exponential distributions as building blocks. It then uses value iteration to solve the resulting MDPs by exploiting properties of exponential distributions to calculate the necessary convolutions accurately and efficiently while providing strong guarantees on the quality of the resulting policy. Our experimental feasibility study in a Mars rover domain demonstrates a substantial speedup over Lazy Approximation, which is currently the leading algorithm for solving continuous state MDPs with quality guarantees.
2007_2_teamcore_ijcai_mareckij342.pdf
Karen Myers, Pauline Berry, Jim Blythe, Ken Conley, Melinda Gervasio, Deborah Mcguinness, David Morley, Avi Pfeffer, M Pollack, and Milind Tambe. 2007. “An Intelligent Personal Assistant for Task and Time Management .” AI Magazine 28 (2), Pp. 47-61 .Abstract
We describe an intelligent personal assistant that has been developed to aid a busy knowledge worker in managing time commitments and performing tasks. The design of the system was motivated by the complementary objectives of (a) relieving the user of routine tasks, thus allowing her to focus on tasks that critically require human problem-solving skills, and (b) intervening in situations where cognitive overload leads to oversights or mistakes by the user. The system draws on a diverse set of AI technologies that are linked within a Belief-DesireIntention agent system. Although the system provides a number of automated functions, the overall framework is highly user-centric in its support for human needs, responsiveness to human inputs, and adaptivity to user working style and preferences.
2007_14_teamcore_calo_pexa.pdf
Praveen Paruchuri. 2007. “Keep the Adversary Guessing: Agent Security by Policy Randomization ”.Abstract
Recent advances in the field of agent/multiagent systems brings us closer to agents acting in real world domains, which can be uncertain and many times adversarial. Security, commonly defined as the ability to deal with intentional threats from other agents is a major challenge for agents or agent-teams deployed in these adversarial domains. Such adversarial scenarios arise in a wide variety of situations that are becoming increasingly important such as agents patrolling to provide perimeter security around critical infrastructure or performing routine security checks. These domains have the following characteristics: (a) The agent or agent-team needs to commit to a security policy while the adversaries may observe and exploit the policy committed to. (b) The agent/agent-team potentially faces different types of adversaries and has varying information available about the adversaries (thus limiting the agents’ ability to model its adversaries). To address security in such domains, I developed two types of algorithms. First, when the agent has no model of its adversaries, my key idea is to randomize agent’s policies to minimize the information gained by adversaries. To that end, I developed algorithms for policy randomization for both the Markov Decision Processes (MDPs) and the Decentralized-Partially Observable MDPs (Dec POMDPs). Since arbitrary randomization can violate quality constraints (for example, the resource usage should be below a certain threshold or key areas must be patrolled with a certain frequency), my algorithms guarantee quality constraints on the randomized policies generated. For efficiency, I provide a novel linear program for randomized policy generation in MDPs, and then build on this program for a heuristic solution for Dec-POMDPs. Second, when the agent has partial model of the adversaries, I model the security domain as a Bayesian Stackelberg game where the agent’s model of the adversary includes a probability distribution over possible adversary types. While the optimal policy selection for a Bayesian Stackelberg game is known to be NP-hard, my solution approach based on an efficient Mixed Integer Linear Program (MILP) provides significant speedups over existing approaches while obtaining the optimal solution. The resulting policy randomizes the agent’s possible strategies, while taking into account the probability distribution over adversary types. Finally, I provide experimental results for all my algorithms, illustrating the new techniques developed have enabled us to find optimal secure policies efficiently for an increasingly important class of security domains.
2007_12_teamcore_praveen_thesis_document.pdf
Hideaki Katagishi and Jonathan P. Pearce. 2007. “KOPT : Distributed DCOP Algorithm for Arbitrary k- optima with Monotonically Increasing Utility .” In Ninth Distributed Constraint Reasoning workshop (DCR).Abstract
A distributed constraint optimization problem (DCOP) is a formalism that captures the rewards and costs of local interactions within a team of agents. Because complete algorithms to solve DCOPs are unsuitable for some dynamic or anytime domains, researchers have explored incomplete DCOP algorithms that result in locally optimal solutions. One type of categorization of such algorithms, and the solutions they produce, is k-optimality; a k-optimal solution is one that cannot be improved by any deviation by k or fewer agents. There are no k-optimal algorithms (k>3) so far. In addition, the quality of solution existing algorithm can produce is fixed. We need different algorithms for different optimality. This paper introduces the first DCOP algorithm which can produce arbitrary k-optimal solutions.
2007_15_teamcore_kopt_algorithm.pdf

Pages