Using DCOPs to Balance Exploration and Exploitation in Time-Critical Domains

Citation:

Matthew E. Taylor, Manish Jain, Prateek Tandon, and Milind Tambe. 2009. “Using DCOPs to Balance Exploration and Exploitation in Time-Critical Domains .” In IJCAI 2009 Workshop on Distributed Constraint Reasoning (DCR 2009) .

Download

2009_15_teamcore_dcr09_taylor.pdf

698 KB

Abstract:

Substantial work has investigated balancing exploration and exploitation, but relatively little has addressed this tradeoff in the context of coordinated multi-agent interactions. This paper introduces a class of problems in which agents must maximize their on-line reward, a decomposable function dependent on pairs of agent’s decisions. Unlike previous work, agents must both learn the reward function and exploit it on-line, critical properties for a class of physicallymotivated systems, such as mobile wireless networks. This paper introduces algorithms motivated by the Distributed Constraint Optimization Problem framework and demonstrates when, and at what cost, increasing agents’ coordination can improve the global reward on such problems.

Using DCOPs to Balance Exploration and Exploitation in Time-Critical Domains

Citation:

Abstract:

Publications: Years

Recent Publications

59f5d16cd6de1f782b70da1b8b307fdc

css_footer