Online Planning for Optimal Protector Strategies in Resource Conservation Games


Yundi Qian, William B. Haskell, Albert Xin Jiang, and Milind Tambe. 2014. “Online Planning for Optimal Protector Strategies in Resource Conservation Games .” In International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2014).


Protecting our environment and natural resources is a major global challenge. “Protectors” (law enforcement agencies) try to protect these natural resources, while “extractors” (criminals) seek to exploit them. In many domains, such as illegal fishing, the extractors know more about the distribution and richness of the resources than the protectors, making it extremely difficult for the protectors to optimally allocate their assets for patrol and interdiction. Fortunately, extractors carry out frequent illegal extractions, so protectors can learn about the richness of resources by observing the extractor’s behavior. This paper presents an approach for allocating protector assets based on learning from extractors. We make the following four specific contributions: (i) we model resource conservation as a repeated game; (ii) we transform this repeated game into a POMDP by adopting a fixed model for the adversary’s behavior, which cannot be solved by the latest general POMDP solvers due to its exponential state space; (iii) in response, we propose GMOP, a dedicated algorithm that combines Gibbs sampling with Monte Carlo tree search for online planning in this POMDP; (iv) for a specific class of our game, we can speed up the GMOP algorithm without sacrificing solution quality, as well as provide a heuristic that trades off solution quality for lower computational cost.
See also: 2014