G-DICE — Jemoka Knowledge Base

Motivation Its the same. It hasn’t changed: curses of dimensionality and history. Goal: to solve decentralized multi-agent MDPs. Key Insights macro-actions (MAs) to reduce computational complexity (like hierarchical planning) uses cross entropy to make infinite horizon problem tractable Prior Approaches masked Monte Carlo search: heuristic based, no optimality garantees MCTS: poor performance Direct Cross Entropy see also Cross Entropy Method sample a value function k takes n highest sampled values update parameter \theta resample until distribution convergence take the best sample x G-DICE create a graph with exogenous N nodes, and O outgoing edges (designed before) use Direct Cross Entropy to solve for the best policy Results demonstrates improved performance over MMCS and MCTS does not need robot communication garantees convergence for both finite and infiinte horizon can choose exogenous number of nodes in order to gain computational savings