Factored MDPs

Motivation Multiple agents need to collaborate to achieve common goal.
Joint Utility Maximization: maximize the joint utility between various agents.
Possible Approaches Using a traditional MDP: an MDP considers “action” as a joint action between all agents (exponential blow up because the agent actions multiply) Local Optimization: share rewards/values among agents Local Optimization: search and maximize joint utility explicitly (no need to model the entire action space) Problems with single Reward Sharing:
Credit Assignment Problem In collective reward situations, determining which action out of the cohort actually contributed to the award is hard.
Free Ride Problem Agents can benefit from reward without actually doing anything by being carried.
Factored MDPs Representation Using factored linear value function to approximate the joint value function Using linear programming to avoid exponential blow up Background Coordination Graphs modeling each agent as a node each edge is a dependency factored Markov Decision Process MDPs are not good at large problems factor the state and action spaces as a random variable factors, etc. action selection each agent maintains a local Q function indicating its population the Q function of each agent maybe influenced by other agents: the coordination graph of the agent is used to calculate contribution We optimize by using one agent at a time: we optimize one agent, then