Policy Optimization deals with algorithms that, unlike value iteration/policy iteration/online planning which uses a surrogate (like value function or some future discounted reward) to calculate a policy, directly optimizes against policy parameters \theta for a policy \pi_{\theta}. Local Policy Search (aka Hooke-Jeeves Policy Search) Genetic Policy Search Cross Entropy Method Policy Gradient, Regression Gradient and Likelyhood Ratio Gradient Reward-to-Go

[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?