Partially Observable Markov Decision Process — Jemoka Knowledge Base

Partially Observable Markov Decision Process is a with . Components: states actions (given state) transition function (given state and actions) reward function Belief System beliefs observations observation model O(o|a,s’) As always we desire to find a \pi such that we can:

\begin{equation} \underset{\pi \in \Pi}{\text{maximize}}\ \mathbb{E} \left[ \sum_{t=0}^{\infty} \gamma^{t} R(b_{t}, \pi(b_{t}))\right] \end{equation}

whereby our \pi instead of taking in a state for input takes in a belief (over possible states) as input. observation and states “where are we, and how sure are we about that?” beliefs and filters policy representations “how do we represent a policy” a tree: conditional plan a graph: with utility: + just take the top action of the conditional plan the alpha-vector was computed from policy evaluations “how good is our policy / what’s the utility?” conditional plan evaluation policy solutions “how do we make that policy better?” exact solutions optimal value function for POMDP POMDP value-iteration approximate solutions estimate an , and then use a policy representation: upper-bounds for s lower-bounds for s online solutions Online POMDP Methods