CPOMDP — Jemoka Knowledge Base

A CPOMDP, or Constrained Partially Observable Markov Decision Process, gives two objectives for the system to optimize upon: an reward function r(s,a) and a set of constraints c(s,a) \geq 0. Specifically, we formulate it as a POMDP: (S,A,\Omega), T, O ,R, with an additional set of constraints \bold{C} and budgets \beta. Whereby, we seek to maximize the infinite-horizon reward \mathbb{E}_{t} \left[R(a_{t}, s_{t})\right] subject to discounting, subject to:

\begin{equation} C_{i}(s,a) \leq \beta_{i}, \forall C_{i},\beta_{i} \in \bold{C}, \beta \end{equation}