MOMDP are POMDPs where some parts of the state are fully observable. Motivation scaling up POMDPs is really hard: exponential curse of dimensionality. Even discretization will cause the number of beliefs to really blow up. Some of the state isn’t uncertain, some others are bounded uncertainty: this REDUCES scale a lot. Solving Solving the algorithm uses SARSOP, or any point-based system. Instead of sampling the full belief state, however, we sample from a tuple (x, b_{y}), whereby x is the observable part and b_{y} is the unobservable part. How Exactly Tuple? True Mixed Observability Go about splitting about your space based on the true observability part. Say there are 10 states which are observable, you literally just initialize 10 sets of alpha vectors to create V_{1} … V_{10} for your observable states, where each one you have:
whereby all of your objectives and backup, etc., takes x your observable state as input. Then, during inference/backup looking at where you are in the observable part and use the value function from that part. Pseudo-Full Observability Train a fully observable model, and then use belief-weighted average during inference. This is where QMDP came from. Bounded Uncertainty Most of the time neither of the top two cases apply cleanly. Instead, most frequently your uncertainty in your observation is bounded by a significant degree. Condition For instance, your GPS maybe uncertain, but if it says you are in Kansas you are not in Shanghai. Formally, for h: O \to S (the hypothetical “preimage” of any observation), we expect that:
gives c \ll 1. Solution If we know we have Bounded Uncertainty, we can reparameterize our POMDP to an MDP over observations (we call this X) plus a POMDP modeling uncertainty in offsets from those observations (we call this Y). Whereby:
where our state space is now split into s \in S = X \times Y s.t. s=(x,y).