We want to solve huge POMDP in the real world, but the belief states are huge. Notably, reachable beliefs are very small given an initial belief. Why is vanilla PCA bad PCA as a denoising procedure: the underlying data is some data which is normally noised. This is not strictly true, the points don’t have normal noise. Better PCA: E-PCA Instead of Euclidean distance, we use
as a metric, where: U the feature specifically:
where F is any convex objective that is problem specific that you choose, Bregman Divergence forces the underlying matricies’ bases to be non-negative Overall Methods collect sample beliefs apply the belifs into E-PCA Discretize the E-PCA’d belifs into a new state space S Recalculate R (R(b) = b \cdot R(s)) and T (we simply sample b,o and calculate update(b,a,o)) for that state space S; congratulations, you are now solving an MDP value iteration