Bellman Equation, etc., are really designed for state spaces that are discrete. However, we’d really like to be able to support continuous state spaces! Suppose we have: S \in \mathbb{R}^{n}, what can we do? Discretization We can just pretend that our system is a discrete-state MDP by chopping the state space up into small blocks. If you do it, you can cast your V back to a step function. Recall that this could start exploding: for S \in \mathbb{R}^{n} and we want to divide each axes into k values, we will get k^{n} values! Also, instead of using the same size grid for every state variable, we may use more steps on the states for which output sensitivity is higher. Using a Function Use a traditional function approximation (linear regression, neural network, etc.) as a proxy. To do this, we need a model f of the MDP such that we have s’ = f\left(s,a\right) such that s’ \sim T\left(.|s,a\right). You may also have stochastic models, namly we can add something like s’ = f\left(s,a\right) + \varepsilon where \epsilon \sim \mathcal{N} to make a more robust model. You can obtain f via data or via physics / expert design. After we have this, given a state s, call \phi\left(s\right) the features of state s. Then, we write:

\begin{equation} V\left(s\right) = \hat{\t}^{T} \phi\left(s\right) \end{equation}

Recall also, if we determinize our MDP, we have the Bellman equation as:

\begin{equation} V\left(s\right) = R\left(s\right) + \gamma \max_{a} V\left(s’\right), \text{ where } s’ = T\left(s,a\right) \end{equation}

Now we have all the pieces to perform a particular type of value iteration: Fitted Value Iteration Sample s_1, …, s_{n} \in S. Initialize parameters \theta to seed a model V_{\theta}\left(s\right) = \theta^{T} s. Repeat, for each i = 1 … n compute: y^{(i)} = R\left(s^{(i)}\right) + \gamma \max_{a} V\left(T\left(s,a\right)\right) update your model V_{\theta} as usual If your value has stochasticity, we can just run it 10 times, etc. and get a next state approximation in a monte-carlo way.

[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?