Ingredients: \mathcal{P} problem (states, transitions, etc.) d depth (how many next states to look into)—more is more accurate but slower U value function estimate at depth d We essentially roll forward into all possible next states up to depth d, and tabulate our value function. Define subroutine forward_search(depth_remaining, value_function_estimate_at_d, state). if depth_remaining=0; return (action=None, utility=value_function_estimate_at_d(state)) otherwise, let best = (action = None, utility = -infinity) for each possible action at our state get an action-value for our current state where the utility of each next state is the utility given by forward_search(depth_remaining-1, value_function_estimate_at_d, next_state) if the action-value is higher than what we have, then we set best=(a, action-value) return best What this essentially does is to Dijkstra an optimal path towards the highest final utility U(s) om your current state, by trying all states.

[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?