best-action worst-state is a lower bound for alpha vectors:

\begin{equation} r_{baws} = \max_{a} \sum_{k=1}^{\infty} \gamma^{k-1} \min_{s}R(s,a) \end{equation}

The alpha vector corresponding to this system would be the same r_{baws} at each slot. which should give us the highest possible reward possible given we always pick the most optimal actions while being stuck in the worst state

[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?