\begin{align} \tau = s_1 \dots s_{n} \end{align}
\begin{equation} a \sim \pi_{\text{lm}}\left(\tau\right) \end{equation}
\begin{equation} a_{i}, \rho_{i} \sim \pi_{\text{lm}}\left(\tau \mid \rho_{i-1} \dots \rho_{1}\right) \end{equation}
“action selection” Tracking in \rho Things that can go into \rho (explanation “why did we do this”) Running summary of the entire \tau <- good chance this avoids the entire \tau why don’t se Threads embeddings being bad seems weird