Some more improvements to Importance Sampling. Cross Entropy Method draw initial samples fit a new distribution with the subset that failed: weight each sample by \begin{equation} w\left(\tau\right) = \frac{p\left(\tau\right) \left{\tau \not \in \psi\right}}{q\left(\tau\right)} \end{equation} problem: what if, immediately on the first proposal, we never got any failures? Then the weight of everything is zero and then life is bad. adaptive cross entropy method with adaptive importance sampling Pick a notion of “distance to failure” f\left(\tau\right) We will ask that f\left(\tau\right) \leq 0, for failure trajectories—so that we have p\left(\tau | \tau \not \in \psi\right) = p\left(\tau | f\left(\tau\right) \leq 0\right). draw samples from current proposal compute f\left(\tau\right) for each sample and pick top m_{\text{elite}} samples set a threshold \gamma to be the highest (i.e. worst) f\left(\tau\right) of the samples—remember to cap at minimum of 0, otherwise we’ll start chopping off the failure region compute the next proposal by minimizing the cross entropy of the distribution with p\left(\tau | f\left(\tau\right) \leq \gamma\right); we use the threshold instead of binary failure.

[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?