ICLR2025 Hu: belief state transformer Key insight: residual stream at the last token kept thought of as a belief state encoding future tokens, that is, uncertainty in the last residual directly correlate the diversity of output Method: trainer transformer and trainer reverse transformer like what Robert wanted, then correlate ICLR2025 Lingam: diversity of thoughts Key insight: Use iterative sampling to achieve higher diversity in self reflection, in order to get better outputs. ICLR2025 Gu: data selection via optimal control. Key insight: use Potryagin data selection, criteria as control target leverage optimal control, and solve for optimal data mix ICLR2025 Kim: Adam with adaptive batch selection Key insight: use bandit approaches to select a batch size at every step which maximize gradient ICLR2025 Fujimoto: General purpose, model, free reinforcement learning Key insight: learn a linear representation of latent dynamics, and do your reinforcement learning for a few steps on the learned latent instead of the actual inputs, and then re-sample dynamics after action. ICLR2025 Wang: speculative rag. Key inside: make multiple drafts and then score it using a different model before performing actual retrieval ICLR2025 Kolbeinsson: composable interventions. Intervention order influences intervention success, and composition, success; compression worsens the ability for interventions to work ICLR2025 Ouyang: Projection head is an information bottleneck Key insight: projection head if distinct from pre-training task serves as an information bottleneck because task difference ICLR2025 Xu: alignment data synthesis from scratch by prompting aligned LLMs with nothing Key insignt: roll out a bunch of samples, label them, and then fine tune on it