ICLR2025 Morris: contextual document embeddings Take a bunch of sentence embeddings as input to produce a new sentence embedding that is now contextual ICLR2025 Noukhovich: asynchronous reinforcement learning for language models Rollout and tune concurrently ICLR2025 Yao: CR-CTC CONSISTENCY REGULATION CTC LOSS CAN BE MADE MORE ROBUST IF YOU REGULARIZE TO HAVE MINIMAL DIFFERENCE BETWEEN TWO AUGMENTED VIEWS OF THE SAME MEL SPECTRUM ICLR2025 Sun: ReDeEP detecting hallucination using mechanistic interpretability Find layers most prone to insert information, measure the information insertion using logit lens before and after passing through FFN, strong change after hallucination prone FFN means hallucination ICLR2025 Fu: CHiP For multi model preference optimization, combine four different loss terms together a varying types of preference loss to get best results ICLR2025 Faysse: ColPali Embed images of text instead of text itself during rag ICLR2025 Liu: DeLLMa Key insight; make a language model of POMDP by asking the language model to produce value judgments and normalizing them and doing standard value iteration ICLR2025 Wijmans: cut your losses in large vocabulary language model Instead of decoding directly into logits, which is memory intensive, there is a trick to allow us to not have to store the entire out projection in memory ICLR2025 Gao: progressing the relative future Solve multiturn RLHF by writing the policy Q value and optimizing it over discounted features ICLR2025 Xiao: SimPER preference alignment by removing hyper parameters Remove the log term of DPO and remove thereby the hyper parameter beta that is needed ICLR2025 Xiong: from tokens to lattices Mask language models learn conditional relationships between tokens of the same entity thereby implicitly creating a graph ICLR2025 Pagliardini: AdEMAMix Key insight: Adam with two different betas and also use gradient information from multiple steps for stabler and faster convergence ICLR2025 Fan: loop transformers for length generalization Key insight: UT like approaches with loops generalize better for tasks of a specific kind ICLR2025 Lee: multiple non-asymptotic rates for value iteration Key insight: anchoring using the original policy speed up average value value iteration That is: Vt = aV0 + bT*Vt-1 ICLR2025 Liu: linear combination of saves checkpoints makes diffusion and consistency models better Key insight: as titled, use evolutionary research to figure out the best mixture of weights to select ICLR2025 Ramapuram: theory analysis and best practices for sigmoid self attention Key insight: sigmoid self attention reduces all gather costs and they have a bunch of tricks to make it work ICLR2025 Sun: block verification accelerate speculative decoding Key insight: when using a small language model to speculatively decode a large language model, evaluate likelihood blocks at a time ICLR2025 Chang: skiable influence a fact tracing Key insight: using a normalized gradient dot product between training examples and outputs, do attribution ICLR2025 Hu: how to visualize training dynamics Key insight: take whatever summary statistics you have for each checkpoint, run classical low dimensional work on it such as PCA ICLR2025 Addepali: safety training of LM’s generalized to semantically related prompts Key insight: take some jailbreak that doesn’t work anymore, make semantic pururbation o it, check if it still works. Often, it does. ICLR2025 Georgiev: attribute to delete Key; learn a data model which then allows you to perturb what pieces of input pre-training data is relevant to the actual output, using this,, with counterfactual for what the correct unlearned outcome is, and then tune against that.