ACL 2025 Paper Notes

11 papers reviewed.

ACL2025 Huang: Making in Multi-Hop QA

Question: can we find a good context permutation to improve reasoning capabilities. One-Liner Notable Methods Two key evaluations: evalutanig relationships between gold documents; notice that performance relates to distance between documents (but FTing helps) investigate the effects between different attention masks (i.e., the use of prefix vs continuation masks) IC Score attention-based context attribution method New Concepts Key insight: correct answers will have single peak of IC scores at go...

Full note

ACL2025 Index

Talks ACL2025 Keynote: Luke Zettemoyer ACL2025 Orals: Language Modeling 1 ACL2025 Orals: QA Posters ACL2025 Monday Morning Posters ACL2025 Tuesday Morning Posters ACL2025 Tuesday Afternoon Posters Takes mayhaps we can apply thoughtbubbbles intutiton to BLT token pruning?

Full note

ACL2025 Keynote: Luke Zettemoyer

Naively: “almost everything comes from pretraining.” How much simple supervision will it radically change the behavior of our language model. Key Directions data long-tail: tokenizer free LLMs data modules: how to we specialize quickly? Tokenizer-Free LM Byte-Level LMs are just more expensive (i.e., there is just a bunch more residual streams! and that’s pretty bad). High level intution: takes the input bytes, create some “strides”/“patches”, and then se...

Full note

ACL2025 Li: TokAlign Token Alignment

Method to adapt tokenization across models. Notable Methods use pairwise cosine similarity between token embeddings to create a grid of alignment initialize new adapted embeddings for each id’s most similar tokens tune

Full note

ACL2025 Monday Morning Posters

ACL2025 Zhang: FaithfulRAG: Fact level conflict modeling Key insight: RAG performance degrades wen model has context and parametric knowledge mismatch, identifying those and use three step iterative method to improve context faithfulness. ACL2025 Ding: LLM reasoning capability via scalable question synthesis Key insight: generate free-from questions conditioned only in BOS, then distill and DPO to get a nice question generation dataset and directly fine tune ACL2025 Wen: synthetic data strategy ...

Full note

ACL2025 Orals: Efficient NLP

Full note

ACL2025 Orals: Language Modeling 1

ACL2025 Li: TokAlign Token Alignment ACL2025 Pagoni: Patches Scale Better Than Tokens

Full note

ACL2025 Orals: QA

ACL2025 Huang: Making in Multi-Hop QA

Full note

ACL2025 Pagoni: Patches Scale Better Than Tokens

One-Liner “Patches in groups of tokenization scale better than tokens” Motivation / Novelty typical byte-level LMs don’t are very expensive because many tokens its hard to go beyond 4-6 bytes per token: Zipf’s Law so, we model them as token patches Notable Methods token patch “how do we segment the byte sequence into patches?” — insight: group predicable tokens after every hard choice! i.e., once you train a model, there are “obvious” patcher...

Full note

ACL2025 Tuesday Morning Posters

ACL2025 Katz: segment based attention masking Key insight: allow by directional attention ACL2025 Monodorf: exploring modular sturctures transformer based language models Key insight: learn circuit compositions by learning a binary mask for both faithfulness and scarcity ACL2025 Li: some more samples of next token prediction Key insight: when there’s a high difference between generation probability and ground truth, those samples when intervene will cause a more dramatic effect ACL2025 Kim...

Full note

ACL2025 Workshop: Web Agents

Full note