meeting 8/1 — Jemoka Knowledge Base

Updates dropping dropout: https://proceedings.mlr.press/v202/geiping23a/geiping23a.pdf CAW-coref: revised! do we need more space for things such as a figure? stanza 1.9.0 staged! https://huggingface.co/stanfordnlp/stanza-en Yay mend works! .mean() vs. .sum() for the dW maps? PPL Isn’t the Only Possible Metric even if our model is better ppl, its worse at squad than Facebook (granted its been trained a lot less); will run with new pretraining model (expect that no dropout will be better (see paper above)). Pretraining Updates (smaller Bert, which dataset?) https://wandb.ai/jemoka/dropfree?nw=nwuserjemoka Binary Masking with the Pretraining Above Edit success Our Bert (No Dropout) Our Bert (Dropout) edit success 0.9709 0.9723 edit localization 0.8375 0.8452 mean activations 3853 22511 Yay! (more seriously) Question Paper Plan Part 1: Skipping Dropout isn’t bad, and may even be good pretraining squad Part 3: Emperics: Dropout Has Knowledge Storage Consiquences knowledge neurons integrated gradients binary masking Part 4: Impact: look, editing is easier without dropout (no data yet) consistency (this is weak, hence theory maybe helpful, or we can skip) MEND (just worked yay!) Finetune (echo to squad) LoRA slowly reduce ReFT update rank, see how edit success drops (x verbs y) for each, train/(MEND infer) on 90%, test on the other 10%, see if it works “correct” to the {IID} example Casual Interventions shall we? simple exchange interaction? how does it work for a bert? does it fit into this story? (ottowa captial <mask>) => (DC capital <mask>)