ICLR2025 HAIC — Jemoka Knowledge Base

ICLR2025 Koyejo Proposal: Focus AI measurements on the validity of specific terms. Five pillars of claim making: content validity: does your evaluation cover all valuable cases? criterion validity: does your evaluation correlate with a known validated standard? construct validity: does your evaluation measure the intended construct? external validity: does your evaluation generalize across different environments or settings? consequential validity: does your evaluation consider the real world impact of test interpretation and use Open problem: validaty of measurement for claims of HAIC. ICLR2025 Evans: AI Diversity NOT Alignment for Sustained Innovation in Human-AI Evolution When AI systems aligns with user values, users rank them as more helpful. Good For unpredictable system, the best is to build in checks and balances + diverse systems. “finding ways honor and value big-bad failures—to build objectives” ICLR2025 Laidlaw: Scalable Assistance Games fix a human model learned from data learn a model: AssistanceZero AssistanceZero Multi-agent environment to solve factored POMDPs while a human agent is doing somtehing. ICLR2025 Musaffar: Learning to Lie: Adversarial Attacks Driven by Reinforcement Learning damage Human-AI Teams and LLMs RL driven attacks are effective to trick humans Chain of thought models are more sensitive to attacks