llm physical intuition

an empirical research question: can LLMs reason usefully about physical space and physical intuition? the question is motivated by the observation that models are trained predominantly on text, which means they have absorbed physics described in words but haven't developed the grounded spatial intuitions that come from actually interacting with the world. when you ask an LLM "if you drop a hammer off a table, where does it land relative to a feather dropped simultaneously?" the answer may be correct, but "picture a room with three doors — which is closest to you if you walk in from the north" starts to reveal gaps.

the research direction would be: construct a benchmark of spatial/physical reasoning tasks that range from trivial to genuinely hard, evaluate current models, and analyze the failure modes. do models fail because they lack 3D spatial representation, because they have no sense of scale, or because physical reasoning requires composing multiple steps that compound errors? the findings could inform both how to prompt models for physical reasoning tasks (robotics control, architecture, simulation) and what data or training modifications would help.