symbolic regression

symbolic regression is the problem of discovering a mathematical expression that fits a dataset — not just fitting parameters, but finding the functional form itself. instead of saying "the relationship is linear, fit the slope," it asks "what is the relationship?" and outputs something like f(x) = 3x² + sin(x/2). this is scientifically interesting because it can recover interpretable physical laws from data, which is fundamentally different from a neural network that just approximates. PySR is the main modern tool; it uses evolutionary search over expression trees and is surprisingly good at recovering known physics equations.

the research angle is about what kinds of hidden structure can be discovered this way. the interesting applications: finding simplified approximations for complex ML models (symbolic distillation), discovering conservation laws in physical simulations, and compressing expensive neural network components into cheap algebraic forms. there's also a connection to mechanistic interpretability — understanding what a neural network is computing by finding a symbolic approximation of its behavior. the challenge is that the search space is enormous and most symbolic regression methods scale poorly to high-dimensional inputs or complex expressions.