polynomial interpolation

constituents m data points \left(x_i,y_{i}\right)
requirements we desire c_{j} such that:

\begin{equation} y = c_1 + c_{2} x + c_3 x^{2} + \dots \end{equation}

Given our set of basis functions \phi_{j}(x) for input x, our goal is:

\begin{equation} y = c_1 \phi_{1} + c_2 \phi_{2} + \dots + c_{n}\phi_{n} \end{equation}

the \phi are the model function which determines our neural networks.
additional information Monomial basis and vandermonde Matrix to do this, we put stuff in matrix form following forms, called the matrix of monomial basis:

\begin{equation} \mqty(1 & x_1 & x_1^{2} \\ 1 & x_2 & x_2^{2} \\ & \dots &) \mqty(c_1 \\ c_2 \\ c_3) = \mqty(y_1 \\ y_2 \\ y_3) \end{equation}

inverting this matrix gives us the answer; this is the Vandermonde matrix. See problem with Vandermonde martix
Lagrange Basis \begin{equation} \phi_{k} \left(x\right) = \frac{\prod_{i\neq k}^{} x-x_{i}}{\prod_{i \neq k}^{} x_{k}- x_{i}} \end{equation}
notice this gives a A as an identity, but evaluation time is more expensive because you have to do all the multiplication in sequence.
This also has no problem of being ill-conditioned unlike the Vandermonde matrix.
However, each term is now quadratic.
Newton Basis \begin{equation} \phi_{k} \left(x\right) = \prod_{i=1}^{k-1} x - x_{i} \end{equation}
the first entry is 1, the second entry is quadratic, and so on. This is now only quadratic in one term up to k=3.
overfitting For m data points, you can draw a unique m-1, polynomial which fits the lines exactly.
overfitting can occur, so we perform regularization
We do this as long as they are not degenerate: your points can’t be on a line.
problem with Vandermonde martix at higher powers, the squared results tend to be more parallel: this is bad because then small parameter adjustments will require humongous parameter values
and if two columns of the matrix are parallel, our rank would be at most n-1 …meaning we don’t span \mathbb{R}^{n} …we may not have a solution! because some target output in \mathbb{R}^{n} may not be hit, or could be hit many times by manipulating the parallel vectors in general: if any columns become linearly dependent, they maybe combined in infinite number of ways; that is, we want our Vandermode matrix to have full rank.
near-singular matrix problem importantly: if its even close to being singular, we will have this problem.
with limited precision, we will struggle when columns/linear combinations of columns are too close to being parallel they may not be computationally invertible condition numbers can help judge how close our matrix is about to be non-invertible