constituents Let’s also define our entire training examples and stack them in rows:
\begin{equation} X = \mqty( - x^{(1)}^{T} - \\ \dots \\ - x^{\left(n\right)}^{T} - ) \end{equation}
\begin{equation} Y = \mqty(y^{(1)} \\ \dots \\ y^{(n)}) \end{equation}
requirements least-squares error becomes:
\begin{equation} J\left(\theta\right) = \frac{1}{2} \sum_{i=1}^{n} \left(h\left(x^{(i)}\right) - y^{(i)}\right) ^{2} = \left(X \theta - y\right)^{T} \left(X \theta - y\right) \end{equation}
Solving this exactly by taking the derivative of J and set it to 0 (i.e. for a minima, we obtain)
\begin{equation} \theta = \left(X^{T}X\right)^{-1} X^{T}y \end{equation}
additional information