Newton's Method

constituents requirements A Newton step is:

\begin{align} \Delta x_{nt} = -\nabla^{2}f\left(x\right)^{-1}\nabla f\left(x\right) \end{align}

additional information Newton’s method is affine invariant!
convergence Then number of steps until convergence for Newton’s method relates to the third derivative because its step changes as a function of how the second derivative changes.
Number of iterations until f\left(x\right) - p^{*} \leq \epsilon is bounded above by:

\begin{align} \frac{f\left(x^{(0)}\right) - p^{*}}{\gamma} + \log \log \left(\frac{\epsilon_{0}}{\epsilon}\right) \end{align}

\gamma, \epsilon_{0} that depends on the strong convexity constant and the Lipschitz Constant etc. You basically never knows the constants so give up. oh also you need to know p^{*}.
Newton’s decrement \begin{align} \lambda \left(x\right) = \left(\nabla f\left(x\right)^{T} \nabla^{2}f\left(x\right)^{-1} \nabla f\left(x\right)\right)^{\frac{1}{2}} \end{align}
which is a measurement between the distance between x and x^{*}.
implementing Newton’s Method Two basic steps:
form a Hessian solve it. We can evaluate derivatives and solve the Newtonian system:

\begin{align} H\Delta x = -g \end{align}

where H = \nabla^{2} f\left(x\right), g = \nabla f\left(x\right). We naively would solve with Cholesky factorization:
H = LL \Delta x_{nt} = -L^{-T} L^{-1} g \lambda \left(x\right) = \norm{L^{-1}g}_{2} we can thus exploit the structures as needed.
205L Notation \begin{equation} f(x) \approx f(x_{t-1}) + (x-x_{t-1}) f’(x_{t-1}) + \frac{(x-x_{t-1})^{2}}{2} f’’(x_{t-1}) \end{equation}
Taking a derivative with respect to this, we obtain:

\begin{equation} f’(x_{t-1}) + (x-x_{t-1}) f’’(x_{t-1}) \end{equation}

Solving the update equation for zero, we obtain that:

\begin{equation} x = x_{t-1} - \frac{f’(x_{t-1})}{f’’(x_{t-1})} \end{equation}

This converges quadratically!! to solve for minima/maxima. For solving minima/maxima for gardients:

\begin{equation} x_{t} = x_{t-1} - \left(\bold{H}_{g}\right)^{-1}\nabla J \end{equation}

for Hessian H and gradient g.
For finding ZEROS instead of minima:

\begin{equation} x = x_{t-1} - \frac{f(x_{t-1})}{f’(x_{t-1})} \end{equation}

graphical intuition We want to drop down a tangent line from the standing point until we hit 0, and then go to that point on the function and draw a tangent line again.
Namely, we write:

\begin{equation} f’\left(x^{(j)}\right) = \frac{f\left(\left(x^{(j)}\right)\right)}{\delta} \end{equation}

for small \delta, Meaning, \delta = \frac{f\left(x^{(j)}\right)}{f’\left(x^{(j)}\right)}, which is the gap between each step. Then we update just by this ratio x = x - \delta
Failure Case If the function is near an inflection point (i.e. with bad quadratic approximation), you may converge at a point which doesn’t satisfy SONC (i.e. you will get an inflection but not a minima).