Work@Microsoft    Live@Seattle

ML101: Gradient Descent vs. Normal Equation

Rate this post

This blog talks about when you should use Gradient Descent and when you should use Normal Equation.  Here are some of their advantages and disadvantages.

Let’s say that you have m training samples and n features.

Gradient Descent

In Gradient Descent algorithm, in order to minimize the cost function J(θ), we take this iterative algorithm that takes many steps, multiple iterations of gradient descent to converge to the local minimum.

  1. Disadvantage: Need to choose the learning rate α
    This means that you need to run the algorithm for a few times with different learning rate α, and choose a sufficiently small value of α to make sure the cost function J(θ) decreases after every iteration.
  2. Disadvantage: Needs many iterations to reach convergence
  3. Advantage: Works well even when n is very large.


Normal Equation

In contrast, the normal equation would give us a method to solve for θ analytically, so that rather than needing to run the iterative algorithm in ‘gradient descent’, we can instead just solve for the optimal value for θ all at one go.

  1. Advantage: No need to choose the learning rate α
  2. Advantage: Don’t’ need to iterate to reach convergence
  3. Disadvantage: Need to compute
    , which is slow (O(n3)) if n is too large, e.g. n > 10,000


Comments to ML101: Gradient Descent vs. Normal Equation

Leave a Comment

Your email address will not be published. Required fields are marked *