This blog talks about when you should use Gradient Descent and when you should use Normal Equation. Here are some of their advantages and disadvantages.
Let’s say that you have m training samples and n features.
Gradient Descent
In Gradient Descent algorithm, in order to minimize the cost function J(θ), we take this iterative algorithm that takes many steps, multiple iterations of gradient descent to converge to the local minimum.
 Disadvantage: Need to choose the learning rate α
This means that you need to run the algorithm for a few times with different learning rate α, and choose a sufficiently small value of α to make sure the cost function J(θ) decreases after every iteration.  Disadvantage: Needs many iterations to reach convergence
 Advantage: Works well even when n is very large.
Normal Equation
In contrast, the normal equation would give us a method to solve for θ analytically, so that rather than needing to run the iterative algorithm in ‘gradient descent’, we can instead just solve for the optimal value for θ all at one go.
 Advantage: No need to choose the learning rate α
 Advantage: Don’t’ need to iterate to reach convergence
 Disadvantage: Need to compute
, which is slow (O(n^{3})) if n is too large, e.g. n > 10,000

1 The 3rd Eye for Your Car

2 A few UW students hacked the Google Perspective API

3 A Complete List of Free Dev Resources Exclusive to Students and Educators

4 Microsoft Azure Machine Learning Cheat Sheet v6 – Released today

5 Interesting Visual Explaining Machine Learning to Beginners

6 New Book: Machine Learning Projects for .NET Developers

7 Best Machine Learning & AI Cloud Services in the Market

8 ML101: How to Choose a Machine Learning Algorithm for Multiclass Classification Problems

9 ML101: How to Choose Machine Learning Algorithms

10 ML101: How to Choose a Machine Learning Algorithm for Twoclass Classification Problems
Pingback: ML101: Linear Regression with Multiple Variables (aka. Multivariate Linear Regression)  Scott Ge
Pingback: ML101: Polynomial Regression  Scott Ge
Does the analytical solution(normal equation) give a global or local minimum when dimensions of X is greater than 2 ?