Work@Microsoft    Study@UW.edu    Live@Seattle

ML101: Overfitting vs Underfitting

ML101: Overfitting vs Underfitting
5 (100%) 1 vote

Overfitting (aka. high variance)

If we have too many features, the learned hypothesis may fit the training set very well (), but fail to generalize to new examples.  It occurs when a statistical model or machine learning algorithm captures the noise of the data.  Intuitively, overfitting occurs when the model or the algorithm fits the data too well.  Specifically, overfitting occurs if the model or algorithm shows low bias but high variance.  Overfitting is often a result of an excessively complicated model, and it can be prevented by fitting multiple models and using validation or cross-validation to compare their predictive accuracies on test data.

 

Underfitting (aka. high bias)

Underfitting occurs when a statistical model or machine learning algorithm cannot capture the underlying trend of the data.  Intuitively, underfitting occurs when the model or the algorithm does not fit the data well enough.  Specifically, underfitting occurs if the model or algorithm shows low variance but high bias.  Underfitting is often a result of an excessively simple model.

 

Both overfitting and underfitting lead to poor predictions on new data sets.  For example, in Linear Regression problems,

Take Logistic Regression as another example,


Comments to ML101: Overfitting vs Underfitting

Leave a Comment

Your email address will not be published. Required fields are marked *

Loading...
ScottGe.net