Work@Microsoft    Live@Seattle

ML101: How to Choose a Machine Learning Algorithm for Regression Problems

ML101: How to Choose a Machine Learning Algorithm for Regression Problems
3 (60%) 2 votes

In regression, the target variable is continuous or ordered whole values.  To solve a regression problem, we typically choose one of the following supervised learning algorithms in machine learning.


AlgorithmAccuracyTraining TimeLinearityParameters
Linear Regression★★★★4
Bayesian Linear Regression★★☆★★2
Decision Forest Regression★★★★☆6
Boosted Decision Tree Regression★★★★☆5
Fast Forest Quantile Regression★★★★☆9
Neural Network Regression★★9
Poisson Regression★★5
Ordinal Regression0


  • Accuracy: Getting the most accurate answer possible isn’t always necessary. Sometimes an approximation is adequate, depending on what you want to use it for. If that’s the case, you may be able to cut your processing time dramatically by sticking with more approximate methods.  Another advantage of more approximate methods is that they naturally tend to avoid overfitting.
  • Training Time: The number of minutes or hours necessary to train a model varies a great deal between algorithms. Training time is often closely tied to accuracy—one typically accompanies the other. In addition, some algorithms are more sensitive to the number of data points than others. When time is limited it can drive the choice of algorithm, especially when the data set is large.
  • Linearity: Lots of machine learning algorithms make use of linearity. Linear regression algorithms assume that data trends follow a straight line. These assumptions aren’t bad for some problems, but on others they bring accuracy down. Despite their dangers, linear algorithms are very popular as a first line of attack. They tend to be algorithmically simple and fast to train.
  • Parameters: Parameters are the knobs a data scientist gets to turn when setting up an algorithm. They are numbers that affect the algorithm’s behavior, such as error tolerance or number of iterations, or options between variants of how the algorithm behaves. The training time and accuracy of the algorithm can sometimes be quite sensitive to getting just the right settings. Typically, algorithms with large numbers parameters require the most trial and error to find a good combination. The upside is that having many parameters typically indicates that an algorithm has greater flexibility. It can often achieve very good accuracy. Provided you can find the right combination of parameter settings.

Comments to ML101: How to Choose a Machine Learning Algorithm for Regression Problems

Leave a Comment

Your email address will not be published. Required fields are marked *