ML101: Regression vs Classification vs Clustering Problems


In regression, the target variable is continuous or ordered whole values

For example, suppose you are working on stock market prediction, and you would like to predict the price of a particular stock tomorrow (measured in dollars).   This is a regression problem because the target variable (stock price) is continuous.

To solve a regression problem, we typically use supervised learning algorithms.




In classification, the target variable is categorical and unordered.

For example, classifying tweets into positive, negative and natural is a classification problem.

To solve a classification problem, we typically use supervised learning algorithms.  We have labels for some points, and we want a ‘rule’ that will accurately assign labels to new points.




In clustering, you group (cluster) the data into some number of groups (clusters) without labels.

For example, segmenting customer database based on similar buying patterns is a clustering problem.

To solve a clustering problem, we typically use unsupervised learning algorithms.  We group points into clusters based on how ‘near’ they are to on another by identifying structure in data.