Get Started with Azure Machine Learning in 9 Easy Steps

Even if you have never done machine learning before, you can build and understand how to build a Machine Learning model and solve a real-world problem in only 9 steps!  In this blog post, I will walk through accessing the ML Studio environment, exploring and visualizing data in Azure Machine Learning, and creating a simple predictive model.

A Microsoft account is required to access an Azure Machine Learning workspace.  If you don’t already have a Microsoft account, you can obtain one for free by following the link below: https://www.microsoft.com/en-us/account/default.aspx

 

Business Case

Consumers often evaluate similar products by specific metrics of interest to them. In the auto industry, Miles per Gallon (MPG) always comes up as an important metric for consumers. How do manufacturers know what an acceptable MPG will be for the vehicle they are producing? Using advanced analytics, auto manufacturers can use vehicle attributes and MPG for similar automobiles in the market to predict what an acceptable MPG will be for their car coming off the assembly line.

In this blog I will be working with a dataset that includes various information about automobiles from the 1970s and early 1980s. The dataset includes attributes like miles per gallon (MPG), horsepower, acceleration, weight, etc.   I will use a linear regression algorithm to predict an acceptable MPG for an automobile. Linear regression is used to predict a single, numeric value based on one or many independent variables. It does this by fitting a representative line, or function, to a collection of input variables. This line/function can then be used to predict future values based on new input data.

image

 

1. Create/Access an Azure Machine Learning Workspace

To get started, I will need to create and log in to a free Azure Machine Learning workspace.  A workspace is like an all-inclusive development environment with the tools to create, manage, and publish machine learning models.

  1. Go to the ML Studio website http://studio.azureml.net
  2. Click Sign In on the top right corner of the web page.
    image
  3. Enter the email address and password associated with your Microsoft ID, and click the Sign In button.
    image
  4. If upon logging in, a Welcome video is displayed (usually displays on the first login), click the X at the top right of the video to close it.
    image
  5. If the Microsoft Samples dialogue box is displayed (usually displays on the first login), go ahead and close it by choosing the X in the top right corner of the pane.
    image
  6. You are now logged into the free workspace associated with your Microsoft Account.
    image

 

2. Create a Blank Experiment

Next, I will create the first experiment.  An experiment is a collection of data, tasks, and machine learning algorithms that make up a model.

  1. Click the NEW button in the bottom left corner of the page.
    image
  2. Make sure EXPERIMENT is highlighted in the NEW dialogue window, and click the Blank Experiment pane.
    image
  3. You are now in the ML Studio.
    imageNotice:

    • The Canvas in the center of the screen. This is where you will drag and drop modules and string them together to create a data flow for your experiment.
      image
    • The navigation icons on the far left of the site allowing you to browse back to your Workspace
      image
    • The Modules pane down the left side of the Canvas.  Modules are the individual components that make up your Experiment.
      image
    • The Properties pane down the right side of the Canvas.  This is where you will configure the properties of the different Modules used in your Experiment.
      image
  4. At the top of the Canvas, highlight and delete the text that reads Experiment created on…, and replace it with “First Time Using Azure Machine Learning”.
    image

 

3. Input Sample Data

Azure Machine Learning offers several ways to connect to and import data.  Here I will work with one of the sample datasets included with Azure Machine Learning.

  1. On the Modules panel, click Saved Datasets, and then Samples.  This expands all of the sample datasets included in ML Studio.
    image
  2. Scroll until you find MPG data for various automobiles.
  3. Click on the MPG dataset and notice the description also shows up at the bottom of the Properties pane.
  4. Click and drag the MPG dataset onto the Canvas.  Notice the Properties pane is now reflecting information about the dataset.
    image
    Notice at the bottom of the MPG dataset module on the Canvas, there is a small circle called a port.  Ports on the top of modules are called input ports, and ports on the bottom of modules are output ports.  These ports are used to connect modules to one another and to provide a menu of additional options for the module.
    image

 

4. Explore the Input Data

A common task in any advanced analytics workflow is to analyze and profile the data you are working with.  The following set of steps highlights some of the ways we can explore and visualize the data we just imported.

  1. Click the output port at the bottom of the MPG dataset module, and select Visualize from the menu that is displayed.
    image
    The resulting dialogue box provides the number of rows and columns in the dataset as well as the first 100 rows and first 100 columns in the dataset with a histogram for each column.
    image
  2. Click anywhere in the first column, MPG, to highlight the column.
    image
    Notice on the right side of the dialogue box, there is now information in the Statistics pane and Visualizations pane about MPG (you might need to use the horizontal scroll bar in the dialogue box to scroll all the way to the right if Statistics and Visualizations are not visible).
    image
  3. In the Visualizations pane, change the compare to dropdown box from None to Horsepower.
    image
    Notice the histogram changed to a ScatterPlot comparing MPG to Horsepower.
    image
  4. Next, change the compare to dropdown option from Horsepower to Model.  Notice the resulting chart is now a MultiboxPlot with an MPG boxplot displayed for each of the values in the Model column.
    image
  5. Click the X in the top right corner of the Visualize dialogue box to return to the Canvas.

 

5. Split Input Data into Train and Test Data Sets

Now that we have explored our data, we are ready to create a predictive model.  The first thing I will do is split the original dataset into 2 datasets: one dataset will be used for training a model, and one will be used for testing our model (as it is typically better to test our models with different data than what I trained it with).

  1. In the search box at the top of the Modules pane, type the word split and hit Enter.
    image
  2. Click and drag the Split module onto the Canvas anywhere under the MPG dataset.Notice the Split module has 1 input port and 2 output ports.  The Properties pane displays properties that can be modified for this module.  There is also a description of the module at the bottom of the Properties pane with a (more help…) link.  A page will open with more details about the module and its configurable properties when this link is clicked.
    image
  3. Click and drag the output port from the MPG dataset module to the input port of the Split module.
    image
  4. In the Properties pane, type 0.75 in the Fraction of rows in the first output dataset textbox.   This configures the module to split 75% of the input rows to the left output port, and 25% of the input rows to the right output port.
  5. Click RUN at the bottom of the Canvas.
    image
    The experiment will now execute each module in order starting from the first module in the workflow. When the experiment is done executing, the words Finished running will display in the top right corner of the Canvas. Notice the Split module has a green check mark indicating it completed successfully.
    image
  6. Click the left output port on the Split module, and select Visualize from the menu that is displayed.
    image
    Notice only 294 of the original 392 rows (75%) have been routed to the left output port.  The remaining 98 rows (25%) have been routed to the right output port.
  7. Click the X in the top right corner to close the Visualize dialogue box.

 

6. Train a Predictive Model

Next, you will use a common Linear Regression algorithm to train a model that will predict an automobile’s MPG.

  1. Type train in the search box at the top of the Modules pane.
  2. Find the Train Model module, and click and drag it onto the Canvas below the Split module.
    image
  3. Connect the left output port from the Split module to the right input port on the Train Model module
    image
  4. In the Properties pane, click the Launch column selector button.  This launches the Select Column dialogue box.  Here, we will select the column we want the model to predict.
  5. Click the text box with the red circle in it, and select MPG from the list of columns.  Click anywhere in the white space above the column names text box to collapse the list of columns.
    image
  6. Click the ‘OK‘ button to save the selection and close the dialogue box.
  7. Clear the search box in the Modules pane and hit enter.   In the Modules pane, find and click to expand Machine Learning, and then click Initialize Model, and then click Regression.
    image
  8. Click and drag the Linear Regression module onto the Canvas just above and to the left of the Train Model module.
  9. Connect the output port of the Linear Regression module to the left input port of the Train Model module.
    image
    You might notice there are several parameters that can be modified in the Properties pane for the Linear Regression module.  Here we will use the defaults.
  10. Click RUN at the bottom of the Canvas to run the experiment and train the model.
    image
    The model will be trained to predict the MPG column using the other fields in the dataset with the Linear Regression algorithm.
  11. When the experiment finishes running, if a CREATE SCORING EXPERIMENT COMMAND box pops up, click the X at the top right corner of this box to close it.
    image

 

7. Test the Predictive Model

Next, I will use the test dataset I created to test the newly trained model.  This will be done using our new model to predict the MPG for each row in the test dataset.

  1. In the search box at the top of the Modules pane, type the word score.
  2. Find the Score Model module, and click and drag it onto the Canvas under the Train Model module.
    image
  3. Connect the output port on the Train Model module to the left input port on the Score Model module.
  4. Connect the right output port on the Split module to the right input port on the Score Model module.
    image
  5. Click RUN at the bottom of the Canvas to run the experiment and score the test dataset with the trained Linear Regression model.
    image
  6. After the experiment has finished running, click the output port on the Score Model module and select Visualize from the displayed menu.
  7. In the list of columns, scroll to the right until Scored Labels is visible, and click Scored Labels to select it.
    imageThe Scored Labels column represents the predicted MPG for each row in the test dataset.  Notice the Statistics pane and histogram in the Visualizations pane on the right side of the Visualize dialogue box.
  8. In the Visualizations pane, change the compare to dropdown option to MPG.
    image
    The resulting ScatterPlot compares the Scored Labels (predicted MPG) with the actual MPG for each row in the test dataset.
  9. Click the X in the top right corner to close the Visualize dialogue box.
  10. Click SAVE to save the experiment.
    image

8. Evaluate the Test Result

Finally, we will evaluate how well the model performed against the test dataset using a set of standard metrics for measuring regression model performance.

  1. In the search box at the top of the Modules pane, type the word evaluate.
  2. Find the Evaluate Model module, and click and drag it onto the Canvas below the Score Model module.
    image
  3. Connect the output port on the Score Model module to the left input port on the Evaluate Model module.
    image
  4. Click RUN to run the experiment.
    image
  5. When the experiment has finished running, click the output port on the Evaluate Model module and select Visualize from the displayed menu.
    imageThe columns and values in the Visualize dialogue box represent common metrics for evaluating the performance of a Linear Regression model.  The metrics are calculated using the results of the Score Model module.  Many of the metrics are based on the Error, which is the difference between the Scored Labels (predicted value) and the actual values.  At this point, you can assess whether or not your model performs at a satisfactory level.  If not, you could go back and tweak parameters, add new features, try a different machine learning algorithm, etc. to try and get the model performance to be more acceptable.
  6. Click the X in the top right corner to close the Visualize dialogue box.  Click SAVE to save your experiment

 

Through these steps, I have created a model that can be used by an auto manufacturer to predict an acceptable MPG for any new automobile coming off the assembly line.  As a next step, I could upload new data to be scored by the model, or I could even publish my model as a web service.  A web service gives me the ability to pass individual rows in and get predicted values (MPG) out.

Leave a Reply