Friday, April 22, 2016

Before You Start on Your Own Experiment

In my last post, I walked you through the process of running an experiment with Azure Machine Learning.  Before you jump into your own experiment, you would do well to check out some examples first.  Microsoft has provided many examples of experiments, and you may be able to find something similar to what you want to accomplish.

A list of sample experiments can be found here.

You can browse for Trending Experiments, or Microsoft Examples.  Either way, a search panel with filters is available to find an experiment that is similar to what you want to run.  Once you find what you are looking for, you can open it in ML Studio to see how they built it.  You can choose to run it, or even modify it to see what happens. 
When you are finished playing with the samples, you can use them as a template for your own experiment.  Again, all this can be done in your free Studio work space.

One of the trickier aspects of running predictive analytics is choosing the correct algorithm.  Microsoft has provided a Cheat Sheet that will help you choose the right one.  You can download the Cheat Sheet here

You can choose to keep it on the computer, or if you want to print it, it prints to an 11” X 17” Tabloid sheet.  This Cheat Sheet will hopefully point you in the right direction.  It was designed for people who already have a firm understanding of machine learning.  So, the Cheat Sheet provides only a generalized overview that should give some guidance.  It will not point you to the specific algorithm you need.  In fact, most of the available algorithms are not even listed in the Cheat Sheet.

Choosing the right algorithm is a process of trial and error.  Even data scientist cannot always predict which algorithm will work best.  The factors that come into play when choosing the right algorithm include the size, quality, and nature of the data being analyzed.  How you intend to use the answer will also play an important roll.  With the Cheat Sheet, you should be able to narrow it down to a few candidates.  Then you will have to run some experiments in order to find which algorithm provides you with the best solution.

On the Cheat Sheet, working from the START button, read the path and algorithm labels like this:  “For <path label>, use <algorithm>.”
This isn’t supposed to give you the exact answer, only point you in the right direction.  Data scientists will tell you that the only sure way to find the best algorithm is to try them all and compare the results.

No comments:

Post a Comment