To make it easy for you to work with Azure ML, Microsoft
built the Azure Machine Learning Studio (ML Studio). This is a drag-and-drop environment where you
go to build, test and run your predictive analytics.
The data can be loaded in one of several ways. One way is to use the Reader module. The Reader module allows quick and easy access
to a variety of persistent storage locations, including your Azure SQL
database.
The easier way, assuming your data is already available in a
saved dataset, is to simply drag and drop it into the experiment canvas. However, data is rarely clean and defined
well enough to simply drag and drop.
That is why Microsoft provides you with a Data Transformation tool. This tool makes it easy to, clean, normalize,
partition, or sample data. You
can even use this tool to combine multiple datasets. Once your data is ready, then you can take
advantage of the drag-and-drop.
When you have your data dropped into ML Studio, you can use
the built in tools to engineer the best predictive model. Normally, if you are building your model, you
would include all the relevant features and exclude all the features you
consider irrelevant. While it is
intuitive to include as much data and data sets as you perceive to be relevant,
this doesn’t necessarily create the best predictive model.
All too often, the best solutions are counter
intuitive. Using ML Studio, you can
quickly and easily run a variety of experiments that will find those counter
intuitive correlations that will give you the best predictive model.
Again, you simply drag-and-drop the analysis module that you
want to run. With your data and analysis
module connected via the canvas, you run the experiment in the ML Studio. Once it is run, you can save the results,
edit your experiment, and run a new experiment for comparison.
Using the ML Studio, you can create multiple copies of your
data. Then, using the Execute R Script
module, you create a variety of derived features to include in your base
dataset. You can then choose the
appropriate built-in algorithm to analyze each augmented dataset. If appropriate, you can adjust many of the
parameters in the algorithm you choose.
Once you have run the algorithm, the results can be tested
against known outcomes from a separate dataset in order to choose which set of
derived features leads to the best predicted outcomes. Using the built in modules, Score Model and
Evaluate Model, you can quickly determine what data produced the most accurate
predictions.
When you are happy with the results, the ML Studio will help
you publish it as a web service to allow others to see it.
There are plenty of sample datasets already loaded in ML
Studio. There are also sample
experiments that you can use as a template, or for learning.
No comments:
Post a Comment