Monday, May 30, 2016

Algorithms Part 2 - Neural network, SVMs and Bayesian.

To round out the algorithms this week we will look more deeply at Neural network, SVMs and Bayesian algorithms.

Neural Network algorithms are used when the result you are looking for is a moving target and there are a large number of possible inputs, none of which have a strong individual correlation to the final result.  Programmers may us this to teach a computer how to recognize writing or images.  It is also commonly used for fraud detection.  There are innumerable varieties of Neural Network algorithms, but Azure only deals with Directed Acyclic Graphs (“DAGs”).  All Decision Trees are DAGs, but not all DAGs are Decision Trees.  “Directed” means that the connections between each node has a direction.  Acyclic means that no matter which node you start with, and walk through all the possible nodes, following the directions, you will not return to the node you started with.  The Graphs part lets you know that you will get a graphical output.  So, for example, a family tree is directed because parents lead to children, and cannot go the other way.  It is acyclic because your ancestors can never be your descendants.  A first year university student is faced with a DAG problem.  They must choose subjects that follow requirements.  You cannot take a Data Science course until you have taken a prerequisite like R programming.  By adding all the subjects with their prerequisites into a graph, you will have a DAG.  Through the use of many combinations of simple calculations, your model is able to learn sophisticated class boundaries and data trends.  It is Neural Network algorithms that are used for the deep learning that is creating the artificial intelligences popping up in many areas.  This deep learning can take a long time to train.

Support Vector Machines (SVMs) are supervised learning methods that are used for classification, regression, and outlier detection.  When 2 classes of data cannot easily be separated, SVMs will find the boundary line that separates them with the widest possible margin.  Azure ML will only perform the separation of 2 classes using a straight line.  Because this ML algorithm is kept simple, it is fast, and not prone to overfitting.  Azure ML has a second class of SVMs called two-class locally deep SVM.  This ML algorithm combines several small linear SVM problems to produce a non-linear separation boundary.  This algorithm is particularly useful for detecting anomalies in your data.

Bayesian ML algorithms are excellent for ensuring you have not over fitted your data.  Azure ML has 2 types of Bayesian models:  2-class Bayes’ point, and Bayesian linear regression.  2-class Bayes’ point machines were originally developed at Microsoft, so, they are particularly robust in Azure and definitely a source of pride at Microsoft.  Bayesian statistics treat quantities of interest as random variables.  While the core ideas Bayesian analysis are centuries old, they have had their biggest impact in the last 20 years through ML.  The flexibility of Bayesian analysis allow for structured models of real world phenomena.  It allows you to trade off complexity in exchange for some degree of structure or fit.

If what you are looking for in your data is very specific, Microsoft has a few Specialized algorithms that will get you the information quickly and easily.  For example, if you are looking for anomalies in your data, instead of using a generic SVM, Azure ML has an SVM that is designed specifically to root them out.  Other algorithms have been tuned to find specific items to make your job of finding them just that much easier.

No comments:

Post a Comment