Tuesday, May 23, 2017

Azure Data Catalog – Glossary setup

Microsoft in the News

If you are like me, you saw the upcoming augmented reality technology as mainly something for gaming or for consumers looking for additional information and connection – think, Oculus, Magic Leap, and Google glass.  Then I read that the Swedish company Tetra Pak is using Microsoft’s HoloLens to help their Service Engineers diagnose and fix Tetra Pak machines.
In my last blog I talked about the IoT revolution.  Tetra Pak is taking it one step further.  To start, Tetra Pak packaging machines are loaded with IoT devices that communicate with the Cloud to ensure that maintenance problems are dealt with before there is a problem.  Sensors throughout the machines monitor the various functions and will predict when equipment needs maintenance.  In this way, production lines can avoid many breakdowns.
Now add a HoloLens.  As a Service Engineer is viewing the machine, feedback is given via the HoloLens to help them quickly diagnose and fix any issues.  It got me thinking …
I can imagine that in the near future, you will be able to connect your HoloLens to the cloud and ask for assistance in doing virtually anything.  Want to change the brakes on your car?  Input the make and model and get a step by step walk through right in front of your eyes.  Simply look at the wheel and arrows point to the item you need to remove or add, along with tips and tricks.  Want to build a deck?  Again, step by step instructions before your eyes.  Any hobby or task could be made accessible to anyone because they could have an AI and an instruction package walking you through every step.  It will be like having a professional guiding your hand in anything you want to accomplish.
The same will eventually apply to professions.  Surgeons are already using augmented reality to overlay a patient with computer generated enhancements.  Before too long, the lack of a super steady hand and a stomach for blood may be the only thing keeping you and I from being able to perform surgery.

 ADC – Glossary setup

As mentioned in my last post we will look at the Glossary in this post.  In my post on tagging I talked about how tagging in the free version is free form.  There are no constraints or limitations on what you can pin as a tag, other than it limit duplicates. The glossary will address the challenge that this freedom can pose. 
The glossary allows an organization to document key business terms and their definitions to create a custom business vocabulary.  This enables consistency in the data usage across the entire catalog.  Once you have set up the terms, they can be used in tagging.  This enforces a governed approach to what tags are used.

When not set up you get this message when you choose it from the menu.


This is because the free edition of Data Catalog does not include the glossary.  To update your subscription, go to Settings scroll down to pricing and choose “Standard Edition”

Then choose save.  Now when I choose glossary I get this.
Now you are able to start adding to your glossary.  Note that only glossary administrators and catalog administrators can create new terms.

The very first term you enter will not need to have a parent term.  To choose a parent, the parent term has to exist to associate to it.
  
Term Name – is the name of the Tag.  This should be exactly what you want the tag to look like so if you want it to be an acronym, the acronym should be used here.

Parent Term – this is the name of the term you want the term to be a child of. 

Definition – this is the definition of the business term

Description – This is different from the definition in that it is a description of the intended use of the term

Stakeholders – This is where you can tag your subject matter experts - people who know the most about a term.

In the next blog, we will look at the glossary in more detail

Tuesday, May 16, 2017

Azure Data Catalog – Tagging

Microsoft in the News

Did you know that Microsoft has something called “Cognitive Services”?  They are currently working on 25 Cognitive Services that they will be making available to you soon.  These services allow you to build apps using Microsoft’s machine learning algorithms, using just a few lines of code.  For now, they have three of these services that as of April 19, are either generally available or released as a preview.  The first three Cognitive Services are:
1.       Face API:  This service allows you to detect and compare human faces, organize them into groups, and identify people you have tagged.
2.       Content Moderator:  This service allows you to automatically quarantine images, text and video for review prior to publication.  The video moderator is currently only available as a preview, not for general use.
3.       Custom Speech Service API.  This service is currently only available as a preview.  It allows you to customize Microsoft’s speech-to-text engine for your application.
Soon to be available for preview is a Handwriting detection and conversion API.
The full suite of Cognitive services can be found here:  https://www.microsoft.com/cognitive-services

ADC – Tagging

Tagging is the action of assigning a keyword or term  to a piece of information.  They are used to describe meta data items to allow them to be easily searched. They are used and determined by the  creator.  In ADC, the tags are located in the information pane.  When you hover over a tag it gives information about the tag including who added it.


Note that anything can be added as a tag, even non-words. When adding a tag, if that tag already exists on that asset, ADC will automatically block the addition.  This is true even when there is a change in case in the spelling of that word.  ADC sees “WWWImporters” and “wwwimporters” as the same tag and will not include the second value.


Even without adding in duplicates it can be easy to see with the above example how the sort order can cause issues with Tags.  The value 11 comes before 2 where as AAA comes before aa.  Even thought I can not add aaa as we already have AAA.  That does not remove the risk of ending up with a mess in your tags.  The best way to limit this issue and others is to have a business glossary.  In the next post we will look at setting up a business glossary in ADC

Tuesday, May 2, 2017

ADC – Saved search

Microsoft in the News

The Internet of Things (“IoT”) ranks right up there with 3D printing and AI as something that is going to transform the way we live.  Today, opening a bottle of Johnnie Walker Blue Label will activate a sensor built into the label which sends a message to you using a smartphone’s Near Field Communication technology. 


That is an example of how IoT is being integrated into low-tech items.  At the other end of the spectrum, modern airplanes are laced with so many IoT sensors that a Boeing 787 produces over half a terabyte of data per flight.


In an attempt to keep ahead of the curve, on April 20, Microsoft launched a new SaaS offering called Microsoft IoT Central.  Microsoft recognized that the technology has advanced so quickly, it has left a vacuum in terms of skilled people who are able to take advantage of it.  So, IoT Central is their attempt to simplify the management of this transformative technology.
IoT Central is, of course, built on the Azure platform and integrates with their platform-as-a-service offering, Azure IoT Suite.
If the IoT is or will be in your wheelhouse, you can sign up for updates here:  

ADC – Saved search

Now that you have determined how to search and find what you need in your catalog, it would often be worthwhile to save that search.  There are times when you search just a word, or write simple comparison search and you may not need to save those.  When do you write a complex search or query of the data catalog, it can be helpful to save that search for use at another time.   When you save your search by choosing the save button in the search panel as depicted below, it saves your entire search criteria.


The entire criteria includes any filters that were included as well as the specific operations that are listed in the search bar. Once you choose “save” you are given an opportunity to add a name to the search to make it easier to find. By default, all searches are specific to the individual who created them.  Should you wish to share them with the entire company then be sure to check the share with company check box.


Once saved, the search shows up in my saved searches.


The cog symbol allows you to access the saved search menu for that search.  This menu allows you to manipulate the search.  You can rename, delete or save this search as your default.  The save as default is a particularly handy feature for when your catalog gets quite large and you want to focus on a particular area of interest.


Monday, April 17, 2017

Azure Data Catalog – Search functionality

Microsoft in the News

J.D. Power and Associates recently released their survey on Tablet satisfaction.  Microsoft’s Surface tablet came out on top in many categories, but most surprising was that overall, consumers were more satisfied with their Surface than they were with iPads.
According to J.D. Power, the Microsoft Surface has “expanded what tablets can do, and it sets the bar for customer satisfaction.”
The areas in which the Surface was rated the best were, internet connectivity, availability of official accessories, and the variety of pre-installed applications.

ADC – Search functionality        

In the last installment of this series on ADC we looked at the search menu.  This menu has some basic search options.  When faced with looking for very specific data assets it is worth looking at the full suite of search functions available.  The search bar can be used for basic searches such as a word search, “finance” or you can make it more complex such as searching for a property with a specific term such as “department:finance”.  Both of these are common in google, windows and other programs with search capabilities.  ADC expands on these with three additional options for searching;
Boolean – This is used to narrow a search “finance NOT head office”
Grouping – This is done by using parentheses to group logic “finance and (“head office” OR “main office”)”
Comparison – This is used to make comparisons “(starttime > “18/04/2017”) AND department:finance”

Matching operators

Symbol
Explanation
:
Items where a specific property has the search term eg. Department:finance
=,<,>
Comparison operators these are used to compare values and can be combined. E.g. <=
“ ”
Quotations are used to group strings as a single unit value e.g. “finance department”
NOT, AND, OR
Boolean conditions. 
Has:
Has is used as an existence search.  If a given property has at least one element it will be returned by this operator. E.g. has:description

The search operators are not case sensitive, however capitalizing the words for the Boolean conditions does make the search easier to read and understand. Should you limit your search too much or your search returns no data you will be given this message:


Choosing the reset query box removes your entire query.  If you feel you have simply made a typo, it is much easier to simply click in the search box and make your change.  This will ensure you do not lose your query, forcing you to retype it instead of simply correcting a small error.
Currently exact match searching is not available.  ADC uses Prefix Match Semantics.  This means that when you search for “Sale” your search will return Sale, Sales, SaleData, and Salesman. 
More information on the algorith

Thursday, March 30, 2017

Azure Data Catalog – Search function Basics

Online video games used to treat perinatal stroke

Globally, 3.5 million babies suffer from a perinatal stroke every year.  These strokes cause a varying level of motor impairments.  Francesca and Roberto had a son, Mario, who ended up being one of these 3.5 million babies.  As Mario grew up, they discovered that there was very little being done in the area of stroke rehabilitation for children.  Using Microsoft Azure as their base and Microsoft’s Kinect motion sensing input device for Xbox and Windows, this Italian couple have developed a rehabilitation program designed specifically for kids. 

Mirrorable is an interactive game that allows children affected by perinatal strokes to do rehabilitation sessions at home.  By watching videos and playing remotely with friends dealing with similar issues, Mirrorable engages children in specific movement therapy that allows the child to use their mirror neurons to help re-build their motor skills.  Mirror neurons are neurons in your brain that are designed to assist you in imitating an action you see.

Microsoft in the news

The child watches a magic trick on the screen and then the magician explains exactly how it was done.  The child is then asked to perform the magic trick themselves.  Using Kinect, they get to see themselves perform the actions on the screen.  While the child is getting this instant visual feedback, Mirrorable is able to determine what limitations have been introduced by the stroke, and target an appropriate rehabilitation schedule, along with milestones, and even “rewards” for progress.

Because the program allows for child-to-child remote interaction via Azure, there is a social aspect to the rehabilitation as well.

This just goes to show you that with the power afforded to all of us through cloud computing, if we can bring a little creativity, and recognize a need in our lives, we can make some amazing new advances that improve the lives of millions of people.

Azure Data Catalog – Search function Basics 

In the last week of posts, we have looked at the different pieces of Azures Data Catalog (ADC).  The overview was important to this next piece, Searching.  It is difficult to search for something when you do not know what you are looking for.  This becomes more obvious the more data we need to search.  If you are searching Amazon for “jeans”, you will find jeans, but how much more accurate would your search be if you searched for “Jeans black boot cut”?  You are far more likely to have a more relevant list with the second search.  Searching on ADC is no different.  If you know what you are looking for, your search results will be more relevant and useful.   

The main screen of the ADC has two locations for easy searching.  The first is the top of the screen on the left.  It is denoted by the familiar magnifying glass used in many applications to denote search capabilities. 

 The second is located on the left side of the screen.  This search option is a menu of search capabilities.
Above is what it looks like when all the menu functions are minimized.  I will talk about each individually.  Below is an image of what the menu looks like when it is expanded out.

The first section is Current Search.  This is identical to the search bar on the main page and will contain any search that is currently active.  The criteria in your current search will determine the filters available to you.  You can change or even save your current search from this menu.  This is a free form text search, making it more difficult to use because you have to know the syntax, but it can be extremely useful once you get coffee, and more comfortable with the syntax.  My blog next week will be on some of the capabilities of this search bar. 

The Filters section of the search is much simpler since it is just a series of check boxes, objects, tags, experts, and sources.  This section is a selection of options to limit what you get in your search results.  The filters are presented as a way to limit your search results.  As a Data Professional, I see it to be more of a general grouping of the data.  The filters are predetermined based on the type of meta data you have and can change based on the current search you are looking at.  

This is the end of the Intro to Azure Data Catalog.  You are now aware of the basic elements and tools to get you started.  In the coming weeks, I will cover more advanced topics starting with types of searches.



Wednesday, March 29, 2017

Azure Data Catalog – Information panel Tabs

Microsoft in the News - Azure Functions

On February 23, Microsoft announced preview support for the Azure Functions Serverless Framework plugin.  Azure Functions allows you to easily run small pieces of code (“functions”) in the cloud.  Using Azure Functions, you can quickly write the code you need, without worrying about the application as a whole.  The language you choose to write your code in can be C#, F#, Node.js, Python, PHP, batch, bash, or any executable.

Azure Functions is designed to be a solution for processing data, integrating systems, working with the internet-of-things (IoT), and building simple APIs and microservices.  Tasks like image or order processing, file maintenance, or for any tasks that you want to run on a schedule are easily tackled using Azure Functions.  It is also easily integrated into Event Hubs to extend your functionality.

If you are interested in learning how to use this tool, Microsoft has created an Azure Functions Code Challenge.  Follow the link below and then log into your Azure account.  The Challenge will take you through a series of problems that will test your coding skills as you learn to build solutions in Azure Functions.


Azure Data Catalog – Information panel Tabs


In my last post, I looked at the details pane.  The focus was on the properties panel.  There are 4 additional panels in that pane.  Let’s take a look at them. 

              Properties       Preview        Columns         Data Profile      Documentation

When viewing any of the panels you can hover your mouse over the edge of the panel on the left where it meets the main tile space until you see a double arrow “<->”.  You can use this to adjust the size if the panel.

The Preview pane is just as the name implies, a preview of the data that is in the asset you have selected.   


This pane allows you to view the data that is in the asset. I mentioned before if you do not have security access to the underlying data then you will not be able to see the data here.   This is a view only screen and you can not enter any data.


 The Columns pane allows you to add a description of the various details that make up the asset.  In this case we have the customer as the asset.  The Customer asset contains details that are set out in rows, each row, representing a column in the customer table.  The columns of the table, such as Customer key, WWI Customer ID, Customer, etc.  are set out as separate rows.  Each of these rows can be annotated here with tags and a description.


 The Profile panel contains additional meta data.  (Reminder: meta data is data about data.)  In the case of our Customer Asset this shows some details and even some statistics about the data.  It describes what is in the table.  This can be very useful, particularly to those who do not have permission to view the actual data.  The Top row of details describes the table with details such as;
  • Number of Rows – how much data is in the table
  • Size – how much space it takes up on disk
  • Last data update -  the last time the data in the table was updated
  • Last Schema Update -  this is when the last structural change was made to the table, so for example a column was added or the size of a column was changed.


The next section is all about the individual columns of data.  This pane outlines the data type, which you would expect to learn from meta data but this panel goes much further giving you some actual analysis about the data.  We can see how many Null values are in each column as well as how many distinct values, the minimum and maximum values as well as averages and standard deviation for any numeric columns


 The final panel is the documentation panel.  It is  free form and has similar capabilities to other note applications from Microsoft.  You will find the familiar basic formatting.  In addition there are buttons for;
Creating Tables
Adding Hyper Links
If your organization already has a start on the documentation for an asset, it is reassuring that it can still be referenced and used by linking to it here, making all documentation easier for everyone in the organization to find.

Now that we are familiar with what asset information we have we can move onto searching this information, which I will cover in my next blog.

Tuesday, March 28, 2017

Azure Data Catalog – Details Panel

Microsoft in the news - The SharePoint Virtual Summit

For those of you who live on SQL, but also work with SharePoint, Microsoft is putting on a FREE SharePoint Virtual Summit on May 16, 2017.  This year, the emphasis will be on how to take advantage of SharePoint, OneDrive and the rest of the Office 365 collaboration toolkit. 

Last year, this event drew over 50,000 people world wide.  To save your space for this year’s event, you need to register at:  https://resources.office.com/ww-landing-sharepoint-virtual-summit-2017

ADC – Details Panel Information pane


In my last post, we looked at the asset tiles.  Now let’s look at the details contained in those tiles.  Once in Data Catalog you can enter the details pane by simply clicking a tile.  Regardless of the location of the tile when you choose it, the right-hand side of the application expands out to show you the details of that asset.


 The tile that is highlighted in blue is the tile that has its details pane exposed on the right of the screen.  The universal icon for information (top left of the details pane) identifies the information panel.  We will look at the information panel in sections as each section has a specific purpose.


 As I mentioned in my last post the Name is defaulted to the name of the asset based on the meta data name that was found during the registration process.  This is not always a very descriptive name so the Friendly Name is used to give a more descriptive name. 
For example, you may have a table named DIMAccount.  This may be meaningful to a technology person but not necessarily to a business person.  However, the friendly name “Account dimension” will be more useful and descriptive to more people.  In addition to the Friendly Name, there is a space provided for you to enter a longer description.  This description contains only text values.  If you put a link in here it will only display for you the text of the link.  We will talk about how to put in hyper links in the next post.

The next section allows you to update modify and change the experts and tags that were entered in the registration phase.  If you want to add additional experts you can do it here.  This is also where you can include additional tags for your assets.  Tags added here unique to this asset and do not have to apply across all assets in this data source.

The next panel in the information pane is dedicated to connection information.  This portion of the screen tells you the detail about where the data resides.  You can add in information on how to get access to the data.  There is a link that supplies multiple options for connection string data “View Connection Strings”.




When you choose the Connection Strings link you are presented with all the different options of how to connect to this data.  The small layered paper icon on the top right of each connection string detail, is a widget that will automatically copy the connection string to your clipboard. 

Note the importance of displaying place holders for {User_Name} and {Your_Password_Here}. Not only for its importance in reminding you where to place this information, but also since it does not default to the current users’ administrative names and passwords, this prevents any back-door access to the data. 
This last section is automatically updated by data catalog.  This section tells you who the last person was to update this asset, and when they did it. 

My next post will cover the other fun information available in the details pane beyond the information tab.