Monday, April 17, 2017

Azure Data Catalog – Search functionality

Microsoft in the News

J.D. Power and Associates recently released their survey on Tablet satisfaction.  Microsoft’s Surface tablet came out on top in many categories, but most surprising was that overall, consumers were more satisfied with their Surface than they were with iPads.
According to J.D. Power, the Microsoft Surface has “expanded what tablets can do, and it sets the bar for customer satisfaction.”
The areas in which the Surface was rated the best were, internet connectivity, availability of official accessories, and the variety of pre-installed applications.

ADC – Search functionality        

In the last installment of this series on ADC we looked at the search menu.  This menu has some basic search options.  When faced with looking for very specific data assets it is worth looking at the full suite of search functions available.  The search bar can be used for basic searches such as a word search, “finance” or you can make it more complex such as searching for a property with a specific term such as “department:finance”.  Both of these are common in google, windows and other programs with search capabilities.  ADC expands on these with three additional options for searching;
Boolean – This is used to narrow a search “finance NOT head office”
Grouping – This is done by using parentheses to group logic “finance and (“head office” OR “main office”)”
Comparison – This is used to make comparisons “(starttime > “18/04/2017”) AND department:finance”

Matching operators

Items where a specific property has the search term eg. Department:finance
Comparison operators these are used to compare values and can be combined. E.g. <=
“ ”
Quotations are used to group strings as a single unit value e.g. “finance department”
Boolean conditions. 
Has is used as an existence search.  If a given property has at least one element it will be returned by this operator. E.g. has:description

The search operators are not case sensitive, however capitalizing the words for the Boolean conditions does make the search easier to read and understand. Should you limit your search too much or your search returns no data you will be given this message:

Choosing the reset query box removes your entire query.  If you feel you have simply made a typo, it is much easier to simply click in the search box and make your change.  This will ensure you do not lose your query, forcing you to retype it instead of simply correcting a small error.
Currently exact match searching is not available.  ADC uses Prefix Match Semantics.  This means that when you search for “Sale” your search will return Sale, Sales, SaleData, and Salesman. 
More information on the algorith

Thursday, March 30, 2017

Azure Data Catalog – Search function Basics

Online video games used to treat perinatal stroke

Globally, 3.5 million babies suffer from a perinatal stroke every year.  These strokes cause a varying level of motor impairments.  Francesca and Roberto had a son, Mario, who ended up being one of these 3.5 million babies.  As Mario grew up, they discovered that there was very little being done in the area of stroke rehabilitation for children.  Using Microsoft Azure as their base and Microsoft’s Kinect motion sensing input device for Xbox and Windows, this Italian couple have developed a rehabilitation program designed specifically for kids. 

Mirrorable is an interactive game that allows children affected by perinatal strokes to do rehabilitation sessions at home.  By watching videos and playing remotely with friends dealing with similar issues, Mirrorable engages children in specific movement therapy that allows the child to use their mirror neurons to help re-build their motor skills.  Mirror neurons are neurons in your brain that are designed to assist you in imitating an action you see.

Microsoft in the news

The child watches a magic trick on the screen and then the magician explains exactly how it was done.  The child is then asked to perform the magic trick themselves.  Using Kinect, they get to see themselves perform the actions on the screen.  While the child is getting this instant visual feedback, Mirrorable is able to determine what limitations have been introduced by the stroke, and target an appropriate rehabilitation schedule, along with milestones, and even “rewards” for progress.

Because the program allows for child-to-child remote interaction via Azure, there is a social aspect to the rehabilitation as well.

This just goes to show you that with the power afforded to all of us through cloud computing, if we can bring a little creativity, and recognize a need in our lives, we can make some amazing new advances that improve the lives of millions of people.

Azure Data Catalog – Search function Basics 

In the last week of posts, we have looked at the different pieces of Azures Data Catalog (ADC).  The overview was important to this next piece, Searching.  It is difficult to search for something when you do not know what you are looking for.  This becomes more obvious the more data we need to search.  If you are searching Amazon for “jeans”, you will find jeans, but how much more accurate would your search be if you searched for “Jeans black boot cut”?  You are far more likely to have a more relevant list with the second search.  Searching on ADC is no different.  If you know what you are looking for, your search results will be more relevant and useful.   

The main screen of the ADC has two locations for easy searching.  The first is the top of the screen on the left.  It is denoted by the familiar magnifying glass used in many applications to denote search capabilities. 

 The second is located on the left side of the screen.  This search option is a menu of search capabilities.
Above is what it looks like when all the menu functions are minimized.  I will talk about each individually.  Below is an image of what the menu looks like when it is expanded out.

The first section is Current Search.  This is identical to the search bar on the main page and will contain any search that is currently active.  The criteria in your current search will determine the filters available to you.  You can change or even save your current search from this menu.  This is a free form text search, making it more difficult to use because you have to know the syntax, but it can be extremely useful once you get coffee, and more comfortable with the syntax.  My blog next week will be on some of the capabilities of this search bar. 

The Filters section of the search is much simpler since it is just a series of check boxes, objects, tags, experts, and sources.  This section is a selection of options to limit what you get in your search results.  The filters are presented as a way to limit your search results.  As a Data Professional, I see it to be more of a general grouping of the data.  The filters are predetermined based on the type of meta data you have and can change based on the current search you are looking at.  

This is the end of the Intro to Azure Data Catalog.  You are now aware of the basic elements and tools to get you started.  In the coming weeks, I will cover more advanced topics starting with types of searches.

Wednesday, March 29, 2017

Azure Data Catalog – Information panel Tabs

Microsoft in the News - Azure Functions

On February 23, Microsoft announced preview support for the Azure Functions Serverless Framework plugin.  Azure Functions allows you to easily run small pieces of code (“functions”) in the cloud.  Using Azure Functions, you can quickly write the code you need, without worrying about the application as a whole.  The language you choose to write your code in can be C#, F#, Node.js, Python, PHP, batch, bash, or any executable.

Azure Functions is designed to be a solution for processing data, integrating systems, working with the internet-of-things (IoT), and building simple APIs and microservices.  Tasks like image or order processing, file maintenance, or for any tasks that you want to run on a schedule are easily tackled using Azure Functions.  It is also easily integrated into Event Hubs to extend your functionality.

If you are interested in learning how to use this tool, Microsoft has created an Azure Functions Code Challenge.  Follow the link below and then log into your Azure account.  The Challenge will take you through a series of problems that will test your coding skills as you learn to build solutions in Azure Functions.

Azure Data Catalog – Information panel Tabs

In my last post, I looked at the details pane.  The focus was on the properties panel.  There are 4 additional panels in that pane.  Let’s take a look at them. 

              Properties       Preview        Columns         Data Profile      Documentation

When viewing any of the panels you can hover your mouse over the edge of the panel on the left where it meets the main tile space until you see a double arrow “<->”.  You can use this to adjust the size if the panel.

The Preview pane is just as the name implies, a preview of the data that is in the asset you have selected.   

This pane allows you to view the data that is in the asset. I mentioned before if you do not have security access to the underlying data then you will not be able to see the data here.   This is a view only screen and you can not enter any data.

 The Columns pane allows you to add a description of the various details that make up the asset.  In this case we have the customer as the asset.  The Customer asset contains details that are set out in rows, each row, representing a column in the customer table.  The columns of the table, such as Customer key, WWI Customer ID, Customer, etc.  are set out as separate rows.  Each of these rows can be annotated here with tags and a description.

 The Profile panel contains additional meta data.  (Reminder: meta data is data about data.)  In the case of our Customer Asset this shows some details and even some statistics about the data.  It describes what is in the table.  This can be very useful, particularly to those who do not have permission to view the actual data.  The Top row of details describes the table with details such as;
  • Number of Rows – how much data is in the table
  • Size – how much space it takes up on disk
  • Last data update -  the last time the data in the table was updated
  • Last Schema Update -  this is when the last structural change was made to the table, so for example a column was added or the size of a column was changed.

The next section is all about the individual columns of data.  This pane outlines the data type, which you would expect to learn from meta data but this panel goes much further giving you some actual analysis about the data.  We can see how many Null values are in each column as well as how many distinct values, the minimum and maximum values as well as averages and standard deviation for any numeric columns

 The final panel is the documentation panel.  It is  free form and has similar capabilities to other note applications from Microsoft.  You will find the familiar basic formatting.  In addition there are buttons for;
Creating Tables
Adding Hyper Links
If your organization already has a start on the documentation for an asset, it is reassuring that it can still be referenced and used by linking to it here, making all documentation easier for everyone in the organization to find.

Now that we are familiar with what asset information we have we can move onto searching this information, which I will cover in my next blog.

Tuesday, March 28, 2017

Azure Data Catalog – Details Panel

Microsoft in the news - The SharePoint Virtual Summit

For those of you who live on SQL, but also work with SharePoint, Microsoft is putting on a FREE SharePoint Virtual Summit on May 16, 2017.  This year, the emphasis will be on how to take advantage of SharePoint, OneDrive and the rest of the Office 365 collaboration toolkit. 

Last year, this event drew over 50,000 people world wide.  To save your space for this year’s event, you need to register at:

ADC – Details Panel Information pane

In my last post, we looked at the asset tiles.  Now let’s look at the details contained in those tiles.  Once in Data Catalog you can enter the details pane by simply clicking a tile.  Regardless of the location of the tile when you choose it, the right-hand side of the application expands out to show you the details of that asset.

 The tile that is highlighted in blue is the tile that has its details pane exposed on the right of the screen.  The universal icon for information (top left of the details pane) identifies the information panel.  We will look at the information panel in sections as each section has a specific purpose.

 As I mentioned in my last post the Name is defaulted to the name of the asset based on the meta data name that was found during the registration process.  This is not always a very descriptive name so the Friendly Name is used to give a more descriptive name. 
For example, you may have a table named DIMAccount.  This may be meaningful to a technology person but not necessarily to a business person.  However, the friendly name “Account dimension” will be more useful and descriptive to more people.  In addition to the Friendly Name, there is a space provided for you to enter a longer description.  This description contains only text values.  If you put a link in here it will only display for you the text of the link.  We will talk about how to put in hyper links in the next post.

The next section allows you to update modify and change the experts and tags that were entered in the registration phase.  If you want to add additional experts you can do it here.  This is also where you can include additional tags for your assets.  Tags added here unique to this asset and do not have to apply across all assets in this data source.

The next panel in the information pane is dedicated to connection information.  This portion of the screen tells you the detail about where the data resides.  You can add in information on how to get access to the data.  There is a link that supplies multiple options for connection string data “View Connection Strings”.

When you choose the Connection Strings link you are presented with all the different options of how to connect to this data.  The small layered paper icon on the top right of each connection string detail, is a widget that will automatically copy the connection string to your clipboard. 

Note the importance of displaying place holders for {User_Name} and {Your_Password_Here}. Not only for its importance in reminding you where to place this information, but also since it does not default to the current users’ administrative names and passwords, this prevents any back-door access to the data. 
This last section is automatically updated by data catalog.  This section tells you who the last person was to update this asset, and when they did it. 

My next post will cover the other fun information available in the details pane beyond the information tab.

Monday, March 27, 2017

Azure Data Catalog – Asset Tiles

Microsoft in the news - Did you know Microsoft makes a desktop computer?
As many of you know, I live and work on my Surface Book laptop.  I have had it for almost a year now and if it were ever lost or stolen, I would not hesitate to buy another one.  I have put it through its paces in numerous ways and in numerous environments and it has never even hinted that it might let me down.
About four months ago, Microsoft introduced their first desktop computer – the Microsoft Surface Studio.  Like their tablet and Surface Book, it isn’t for everyone since it too is priced for high end users who are not faint of heart.  Starting at $2,999 US, there are many people who will quickly give it a pass.  However, if you are looking for a high-end computer, you are now aware of this one.
It comes with a 28-inch touch screen display, wireless keyboard, wireless mouse and a Surface Pen that magnetically clips to the side of the screen.  The screen folds down to only 20 degrees so you can use it like a tablet. 
I don’t have one myself, but from what I have seen of the reviews, people love it.  If it is anywhere close to the quality they put into their Surface Book, it would definitely be worth looking at if you are looking for a high-end desktop.  There are lots of videos on YouTube and lots of reviews out there.  So, don’t just take my word for it.

Azure Data Catalog – Asset Tiles

In my last post, we looked at the Tool bar.  Although the tool bar dominates the real-estate at the top of the page, the majority of your window consists of the tiles that display your assets. These are located conveniently in the middle of the page
Each tile in the main section of the page is an asset.   An asset is any object that is a part of the data source. When we first registered a data source, Data Catalog reads the meta data and creates the tiles to represent the assets.  Let’s take a closer look at what details are in each of these tiles.

When you click on a tile, you will see a small check box appear in the top right.

Here highlighted in blue is the check box.  Once the tile is clicked it will show the check mark and open a details panel.  There is a lot of information in the details panel, so we will cover that in a separate post.   
The first line of the tile is the name of the asset that was registered; in this case “Customer”. This name was automatically pulled from the meta data in the source.
The next line is a handy click tile to add a description message. It invites you to click on it to present you with the same details panel that we will discuss in my next post.
By denoting myself as the expert on this source, when data catalog created the tile, it saved me typing the information by automatically listing me as the expert. 
Below this is the name WWWImporters.  This is the tag I entered during the registration process.  I only entered one but I could have entered as many as were appropriate to every asset in the data source.
 In this case, our source is the database WideWorldImportersDW, a data warehouse in SQL Server.  The type of source below the name.
On the bottom row are some action items.  The first is Open In…  this is the where you can choose to view the data in a separate application.

The ability to view this data is only available to you if have access to the original source.
The middle menu, Explore Database, opens a new window which gives you additional information about the source of the data.  In this case a database.

The details window gives you a high-level overview of all the assets in the source.  In this case, it tells you there are 29 tables and 24 stored procedures in the database.  The ß Back to Catalog link at the top of the box allows you to easily go back to your tile view.
On the far right of the tile there is a pin.  By choosing the pin, Data Catalog will pin the asset tile to your application page.  This makes it very easy to find later and gives you quick access to the assets you use often.
My next post will look at the detail in the information panel.

Saturday, March 25, 2017

Azure Data Catalog - Toolbar

Microsoft in the News - Microsoft is helping with your customer churn rate.

As any business manager worth their weight in spam knows, keeping a customer is a lot less expensive than finding a new customer.  In fact, it has been estimated that in the retail business, it is five times less expensive.  That is why business owners can become obsessed with their customer churn rate.

Churn Rate:  The annual percentage rate at which customers stop being a customer.

Predicting which customers are going to churn has been a highly sought after goal for a long time.  Businesses have been collecting customer data for years, but turning it into something useful, like predicting who will churn, has been difficult at best and elusive at worst.

But guess what recently invented tool is good at predictions:  AI

On March 22, Microsoft publicly unleashed their AI, Cortana, on the subject of Customer Churn.

If you, or someone you know is concerned about Customer Churn, you can now create an on-premises solution using SQL Server R Services utilizing the power of Cortana.  You can find more information on this here:  Customer Churn Prediction Template with SQL Server R Services

A deployment guide can be found here:  Technical Deployment Guide

And guidance on how to build your model can be found here: Retail Customer Churn Prediction Template

Azure Data Catalog - Toolbar

My last post was on the Azure Data Catalog Dashboard.  One of my favorite features of the dashboard was its simple and clean look.  The same is true about the Data Catalog Application in general.  Keeping things consistent between screens, Microsoft has the same tool bar in both the dashboard and throughout the application.
The tool bar contains all the administrative features of the data catalog

The home button will take you back to the dashboard at any time.

The Publish button will take you to the Publish your data now! Screen that we discussed in an earlier post. These buttons are more for navigation than for manipulation of any data.

 The glossary is where the Business Glossary resides (in the Standard Edition, not the free basic edition of Data Catalog).  The Business Glossary allows the user to define business terms and create a common lexicon to be used throughout the Data Catalog.  I will cover the Glossary in detail in a future post.

 Settings tools are the tools that we use to set the general parameters when creating the catalog.  These can also be changed in your Azure Portal but it is convenient to have access to them directly in the application.  From here, you can upgrade your subscription, or, change the location of the meta data for the catalog.  You can also upgrade to the Standard Edition from this screen when you realize how badly you want that GlossaryAll other settings that were created in our original How to Create a Data Catalog blog post are here so you can make any modifications needed.

 Just to have some fun I added a Portal Title to my Catalog and saved it in the settings.  The change shows up immediately displaying my company name. Now I have a branded Catalog!

 The final button on the toolbar is the User button which displays details about the user and catalog you are using.  The key features of this menu is the ability to clear Search History and Sign Out.

My Next post will explore the attributes of an Asset.

Friday, March 24, 2017

Azure Data Catalog – The dashboard

Microsoft in the new - Custom visuals for Power BI

Back in July 2015, Microsoft announced the addition of custom visuals for Power BI.  They have since run several contests to encourage the community to develop and make available their own custom visuals.  There are now more than 80 visuals available.

As of March 22, these custom visuals are available for download in the Office store.

They haven’t all been onboarded yet, but they should all be available within the next couple weeks.  Once they are all there, you will be able to search for exactly what you need either directly, or by perusing the categories.

Azure Data Catalog – The dashboard

In my last blog post we looked at how to access the data catalog.  Now let’s look at the dashboard.  When you go to your catalog from the Azure portal as in my last post you will land on your Azure Data Catalog dashboard.

This is a cleanly designed dashboard with a few basic metrics about your data.  This one is very simple due to the limited amount of data I have loaded.  The first item located at the top of the dashboard is the register more data button

This allows you to complete another round of registration for an additional data source. This button conveniently takes you back to the Publish Your data now! Screen we talked about in an earlier post.
Below this are some Key Performance Indicators (KPis) about your data.  The idea of these metrics is to give you at first glance an overview of your entire Catalog as it currently exists.  As you add more data and make change and annotations to the data these metrics will change.

The middle of the dashboard covers the metrics of the assets in the catalog and who is using them, and how they are using them. The Top Tags highlight what is being made note of in the catalog and the top ways people are annotating the meta data.

The final piece of the dashboard displays what assets have the highest volume, which experts are listed most often and what the top sources of data are and what type they are. 

One of the better features of the dashboard is the search bar. 

This image does not do it justice.  What makes the search bar fantastic is that it is easy to find at the top of the screen in the center and you can do useful searches directly from your dashboard.  You do not have to go into the main body of the application to do a search.  The additional feature I love is that is saves my last few searches so I do not have to retype them.

My next blog will cover the toolbar at the top of the dashboard and the top of the application screen.