SQL's Melody

Tuesday, May 30, 2017

Azure Data Catalog – Glossary

Microsoft in the News

Finally! Microsoft has added the ability to convert PDF documents into editable Word documents. Assuming you are using MS Office 365, you can now open a PDF in Word the same way you would open a Word doc. You will get a message saying that it may take a little time to open (converting a lot of graphics takes time) and that the converted document may not be exactly as shown in the PDF – nothing new there.

The ability to convert to a Word document will depend on how the PDF was created. If the PDF is an image – in other words it was scanned in, then it cannot be converted. On the other hand, if the PDF was created by saving a Google doc or Word doc as a PDF, then all the required information is there for conversion.

Life just got a little easier.

ADC – Glossary

As mentioned in my last post, the glossary allows an organization to document key business terms and their definitions to create a custom business vocabulary. This enables consistency in the data usage across the entire catalog. Once you have set up the terms they can be used in tagging.

The Glossary can be set up in a hierarchy of items to show classifications. There is no limit to the number of levels in a hierarchy, however it is strongly suggested to limit the levels to fewer than three to keep it easy to understand. It is also acceptable to have no hierarchies as well by simply leaving the parent term blank. Once you have added a few items you can review them in the main glossary page.

The left side of the glossary screen shows the terms you have added

Here we see that a checking account is a child item of the CIF. I use this example to show a few things. The Cif is an acronym for Customer Information File. However, it is almost never referred to as that in a bank; it is simply called a CIF. This is a good example of how an asset can then be tagged as a CIF without the entire name being needed. It also standardizes it as CIF, instead of having the numerous variations that people may come up with such as; cif, C.I.F etc. This is important when searching tags. Consistent tags make searches more consistent.

There is additional information about the terms displayed in a table. The column sizes of each can be adjusted to allow you to view as much or as little of each column as you prefer. Above this table is the filter for the glossary and a Results per page. As your glossary expands both of these will become very important in the management of your glossary.

The box to the left of each item can be checked to display the details of the item, which will be displayed in a panel on the right.

Note at the top of the screen under the main Data Catalog menu is the glossary menu

New term – allows you to create a new term

Add child – will add a child to an existing term automatically. This will bring up a new term window with the parent term filled in with the information of the term you were on when you choose add child.

Add admin – this allows you to add a glossary administrator. Note that security groups need to be set up for this to work.

Edit term – will allow you to edit the current term you are viewing.

Delete – this will delete the current term. Note that if the term is a parent then all children must be deleted before you can delete the parent.

Toggle – this button expands and minimizes the detail view pane.

It is interesting to note that the glossary gives you not only information about the term but also how it is used and what it is related to by noting the relationships and assets associated to the term.

Next we will look at how these are used in tags.

Tuesday, May 23, 2017

Azure Data Catalog – Glossary setup

Microsoft in the News

If you are like me, you saw the upcoming augmented reality technology as mainly something for gaming or for consumers looking for additional information and connection – think, Oculus, Magic Leap, and Google glass. Then I read that the Swedish company Tetra Pak is using Microsoft’s HoloLens to help their Service Engineers diagnose and fix Tetra Pak machines.

In my last blog I talked about the IoT revolution. Tetra Pak is taking it one step further. To start, Tetra Pak packaging machines are loaded with IoT devices that communicate with the Cloud to ensure that maintenance problems are dealt with before there is a problem. Sensors throughout the machines monitor the various functions and will predict when equipment needs maintenance. In this way, production lines can avoid many breakdowns.

Now add a HoloLens. As a Service Engineer is viewing the machine, feedback is given via the HoloLens to help them quickly diagnose and fix any issues. It got me thinking …

I can imagine that in the near future, you will be able to connect your HoloLens to the cloud and ask for assistance in doing virtually anything. Want to change the brakes on your car? Input the make and model and get a step by step walk through right in front of your eyes. Simply look at the wheel and arrows point to the item you need to remove or add, along with tips and tricks. Want to build a deck? Again, step by step instructions before your eyes. Any hobby or task could be made accessible to anyone because they could have an AI and an instruction package walking you through every step. It will be like having a professional guiding your hand in anything you want to accomplish.

The same will eventually apply to professions. Surgeons are already using augmented reality to overlay a patient with computer generated enhancements. Before too long, the lack of a super steady hand and a stomach for blood may be the only thing keeping you and I from being able to perform surgery.

ADC – Glossary setup

As mentioned in my last post we will look at the Glossary in this post. In my post on tagging I talked about how tagging in the free version is free form. There are no constraints or limitations on what you can pin as a tag, other than it limit duplicates. The glossary will address the challenge that this freedom can pose.

The glossary allows an organization to document key business terms and their definitions to create a custom business vocabulary. This enables consistency in the data usage across the entire catalog. Once you have set up the terms, they can be used in tagging. This enforces a governed approach to what tags are used.

When not set up you get this message when you choose it from the menu.

This is because the free edition of Data Catalog does not include the glossary. To update your subscription, go to Settings scroll down to pricing and choose “Standard Edition”

Then choose save. Now when I choose glossary I get this.

Now you are able to start adding to your glossary. Note that only glossary administrators and catalog administrators can create new terms.

The very first term you enter will not need to have a parent term. To choose a parent, the parent term has to exist to associate to it.

Term Name – is the name of the Tag. This should be exactly what you want the tag to look like so if you want it to be an acronym, the acronym should be used here.

Parent Term – this is the name of the term you want the term to be a child of.

Definition – this is the definition of the business term

Description – This is different from the definition in that it is a description of the intended use of the term

Stakeholders – This is where you can tag your subject matter experts - people who know the most about a term.

In the next blog, we will look at the glossary in more detail

Tuesday, May 16, 2017

Azure Data Catalog – Tagging

Microsoft in the News

Did you know that Microsoft has something called “Cognitive Services”? They are currently working on 25 Cognitive Services that they will be making available to you soon. These services allow you to build apps using Microsoft’s machine learning algorithms, using just a few lines of code. For now, they have three of these services that as of April 19, are either generally available or released as a preview. The first three Cognitive Services are:

1. Face API: This service allows you to detect and compare human faces, organize them into groups, and identify people you have tagged.

2. Content Moderator: This service allows you to automatically quarantine images, text and video for review prior to publication. The video moderator is currently only available as a preview, not for general use.

3. Custom Speech Service API. This service is currently only available as a preview. It allows you to customize Microsoft’s speech-to-text engine for your application.

Soon to be available for preview is a Handwriting detection and conversion API.

The full suite of Cognitive services can be found here: https://www.microsoft.com/cognitive-services

ADC – Tagging

Tagging is the action of assigning a keyword or term to a piece of information. They are used to describe meta data items to allow them to be easily searched. They are used and determined by the creator. In ADC, the tags are located in the information pane. When you hover over a tag it gives information about the tag including who added it.

Note that anything can be added as a tag, even non-words. When adding a tag, if that tag already exists on that asset, ADC will automatically block the addition. This is true even when there is a change in case in the spelling of that word. ADC sees “WWWImporters” and “wwwimporters” as the same tag and will not include the second value.

Even without adding in duplicates it can be easy to see with the above example how the sort order can cause issues with Tags. The value 11 comes before 2 where as AAA comes before aa. Even thought I can not add aaa as we already have AAA. That does not remove the risk of ending up with a mess in your tags. The best way to limit this issue and others is to have a business glossary. In the next post we will look at setting up a business glossary in ADC

Tuesday, May 2, 2017

ADC – Saved search

Microsoft in the News

The Internet of Things (“IoT”) ranks right up there with 3D printing and AI as something that is going to transform the way we live. Today, opening a bottle of Johnnie Walker Blue Label will activate a sensor built into the label which sends a message to you using a smartphone’s Near Field Communication technology.

http://www.cio.co.uk/it-strategy/johnnie-walker-joins-internet-of-things-3613305/

That is an example of how IoT is being integrated into low-tech items. At the other end of the spectrum, modern airplanes are laced with so many IoT sensors that a Boeing 787 produces over half a terabyte of data per flight.

http://www.computerworlduk.com/data/boeing-787s-create-half-terabyte-of-data-per-flight-says-virgin-atlantic-3433595/

In an attempt to keep ahead of the curve, on April 20, Microsoft launched a new SaaS offering called Microsoft IoT Central. Microsoft recognized that the technology has advanced so quickly, it has left a vacuum in terms of skilled people who are able to take advantage of it. So, IoT Central is their attempt to simplify the management of this transformative technology.

IoT Central is, of course, built on the Azure platform and integrates with their platform-as-a-service offering, Azure IoT Suite.

If the IoT is or will be in your wheelhouse, you can sign up for updates here:

https://www.microsoft.com/en-us/internet-of-things/iot-central-saas-solutions

ADC – Saved search

Now that you have determined how to search and find what you need in your catalog, it would often be worthwhile to save that search. There are times when you search just a word, or write simple comparison search and you may not need to save those. When do you write a complex search or query of the data catalog, it can be helpful to save that search for use at another time. When you save your search by choosing the save button in the search panel as depicted below, it saves your entire search criteria.

The entire criteria includes any filters that were included as well as the specific operations that are listed in the search bar. Once you choose “save” you are given an opportunity to add a name to the search to make it easier to find. By default, all searches are specific to the individual who created them. Should you wish to share them with the entire company then be sure to check the share with company check box.

Once saved, the search shows up in my saved searches.

The cog symbol allows you to access the saved search menu for that search. This menu allows you to manipulate the search. You can rename, delete or save this search as your default. The save as default is a particularly handy feature for when your catalog gets quite large and you want to focus on a particular area of interest.

Monday, April 17, 2017

Azure Data Catalog – Search functionality

Microsoft in the News

J.D. Power and Associates recently released their survey on Tablet satisfaction. Microsoft’s Surface tablet came out on top in many categories, but most surprising was that overall, consumers were more satisfied with their Surface than they were with iPads.

According to J.D. Power, the Microsoft Surface has “expanded what tablets can do, and it sets the bar for customer satisfaction.”

The areas in which the Surface was rated the best were, internet connectivity, availability of official accessories, and the variety of pre-installed applications.

ADC – Search functionality

In the last installment of this series on ADC we looked at the search menu. This menu has some basic search options. When faced with looking for very specific data assets it is worth looking at the full suite of search functions available. The search bar can be used for basic searches such as a word search, “finance” or you can make it more complex such as searching for a property with a specific term such as “department:finance”. Both of these are common in google, windows and other programs with search capabilities. ADC expands on these with three additional options for searching;

Boolean – This is used to narrow a search “finance NOT head office”

Grouping – This is done by using parentheses to group logic “finance and (“head office” OR “main office”)”

Comparison – This is used to make comparisons “(starttime > “18/04/2017”) AND department:finance”

Matching operators

Symbol	Explanation
:	Items where a specific property has the search term eg. Department:finance
=,<,>	Comparison operators these are used to compare values and can be combined. E.g. <=
“ ”	Quotations are used to group strings as a single unit value e.g. “finance department”
NOT, AND, OR	Boolean conditions.
Has:	Has is used as an existence search. If a given property has at least one element it will be returned by this operator. E.g. has:description

The search operators are not case sensitive, however capitalizing the words for the Boolean conditions does make the search easier to read and understand. Should you limit your search too much or your search returns no data you will be given this message:

Choosing the reset query box removes your entire query. If you feel you have simply made a typo, it is much easier to simply click in the search box and make your change. This will ensure you do not lose your query, forcing you to retype it instead of simply correcting a small error.

Currently exact match searching is not available. ADC uses Prefix Match Semantics. This means that when you search for “Sale” your search will return Sale, Sales, SaleData, and Salesman.

More information on the algorith

Thursday, March 30, 2017

Azure Data Catalog – Search function Basics

Online video games used to treat perinatal stroke

Globally, 3.5 million babies suffer from a perinatal stroke every year. These strokes cause a varying level of motor impairments. Francesca and Roberto had a son, Mario, who ended up being one of these 3.5 million babies. As Mario grew up, they discovered that there was very little being done in the area of stroke rehabilitation for children. Using Microsoft Azure as their base and Microsoft’s Kinect motion sensing input device for Xbox and Windows, this Italian couple have developed a rehabilitation program designed specifically for kids.

Mirrorable is an interactive game that allows children affected by perinatal strokes to do rehabilitation sessions at home. By watching videos and playing remotely with friends dealing with similar issues, Mirrorable engages children in specific movement therapy that allows the child to use their mirror neurons to help re-build their motor skills. Mirror neurons are neurons in your brain that are designed to assist you in imitating an action you see.

Microsoft in the news

The child watches a magic trick on the screen and then the magician explains exactly how it was done. The child is then asked to perform the magic trick themselves. Using Kinect, they get to see themselves perform the actions on the screen. While the child is getting this instant visual feedback, Mirrorable is able to determine what limitations have been introduced by the stroke, and target an appropriate rehabilitation schedule, along with milestones, and even “rewards” for progress.

Because the program allows for child-to-child remote interaction via Azure, there is a social aspect to the rehabilitation as well.

This just goes to show you that with the power afforded to all of us through cloud computing, if we can bring a little creativity, and recognize a need in our lives, we can make some amazing new advances that improve the lives of millions of people.

Azure Data Catalog – Search function Basics

In the last week of posts, we have looked at the different pieces of Azures Data Catalog (ADC). The overview was important to this next piece, Searching. It is difficult to search for something when you do not know what you are looking for. This becomes more obvious the more data we need to search. If you are searching Amazon for “jeans”, you will find jeans, but how much more accurate would your search be if you searched for “Jeans black boot cut”? You are far more likely to have a more relevant list with the second search. Searching on ADC is no different. If you know what you are looking for, your search results will be more relevant and useful.

The main screen of the ADC has two locations for easy searching. The first is the top of the screen on the left. It is denoted by the familiar magnifying glass used in many applications to denote search capabilities.

The second is located on the left side of the screen. This search option is a menu of search capabilities.

Above is what it looks like when all the menu functions are minimized. I will talk about each individually. Below is an image of what the menu looks like when it is expanded out.

The first section is Current Search. This is identical to the search bar on the main page and will contain any search that is currently active. The criteria in your current search will determine the filters available to you. You can change or even save your current search from this menu. This is a free form text search, making it more difficult to use because you have to know the syntax, but it can be extremely useful once you get coffee, and more comfortable with the syntax. My blog next week will be on some of the capabilities of this search bar.

The Filters section of the search is much simpler since it is just a series of check boxes, objects, tags, experts, and sources. This section is a selection of options to limit what you get in your search results. The filters are presented as a way to limit your search results. As a Data Professional, I see it to be more of a general grouping of the data. The filters are predetermined based on the type of meta data you have and can change based on the current search you are looking at.

This is the end of the Intro to Azure Data Catalog. You are now aware of the basic elements and tools to get you started. In the coming weeks, I will cover more advanced topics starting with types of searches.

Wednesday, March 29, 2017

Azure Data Catalog – Information panel Tabs

Microsoft in the News - Azure Functions

On February 23, Microsoft announced preview support for the Azure Functions Serverless Framework plugin. Azure Functions allows you to easily run small pieces of code (“functions”) in the cloud. Using Azure Functions, you can quickly write the code you need, without worrying about the application as a whole. The language you choose to write your code in can be C#, F#, Node.js, Python, PHP, batch, bash, or any executable.

Azure Functions is designed to be a solution for processing data, integrating systems, working with the internet-of-things (IoT), and building simple APIs and microservices. Tasks like image or order processing, file maintenance, or for any tasks that you want to run on a schedule are easily tackled using Azure Functions. It is also easily integrated into Event Hubs to extend your functionality.

If you are interested in learning how to use this tool, Microsoft has created an Azure Functions Code Challenge. Follow the link below and then log into your Azure account. The Challenge will take you through a series of problems that will test your coding skills as you learn to build solutions in Azure Functions.

https://functionschallenge.azurewebsites.net/login?WT.srch=1&WT.mc_id=AID529444_SEMR9qRdV7vWT.srch=1&wt.mc_id=AID529444_SEM_

Azure Data Catalog – Information panel Tabs

In my last post, I looked at the details pane. The focus was on the properties panel. There are 4 additional panels in that pane. Let’s take a look at them.

Properties Preview Columns Data Profile Documentation

When viewing any of the panels you can hover your mouse over the edge of the panel on the left where it meets the main tile space until you see a double arrow “<->”. You can use this to adjust the size if the panel.

The Preview pane is just as the name implies, a preview of the data that is in the asset you have selected.

This pane allows you to view the data that is in the asset. I mentioned before if you do not have security access to the underlying data then you will not be able to see the data here. This is a view only screen and you can not enter any data.

The Columns pane allows you to add a description of the various details that make up the asset. In this case we have the customer as the asset. The Customer asset contains details that are set out in rows, each row, representing a column in the customer table. The columns of the table, such as Customer key, WWI Customer ID, Customer, etc. are set out as separate rows. Each of these rows can be annotated here with tags and a description.

The Profile panel contains additional meta data. (Reminder: meta data is data about data.) In the case of our Customer Asset this shows some details and even some statistics about the data. It describes what is in the table. This can be very useful, particularly to those who do not have permission to view the actual data. The Top row of details describes the table with details such as;

Number of Rows – how much data is in the table
Size – how much space it takes up on disk
Last data update - the last time the data in the table was updated
Last Schema Update - this is when the last structural change was made to the table, so for example a column was added or the size of a column was changed.

The next section is all about the individual columns of data. This pane outlines the data type, which you would expect to learn from meta data but this panel goes much further giving you some actual analysis about the data. We can see how many Null values are in each column as well as how many distinct values, the minimum and maximum values as well as averages and standard deviation for any numeric columns

The final panel is the documentation panel. It is free form and has similar capabilities to other note applications from Microsoft. You will find the familiar basic formatting. In addition there are buttons for;

Creating Tables

Adding Hyper Links

If your organization already has a start on the documentation for an asset, it is reassuring that it can still be referenced and used by linking to it here, making all documentation easier for everyone in the organization to find.

Now that we are familiar with what asset information we have we can move onto searching this information, which I will cover in my next blog.