Not to be confused with a “catalogue” which
is some form of ancient paper based device, a “catalog” is a collection of metadata. It is a directory of information that
describes where a data set, file or database entity is located. Additional information about the data may
also be included such as the producer, content, quality, condition, and any
other characteristic that may be pertinent.
It is a tool that allows an analyst to find the data they need. There may be solutions hidden in your
data. A data catalog, at the least, will
tell you where to look.
In any organization, data is collected and
stored across different departments, multiple databases, and in a variety of
formats. In banking, for example, the
customer information that a bank manager sees isn’t the same as what the
Finance Department sees. In fact, the bank
manager is likely not even aware that a separate and unique data source about
their clients even exists. Registering
these sources in a catalog allows people to become aware of the existence of
data they may find useful.
Suppose you are at the library and you want
to hold in your hands a map with information about Hole-in-the-Wall Falls in
Oregon. You could look at numerous maps
and not find anything. The first map you
pick up may be a highway map. If the
catalog you are looking at has the map descriptions, it will save you a lot of
searching. The catalog may describe the
map you are looking for as a topographical map showing hydrology for the state
of Oregon, with the map being located at a specific library. Now, instead of travelling from library to
library looking through a variety of maps of Oregon, you can focus your
attention on tracking down this single map with the information you need.
Microsoft’s Azure Data Catalog (“ADC”) is a
fully managed service. With ADC, when
you register a data source, you can point to the source of that data and ADC
will automatically extract structural metadata.
The source of the data does not have to be in the cloud.
Once registered, the catalog card can be
used by anyone with access. Others can
then annotate it in order to enrich the metadata. ADC will allow for crowd sourcing of metadata
in order to provide a catalog rich in details.
Tags can include, for example, descriptions of how the registered data
can be used to find what otherwise might be obscure or unique solutions.
Because the source of the data is
registered in the Catalog, a user can connect directly to that data source
through the catalog. If the data is such
that it shouldn’t be freely shared throughout an organization, ADC will allow
the registrant to restrict access by defining ownership of the data and authorization
requirements for access.
Organizations produce data at an enormous
rate. Storage for that data is likely to
run the full gamut of places from an individual computer to the cloud, with
locations anywhere on the planet. This
exponential growth of data and data sources makes a data catalog a very useful
tool for making that data useful to everyone within the organization. Through the use of ADC, you can actually find
that needle in the haystack.
Some
links to get you started:
You can find a series of “how to” links at
the end of this Data Catalo intro article: https://azure.microsoft.com/en-us/documentation/articles/data-catalog-what-is-data-catalog/
No comments:
Post a Comment