SQL's Melody: Event Hubs

I am sure you have heard by now of the IOT or the Internet of Things. With a plethora of different IOT devices sending data to a variety of applications, did you ever wonder how that data is managed? Event hubs perform this task. They take in data from everything, from telemetry and mobile app data to gamming data and organize it, so it is easier to consume and use. Event hub does this by managing the flow of data as it is received.

Let’s begin with an example. We have sensors at bank branches that log when customers come and go from the branch. The sensors that log when people come and go are the IOT devices. These devices are called Event Publishers. The publishers send the events, entry and exit of the bank, to the hub where they are organized and then available for applications to use the data. These applications are referred to as consumers. Consumers can be a variety of different types of applications that often need different parts of the data.

All the data is coming in from a single type of publisher the sensors, in this example but often there is more than one type of data. It could be location data such as how close this customer lives from the branch or basic data such as, is this customer currently in the branch and many more individually different types of data. Event Hub can take all the data as fast as it comes in, unfortunately in this type of environment there is so many different types of data, that at a large volume it is essentially unusable in a traditional way. This is where the Event Hub comes in, it is used to sort and organize the data in a way that the consumer can use it.

It is important how the data is used to determine how to organize it. Event hub uses partitions to do this.

The formal name for how Event Hub organizations the data is a competing consumer model format. This means that multiple consumers of the data can receive data from the same channel at the same time to optimize the volume and scale. As you can imagine, data is streaming in at a highly variable rate, in a wide variety of ways, and using a surfeit of protocols. As the data enters the workflow, Event Hub uses partitions to segment the data and adds new partitions as data arrives. Each partition is retained for a configured retention time that is set at the Event Hub level. The events cannot be deleted but expire on a time basis.

Partitions are also set when the Event Hub is created. Partitions are the key to the data organization for any downstream workflow management and should be based on the degree of downstream parallelism you will require for the consumer. A good rule is to make the number of partitions equal to the expected number of concurrent consumers.

Any entity, regardless of type that reads event data is called an event consumer. All consumers read the event stream through partitions in a consumer group. Similar to the event hub, the consumer groups partition the data for each concurrent consumer. The consumer connects to a session in which the events are delivered as they become available. The consumer does not need to pole to determine if data is available. The consumer group is what controls both the subscription of the Event Hub and the viewing of it. Each group enables a separate view of the event data stream to each of the consumers. This individualization allows consumers to process the data at a rate that is appropriate for them and in offsets (set groupings of events) that are individual to that consumer.

To learn more about Event Hubs and how to set them up check out the programming guide at this link; http://bit.ly/2hWEUob

SQL's Melody

Friday, January 6, 2017

Event Hubs

No comments:

Post a Comment