I
am sure you have heard by now of the IOT or the Internet of Things. With a plethora of different IOT devices
sending data to a variety of applications, did you ever wonder how that data is
managed? Event hubs perform this task.
They take in data from everything, from telemetry and mobile app data to
gamming data and organize it, so it is easier to consume and use. Event hub does this by managing the flow of
data as it is received.
Let’s
begin with an example. We have sensors
at bank branches that log when customers come and go from the branch. The sensors that log when people come and go
are the IOT devices. These devices are
called Event Publishers. The publishers
send the events, entry and exit of the bank, to the hub where they are organized
and then available for applications to use the data. These applications are referred
to as consumers. Consumers can be a
variety of different types of applications that often need different parts of
the data.
All
the data is coming in from a single type of publisher the sensors, in this
example but often there is more than one type of data. It could be location data such as how close
this customer lives from the branch or basic data such as, is this customer
currently in the branch and many more individually different types of data. Event Hub can take all the data as fast as it
comes in, unfortunately in this type of environment there is so many different
types of data, that at a large volume it is essentially unusable in a traditional
way. This is where the Event Hub comes
in, it is used to sort and organize the data in a way that the consumer can use
it.
It
is important how the data is used to determine how to organize it. Event hub uses partitions to do this.
The
formal name for how Event Hub organizations the data is a competing consumer
model format. This means that multiple
consumers of the data can receive data from the same channel at the same time
to optimize the volume and scale. As you
can imagine, data is streaming in at a highly variable rate, in a wide variety
of ways, and using a surfeit of protocols.
As the data enters the workflow, Event Hub uses partitions to segment
the data and adds new partitions as data arrives. Each partition is retained for a configured
retention time that is set at the Event Hub level. The events cannot be deleted but expire on a
time basis.
Partitions
are also set when the Event Hub is created.
Partitions are the key to the data organization for any downstream
workflow management and should be based on the degree of downstream parallelism
you will require for the consumer. A good rule is to make the number of partitions
equal to the expected number of concurrent consumers.
Any
entity, regardless of type that reads event data is called an event
consumer. All consumers read the event
stream through partitions in a consumer group.
Similar to the event hub, the consumer groups partition the data for each
concurrent consumer. The consumer
connects to a session in which the events are delivered as they become
available. The consumer does not need to
pole to determine if data is available. The consumer group is what controls
both the subscription of the Event Hub and the viewing of it. Each group enables a separate view of the
event data stream to each of the consumers.
This individualization allows consumers to process the data at a rate
that is appropriate for them and in offsets (set groupings of events) that are
individual to that consumer.
To
learn more about Event Hubs and how to set them up check out the programming guide at this link; http://bit.ly/2hWEUob
No comments:
Post a Comment