IoT – Blend Master

Azure Event Hubs and Amazon Kinesis Side by Side

Azure Event Hubs and Amazon Kinesis are two competing cloud services that serve the same purpose – reliably collect and process massive amounts of data with low latency and at low cost. Although both services provide similar functionality, there are significant differences to be aware of when architecting a solution. This article compares various aspects of Azure Event Hubs and Amazon Kinesis and is intended to assist in the software architecture decision-making process.

Key concepts

Amazon Kinesis streams use shards as the base throughput units. Each shard provides a capacity of 1MB/sec data input and 2MB/sec data output, supports up to 1,000 PUT records and up to 5 read transactions per second. The default shard limit depends on a region and is either 25 or 50 shards per region but you can request an increase. There is no upper limit to the number of shards in a stream or account.

Azure Event Hubs stream throughput capacity is controlled by throughput units. One throughput unit includes up to 1MB/sec ingress, up to 2MB/sec egress, and supports 1,000 events per second. Event Hubs also introduce a concept of partitions – a data organization mechanism designed to support multiple concurrent readers. A single partition has a maximum scale of one throughput unit. There’s a default limit of 20 throughput units per Azure account and 32 partitions per Event Hub but both limits can be increased by request.

Data input

Amazon Kinesis API uses HTTPS protocol for all operations. Every put request must be signed using an access key. You can control the level of access to Amazon Kinesis resources using AWS Identity and Access Management (IAM). IAM policies that only allow write operations to specific streams can be used to add data records. Data producers can also use the Amazon Kinesis Producer Library (KPL) to simplify producer application development. The maximum size of a data blob (the data payload before Base64-encoding) is 1MB.

Azure Event Hubs support HTTPS and AMQP 1.0 protocols for event publishing. Event publishers use Shared Access Signature (SAS) tokens for authentication. SAS tokens for event publishers can be created with send-only privileges on a specific Event Hub. .NET developers can take advantage of the EventHubClient for publishing events to Event Hubs and Apache Qpid project can be used for sending messages over AMQP from a variety of platforms and languages. You can send up to 256KB of event data in a single request. Publisher policies is a distinctive feature of Azure Event Hubs that is designed to facilitate large numbers of independent event producers.

Data processing

Amazon Kinesis consumer applications can read data from streams using either Amazon Kinesis API or Amazon Kinesis Client Library (KCL). Amazon Kinesis Client Library (KCL) makes it easier to build robust applications that read and process stream data by handling complexities typically associated with distributed stream processing. Amazon Kinesis Connector Library helps you integrate Amazon Kinesis with other AWS services and third-party tools and provides connectors to Amazon DynamoDB, Amazon Redshift, Amazon S3, and Elasticsearch. Amazon Kinesis Storm Spout library helps Java developers integrate Amazon Kinesis with Apache Storm.

Azure Event Hubs consumers connect via the AMQP 1.0 session, in which events are delivered as they become available. Consumer groups allow multiple consuming applications to read the same stream independently at their own pace. You can create up 20 consumer groups per Event Hub. The EventProcessorHost class can significantly simplify distributed partition processing for .NET clients. Azure Steam Analytics service provides out-of-the-box integration with Event Hubs and can be used to process ingested events in real-time. Stream Analytics supports Azure SQL database, Blob storage, Event Hub, Table storage, and Power BI output sink options.

Monitoring

Amazon Kinesis integrates with Amazon CloudWatch service which is a reliable, scalable, and flexible monitoring solution that enables you to collect, view, and analyze CloudWatch metrics for your Amazon Kinesis streams.

Azure Event Hubs don’t provide a built-in monitoring and notification mechanism beyond the basic metrics available on the Azure management portal at the time of writing.

Capacity management

Amazon Kinesis stream throughput is limited by the number of shards within the stream. A resharding operation must be performed in order to increase (split) or decrease (merge) the number of shards. Stream data records are accessible for a maximum of 24 hours from the time they are added to the stream.

Azure Event Hubs ingress is throttled and egress is limited to the amount of capacity provided by the number of throughput units assigned to the stream. Throughput units are provisioned on a best effort basis and may not always be available for immediate purchase. The default Event Hubs message retention period is 24 hours but the Event Hubs Standard tier supports a maximum retention period of 7 days.

Pricing

Amazon Kinesis uses simple pay as you go pricing and is based on two dimensions: Shard Hour and PUT Payload Unit (25KB payload chunk). The pricing varies by region and is $0.015/hr per shard and $0.014 per 1,000,000 PUT payload units in the US East.

Azure Event Hubs use tiered pricing model and charge by the number of assigned throughput units and ingress events (units of data 64KB or less). The Event Hubs Basic tier costs $0.015/hr per throughput unit and $0.028 per million events while the Event Hubs Standard costs $0.03/hr per throughput unit and $0.028 per million events in the Central US region. Service Bus brokered connections (AMQP connections) are billed separately but the first 100 concurrent connections are free for every Basic Event Hubs namespace, and the first 1,000 concurrent connections per subscription are free for Standard Event Hubs.

Additional resources

Simplify Real-Time Event Processing with Azure Stream Analytics

Scalable and reliable real-time event processing has never been a simple task. Software architects and developers involved in such projects often spend the majority of their time focusing on building custom solutions to satisfy the reliability and performance requirements of the applications and significantly less time on the actual business logic implementation. The introduction of cloud services made it easier to scale real-time event processing applications but it still requires complex solutions to properly distribute the application load across multiple worker nodes. For example, EventProcessorHost class provides a robust way to process Event Hubs data in a thread-safe and multi-process safe manner but you are still responsible for hosting and managing the worker instances. Wouldn’t it be nice not having to worry about the event processing infrastructure and focus on the business logic instead?

Azure Stream Analytics service introduction

Azure Stream Analytics is a fully managed, scalable and highly available real-time event processing service capable of handling millions of events per second. It works exceptionally well in combination with Azure Event Hubs and enables the 2 main scenarios:

Perform real-time data analytics and immediately detect and react to special conditions.
Save event data to persistent storage for archival or further analysis.

Each stream analytics job consists of one or more inputs, a query, and one or more outputs. At the time of writing the available input sources are Event Hubs and Blob storage, and the output sink options consist of SQL Database, Blob storage, Event Hubs, Power BI and Table storage. Pay special attention to the Power BI output option (currently in preview), which allows you to easily build real-time Power BI dashboards. The diagram below provides a graphical representation of the input and output options available.

Azure Stream Analytics Diagram — Figure 1 – Azure Stream Analytics input and output options

The queries are built using a Stream Analytics Query Language – a SQL-like query language specifically designed for performing transformations and computations over streams of events.

Step-by-step Event Hub data archival to Table storage using Azure Stream Analytics

For this walkthrough, let’s assume that there’s an Event Hub called MyEventHub and it receives location data from connected devices in the following JSON format:

{ "DeviceId": "mwP2KNCY3h", "EpochTime": 1436752105, "Latitude": 41.881832, "Longitude": -87.623177 }

Navigate to the Azure management portal and create a new Stream Analytics job.
Open the MyStreamAnalytics job and add a new Event Hub input.
Configure the Event Hub settings.
Keep the default serialization settings.
Add a new Table storage output.
Configure the Table storage settings.
Use the DeviceId field as PartitionKey and EpochTime as Row Key.
Adjust the query to use the correct input and output alias names.
Finally, start the job and watch your output storage table get populated with the Event Hub stream data without writing a single line of code!

Additional resources

Azure Event Hub Message Publishing from Connected Devices

Building an application for a connected device like a mobile phone, tablet, or a microcomputer is easier than ever these days. A variety of tools and supported programming languages allow developers to quickly build applications that collect sensor data, capture telemetry information or location data, and send the collected information to backend services for processing.

Connected devices communication challenges

A functional prototype can often be built in a matter of days or even hours but a number of challenges arise and must be addressed in order to prepare the application for a wider distribution. Some of the most common challenges are:

Platform support – applications running on various platforms must be able to send information.
Reliability – the backend service must be able to reliably accept information.
Security – only authorized devices should be able to send information and the information must be protected from unauthorized access and manipulation.
Latency – the network calls must be as quick as possible to avoid network interruptions, preserve system resources and battery power.
Scalability – in many cases, the backend service must be able to handle massive amounts of data.

It is obviously possible to build a custom solution that satisfies all of your application-specific requirements but would it bring you any business value or would you rather focus on the actual application functionality? Luckily, there are cloud services specifically designed for these types of scenarios, and they can significantly simplify your development efforts.

Azure Event Hubs service introduction

Meet Azure Event Hubs – a highly scalable, low latency and high availability event ingestor service capable of reliably handling millions of events per second and buffering them for further processing for anywhere from 24 hours up to 7 days. The service supports HTTP and AMQP protocols which makes it an attractive option for a wide variety of devices and platforms.

The Event Hubs REST API is pretty straight-forward and easy to use. Simply submit an HTTP POST request to your event hub endpoint and set the request body to a JSON-encoded string that contains one or more messages. For example, below is a sample Send Event request, courtesy of Event Hubs REST API Reference on MSDN.

POST https://your-namespace.servicebus.windows.net/your-event-hub/messages?timeout=60&api-version=2014-01 HTTP/1.1
Authorization: SharedAccessSignature sr=your-namespace.servicebus.windows.net&sig=tYu8qdH563Pc96Lky0SFs5PhbGnljF7mLYQwCZmk9M0%3d&se=1403736877&skn=RootManageSharedAccessKey
Content-Type: application/atom+xml;type=entry;charset=utf-8
Host: your-namespace.servicebus.windows.net
Content-Length: 42
Expect: 100-continue

{ "DeviceId":"dev-01", "Temperature":"37.0" }

Generating a request like this should not require much effort in any programming language on any platform, be it a mobile app or a Node.js application running on a Raspberry Pi. The only part that deserves special attention is the Authorization request header. The Authorization header contains a Shared Access Signature (SAS) token that can be generated by any client that has access to the signing key specified in the shared access authorization rule. Below are some of the best practices to follow when generating SAS tokens for connected devices.

Azure Event Hubs SAS token generation best practices

Never use the RootManageSharedAccessKey shared access policy to generate SAS tokens for your connected devices since it’s a highly privileged policy. Follow the principle of least privilege and always create a new policy with Send-only permission for each event hub.
Never send or store your policy keys on connected devices as it may expose your keys for unauthorized access and makes it difficult to rotate or revoke them. Always generate your SAS tokens on the server and only store the generated tokens on the devices.
Never generate SAS tokens and submit HTTP POST requests to the common event hub URI as the same SAS token would be valid for any device. Always target unique device-specific event hub endpoints when generating tokens and publishing events.

Additional resources