Amazon Kinesis Data Firehose is a fully managed service for efficiently streaming data from virtually any data source to your AWS applications. It provides near-real-time ingestion capabilities for building real-time data applications. In this article, we’ll detail how Amazon Kinesis Data Firehose works and the best use cases for it.
Amazon Kinesis Data Firehose is a fully-managed streaming ETL (extract, transform, and load) service that can ingest streaming data from various sources, optionally transform it into new formats and deliver it into data lakes and data warehouses for analysis. It can combine data from multiple sources such as AWS EC2 or AWS DynamoDB, transform it for analytics purposes, and then load it to destination services such as Amazon S3, Amazon Redshift, Splunk, or other HTTP endpoint partners.
Let’s dive into what happens during the extract, transform, and load processes.
Amazon Kinesis Data Firehose can capture logs, financial data, sales orders, and other types of data. Data Sources can include logs coming from Amazon EC2 instances or data from mobile apps and IoT devices.
There are different ways to connect data sources to Amazon Kinesis Data Firehose:
If needed, customers can transform the extracted data into a specific format to make it more usable. Again, there are various ways to do that.
Firstly, Amazon Kinesis Data Firehose has built-in transformation options for raw and JSON data. This data can be transformed into formats such as Apache Parquet and Apache ORC formats.
Developers can also use AWS Lambda functions to transform raw data when creating a new delivery stream. You can create a function from scratch or leverage Lambda blueprints provided by AWS.
Lastly, the platform can compress data before loading it to Amazon S3. With dynamic partitioning, customers can also optimize data for analytics tools.
After extracting and transforming data, Amazon Kinesis Data Firehose can load it to various sources for analytics purposes, including:
For organizations looking for a streamlined ETL solution, Amazon Kinesis Data Firehose can offer a lot of benefits with no code required.
Because Amazon Kinesis Data Firehose is a fully managed service, customers don’t have to worry about its underlying infrastructure. The ETL service can automatically scale to capture and load more data, and it also offers high availability by replicating data across three data centers in the same AWS region.
You can get started for free with Amazon Kinesis Data Firehose and only pay for the following 4 types of on-demand usage:
Amazon Kinesis Data Firehose offers two types of encryption for data at rest and data in transit. Data is in transit when it’s moving from one place to another destination, such as an Amazon S3 bucket.
When configuring a delivery stream, you can choose to encrypt your data using an AWS Key Management Service (KMS) key, which you control and manage. While encrypting your data is optional, it’s recommended to improve security.
To show you how to get started with this platform, I’ll explain how to create a Kinesis Data Firehose delivery stream to send data to the destination of your choice.
To create a Kinesis Data Firehose delivery stream, we’ll need to choose a data source and a delivery stream destination. Before you get started, however, here’s what you’ll need:
Now, the first step is to log in to the AWS Management console and open the Kinesis console as shown below.
[Image2]
Note: I’ll be using demo data from AWS for this tutorial, but the Direct PUT option lets you use data from various AWS services, agents, and open-source services. You can find more details on this support page.
After our delivery stream was successfully created, we can now start sending data to our S3 bucket.
In this tutorial, I configured the destination for my delivery stream as an AWS S3 bucket. In the Delivery stream details, I can check if everything is working properly.
Now, I’ll open the AWS S3 bucket that I configured as a destination to see if the logs have been successfully sent. There is a folder named 2023 with the current year and subfolders in date and month format, and I’ll download and open the file named “mydeliverystream-1-2023-06-02-21-31-53-69ee0d29-aaaf-4dc9-9591-5f7738b6ee86stored” to check how the logs look like.
In this article, I gave you an overview of how Amazon Kinesis Data Firehose works and I showed you how you can use this ETL platform for your own projects. This AWS service offers lots of benefits if you’re already using the AWS cloud, but it can also ingest and transform data from other sources. With its pay-as-you-go pricing model, you can easily get started without worrying about the required infrastructure.