If you were talking to someone whose organization is considering Amazon Kinesis, what would you say?
How would you rate it and why? Any other tips or advice?
The question of whether I would recommend Amazon Kinesis over Azure Event Hub is tricky. While both have their advantages and I consider them to be almost equal, we feel the latter to be better suited to our environment, which is why we went with it. The data transferring policies and associated costs of Amazon were the deciding factors for me. I rate Amazon Kinesis as an eight or nine out of ten.
With my limited exposure to Kinesis, and with the pain points and probably not using it properly, we did see that it was successful. Having said all that, and the pain points that we went through, on a scale of one to ten I would give Kinesis an eight out of 10.
It's important to think about how you are going to fix the end points that connect to your Kinesis files. I would rate this solution a nine out of 10.
I have a lot of experience in Kinesis and data analytics including in networking in the Amazon AWS environment. My experience is as a big data architect. I draw all environments in AWS. On a scale from one to ten, I would rate the solution at a six. It's pretty good, and great for big environments, however, you do need to be well versed in the product to set it up.
My advice for anybody who is implementing this product is to start by reading through the Amazon documentation, as well as go through some videos on YouTube or Pluralsight just to get a high-level idea of what's going on. Then, start experimenting and trying to figure out how it works. From there, try to figure out how to choose your optimal sharding strategy, like how many shards do you need within the stream and how you want to partition the data within it. I think from there, you need to look at your production and consumption rates on the stream. This is how much data you are putting onto the stream and at what kind of rate. You need to make sure that you're consuming data off of the stream, also, and look at that rate too. The ideal use case is to be able to consume data faster than producing because then you're able to control things. If you're not able to do that, then you could get overwhelmed. The biggest lesson that I learned from using this product is that it's a whole new world of processing big data. I come from a traditional data warehousing background where everything is batch-oriented. So for this, this is a whole new ball game in terms of how to process data. It's a new mechanism for harnessing the power of data. A traditional data warehouse could not analyze, for example, what is going on in real-time on a racing car. It's not scalable and it's not going to work. However, something like this is dynamic and big enough to handle this kind of application. This is a pretty good product, albeit I don't have much to compare it with. That said, I don't have any problems with it. It's done what it's asked and it's easy to use. I would rate this solution a nine out of ten.
My advice to anyone thinking about Amazon Kinesis, is that if they have ClickStream or any streaming data which varies from megabytes to gigabytes, they can definitely go for Amazon Kinesis. If they want to do data processing, or batch or streaming analytics, they can choose Amazon Kinesis. And if you want to enable database stream events in Amazon DynamoDB, then you can definitely go for Amazon Kinesis. I don't see any better option for these other than Amazon Kinesis. You can use Amazon Kinesis Data Analytics Tool to detect an anomaly before you process the data. That's one more beauty. The first things we need to determine are the source and the throughput of the data and the latency you want. On a scale of one to ten I would rate Amazon Kinesis a nine.
Kinesis has the best of Amazon: data streaming, building processes, data analytics, data in real-time are very good. The output and monitoring are easy. It has good performance. I would rate it a nine out of ten.
My recommendation for Data Streams is to do a deep dive into the documentation before implementing to avoid what we did at the beginning. You try to process record by record or push record by record into Kinesis and then realize that it is not cost effective or even efficient. So you need to know that you need to aggregate your data before you push it into Kinesis. So documenting yourself about the best practices in using Kinesis is definitely something I would recommend to anyone. For Kinesis Analytics, I was actually surprised at how easy it is to use an application with such power. I would say with a trial, users will realize that for for such a fairly complex application such as Kinesis Analytics, it is something that you can do very quickly with minimal resources and it gives you a lot of value for specific use cases. On a scale of one to ten, I would give Amazon Kinesis a nine. I don't have much to complain about Kinesis.
If you want to use a stream solution you need to evaluate your needs. If your needs are really performance-based, maybe you should go with Kafka, but for near, real-time performance, I would recommend Amazon Kinesis. If you need more than one destination for the data that you are ingesting in the stream, you will need to use Amazon Kinesis Data Streams rather than Firehose. If you only want to integrate from one point to another, then Kinesis Firehose is a considerably cheaper option and is much easier to configure. From using Kinesis, I have learned a lot about the synchronous way of processing data. We always had a more sequential way of doing things. On a scale from one to ten, I would give this solution a rating of eight.
It's nice to deploy this with the Amazon goodness of Cloud Formation and Terraform, to have it all deployed in a repeatable way. I know that it's easy to go into the console and do it manually, but it's best to do infrastructure as code, in particular with Kinesis. I would rate this solution a nine out of 10.
What do you like most about Amazon Kinesis?
Thanks for sharing your thoughts with the community!
What features are most important to compare when selecting a cloud ETL tool?