Please share with the community what you think needs improvement with Amazon Kinesis.
What are its weaknesses? What would you like to see changed in a future version?
Amazon Kinesis is not a bad product, but Azure Event Hub provides us with certain operational advantages, as our focus is on Microsoft related coding. This is why .NET is what we use at the backend. While we can use both Azure Event Hub and Amazon Kinesis towards this end, I feel the latter to be less customized or developed for use in connection with the server-less programming. Amazon Kinesis has a less meaningful and easy use than Azure Event Hub. Amazon Kinesis involved a more complex setup and configuration than Azure Event Hub.
In general, the pain point for us was that once the data gets into Kinesis there is no way for us to understand what's happening because Kinesis divides everything into shards. So if we wanted to understand what's happening with a particular shard, whether it is published or not, we could not. Even with the logs, if we want to have some kind of logging it is in the shard. That is something that we thought we needed then, but later we realized that Kinesis was not built for that. They must have already improved by now, because I have not been in touch with AWS for the last five, six months since I joined this organization which uses Azure. I did not get to experiment with AWS Kinesis too much after that. It was built for something else, but we used Kinesis for one purpose and we were expecting a feature out of it that may not have really been the design of the service when they built Kinesis. It was almost like a black box for us, because once the data comes in we need to rely on the Lambda itself to let us know. Because if some Kinesis code is coming in, it processes that we will log back in using the Lambda. And that is where we would know, "Oh, okay this guy has come in, this guy has come in." We hoped for a better way of being able to track the shard being processed or how they streamed within Kinesis. We wanted to have a look at that, but that was not available then. It may not even be available now. We did not have the feature that we expected in the first place from Kinesis. Overall that was the only thing that we felt was lacking. Our use case may not have been the most ideal one, but other than that we did not have many qualms with Kinesis. Overall, we felt we would have simplified the entire design of what we did by simply using an SNS and SQS, because we have much better visibility in terms of tracking what happens within the SNS and SQS.
They recently expanded the feature sets, but when we were implementing it, it could only deliver to one platform. I'm not sure where it's at now but multiple platforms would be beneficial. I'd also like to have some ability to do first in, first out queuing. If I put several messages into Firehose, there's no guarantee that everything will be processed in the order it was sent.
The automation could be better. The solution needs to be better at information capture. Some jobs have limitations which can make the process a bit challenging. In order to do a successful setup, the person handling the implementation needs to know the solution very well. You can't just come into it blind and with little to no experience.
I'm currently trying to figure out production rates and consumption rates for data. If there were better documentation on optimal sharding strategies then it would be helpful.
Kinesis Data Analytics needs to be improved somewhat. It's SQL based data but it is not as user friendly as MySQL or Athena tools. That's the one improvement that I'm expecting from Amazon. Apart from that everything is fine.
Kinesis is good for Amazon Cloud but not as suitable for other cloud vendors.
In terms of what can be improved, I would say that within Data Streams, you have a variety of ways to interact with the data; you have the Kinesis client library, the KCL, and you have the Kinesis agent. When we were developing our architecture a couple years back, all the libraries to aggregate the data were very problematic. So the Kinesis Aggregator, which essentially improves the performance and cost by aggregating individual records into bigger one, is something that I found had a lot of room for improvement to make it a lot more refined. At the time I found a couple of limitations that I had to work around. So definitely on that side I found room for improvement. Something else to mention is that we use Kinesis with Lambda a lot and the fact that you can only connect one Stream to one Lambda, I find is a limiting factor. I would definitely recommend to remove that constraint.
The default limit that they have, which at the moment is 5,000 records per second (I'm talking about Kinesis Firehose which is a specialized form of the Amazon Kinesis service) seems too low. Actually, on the first week that we deployed it into production, we had to roll it back and ask Amazon to increase the default limits. It's mentioned in the documentation, but I think the default settings are far too low. The first week it was extremely slow because the records were not properly ingested in the stream, so we had to try it again. This happened the first week that we deployed it into production, but after talking with Amazon, they increased their throttling limits up to 10,000 records. Now it works fine.
I would say that the solution probably has the capability to do sharding so that you can do a lot of things in parallel. I think that the way the sharding works could be simplified and include features that make it easier to scale in a parallel way.
What do you like most about Amazon Kinesis?
Thanks for sharing your thoughts with the community!
What features are most important to compare when selecting a cloud ETL tool?