How to Implement Real-Time Analytics on Hadoop
One of the biggest challenges with Hadoop is speed. How do you get real-time analytics performance out of a technology like Hadoop that was designed to trade off performance for scalability? While technologies like Hive, Presto, Parquet, ORC and others have delivered improvements, none of them provide near real-time, sub-second performance at scale, until Apache Druid. Druid, which is included as part of Cloudera HDP, has been widely used to deliver real-time performance for reporting and ad-hoc analytics in data lake deployments.
Learn how companies have accelerated Hadoop analytics using Druid, and also moved towards real-time analytics using message buses like Kafka or Amazon Kinesis. This paper explains why delivering real-time analytics on Hadoop is so hard, approaches companies have taken to accelerate Hadoop, and how they used Druid Hadoop to create end-to-end real-time analytics architectures.