VIRTUAL DRUID SUMMIT
April 15, 2020

The Virtual Druid Summit is a free half-day of online sessions that includes lightning talks from important Apache Druid adopters plus an interactive "Ask Us Anything" with Druid authors.

Scroll to the bottom of this page for full session descriptions.

Virtual Druid Summit Session Full Descriptions:



Apache Druid Vision and Roadmap
Gian Merlino, Apache Druid PMC Chair
8:00am - 8:30am PT

Gian will offer his reflections on the Druid journey to date, plus describe his vision for what Druid will become.  He will lay out the near-term Druid roadmap and take your questions.


Automating CI/CD for Druid Clusters at Athena Health
Shyam Mudambi, Sr. Architect, Athena Health
9:00am - 9:30am PT

At Athena Health, we are creating a new performance management application for our clients, and one of its key components is Apache Druid. Since we are deploying this new application in the cloud, we needed an automated (CI/CD) based approach to create, update and delete Druid clusters, as well as scale different node groups within the cluster based on expected load. In this talk, we will go over how we implemented this process on AWS utilizing Terraform to deploy and update clusters within minutes.


Druid for Anti-Money Laundering (AML) Investigation
Arpit Dubey, SVP, Big Data Platform Lead & Architect, DBS
10:00am - 10:30am PT

DBS is using Druid to handle the AML investigation for the compliance team. The AML (anti-money laundering) workflow generates alerts which are tracked within Druid. The transactional data is ingested from RDBMS to S3 and ingested back to Druid at regular intervals. Investigators can now slice and dice over millions of data with low latency. Currently over 4 million transactions per day are recorded in Druid.

In addition to this, the following use cases are powered by Druid:

  1. Real time dashboards from druid for the monitoring purpose.
  2. Druid also serves the aggregated data required for the Machine Learning modelling for AML module. Which no other data store provide similar performance results


How Druid Power Real-Time Analytics at BT
Pankaj Tiwari, Head of Engineering, BT
11:00am - 11:30am PT

We joined this Journey in Q2 2019 by asking Imply to help us with onboarding an in-house Network Performance Management Project and it has been an amazing journey with its fair share of ups and downs. DRUID has plenty of features which we can talk about, however the ones which enabled us to choose DRUID as our choice of database are:

  • Highly distributed and Horizontally scalable architecture
  • Share nothing architecture
  • Support for Aggregation/Post-aggregation and statistical functions https://druidsummit.org/cfp#page-submit


Analytics over Terabytes of Data
Swapnesh Gandhi, Senior Software Engineer, Twitter
12:00pm - 12:30pm PT

MoPub, a Twitter company, provides monetization solutions for mobile app publishers and developers around the globe. Mopub receives over 33 Billion ad requests per day generating over 200TB of raw logs every day. We built Mopub Analytics as the analytics platform, using Druid + Imply for our end users who are Publishers, Demand side partners and Internal users.

We will talk about the architecture of the analytics platform, our Druid cluster setup, hardware choices, monitoring, use cases, limiting factors, challenges with lookups and solutions we used.


Using Druid for Network Monitoring and Trust Analytics at Cisco
TJ Giuli, Principal Engineer, and Abhishek Balaji Radhakrishnan, Software Engineer, Cisco Systems
1:00pm - 1:30pm PT

At Cisco's Crosswork Cloud, we use Apache Druid for several use cases, including monitoring internet routing updates, tracking device inventory statistics, and ingesting trusted device events. In our talk, we share our experiences and insights on how we deploy, monitor, and integrate Druid with our applications. We describe the technical challenges that led us to migrate from a key-value data store to Druid, our pipeline architecture, and an overview of our streaming and batch workloads.

Key takeaways:

  • Experiences deploying, running, and monitoring Druid in production at Cisco.
  • Methods for safely querying multi-tenant data sources.
  • Techniques for using code-generation to manage ingestion, data source schemas, and provide a strongly-typed end-to-end data flow throughout our system.


Apache Druid Fireside Chat (Ask Us Anything)
Fangjin Yang, Druid co-author, Gian Merlino, Apache Druid PMC Chair, Vadim Ogievetsky, Imply Chief Product Officer and Druid Contributor
2:00pm - 2:45pm PT

Take advantage of a rare opportunity to talk to some of the world's most adept Druid experts. During this session, we open the mic to take your questions. Ask us anything. Really.

Virtual Druid Summit Session Full Descriptions:



Apache Druid Vision and Roadmap
Gian Merlino, Apache Druid PMC Chair
8:00am - 8:30am PT

Gian will offer his reflections on the Druid journey to date, plus describe his vision for what Druid will become.  He will lay out the near-term Druid roadmap and take your questions.


Automating CI/CD for Druid Clusters at Athena Health
Shyam Mudambi, Sr. Architect, Athena Health
9:00am - 9:30am PT

At Athena Health, we are creating a new performance management application for our clients, and one of its key components is Apache Druid. Since we are deploying this new application in the cloud, we needed an automated (CI/CD) based approach to create, update and delete Druid clusters, as well as scale different node groups within the cluster based on expected load. In this talk, we will go over how we implemented this process on AWS utilizing Terraform to deploy and update clusters within minutes.


Druid for Anti-Money Laundering (AML) Investigation
Arpit Dubey, SVP, Big Data Platform Lead & Architect, DBS
10:00am - 10:30am PT


DBS is using Druid to handle the AML investigation for the compliance team. The AML (anti-money laundering) workflow generates alerts which are tracked within Druid. The transactional data is ingested from RDBMS to S3 and ingested back to Druid at regular intervals. Investigators can now slice and dice over millions of data with low latency. Currently over 4 million transactions per day are recorded in Druid.

In addition to this, the following use cases are powered by Druid:

  1. Real time dashboards from druid for the monitoring purpose.
  2. Druid also serves the aggregated data required for the Machine Learning modelling for AML module. Which no other data store provide similar performance results


How Druid Power Real-Time Analytics at BT
Pankaj Tiwari, Head of Engineering, BT

11:00am - 11:30am PT

We joined this Journey in Q2 2019 by asking Imply to help us with onboarding an in-house Network Performance Management Project and it has been an amazing journey with its fair share of ups and downs. DRUID has plenty of features which we can talk about, however the ones which enabled us to choose DRUID as our choice of database are:

  • Highly distributed and Horizontally scalable architecture
  • Share nothing architecture
  • Support for Aggregation/Post-aggregation and statistical functions https://druidsummit.org/cfp#page-submit


Analytics over Terabytes of Data
Swapnesh Gandhi, Senior Software Engineer, Twitter
12:00pm - 12:30pm PT


MoPub, a Twitter company, provides monetization solutions for mobile app publishers and developers around the globe. Mopub receives over 33 Billion ad requests per day generating over 200TB of raw logs every day. We built Mopub Analytics as the analytics platform, using Druid + Imply for our end users who are Publishers, Demand side partners and Internal users.

We will talk about the architecture of the analytics platform, our Druid cluster setup, hardware choices, monitoring, use cases, limiting factors, challenges with lookups and solutions we used.


Using Druid for Network Monitoring and Trust Analytics at Cisco
TJ Giuli, Principal Engineer, and Abhishek Balaji Radhakrishnan, Software Engineer, Cisco Systems
1:00pm - 1:30pm PT


At Cisco's Crosswork Cloud, we use Apache Druid for several use cases, including monitoring internet routing updates, tracking device inventory statistics, and ingesting trusted device events. In our talk, we share our experiences and insights on how we deploy, monitor, and integrate Druid with our applications. We describe the technical challenges that led us to migrate from a key-value data store to Druid, our pipeline architecture, and an overview of our streaming and batch workloads.

Key takeaways:

  • Experiences deploying, running, and monitoring Druid in production at Cisco.
  • Methods for safely querying multi-tenant data sources.
  • Techniques for using code-generation to manage ingestion, data source schemas, and provide a strongly-typed end-to-end data flow throughout our system.


Apache Druid Fireside Chat (Ask Us Anything)
Fangjin Yang, Druid co-author, Gian Merlino, Apache Druid PMC Chair, Vadim Ogievetsky, Imply Chief Product Officer and Druid Contributor

2:00pm - 2:45pm PT

Take advantage of a rare opportunity to talk to some of the world's most adept Druid experts. During this session, we open the mic to take your questions. Ask us anything. Really.