OpsClarity CEO Dhruv Jain Explains How He is Bringing Machine Learning to DevOps

Posted 12/3/2015 8:04:28 AM by STUART PARKERSON, Publisher Emeritus

OpsClarity CEO Dhruv Jain Explains How He is Bringing Machine Learning to DevOps
OpsClarity just announced the launch of its Intelligent Operations Platform, which offers a suite of tools for visualizing, understanding, and troubleshooting performance issues for complex web-scale applications and infrastructure. We recently spoke with Dhruv Jain, CEO and co-founder of OpsClarity about the new company and its platform.

Prior to founding OpsClarity, Dhruv worked in Corporate Strategy and Business Development at Cisco Systems where he helped to drive technology and product strategy for the $7B+ Data Center, Virtualization and Cloud business. Dhruv holds an MS in Electrical Engineering from Stanford University, a BTech from Indian Institute of Technology, Bombay and an MBA from Wharton.

ADM: Who is OpsClarity and what are you launching?

Jain: OpsClarity’s vision is to build intelligent, data-driven software for today’s DevOps teams. The rise of cloud-native services and web-scale application architectures has resulted in massive change, scale and complexity challenges for operations. Managing this requires a next-generation solution that can automatically adapt to, and understand, dynamic application environments and proactively surface actionable insights. 

We are leveraging our roots in data science and large-scale streaming analytics to build a smart solution that brings new levels of visibility, focus and productivity to modern software operations. 

We are launching the OpsClarity Intelligent Operations Platform, a system that adapts to, and learns, every component of a company’s dynamic software environment, proactively detects system anomalies, and automates many of the manual tasks performed by operations engineers. 

ADM: What problem does OpsClarity solve?  How does it help DevOps teams?

Jain: The age of web-scale applications has created a massively complex challenge for modern software operations. Today’s applications are cloud-native and powered by containerized micro services – and the number of micro services and instances per micro service are expanding exponentially. 

All of this has created an environment where applications and infrastructures are constantly changing and there is an explosion of operational data to understand and analyze – including metrics, events, and alerts from cloud infrastructures, containers, micro services, open-source frameworks, and custom applications.

We have developed a library of complex algorithms based on the context of a company's specific metrics (which we are constantly learning) to automatically understand, synthesize, and baseline this data and present it in a way that drives immediate understanding and focus for DevOps teams. This is coupled with advanced visualizations that guide Ops teams to focus on what matters rather than having them stare at, and sift through, hundreds of graphs and try to make sense out of them. 

In short, we help modern software operations teams instantly understand the health of their entire environment and efficiently troubleshoot by identifying the most important anomalies and providing deep contextual information. 

ADM: What is unique in the way OpsClarity helps modern software teams monitor and troubleshoot their operations environment?

Jain: Having built navigation systems in the past, our team realized they could apply this visualization model to operations data. Our hierarchical view provides a layered visualization of application topology, overlaid with component health for an immediate understanding of overall system health. This visual paradigm helps Ops teams identify what matters most and what requires their most immediate attention. 

Think of it as a guide navigating you through the troubleshooting process instead of having to rely on intuition or searching for needles in a haystack. 

ADM: Can you explain the core technology and features of your platform?

Jain: At the heart of the OpsClarity platform is the Operational Knowledge Graph (OKG), a powerful engine with the accumulated knowledge of the most experienced operations engineers. OKG speeds the configuration and analysis your software environment by understanding and constantly learning operational data models, application and service topologies, critical metrics and events for each service, and failure patterns and propagation. The OKG is leveraged by all features of the platform, including:

Application service map: All services on every host are automatically discovered and clustered to create a logical application topology view. Connections between service clusters show relationships and  dependencies, giving every team member a real-time, architect-level view of the entire application hierarchy and topology. 

Unified visualization: A host-oriented view provides a powerful, real-time visualization of your hosts and the services they are running, all with a health status overlay. Choose to group hosts using built-in tags like “service type” or custom tags like “application name,” and you can quickly understand how each host powering each function is performing — at a glance. 

Timeline and event replay: Quickly drill into an interactive timeline to understand when and where a specific failure occurred. Use the DVR-like feature to help you better understand root cause of the failure, how it  propagated from one service to another, and when it started to affect service health. 

Powerful event correlation: Data-science-driven event correlation provides a real-time running log of the most important issues requiring attention. Events are generated from automatic anomaly detection – that learns system behavior without static thresholds – as well as traditional availability checks and metric threshold violations. You can then filter by importance, type, and scope to find and diagnose problems, fast. 

Drill down dashboards - Service-level dashboards present fine-grained metrics for each service alongside system metrics of the underlying hosts. Drill down to get clear confirmation of any problem you’re troubleshooting, as well as detailed health data that your application operations team can use every day. 

ADM: What is the single most important advantage OpsClarity delivers to today’s DevOps teams?

Jain: The OpsClarity platform helps Ops teams understand everything happening in their environment - instantly. It provides the most complete view and the fastest path to troubleshooting. 

To do that, we use a data science approach coupled with advanced layered visualizations to guide DevOps team to focus on issues that really need their attention, rather than them having to sift through multiple dashboards, each with 10s and 100s of graphs and chart. 

This ability to get a complete overview of  overall system health combined with contextual information from drill down dashboards, timelines and alert logs, enable DevOps team with a radically different approach to troubleshooting.

ADM: What do you do differently than the other players in your space?

Jain: In this new world, applications and their underlying infrastructure have become more complex and distributed, but the monitoring and troubleshooting tools available to operations teams still depend on manual analysis of thousands of metrics across hundreds of graphs displayed on crowded dashboards. 

Our data-science-driven approach makes this whole process more efficient by using sophisticated algorithms to automatically detect application architecture and system relationships, apply built-in domain knowledge about modern application stacks and relevant service metrics, and understand and visualize real-time operational data in a powerful and intuitive way.

ADM: What is your funding to date?  Who led it?

Jain: OpsClarity has raised $11M in Series A funding, led by NEA, one of the world’s largest and most active venture capital firms. Additional top-tier investors include Pinnacle Ventures, AME Cloud, Morado Venture Partners and other well-known angels.

ADM: What are your plans for the future?

Jain: Our first solutions are focused on automating operational monitoring and troubleshooting of web-scale applications but our platform can be applied to many more aspects of modern software operations. As we expand the use cases for our platform we will continue to focus on ways to leverage our data-science approach and advanced visualizations to other areas, extending to use-cases encompassing deployment monitoring, capacity planning and correlation with business events and logs. 

Read More http://www.opsclarity.com/...


About the author: STUART PARKERSON, Publisher Emeritus

Stuart Parkerson has an extensive background in niche technology publishing.

Subscribe to App Developer Daily

Latest headlines delivered to you daily.