Big Data

DevOps for Big Data with Pepperdata

Wednesday, April 19, 2017

DevOps for Big Data is bigger than ever for reducing the bottleneck effect.

Despite huge investments in big data applications, there’s still a bottleneck as developers and operators try to find effective and efficient ways to adjust and correct their big data applications’ code. As a result, companies deploying applications suffer from decreased developer productivity and cluster efficiency - a critical flaw to keep up with today’s big data influx. Enter DevOps for Big Data.

The company Pepperdata was founded in 2012 by industry veterans Chad Carson and Sean Suchter, Pepperdata is a Big Data startup based in Cupertino, CA. Companies such as Comcast, Philips Wellcentive, and Zillow depend on Pepperdata to manage and improve the performance of Hadoop and Spark.

Enterprise customers use their products and services to troubleshoot performance problems in production, increase cluster utilization, and enforce policies to support multi-tenancy. They also work with customer Big Data systems both on-premise and in the cloud.

We recently had a chat with Ash Munshi, CEO to chat about the initiative and all things DevOps.

ADM: Who are your biggest competitors today?

Our biggest competition is from in-house developed solutions generally cobbled together using tools such as Ambari, Grafana, and Nagios.

ADM: What’s the current state of DevOps?

DevOps for Big Data is very much an evolving story. While tools for development, build, test, and release are shared with classical DevOps, many more tools are specific to Big Data. As a result, there is little standardization, particularly when it comes to integrating performance information into the DevOps chain. With Big Data, performance can mean the difference between business critical and business useless. Pepperdata is focused on bringing performance feedback into the entire DevOps chain for Big Data.

ADM: What are the common misconceptions of DevOps?

Is it process? Is it culture? Is it a tool/software? These are simple questions at the heart of much of the misconception around what DevOps is. Most organizations fall short because they don’t truly understand what DevOps means or how to align their IT organizations to successfully adopt it. The reality is that DevOps is all of the above and more. DevOps is a holistic approach to delivering high quality products and services to customers as efficiently as possible. It requires everyone involved in the process to rethink how they work and collaborate around a common set of goals.

ADM: Why do Big Data applications need DevOps?

DevOps is the modern standard for application development and delivery. DevOps fosters collaboration and communication between developers, quality assurance, and IT operations professionals. DevOps tool chains improve and automate stages and feedback loops within the DevOps cycle of plan, code, build, test, release, deploy, operate, and monitor. DevOps can shorten time to delivery, improve user satisfaction, deliver better quality product, improve productivity and efficiency, and better meet user needs by allowing faster experimentation.

DevOps is a part of many successful Big Data environments, even if it’s not always recognized as such today. DevOps-style rapid iteration, feedback, and release cycles are clearly used in many Big Data environments. And companies today are actively recruiting and hiring staff for these roles. DevOps for Big Data uses many of the same tools as traditional DevOps environments, such as source code management, bug tracking, continuous integration, and deployment tools. Some examples of Big Data specific DevOps tools include Anaconda, Apache Zeppelin, and Jupyter notebooks.

Pepperdata expects there to be increased focus on DevOps for Big Data. New practices, technology, and software will emerge to better support DevOps for Big Data - and Pepperdata will be contributing to that with a focus on performance aspects of DevOps for Big Data.

ADM: Where do you see the future of Big Data (and why should DevOps fit into that)?

Big Data is on an accelerated path to becoming more and more mainstream. At Pepperdata our focus is companies that are running production Big Data environments to drive their business. We work with some of the top companies in the world that have strategically and operationally mastered Big Data.

Over the past 18 months this trend has accelerated, and as Operations teams have become responsible for running Big Data deployments, we see two key trends. First is that Big Data must fit into the way IT teams work today, and that means DevOps. DevOps is the modern standard for application development and delivery. Secondly, as Big Data becomes more integrated into the business processes and operations of a company, there will be an accompanying technical convergence of Big Data architectures and workloads with mainstream data center technology.

Big Data has traditionally added significant complexity to IT environments because it comes with different technologies, different tools, and different skillsets required to support it. Over time, you will have Big Data and traditional workloads running side by side on the same orchestration framework, with a common set of management tools.

Ash Munshi, CEO of Pepperdata

ADM: Given your experience, how has the technology industry shifted in the last few years?

When we started Pepperdata in 2012, it was already clear that the kinds of technology we’d been building and using at online giants like Microsoft and Yahoo was being adopted by other companies and industries. Using large distributed systems to solve problems with huge amounts of data was starting to have real business impacts.

In the years since then, we’ve only seen this trend accelerate. Big Data applications are increasingly moving from R&D projects into production, which means these large distributed systems and the applications that run on them must be efficient and easy to operate. We’ve also seen companies move some or all of their Big Data infrastructure onto the cloud.

Finally, over the past few years we’ve seen an explosion in Spark adoption and the use of Machine Learning.

ADM: What technical innovations have you made with the release of Application Profiler?

Pepperdata Application Profiler is built on an open-source project built by LinkedIn, called Dr. Elephant. We’ve integrated Dr. Elephant’s open-source code as a Software-as-a-Service offering. With Application Profiler, we now bring performance feedback all the way back to developers. Application Profiler analyzes all Hadoop and Spark jobs running on the cluster and provides developers with technical insights on how each job performed. It gives developers actionable recommendations for tuning jobs and lets them validate tuning changes made to applications, with a before-and-after comparison. It also lets operators quickly greenlight new jobs before they move to the production cluster.