Big Data

The impact of fast data on the modern application stack

Monday, October 9, 2017

How fast data and data streaming capabilities is becoming increasingly important to the modern application stack.

Lightbend has announced its Fast Data Platform to help operationalize applications built for streaming data use cases. The new distribution aims to support how organizations design, build and run fast data applications, as new use cases like Real-Time Decisioning, Real-Time Personalization, IoT and others evolve the big data requirements from its legacy batch/Hadoop roots to more “data in motion”/streaming data requirements. Lightbend VP of corporate development and biz dev, Brad Murdoch explains how fast data is disrupting the modern big data application stack, and the significance of Lightbend’s new distribution.

ADM: How is the way that developers work with data different today?

Murdoch: Business requirements are forcing developers to deal with more data, closer to real-time, and this concept of "fast data" really represents that shift that's been driven by a number of concepts that have overlapping definitions - real-time, 'data in-motion', streaming data - and how everything from framework choices to application design are supporting that constant flow of data through the application. What we're seeing is an evolution where applications aren't just being integrated with big data, the applications are increasingly being built around the data use cases.

"Real-time" of course has different meanings for different use cases. Some industries consider real-time merely speeding up a data response time from days to hours, while for others real-time truly means sub-millisecond. But the urgency to get faster with data is a pressure that an increasing percentage of developers are facing with the applications and systems they are building.

ADM: Based on your recent survey, what are some of the new frameworks and adoption choices that Lightbend is seeing change based on these new requirements?

Murdoch: There are so many choices out there. It's a great thing on the one hand to have so many different options at every layer of the application stack. And there's so much excitement around Kafka, Akka Streams, Spark Streaming, Apache Flink, Google Beam, and the new crop of machine learning libraries. But the range of technology choice can also make it very difficult for developers to pinpoint the right technology for their use case. For most enterprise use cases around fast data, developers need to match tools based on tradeoffs between latency, volume, transformation and integration.

ADM: There are roughly 10M JVM language developers - how well or poorly do you think that Java EE has met these new requirements?

Murdoch: Java EE - which was recently made an Eclipse Foundation project, so we'll see how it evolves - has been focused overwhelmingly on "data at rest" and accessing external data through JDBC and JPA. We have not really seen Java EE keeping up with the new use cases that are pushing Java developers into fast data requirements, and the rise of these frameworks was in some cases a response to the lack of Java EE capability within specific streaming data use cases. For example, Apache Kafka has become extremely popular as a messaging standard, precisely because of the lack of evolution with JMS. The JCP hasn't been able historically to keep up with the streaming evolution.

ADM: Lightbend advocates that Reactive is a key goal for these data-driven systems. Explain.

Murdoch: It used to be when you built Hadoop jobs you had to run them for a few hours or maybe a day. Now imagine that I have to deploy something that’s going to continuously run for days or months or even longer. Now I have to think about inevitable hardware failures and spikes in traffic - it raises the bar for what it means to build services that are extremely resilient, super responsive and highly elastic. In other words, these services need to be Reactive! It’s something that hasn’t been a requirement in the big data world, but it’s something that our microservices friends have been doing for a long time. So we think that one of the ways that people can be successful in streaming is to learn the lessons from the Reactive application / microservices world.

ADM: Tell us about Lightbend's Fast Data Platform and what you are trying to enable for developers tackling these use cases.

Murdoch: If you think about the day to day world of developers, architects and operations, they really have three phases: design, build and operate. Lightbend Fast Data Platform is our distribution that addresses those key areas.

When you’re thinking about design, you’re thinking about which tools and frameworks should you use - and we’re giving developers recipes including sample applications that match common requirements. When you’re thinking about build, you’re thinking about developer productivity, and the Fast Data Platform comes pre-integrated, saving significant developer cycles and makes installation trivial. For the operate phase, our distribution also provides the first viable toolset for real DevOps for an entire fast data system - including how to monitor and detect when problems occur, or prevent them from happening in the first place. And for both developers and ops teams, we offer expert support from a single vendor - to quickly resolve blockers and production issues across all your fast data frameworks. And that’s a very important part of what we offer with Lightbend Fast Data Platform.

Brad Murdoch, VP of Corporate
and Business Development, Lightbend

ADM: Machine learning and AI have captured a ton of interest, but appear to still be very early in terms of developers actually creating ML and AI- based applications. What do you think is the long term potential, and when will we start to see more mainstreaming of ML and AI?

Murdoch: One of the main reasons for the dearth of real-world ML-based applications is just how hard it has been to build a run-the-business system leveraging ML and AI - and a big reason why we built Fast Data Platform is to make this much easier. From real-time fraud detection, to personalization, to predictive-maintenance, machine learning models are at the foundation of the most interesting fast data use cases. What we see is that we're at this critical juncture with machine learning today where it's evolving out of its data science roots and in desperate need of being operationalized. We are also starting to see some advanced use cases beginning to leverage deep learning, but it's still pretty nascent.

ADM: What happens to all of the Hadoop / batch architectures? Are they end of lifing or do they live alongside this new crop of Fast Data frameworks?

Murdoch: It's a misconception that fast data will replace the entire Hadoop ecosystem. For starters, "real-time" carries different meaning for different developers. For many, simply speeding up batch jobs in Hadoop can narrow a job from a day to an hour-and that's "fast enough." Not every use case and industry is going to require sub-second response times with data. And there are many scenarios where the business case will require infrastructure featuring both Hadoop and Fast Data. So it’s not that we’re going to turn off our Hadoop clusters or stop doing data warehousing - they’re obviously still very important. But increasingly, if we can get value out of our data as quickly as possible, that opens up a world of new opportunities.

About Brad Murdoch

Brad brings more than 25 years of enterprise experience to Lightbend. Brad has filled a number of executive roles in business development, marketing, strategy and operations for early and growth stage companies in the professional open source, cybersecurity and enterprise mobility markets, including JBoss (acquired by Red Hat), Prevoty, Framehawk (acquired by Citrix), and Nukona (acquired by Symantec). Brad holds an honors degree in Computer Science from the University of Glasgow.