Enterprise 25,229 VIEWS
Posted Tuesday, October 20, 2015 by Apurva Dave
READ MORE: http://www.jut.io/...
All Hail Point-and-Click Apps!
A few years ago consumerization of everything was the rage. The belief was that enterprises would become more productive by rethinking every one of their processes and applying consumer principles to them. Setting up a new datacenter might be like scheduling your laundry service via your phone.
Not surprisingly, data analytics were swept up in that trend as well. It just so happened that the big data trend coincided with the consumerization trend. So … big data should just be a string of point-and-clicks that magically get you to the answer, regardless of who you are. CEO, product manager, production manager, IT operations - you’re a few simple clicks from processing terabytes of data and having the correct answer pop on to the screen of your phone.
All kidding aside, the consumerization movement has dramatically improved the accessibility of data. A focus on the workflows of users vs the data itself has helped give individuals more access to relevant data than they’ve ever had before. It’s a step in the direction of turning your organization into a data-driven business. All good, right?
Well, here’s the flip side: In order to simplify the complex task of correlating and calculating insights, many applications need to make fundamental assumptions about your data and what’s important to your business. What you typically see is that:
1. Products have strict requirements about what data is ingested and how it should be structured, regardless of whether that fits your ideal data structure.
2. There are underlying assumptions that the application layers into your data that your data consumers might not even know about.
There’s a way around these problems. It sounds contrarian, but hear the case out: Analytics should be treated as code.
Not only can those analytics be more powerful and do a better job of representing your data accurately, code can also make your data more accessible throughout your organization than a typical application can.
The requirements of analytics as code
Despite the consumerization movement, code doesn’t need to be a bad word. A good model for analytics as code is what happened in server infrastructure and management. What used to be managed by a slew of vendor-specific applications became “infrastructure as code” – lightweight scripts that could quickly and easily deploy new software in the cloud or in data centers.
As opposed to depending on a specific vendor app to know exactly how you’d like to deploy, how you’d like to customize your deployment, and even what 3rd party software you’d like to install, you just write a simple script that can manage deployment. That script is reusable by others (for the same or different use cases), it’s versioned and it’s more reliable than having a bunch of knowledge locked up in a set of silo’d management apps.
There are three core requirements for this shift that aren't clear by just talking about analytics as code, so let's lay them out here:
1. Analytics as code must be high level declarative code. Think about SQL but designed for streaming data. It's easy to describe what you want, not how the underlying system simplement the query.
2. The code must cover everything from getting the data, to analyzing the data, to visualizing the output. There is no need to implement separate libraries, frameworks, or apps for any part of the analytics puzzle.
3. You must be able to hide the code, while giving non-technical users a way to interact, filter, and transform the data.
Analytics as code: a quick example
Analytics as code doesn’t have to be something like low-level, functional MapReduce projects written in Java. In fact, the way code can be more accessible to data developers is to be built around a domain-specific language specifically for big data. Then, much like the concept of infrastructure as code, analytics as code can be defined as lightweight scripting that quickly gets any developer, DevOps, or analyst to an answer. For example the basic structure of an analytics query “script” looks like:
read data | analyze | visualize
This structure is what Jut uses in its language, Juttle, for streaming analytics processing. Areal-world query looks like this:
Read -last :15 minutes: metric_name=“response_time”
| reduce -every :10 seconds: maximum =max(value) by host
//graph max response time per host over the last 15 minutes, in 10 second intervals
So why analytics as code?
Code-driven analytics allows you to open the hood and see the assumptions being used to create the data. Code allows you to tap into all of the data you have stored into the system without regard to what someone else’s pre-defined system says you “should” be able to do. This gives you the power to do things like:
- Try and correlate two distinct data streams that your vendor didn’t think were related
- Convert one data type (like event messages) into another (metrics)
- Easily "join" external datasets in ways that make sense for your analysis
At a high level, analytics as code excels because:
Code is more powerful.
Apps have pre-defined analytics that make it easy for you to get to an answer, but is it the right answer? Analytics as code allows you to define how you want to look at the data. Should you be looking at raw data? The average? 90th percentile? Or the rate of change?
Should you be simultaneously graphing the data and looking at the results in a table? Treating analytics as code makes this all customizable in a way that makes the data more relevant to your business and to your users.
Code is reusable.
A set of analytics like the ones above can easily be cloned, versioned and reused by anyone on your team. For example, here’s a set of Juttle analytics code that calculates the rate of change for time-series data. https://gist.github.com/welch/8ace31194b91a9a86a8a
Naturally you could fork this code on Github as needed. But what’s more powerful is to be able to apply programming-like philosophies to your analytics. You could write your own analytics but leverage someone else’s code like this:
import "https://gist.github.com/welch/8ace31194b91a9a86a8a” as trend
read -last :15 minutes: “your data”
| reduce -every :2 seconds: trendline = trend (value)
There you go - you just created a trend analysis byre-using someone else’s code. Note this idea isn’t anything new - it’s just doing what coders do all the time anyway.
Code can make data more accessible, not less
Code is generally associated only with developers. Even if code is powerful, that doesn’t sound very accessible. But frankly, some off-the-shelf application that has specific, opinionated assumptions built into its framework is also not accessible - it’s just that data consumers might not realize that their data is being filtered, aggregated or somewhat limited before they even touch it.
It might be counter-intuitive, but high-level code can make data more accessible.
1. Coding makes data more accessible to data developers. Developers, DevOps, Data Scientists, Business Analysts can all take a code-like approach. Not all of them will be coding MapReduce jobs, but if analytics could instead be accessed at a high level -like the scripting approach above - it’s accessible to anyone who does something like SQL or generally has scripting familiarity. It’s in fact a preferred choice for many people who like to roll up their sleeves and get dirty with data.
2. The result of coding is more accessible to data consumers. Data developers can build - quickly and iteratively - analytics and lightweight applications that are custom-tailored to a use case or an environment. There will still be assumptions built into the data - but now it’s created in a way that makes sense for your business, for your user. The business user will still interact with data visualizations and use controls to slice and filter the data. But now when the users want to see the result differently - or even peer into why and how a program is working the way it does - high level analytics code can be shared, quickly be adapted or changed. It's a huge step forward from heavyweight functional code that requires multiple developers and long development cycles.
For organizations that are serious about data, perhaps it's time to rethink those off-the-shelf applications in favor of analytics as code.
READ MORE: http://www.jut.io/...