Data science and the currency of the future
|Richard Harris in Big Data Wednesday, February 6, 2019|
Data science, or the ability to manage large amounts of data, is a skill set that is still evlolving as our data consumption continues to grow year by year and we understand the ramifications of mass qualities of data. But it's not just about the data, it's about the data quality if you want to end up on your feet.
A recent study performed by IDC and Seagate forecasted that by 2025, global data will grow to 163 zettabytes. For scale, that’s approximately 1 trillion gigabytes per zettabyte. As the amount of available data grows, businesses are clamoring for larger and larger pieces of the pie. However, when it comes to leveraging data, success doesn’t rest on the quantity you have access to, but the quality of insights you can pull from it.
Raghu Chakravarthi, SVP of R&D and Support Services at Actian, understands that the key to digital transformation lies in data quality and he answered several questions about what businesses should be doing to boost this in a recent conversation ADM had with him. Things like; implementing an operational data warehouse – using real-time analytics to make decisions in the moment, getting comfortable with Data at the Edge – using edge computing to change the way we do business globally, and ensuring data is always secure – maintaining the safety of hybrid data in new terrain.
ADM: Data Scientist say that data is the new currency. Do you agree? Is there really that much data being generated every day?
Chakravarthi: Data is fueling the new economy and analytics is the engine behind the change. Google, Facebook, and the likes have been successful in offering free services for consumers. Meanwhile, these free services are supported by the data produced by these consumers, hence making the consumers the product. So, really, data is the new currency and analytics are the engine behind creating new revenue streams. With the proliferation of devices everywhere, there is a high volume of data generated by machine logs, smart sensors, and related technology, and it’s getting to be unmanageable. With the number of smart sensors in my home (from CO detectors and security cameras to the solar panel generation sensors), I’m able to produce 1 GB of data a day. The key factor to consider here is the quality of the data produced and how to correlate the data from various sources to make sense of it.
ADM: Why is data quality important? Why is it important to have fresh data and use real-time analytics?
Chakravarthi: Data analysts and scientists spend two-thirds of their time creating quality data that can be consumed downstream, whether as dashboards or through model development. That is an enormous waste of time for high-priced individuals and kills agility in producing insights. The majority of the time, bad data is not caught and results in incorrect decisions.
Fresh data is key because, in this fast-paced world of innovation, yesterday’s data does not reflect what is happening now and how it gives insight into the future. Real-time data feeds are the focus of the next generation of targeting for use cases such as hyper-personalization.
ADM: How can businesses improve their data quality? Why is it important to run data quality checks?
Chakravarthi: Collecting metadata about data, mining real-time data using anomaly detection techniques for figuring out the outliers, and applying machine learning to cleanse data is the way to improve data quality. Gone are the days of an ETL process performing the cleansing. Another important technique is to track data lineage to understand the source of the data and categorize the accuracy of the sources and the transformations that were performed on the data originating from that source.
ADM: How can implementing an operational data warehouse (ODW) centered on using real-time analytics help businesses sort through the influx of data and provide actionable insights? How is an ODW solution different from a traditional Hadoop-based data warehouse solution that we find in the market today?
Chakravarthi: As I mentioned earlier, data quality is the key to producing accurate insights. The core functionality that an operational data warehouse provides is updating incorrect data that was sourced and pushing the fixes back to the source systems. This functionality really helps the productivity of the analyst or the data scientist. The current traditional data warehouses are notoriously slow or incapable of providing this data cleansing functionality. This is a key difference that will make the Operational Data Warehouse the preferred choice for business analysts and data scientists in the near future.
ADM: How does edge computing factor in?
Chakravarthi: Edge computing is the holy grail in analytics. For analytics to truly scale for every human, performing complex computation at the edge is essential. Edge computing relies on a secure, fast and reliable connection to the aggression points to function. With the proliferation of various hacking methods such as the recent SQL Lite bug that could lead to remote code execution, there has never been a greater need to have a secure computing platform at the edge. Another factor is that given the limitations in connectivity and bandwidth, the edge computing platform should provide for a standardized interface to perform data cleansing and analytics at the edge.
ADM: How important is data security when working with fresh, quality data?
Chakravarthi: Data security is being scrutinized by government agencies and has led to heavy penalties for violators. This is the result of the consumer demand for privacy. Real-time use cases such as hyper-personalization rely on fresh, recent data to drive a different consumer experience that the new world demands. In implementing use cases such as hyper-personalization, one must really secure the data that the consumer is producing because, due to the time- or location-specific behavioral data, it can cause a lot of damage if it gets into the wrong hands.
ADM: Digital transformation seems to be the topic everybody is talking about right now. How does fresh, quality data factor into making this successful?
Chakravarthi: Digital transformation is making every business unit rethink every business process it executes to be data-driven. New revenue streams are being mandated to replace disappearing existing ones. To enable this, every business is transforming itself into a data-generating business, and they are using the generated data to figure out the behavior of their users and consumers. The behavioral analytics assumes that the data is fresh and new and also of decent quality so that it can predict with a certain level of accuracy the next action to be taken by the consumer.