Study Show Companies Struggle with Big Data Management Performance Issues Because of Bad Data
Thursday, June 23, 2016
StreamSets has announced the results of a survey that delved into the challenges of bad data on data management performance issues. The survey was conducted by Dimensional Research and included responses from 314 data management professionals globally.
The primary research goal was to capture how companies manage the flow of big data. The research also investigated and documented current tools’ capabilities, data quality and efforts to maintain big data pipelines and infrastructure.
The survey revealed pervasive data pollution as companies face challenges on a number of data performance management issues including the lack of ability to stop bad data to keeping data flows operating effectively. 90 percent of respondents report flowing bad data into their data stores while just 12 percent consider themselves good at the key aspects of data flow performance management.
Among the findings:
- Ensuring data quality is the most common challenge faced when managing big data flows (68 percent).
- In relation to bad data flowing into stores, 74 percent of organizations reported currently having bad data in their stores, despite cleansing data throughout the data lifecycle.
- While 69 percent of organizations consider the ability to detect diverging data values in flow as "valuable" or "very valuable," only 34 percent rated themselves as "good" or "excellent" at detecting those changes.
- 12 percent of respondents rate themselves as "good" or "excellent" across five performance management areas detecting the following events: pipeline down, throughput degradation, error rate increases, data value divergence and personally identifiable information (PII) violations.
- Performance degradation (44%), error rate increases (44%) and detecting divergent data (34%) were where respondents felt weakest.
- Detecting a "pipeline down" event was the only metric where a large majority felt positively about their capabilities (66%).
- In relation to tweaking of pipelines due to data drift, 85 percent said that unexpected changes to data structure or semantics create a substantial operational impact. Over half (53%) reported that they have to alter each data flow pipeline several times a month, with 23% making changes several times a week or more.
- Nearly two thirds of respondents use ETL/data integration tools and 77 percent use hand coding to design their data pipelines.
Read more: https://19ttqs47cfw33zkecq3dz58m-wpengine.netdna-s...
Become a subscriber of App Developer Magazine for just $5.99 a month and take advantage of all these perks.
MEMBERS GET ACCESS TO
- - Exclusive content from leaders in the industry
- - Q&A articles from industry leaders
- - Tips and tricks from the most successful developers weekly
- - Monthly issues, including all 90+ back-issues since 2012
- - Event discounts and early-bird signups
- - Gain insight from top achievers in the app store
- - Learn what tools to use, what SDK's to use, and more