RAPIDS data access acceleration comes to MapR
|Christian Hargrave in Big Data Friday, October 12, 2018|
RAPIDS data access acceleration comes to MapR to provide data scientists with data faster and easier.
MapR Technologies, Inc. has announced support within the MapR Data Platform to accelerate data access and production deployments for data science through the RAPIDS open-source software. MapR helps data scientists accelerate the access of required training data by focusing on easing the issues of on-boarding, cleansing, cataloging, and feeding data at high performance to GPUs and NVIDIA DGX systems. The MapR solution also manages the deployment and management of multiple models into production to speed business impact.
"The challenge for most data scientists is the data logistics to locate, prep and access the right data for training. In many cases, 90 percent of the time is spent data wrangling," said Anil Gadre, EVP and chief product officer, MapR Technologies. "MapR complements RAPIDS with a data management and logistics fabric to accelerate the high-scale processing and access of disparate data across geographies. The same fabric also speeds the deployment of models into production and coordinates the continuous deployment and updating of multiple models to impact business in real-time at scale."
Central to the solution is the ability to coordinate data flows from across the enterprise and, through a pre-built MapR container for GPUs, make it easy to integrate into NVIDIAs complete end-to-end data science training pipelines.
The MapR Data Platform for RAPIDS enables data scientists to:
- Collect data at scale from a variety of sources and preserve raw data so that potentially valuable features are not lost
- Make input and output data available to many independent applications even across geographically distant locations, on premises, in the cloud or at the edge
- Manage multiple models during development and easily roll into production
- Improve evaluation methods for comparing models during development and production, including the use of a reference model for baseline successful performance
- Support rapid stream-based delivery of standard files including Parquet, ORC, JSON, AVRO, and CSV file formats directly into RAPIDS
"MapRs work with NVIDIA in the RAPIDS ecosystem is helping make broad adoption in the enterprise easy for the largest breadth of workloads," said Clément Farabet, vice president of AI infrastructure at NVIDIA. "MapRs ability to span on-prem and cloud, from IoT edge to core with a scalable, high-performance common platform means that more data can be fed to GPUs and more innovative applications can be created by data scientists faster."