Artificial Intelligence

Machine learning expands with new ArangoML Pipeline Cloud

Friday, January 31, 2020

ArangoML Pipeline Cloud on ArangoDB Oasis provides a fully-hosted, fully-managed common metadata layer for Machine Learning pipelines

ArangoDB announced the release of ArangoML Pipeline Cloud, a fully-hosted, fully-managed common metadata layer for production-grade data science and Machine Learning (ML) platforms. ArangoML Pipeline Cloud runs on ArangoDB Oasis, ArangoDB’s recently released cloud service, and is the latest offering in ArangoDB’s ML extension, ArangoML.

ArangoML Pipeline Cloud meets the needs of both data scientists, who are concerned with the quality of the data, feature training, and model results, as well as DevOps, who need to manage which datasets and deployments are in use, their performance, and how they are being deployed. ArangoML Pipeline Cloud centralizes the metadata produced across the ML pipeline, providing a common interface to show relationships of the data, features, and model training results, as well as the deployments, management, and serving logistics. ArangoML Pipeline solution is pipeline agnostic, allowing any combination of pipeline components to be connected. Additionally, as a cloud-based service, it can be up & running in just a few clicks.

“Common metadata is an often overlooked aspect when building production-grade ML pipelines, but is equally as important as good training data,” said Jörg Schad, Head of Engineering and Machine Learning at ArangoDB. “It is not only crucial for DataOps teams when looking for reproducible builds, audit trails, or compliance with privacy regulations, but extremely valuable for data scientists as well -- allowing them to easily grasp the lineage of models, what artifacts are involved, and also enabling performance comparisons across different models and approaches.”

As a multi-model database, ArangoDB can easily accommodate and unite unstructured, highly-interlinked data, such as inference and model descriptions, and allow relationships between them to be stored as a graph that can be managed by the DevOps engineer and used by the data scientist at the same time.