Data Catalog is now available in public beta
|Richard Harris in Big Data Tuesday, July 2, 2019|
At Google Cloud Next'19, Google announced Data Catalog, a fully managed, data discovery and metadata management service that allows you to quickly discover, manage, and understand your data in Google Cloud.
At Google Cloud Next '19 San Francisco, Google introduced Data Catalog, a fully managed, data discovery and metadata management service that allows you to quickly discover, manage, and understand your data in Google Cloud. We're announcing that Data Catalog is now available in public beta.
Simple and powerful data discovery
Data analysts can now use Data Catalog to easily search for tables in Google BigQuery or topics in Cloud Pub/Sub across all cloud projects that they can access. Data Catalog uses the same search technology that supports Gmail and Google Drive, allowing you to quickly find data by table name, column name, or business metadata in tags using various filters. Integration with access controls defined in Cloud Identity & Access Management(IAM) returns data that you have access to, reducing the need to configure additional permissions within Data Catalog.
“Data Catalog gives us the flexibility we need in metadata management,” says Crystal Widjaja, SVP, Business Intelligence & Growth at Go-Jek. “Integration with Cloud Identity and Access Management (IAM) means that data discovery is ACL-ed though the Data Catalog search index, giving us peace of mind.”
Understand your data with schematized business metadata
Data Catalog allows data stewards to tag data assets with metadata and easily search through them. You can define business metadata using tag templates and apply them to various data assets. Data Catalog extends the traditional business glossary concept by supporting doubles, booleans, and enumerated type in addition to storing metadata as strings. For example, you can assign a business category as an enumerated type to a data asset from a preset list of categories, ensuring consistent categories are used when capturing metadata. Data Catalog provides a wealth of API options that augment the UI. With the API, you can bulk attach tags as part of a data processing pipeline as soon as a table is created in BigQuery, storing information such as the last ETL update time as a tag.
Automatically detect and classify sensitive data with Cloud Data Loss Prevention (DLP)
In recent years, increased regulatory and compliance requirements are driving companies to data governance solutions. The Cloud DLP integration enables data governors to create jobs and scan hundreds of tables for sensitive data and attach tags in Data Catalog. This allows you to find tables with sensitive data types and classify them with DLP generated tags across all their data on Google Cloud, providing you with a richer set of data out-of-the-box, and complementing other tagging processes. With DLP, you can also configure periodic scans to keep the tags updated, ensuring compliance at all times.
To use Data Catalog, navigate to your GCP console, and click on Data Catalog in the left navigation panel. All your BigQuery tables are automatically indexed and searchable. Data stewards can define business tag templates to be applied to all datasets. To learn more about using Data Catalog for discovering data and metadata management, check out the overview of Data Catalog or the documentation.