Boost Data Quality and Trust

Learning Objectives

After completing this unit, you’ll be able to:

Describe how to bring metadata into the catalog in the Metadata Command Center.
List Metadata Command Center use cases.
Distinguish between business and technical assets.
Explain how to monitor data quality in Cloud Data Governance and Catalog.

Metadata Command Center

In the modern enterprise, data is the lifeblood of business. Yet without governance, an organization’s data is a massive, stadium-sized junk drawer. To turn this chaos into a competitive advantage, organizations rely on Cloud Data Governance and Catalog (CDGC).

But how does CDGC know what’s inside the junk drawer? To use this environment, you must understand the machinery that powers it, the language it speaks, and the health monitors that keep it running.

Enter the metadata command center (MCC). If CDGC is the clean storefront that Maria, the marketing manager, uses to search for data, MCC is the administrative hub where Mateo, the data architect, configures scanners. Think of these scanners as automated sorting machines that Mateo sends into the various corners of the junk drawer—whether they’re in the cloud (AWS S3, Azure Data Lake), in traditional databases (Oracle, SQL Server), or in business intelligence (BI) tools (Tableau, Power BI).

The MCC page where you configure a scanner.

Alpine Group just moved its customer loyalty data to the cloud. Mateo is excited to use MCC, because that means he doesn’t have to manually document 500 new tables. To get started in MCC, Mateo selects the Snowflake Scanner and provides his credentials.

Then, MCC scans the data and automatically gets to work.

Extracts metadata: It pulls the data about the data—table names, column types, and creation dates.
Profiles the data: It verifies that a column named Phone actually contains phone numbers.
Discovers lineage: It identifies that the Customer_Gold table in Snowflake originated from a CSV file in an S3 bucket.
Classifies data: It uses AI to examine metadata and automatically tags them, so the system can classify data without needing manual rules.
Discovers relationships: It automatically uncovers hidden links between tables and columns by analyzing their actual values. Instead of guessing how datasets fit together, Maria gets instant suggestions on how to join them.
Associates glossaries: It automatically links business terms from the glossary to Maria’s columns and tables, which helps assign user-friendly names to technical data without any manual effort.

Once the scan is finished, this digital DNA is published into the data catalog, making it searchable for everyone.

Additional MCC Features

MCC includes a wide range of features beyond catalog source scanning. These features encompass:

Lookup tables: Upload and manage lists of known values, such as lists of countries or codes, to help the AI accurately identify data and match it to the right business terms.
Job monitoring: Check the status of background tasks to determine whether your data is being successfully scanned and updated.
Lineage management: Connect different data sources to create a visual map that shows exactly where your data comes from and where it goes.
Access control: Set up permissions and roles to make sure only the right people can view or change specific information.
Workflow management: Build and track approval processes to ensure the right experts review and approve changes.
Asset groups: Group related assets together to simplify management and control access permissions collectively in MCC.
Custom attributes: For specific data, you can create additional properties that describe the data.
Classification rule building: Create simple or complex rules to classify data while bringing it into the catalog.

Bridge the Data Language Gap

One of the biggest hurdles in any company is the language barrier. Remember those cryptic, faded labels in the data junk drawer? They highlight a massive challenge in data governance—Mateo speaks in code, but Maria speaks in outcomes. They’re speaking different dialects when reviewing the same piece of data. To organize the catalog, you need to understand the two types of assets inside it.

Technical assets (Mateo’s language): These assets are the how and where. These include the physical data structures as the computer system understands them. They include tables, columns, views, ETL jobs, and API endpoints. To Mateo, a technical asset might be named CUST_LTY_V3_FINAL. This is useful for knowing which server it sits on, but it tells the average person nothing about what the data represents.
Business assets (Maria’s language): These assets are the what and why, and they provide context humans need to get work done. These include real-world concepts, policies, and metrics the data actually represents, for example net profit, stakeholders, and domains. Maria doesn’t care about SQL table names like Mateo does, she just wants to find the customer loyalty scores to build her marketing campaign.

If Maria searches the catalog for loyalty members, but the system only knows the technical name (CUST_LTY_V3_FINAL), she’ll never find it.

This is where the business glossary in CDGC comes in. It acts as the ultimate translation dictionary, connecting Mateo’s technical assets to Maria’s business needs. When Maria types in her business term, the catalog automatically translates it, searches the technical assets, and gives her what she needs. Thanks to CDGC, IT and business users are finally speaking the same language!

Ensure Data Quality

Having data is great; having accurate data is essential. Using “dirty” data to make business decisions is like grabbing a rusted, dead battery from the junk drawer—it might look like the exact item you need, but it won’t power your project. CDGC integrates with the data quality service to provide a rigorous, automated framework to measure, monitor, and improve the reliability of your information.

Define the Rules of Health

Within CDGC, you don’t just hope the data is good; you define the exact criteria for “good” data using Data Quality Rules. For example, to ensure Alpine Group’s marketing lists are valid, Mateo creates a rule for the Email Address column stating, “Every entry must contain an @ symbol and a valid domain extension.”

Once rules are set, CDGC’s engine (through MCC) runs these checks automatically. It flags any information that doesn’t follow Mateo’s specific rules. For example, it checks if there are empty cells in a Customer ID column, if a shipping address actually exists in the real world, or if there are duplicate records for the same person.

There are countless checks you can perform to ensure your data is exactly how it should be—complete, reliable, and trustworthy.

Review the Quality Scorecard

The most engaging part of CDGC is the scorecard. Much like a health app on your phone, it gives each dataset a health score, such as 92%.

Imagine Sunil, a financial analyst at Alpine Group, is about to run the end-of-quarter reports. He finds the Quarterly Sales table in the catalog. Before he exports the data, he finds a critical data quality score of 65%.

He clicks the score and discovers that 35% of the Tax Amount column is missing. Because he caught this before generating the final report, he can alert the designated data steward to fix the source system. Governance just saved Sunil and the chief financial officer from presenting incorrect numbers to the board of directors!

Secure the Foundation

By using MCC to automate discovery, bridging the language gap between business and technical assets, and monitoring data quality, CDGC ensures Alpine Group’s data is searchable, understandable, and trustworthy.

However, now that you’ve built this high-speed digital library, the final crucial step is ensuring that the right people (and only the right people) can get inside and view only what is accessible to them, leading us to our next unit on access control and data access management.

A Data Quality Scorecard for an asset in CDGC.

You’ve now learned how the catalog is populated and governed. Next, you learn how to secure it so only authorized people can access specific assets. How do you ensure that your sensitive data is handled appropriately? Move on to the next unit to learn all about data access control and data access management.

Estimación de tiempo

Temas

¿Necesita ayuda?