Data governance

Intelligence and unified data governance in the multi-cloud era

Today, it is imperative for organizations to adapt to an increasingly data-driven world and develop analytical agility. However, this is easier said than done, given the diverse sources of information that organizations manage and the complex mechanisms of data processing, including data movement, data discovery, cleansing and preparing reliable data for analysis, etc. The challenge is doubled when you don’t know where your data comes from and what it means. At Data Engineering Summit 2022, Kirthi Ganapathy, Customer Engineering Manager at Google Cloud, shared insights, key learnings, and best practices around intelligent metadata management, security, and governance in a diverse data environment. and widely distributed.

What is Data Governance?

Data governance, at its most basic level, is the practice of improving an organization’s data to make it accessible, understood, protected and trusted. Every business needs to think about the entire data lifecycle, starting with data reception and ingestion, cataloging persistence, curation, storage, management, sharing, archiving, backup , recovery, disposal, deletion and deletion of data.

The data governance framework is based on four main pillars:

  1. Data discovery: data classification, data lineage, metadata and catalog and data quality
  2. Data management: lifecycle and records management, master data, master data and SRE
  3. Data protection: masking, encryption, access management, audit and compliance, residency and retrievability
  4. Data Accountability: Ownership, Policies and Standards, Domain-Based Governance, and Ethics

“Data governance encompasses the ways in which people, processes and technology can work together to enable auditable compliance with defined and agreed upon policies across different technical solutions and different infrastructure boundaries,” Kirthi said.

Data Priorities

“What organizations really want is to be able to learn from the data they have, without any restrictions, without necessarily moving it around, and in a way that makes sense to them,” Kirthi said.

An intelligent data fabric enables organizations to centrally manage, monitor, and govern data across data lakes, data warehouses, and data marts with consistent controls, providing access to trusted data and powering analytics in large scale. It provides unified metadata-driven data management through a single pane of glass, centralized security and governance, enabling distributed ownership with global control, built-in intelligence to unify distributed data without data movement, and a single platform. open form supporting open source tools and a robust partner ecosystem.

What is a data mesh?

Data mesh is a type of data architecture that makes data accessible, available, discoverable, secure, and interoperable. It combines two principles: domain-driven decentralization and data-as-product.

In domain-driven decentralization, data belongs to the people who understand it best. For example, the finance team owns the financial data and the human resources team owns the human resources and employee data. Thus, no centralized entity owns the data of the entire organization.

In the second approach, the data is considered as a product. A team owns the data, just as a team would own all of the services and its business. In other words, you treat other teams as internal customers of your data.

Now let’s see how to create a data mesh architecture. Building a data mesh involves:

  1. Organize data to map it to your business: Logical organization of data according to its use rather than where it is stored.
  2. Uniformly manage and govern data: Establish standardized policies for access control, data quality, classification and lifecycle management.
  3. Access data from a variety of tools: Access distributed data from cloud-native, open-source google tools with automatic metadata propagation and a unified experience.

Google Cloud Way

“We have three data domains here, sales data, CRM data or customer data and product data, each of which can be implemented as a different data lake, with their respective data pipelines, allowing the respective product teams to implement a very fine-grained permissions control, including at a sub-lake or ozone level on each of these data lakes independently, as defined by organizational best practices,” Kirthi said.

She further stated that with this architecture:

  1. Your organization gets the freedom to store data wherever you want, choose the best analytics tools, and have flexibility in pricing and consumption model to meet financial governance needs.
  2. Built-in data intelligence leveraging the best AI/ML capabilities to automate data management and reduce manual tasks.
  3. Enable standardization and unification of metadata, security policies and data classification.