“Data is the new oil” has transformed the way modern enterprises perceive and approach data. But amassing petabytes of data would be for naught if these data go unchecked and ungoverned.
That’s why you need to have data governance. While data management is purely administrative (collecting, storing, validating, and maintaining data), data governance is more strategic and holistic. A data governance framework prescribes the definitions, roles and responsibilities, and access and security of data, among others. Put simply, it drives the strategy to govern data through people, processes, and policies.
The goals of data governance are improvement in data quality, enablement of data access to all levels of workforce, and reduced data storage and management costs. This leads to highly-strategized decision making that produces favorable outcomes for your organization.
Approaches toward data governance may vary and address specific areas like defining the roles and ownership, setting up data cataloguing and standards, etc.
How to Improve Data Quality Through Governance?
But let’s first zoom in on the most critical and practical aspect—maintaining and upholding data quality within your system. Today, most companies have been moving towards Active and Passive Governance processes from having no governance at all. So, let’s examine the interplay between these two approaches in the end-to-end data quality processes.
What is Passive Governance?
This approach employs data quality checks as the final step of data processing. Essentially, data is created and maintained in the system first. Then, analytics tools on data quality will be executed to surface data discrepancies. At this juncture, data stewards will come in to either fix the errors or escalate to data owners for further action.
It’s easy to see the flaw of this model. People could be accessing the system and using the data even before data quality checks begin, risking the consumption of bad or incomplete data. The worst-case scenario is when this misleads them into making ill-informed high-stake decisions.
And if the errors are large in magnitude, an extensive data cleansing would be inevitable, rendering some datasets or the system unavailable for a prolonged duration.
Of course, this process can be further refined by adopting measures like running data quality checks as end-of-day activities or only making critical data available after remediation.
On the plus side, as the approach is reactive in nature, only a small team consisting of data stewards and data owners is needed to handle the setup, monitor data validation, and fix errors. The process and people run in the background without involving other teams in the value chain.
Is Active Governance the Better Approach?
As for active governance, this approach is also coined “Data quality at the source”. So, you could probably guess how the approach works. All the required data from multiple sources will be collected and validated using a set of rules before entering your primary system. It’s an automation of data collection and validation process—not just remediation, as is the case with passive governance.
With active governance, you’re not directly fixing data; you’re fixing the process or rules that caused the errors in the first place.
Companies operating in highly-regulated industries like finance and pharmaceuticals would find it imperative to employ active governance to control and validate data before system entry.
Although this approach is more effective, it involves more people across different teams to define comprehensive business rules and instil the discipline to merge data from various sources and perform validation. Top-down executive sponsorship and change management play a critical role in encouraging the adoption of active governance.
What If We Combined Both?
The more superior approach here would be to combine both passive and active governance in your data processes. Passive governance can be useful in scenarios where you’re migrating to a new enterprise system, like SAP S/4HANA. Migrating legacy data would be pointless if its quality isn’t taken care of. Passive governance plays a central role here in validating and cleansing legacy data within the new system using the defined business rules and validations.
Active governance would then handle the new data sets where they get validated at one central point before entering your system. Remember that this approach only pays off when you apply rigor in defining the business rules and policies to address all exceptions and scenarios. It helps reduce the need to do data cleansing mid-stream which can be disruptive to your operations.
Achieve End-to-end Data Governance with MDO
MDO has both active and passive governance models to handle different facets of your data quality challenge.
MDO Data Intelligence Workbench (DIW) has the capabilities to cleanse and enrich legacy data that are migrated to your new system. While user-defined business rules are used for validation, DIW can also leverage machine learning (ML) models by learning the inputs and errors from huge datasets. The remediation process would still undergo approvals with logs and audit trails turned on for traceability purposes. This way, data stewards can still oversee the whole process.
As for the treatment of new data, you can set up MDO as the point of data entry. MDO contains pre-defined business rules to validate data before entering your system. It also gives you the option to configure your own business rules according to specific requirements.
What’s more—you can leverage external content and industry standards like UNSPSC, ISO14224, etc. to do data enrichment. Essentially, you’ll have accurate, validated, standardized, and enriched data in your system at all times.
Via end-to-end governance, people throughout your organization can start to trust and use data. And you’ll be on the right track to building a data-driven culture.
Written by: Shigim Yusof