How Many People Are Working Here?
Set aside the jokes about IT garbagemen, undertakers and the Grim Reaper – this blog is about the differences between good and bad data governance, and the cost to organizations of mitigating bad data governance. This article focuses on what good data governance looks like and what is needed to implement it.
The intention of the article isn’t to define in detail the components of data governance – there are many articles and material available for that. It intends to give a viewpoint on what good looks like based on decades of dealing with the bad. And I’m not interested in debating whether data domain, data set or another term are the best descriptor for the concept – close is good enough.
To illustrate the concepts let’s look at the question posed in the title of “how many people are working here”.
Fundamentals of Good Data Governance

Firstly, successful data governance doesn’t happen organically – it requires strong sponsorship from the top of the IT organization. There must be investment in process, people and technology to support the data governance, and the business needs to be sold on the benefits of the financial and people investments required.
The foundational elements required for good data governance are:
- Corporate data governance standard.
- Master data management.
- Data domains.
- Application governance process.
- Metadata governance process.
- Application data governance plan and data mapping.
- Corporate data store/presentation layer.
So how does this foundation help address the question of “how many people work here”?
Understanding the Question
For those who have dealt with HR and finance you may already know this is a trick (or tricky) question. In fact, there are many different questions depending on who is asking. The context of the question defines what is actually being asked and so it is an excellent example of why data domains are critical and must be precise and comprehensive to serve their intended purpose.
So what are these hidden questions? Or more precisely, what is the full question that needs to be asked to get the desired answer, and what variations could still exist?
- HR: how many distinct employees do we have?
- How are interns, short-term disability, long-term disability, part-time, job-share employees handled?
- HR/Finance: how many employee FTEs do we have?
- Same variants as above, but now part-time and job-share scenarios may be providing a different answer
- Line Managers: how many employees do I have in my team?
- Line Managers: how many people do I have to pay for?
- This might be employees only or may be a combination of HR and purchase orders
- Facilities: how many people do we have to accommodate at a site?
- Facilities: how many people are permitted to enter a site? Alternately how many people can enter a secure location within a site?
- Facilities/Emergency Services: how many people are on the site at the moment?
- IT: how many laptops do we need to issue?
- IT: how many system IDs are needed?
- Investor relations: how many FTEs do we have
- Like the HR/Finance employee FTEs question, but can also include classes of contractors who back-fill employees
- Procurement: how many contract resources do we have?
Planning For Success
So how does good data governance address understanding and answering the question?
Data Governance Standard
As previously stated, data governance can only occur with strong sponsorship from the top of both IT and business functions. This sponsorship must be accompanied by process, people and technology to implement and sustain the data governance, and the data governance standard documents the process. It defines to Product Owners and Application Owners their responsibilities in managing their applications, data standards and documentation from a data governance perspective. Critically it will set the corporate expectations for the quality of data (master data management) and the availability of data to other applications. Additionally, it defines the role of the enterprise architecture function in building good data governance practices into the design of applications.
Compliance to the data governance standard cannot be optional or it will be first short cut that Product Owners will take when budgets are tight, and a partial implementation will never be successful in the long term.
Master Data Management
Master data management is the ongoing process to implement and sustain data quality and standards across the organization, and typically is supported by a dedicated organization. This organization is the steward of the data governance standard, ensuring adherence to all elements of data governance.
Data Domains
It is easy to misunderstand data domains as an academic exercise to define data elements in a hierarchical manner, particularly as your data domains will be similar to those of peers in the same industry.
The value in creating data domains is that the process creates not only the data hierarchy but also a deep understanding of your data and how it used. Answering the question of “how many people are working here” requires:
- Understanding the business groups that will be asking the question and what problem they are looking for information to help solve.
- Agreeing a common frame of reference to define data elements across disparate applications across the company.
- Identifying the data elements needed to answer the questions.
- Identifying the applications that will source the data elements.
Far from being an academic exercise data domains are the foundation of good data governance, the skeleton that the body will be built around. They define not only what data exists in the company but what it means, anticipating the variations in the “how many people are working here question”.
For the geeks out there, they are the Babel fish of data governance.
Application Governance Process
What is an application? Simple enough question, right? Unfortunately, no.
If you implement a single application across 20 business functions and each has its own distinct instance is that one application or 20?
If an application is migrated from an on-premises instance to a SaaS offering is that a new application or a continuation of the same? What if both the on-premises and SaaS instances run concurrently while different business functions transition over?
If an application’s front end is shut down but the data persists for reporting is that application still live or not?
Consistently dealing with these questions is critical for good data governance. Just like defining and implementing data domains is necessary to understanding data so is a consistent application governance process. Without application governance the definition of what an application is can vary wildly and then understanding the application landscape becomes greatly more complicated.
Application Data Governance Plan and Data Mapping
This is where the rubber meets the road. You’ve invested in developing data domains and have a robust application governance process. Each Product Owner intimately knows the business processes that their applications support and the data within those applications. There is accurate and comprehensive requirement and design specifications describing the application’s implementation, and each interface into and out of the application is equally well documented.
Job done? Good data governance achieved?
No.
- The data within the application is not understood from the perspective of its value to the entire company, only in the context of the Product itself.
- Other Product Owners have no visibility into the meaning of the data, nor possibly to the data itself.
- Typically each interface is a point solution serving a specific data need as opposed to a comprehensive strategy for making the data readily available to the entire company.
- From a company level it is not clear where data is created, supplemented and consumed, nor is it apparent which application(s) are system of record for specific data elements.
- Requirement, design and interface specifications are typically developed in Word or Excel and cannot be readily mined for metadata.
The scenario described is a classic example of a low maturity of data governance and will almost certainly lead to the failure of any data governance initiative.
When implementing an application a data governance plan must be created. This will define the data in the application in terms of your company’s data domains, identifying which data elements are consumed, created or supplemented by the application, as well as for which data elements the application is the system of record. It will define the strategy for consuming and presenting the data to the rest of the corporation. Where data in the application is managed in a different manner to the data domains the data governance plan defines how this will be addressed.
The application data mapping is a more detailed level analysis that gets to the application and data domain field level to document how the mappings and transformations are implemented. It can be included in the same document (or metadata store) as the data governance plan but both levels of analysis must be addressed.
For our question of “how many people are working here?” the application data governance plan and data mapping ensure that the metadata governance application is populated with the necessary information to identify the data needed to respond to the question.
Metadata Governance Process
The data domains, application data governance plans and application data mappings are critical to the success of a data governance strategy, but the information gathered as part of these must also be treated as an asset unto itself. If Word, Excel or similar documents are used then the information will effectively be hidden.
There are many available metadata management applications, and I don’t intend to assess them here but only to assert the need for a robust implementation and the mandatory use of the tool throughout the company.
A successful metadata governance process will not only document the data domains, but will identify for each domain, sub-domain and element such information as:
- The description of the data in a non-vendor proprietary format.
- Application(s) that create the data element.
- Application(s) that are the system of record for the data element.
- References to similar or related data elements.
- If a data object could have distinct types (e.g. a Purchase Order could be for direct or indirect material, utilities, consulting services) these should be identified, along with implications of the distinct types (required fields, source applications, etc.)
- If a derived field the logic for creating the element and the contributing elements.
- Data type.
- If a closed data set then a listing of acceptable values.
- Data parameters (max/min values, length, level of precision).
- If a text field then the supported languages.
For each application it will define:
- The complete list of data elements the application contains.
- For each element is the element consumed, created or supplemented?
- Is the application the system of record for the data element
- What transformations are required from the data domain to proprietary data formats?
Good data governance cannot occur if metadata is siloed or inaccessible. A metadata governance process and application are critical.
For our question of “how many people are working here?” we need to know what data elements we need, from which applications they can be sourced and the exact meaning of each data element. The metadata governance process allows the data elements to be identified and selected.
Corporate Data Store/Presentation Layer
A key piece of advice I was given early in my career was to always be aware of not only your own strengths but also your own weaknesses. I am not an expert in how to manage data at scale so will not try to give guidance.
What is critical is that as part of the corporate data governance standard every application must present its data for consumption through whatever process or technology is selected. This presentation of data must be part of the architectural evaluation of applications before their selection and cannot be optional nor can it be implemented as an afterthought.
Enterprise Architecture
As IT Garbagemen we are kept in business by cleaning up after Product Owners and Application Owners who haven’t bothered to manage their data correctly. On one hand – thanks – love the business. But joking aside a company should be channeling time and effort into data governance activities that deliver value, not just mitigating risk.
This is where an enterprise architecture function is critical – to be involved at the planning phase of new application implementations with a view to governing a cradle-to-grave attitude to data governance aligned with the corporate data governance standard.
And teeth. Lots of teeth. The enterprise architecture function must have the authority to shutdown application implementations or heavily fine a product Owners that do not comply with the corporate data governance standard. This is particularly important when dealing with SaaS vendors who typically are highly intractable – once poor data governance is accepted it is extremely costly and difficult to reverse.
There will always be pressure from the Product Owner to get a new application live as soon as possible and the question of deferring data governance deliverables to post go-live will be raised. If the answer is “Ok, but you will be on a report quarterly to your VP” will guarantee it will never get done. “Ok, but you will have 6.25% of the application implementation cost deducted from your budget quarterly (25% per annum) until you are compliant” will get attention.
As I said, lots of teeth.
Tying it Together
OK – so to recap the elements of good data governance and how it drives value.
The corporate data governance standard sets the benchmarks for how data is viewed and managed across the company, as well as defining Product Owners and Application Owners obligations. It will mandate that as part of the application implementation all data in the application that is part of the corporate data domains is presented for consumption throughout the company.
The master data management process and organization ensure adherence to the data governance standard.
The data domains define a company’s information assets, both in terms of data elements that are created, maintained and consumed as well as providing a common reference language to clearly define data objects and their meaning. Data domains are solution agnostic so will not change when business or metadata applications are changed.
The asset governance process tracks all applications in the company’s landscape. It is supported by strict policies that ensure that the applications are defined and managed consistently across the company.
The metadata governance process tracks where data is created, supplemented and consumed across the company. For each data element it defines the application(s) that are the system of record.
The application data governance plan defines the data created, consumed and supplemented by the application, and which data elements the application is the system of record. It will state how the data created or supplemented in the application will be presented for consumption by other applications, including the technologies used and frequency of updates. The application data mapping is the bridge from the corporate data domains to the application proprietary data structures. It is not only an element-to-element mapping but also addresses transformations necessary to normalize from the proprietary data structures to the corporate data domains. Both the application data governance plan and data mapping can be standalone documents, be part of a design specification or in a fully mature data governance program are part of the data management application.
Finally, the corporate data store. I’m an old school, data warehouse guy so I’m at least 20 years behind on this. I’ll just say it must exist and not attempt to suggest the best way to implement it!