Updated: Apr 2
Data Lineage provides the ability to trace data through the organisation, from entry point through to the internal/external reports based on which key business decisions are made.
Data Lineage helps to ensure that the data driving these decisions is trusted and that the reports to regulators and auditors can be substantiated. It can also help create a common language and understanding between business and IT to improve data quality.
In this video we explore how organisations can derive the most value from their Data Lineage Program.
We create 2.5 Quintillion bytes of data everyday. Our data generation has exploded to such gargantuan levels that 90% of the world’s data was created in last two years only.
Given the data oceans every firm has built over a period of time, it becomes even more important to know the sources every data comes from, where it lives, what transformations it undergoes, who uses it & for what purposes and lastly, what happens to it when the usage is completed?
While creation of data lineage is supposed to be ‘THE ANSWER’ to all of these questions; yet, we see firms grappling with the challenge of depicting an end-to-end pictorial view of their data throughout its lifecycle.
Whether you’re embarking on journey to create lineage or in the middle of it; make sure to answer the below questions correctly to ensure you’re going in the right direction towards building scalable data lineage.
What are the data elements you are creating lineage for?
Not all data are created equal so make it easy upon yourself and only measure what matters. You must know the data elements you should be bothered about. Call them critical or key data elements. They could be critical because of their usage in certain business process, key projects, regulatory reporting, analytics, or maybe they have legal or financial implications on your business. In order to arrive at your CDE or KDE, identify the parameters that you want to rate your data elements on. Remember that it’s an iterative process to arrive at bare minimum number of CDEs and then derive lineage for only those CDEs.
With a list of CDEs in hand, your next question on data lineage should be:
Why create Data Lineage?
Are you doing it because of regulatory-push or do you see real value in carrying out this enterprise-wide exercise? While doing a bare minimum of lineage to achieve compliance sounds like an easy way out, you may not want to miss out on the immense benefits that could be derived from a full-fledged data lineage.
Once you know where your data is – what stages it goes through – what systems it hops and what purposes it is used for – the opportunities to leverage on such knowledge are unlimited. It’s like knowing where the gold mine is and all you must do is; go there and dig!
You can identify data quality at a granular level to carry out cleansing, implement controls to minimize data risks, assign ownership of data, perform reconciliation & resolve breaks with shorter turnaround time. Having a full view & control over organization-wide data can literally open an ocean of opportunities.
You may be wondering..
If data lineage is so powerful, then who should be driving it? Business or IT?
Business – in no uncertain terms! It should be business driving the lineage design exercise powered with the support of the IT department. While IT can provide insights on data’ system hopping & its schema conceptualization; the real meaning of data and its importance can only be assessed by businesses who are also responsible to implement enterprise-wide data strategy.
So at what levels should you design your lineage?
Typically, there are two levels of data lineage that are helpful depending upon who’s looking?
A chief data officer may only be interested in seeing how a data element is utilized, what system it hops and what is its source of origin.
Such view is often called Schema-level or Coarse Grained Lineage – where only system level data flow is shown but the same view can be used to drill down to actual columns & table levels where one can also capture the transformation it undergoes, all the splits & merges along with its usage at multiple stages of lifecycle. Such zoomed levels are commonly called fine-grained or physical lineage.
Finally what are the other purposes lineage serves and should you leverage technology to create our data lineage?
Lineage when done right offers endless possibilities of value extraction. It is not a mere pictorial diagram of data flow – the diagram is just a beginning and we always advise against doing so on traditional static tools like Microsoft Visio. That is because the potential of lineage can only be realized when done on systems that offer supplementary options to be added on top of lineage views.
There are various tools available that can facilitate creation of lineage by harvesting your system’s metadata. These solutions also deliver values exceeding beyond lineage. For instance, on a data lineage; one can also depict the data quality rules implemented, DQ levels can further be rolled up at system levels, depiction of security & data control points at system levels to ensure data privacy & integrity, systems reconciliation, identification of source of truth, golden sources and so on.
A lineage diagram can also help in mitigating data risks arising out of manual interventions; as lineage points out manual touch points which can then be tactically removed by automating the processes involved.
The value extracted from lineage is only limited by your own imagination. Data lineage may be enforced in some industry types by the regulators for e.g. banks need to create lineages in order to comply with CPG 235 or BCBS 239 guidelines however; a conscious effort with support from board members in designing a firm-wide lineage helps adding value to the business in a very long run.