Data Lineage and Traceability: Using Data to Improve your Data
October 19, 2016
This blog post by Xenomorph CEO Brian Sentance is entitled Data Lineage and Traceability: Using Data to Improve your Data and was originally published on Data Management Review.
Enterprise Data Management (EDM) is, to state the obvious, a data-centric process (the clue is in the name!). But while financial institutions have placed evermore emphasis on ensuring the accuracy of their reference and market data, the need to maintain clearer insights into EDM systems and processes has also become a growing priority. It is no wonder then that data lineage or data traceability is a topic that has grown in importance.
Maintaining accurate data isn’t just important for guiding trading and investment decisions. Accurate data can be just as important to guide EDM decisions. Every time an item of data is sourced and run through validation checks, it will trigger a workflow. Recording the details of that workflow can be essential. In particular, this can help an enterprise accomplish two key goals.
First, recording workflows provides management with information to ensure processes are constantly improved. For example, firms may want to study how long it takes their staff to correct an erroneous price, and in doing so may make a judgement on which of their data analysts are performing best. They may look at the source of erroneous prices and make a judgment on which vendor tends to offer the most reliable data for certain instruments, asset classes or data types.
Some of our clients have even started taking the next step – not only using management information to improve processes, but also using other sources of data to prioritise which potentially erroneous data items to fix first. One of our customers linked its trade position databases into our architecture to rank the potential dollar impact associated with each exception. Another wanted its exception handling prioritised based on its clients’ holdings: exceptions that potentially impacted their largest, most important clients would be the first to be investigated.
It’s an old cliché, but the saying ‘you can’t manage what you can’t measure’ still holds a lot of truth. Having accurate data and analytics can really help guide management decisions and will ultimately drive better results, whether that means more efficient processes, or ultimately, higher quality data.
Second, recording workflows can provide data consumers with better insights into the content they are sourcing. If everyone within an enterprise knew where each data item was sourced from, which validation rules it passed and failed, how it was derived, interpolated or normalised, and who executed those processes, it could help clear up a lot of confusion.
Even though a data item may have been approved as a golden record, some may still want to know the process it went through to achieve that status, particularly when they see another source that doesn’t match.
Keeping accurate records of data lineage can also enable certain groups to agree to disagree. Ultimately, the value of any asset can be subjective. Slight nuances in models and assumptions can result in more pronounced discrepancies in valuations. Simply having insight into how a price was calculated can therefore help reconcile discrepancies between different values for the same instrument.
So it is not just management that can benefit from insights into data processes. End users of the data can also benefit.
Visualising the Benefits
As industry participants begin collecting more data to guide data management processes, and more data to describe the origins of gold copy data, we will also need better tools to visualise all the data. Visualisation is really about helping end users (or management) spot patterns and anomalies in datasets, and in doing so, turning data into more insightful information that you can base decisions on.
Ultimately, the combination of data, metadata and visualisation tools can create a positive feedback loop, with the end goal being an improvement in both the quality of our market and reference data, and our processes.