The Missing Data Gap
October 16, 2012
Getting to the heart of "Data Management for Risk", PRMIA held an event entitled "Missing Data for Risk Management Stress Testing" at Bloomberg's New York HQ last night. For those of you who are unfamiliar with the topic of "Data Management for Risk", then the following diagram may help to further explain how the topic is to do with all the data sets feeding the VaR and scenario engines.
I have a vested interest in saying this (and please forgive the product placement in the diagram above, but hey this is what we do…), but the topic of data management for risk seems to fall into a functionality gap between: i) the risk system vendors who typically seem to assume that the world of data is perfect and that the topic is too low level to concern them and ii) the traditional data management vendors who seem to regard things like correlations, curves, spreads, implied volatilities and model parameters as too business domain focussed (see previous post on this topic) As a result, the risk manager is typically left with ad-hoc tools like spreadsheets and other analytical packages to perform data validation and filling of any missing data found. These ad-hoc tools are fine until the data universe grows larger, leading to the regulators becoming concerned about just how much data is being managed "out of system" (see past post for some previous thoughts on spreadsheets).
The Crisis and Data Issues. Anyway enough background above and on to some of the issues raised at the event. Navin Sharma of Western Asset Management started the evening by saying that pre-crisis people had a false sense of security around Value at Risk, and that crisis showed that data is not reliably smooth in nature. Post-crisis, then questions obviously arise around how much data to use, how far back and whether you include or exclude extreme periods like the crisis. Navin also suggested that the boards of many financial institutions were now much more open to reviewing scenarios put forward by the risk management function, whereas pre-crisis their attention span was much more limited.
Presentation. Don Wesnofske did a great presentation on the main issues around data and data governance in risk (which I am hoping to link to here shortly…)
Issues with Sourcing Data for Risk and Regulation. Adam Litke of Bloomberg asked the panel what new data sourcing challenges were resulting from the current raft of regulation being implemented. Barry Schachter cited a number of Basel-related examples. He said that the costs of rolling up loss data across all operations was prohibitative, and hence there were data truncation issues to be faced when assessing operational risk. Barry mentioned that liquidity calculations were new and presenting data challenges. Non centrally cleared OTC derivatives also presented data challenges, with initial margin calculations based on stressed VaR. Whilst on the subject of stressed VaR, Barry said that there were a number of missing data challenges including the challenge of obtaining past histories and of modelling current instruments that did not exist in past stress periods. He said that it was telling on this subject that the Fed had decided to exclude tier 2 banks from stressed VaR calculations on the basis that they did not think these institutions were in a position to be able to calculate these numbers given the data and systems that they had in place.
Barry also mentioned the challenges of Solvency II for insurers (and their asset managers) and said that this was a huge exercise in data collection. He said that there were obvious difficulties in modelling hedge fund and private equity investments, and that the regulation penalised the use of proxy instruments where there was limited "see-through" to the underlying investments. Moving on to UCITS IV, Barry said that the regulation required VaR calculations to be regularly reviewed on an ongoing basis, and he pointed out one issue with much of the current regulation in that it uses ambiguous terms such as models of "high accuracy" (I guess the point being that accuracy is always arguable/subjective for an illiquid security).
Sandhya Persad of Bloomberg said that there were many practical issues to consider such as exchanges that close at different times and the resultant misalignment of closing data, problems dealing with holiday data across different exchanges and countries, and sourcing of factor data for risk models from analysts. Navin expanded more on his theme of which periods of data to use. Don took a different tack, and emphasised the importance of getting the fundamental data of client-contract-product in place, and suggested that this was a big challenge still at many institutions. Adam closed the question by pointing out the data issues in everyday mortgage insurance as an example of how prevalant data problems are.
What Missing Data Techniques Are There? Sandhya explained a few of the issues her and her team face working at Bloomberg in making decisions about what data to fill. She mentioned the obvious issue of distance between missing data points and the preceding data used to fill it. Sandhya mentioned that one approach to missing data is to reduce factor weights down to zero for factors without data, but this gave rise to a data truncation issue. She said that there were a variety of statistical techniques that could be used, she mentioned adaptive learning techniques and then described some of the work that one of her colleagues had been doing on maximum-likehood estimation, whereby in addition to achieving consistency with the covariance matrix of "near" neighbours, that the estimation also had greater consistency with the historical behaviour of the factor or instrument over time.
Navin commented that fixed income markets were not as easy to deal with as equity markets in terms of data, and that at sub-investment grade there is very little data available. He said that heuristic models where often needed, and suggested that there was a need for "best practice" to be established for fixed income, particularly in light of guidelines from regulators that are at best ambiguous.
I think Barry then made some great comments about data and data quality in saying that risk managers need to understand more about the effects (or lack of) that input data has on the headline reports produced. The reason I say great is that I think there is often a disconnect or lack of knowledge around the effects that input data quality can have on the output numbers produced. Whilst regulators increasingly want data "drill-down" and justfication on any data used to calculate risk, it is still worth understanding more about whether output results are greatly sensitive to the input numbers, or whether maybe related aspects such as data consistency ought to have more emphasis than say absolute price accuracy. For example, data quality was being discussed at a recent market data conference I attended and only about 25% of the audience said that they had ever investigated the quality of the data they use. Barry also suggested that you need to understand to what purpose the numbers are being used and what effect the numbers had on the decisions you take. I think here the distinction was around usage in risk where changes/deltas might be of more important, whereas in calculating valuations or returns then price accuracy might receieve more emphasis.
How Extensive is the Problem? General consensus from the panel was that the issues importance needed to be understood more (I guess my experience is that the regulators can make data quality important for a bank if they say that input data issues are the main reason for blocking approval of an internal model for regulatory capital calculations). Don said that any risk manager needed to be able to justify why particular data points were used and there was further criticism from the panel around regulators asking for high quality without specifying what this means or what needs to be done.
Summary – My main conclusions:
- Risk managers should know more of how and in what ways input data quality affects output reports
- Be aware of how your approach to data can affect the decisions you take
- Be aware of the context of how the data is used
- Regulators set the "high quality" agenda for data but don't specify what "high quality" actually is
- Risk managers should not simply accept regulatory definitions of data quality and should join in the debate
Great drinks and food afterwards (thanks Bloomberg!) and a good evening was had by all, with a topic that needs further discussion and development.