Big Data – What is its Value to Risk Management?
February 8, 2013
A little late on these notes from this PRMIA Event on Big Data in Risk Management that I helped to organize last month at the Harmonie Club in New York. Big thank you to my PRMIA colleagues for taking the notes and for helping me pull this write-up together, plus thanks to Microsoft and all who helped out on the night.
Sharma (of Western Asset Management and Co-Regional Director of PRMIA NYC)
introduced the event and began by thanking Microsoft for its support in
sponsoring the evening. Navin outlined how he thought the advent of “Big Data”
technologies was very exciting for risk management, opening up opportunities to
address risk and regulatory problems that previously might have been considered
out of reach.
Navin defined Big Data as the structured or unstructured in
receive at high volumes and requiring very large data storage. Its
characteristics include a high velocity of record creation, extreme volumes, a wide
variety of data formats, variable latencies, and complexity of data types.
Additionally, he noted that relative to other industries, in the past financial
services has created perhaps the largest historical sets of data and
continually creates enormous amount of data on a daily or moment-by-moment
basis. Examples include options data, high frequency trading, and unstructured
data such as via social media. Its usage
provides potential competitive advantages in a trading and investment
management. Also, by using Big Data it is possible to have faster and more
accurate recognition of potential risks via seemingly disparate data – leading
to timelier and more complete risk management of investments and firms’ assets.
Finally, the use of Big Data technologies is in part being driven by regulatory
pressures from Dodd-Frank, Basel III, Solvency II, Markets for Financial
Instruments Directives (1 & 2) as well as Markets for Financial Instruments
Navin also noted that we will seek to answer questions such
- What is the impact of big data on asset
- How can Big Data’s impact enhance risk
- How is big data used to enhance operational
Presentation 1: Big
Data: What Is It and Where Did It Come From?: The first presentation was
given by Michael Di Stefano (of Blinksis Technologies), and was titled “Big
Data. What is it and where did it come from?”.
You can find a copy of Michael’s presentation here.
In summary Michael started with saying that there are many definitions of Big
Data, mainly defined as technology that deals with data problems that are
either too large, too fast or too complex for conventional database technology.
Michael briefly touched upon the many different technologies within Big Data
such as Hadoop, MapReduce and databases such as Cassandra and MongoDB etc. He
described some of the origins of Big Data technology in internet search, social
networks and other fields. Michael described the “4 V’s” of Big Data: Volume,
Velocity, Variety and a key point from Michael was “time to Value” in terms of
what you are using Big Data for. Michael concluded his talk with some business
examples around use of sentiment analysis in financial markets and the
application of Big Data to real-time trading surveillance.
Presentation 2: Big
Data Strategies for Risk Management: The second presentation “Big Data
Strategies for Risk Management” was introduced by Colleen Healy of Microsoft
Colleen started by saying expectations of risk management are rising, and that
prior to 2008 not many institutions had a good handle on the risks they were
taking. Risk analysis needs to be done across multiple asset types, more
frequently and at ever greater granularity. Pressure is coming from everywhere
including company boards, regulators, shareholders, customers, counterparties
and society in general. Colleen used to head investor relations at Microsoft
and put forward a number of points:
- A long line of sight of one risk factor does not
mean that we have a line of sight on other risks around.
- Good risk management should be based on simple
- Reliance on 3rd parties for understanding
risk should be minimized.
- Understand not just the asset, but also at the
correlated asset level.
- The world is full of fast markets driving even
more need for risk control
- Intraday and real-time risk now becoming
necessary for line of sight and dealing with the regulators
- Now need to look at risk management at a most
Colleen explained some of the reasons why good risk
management remains a work in progress, and that data is a key foundation for
better risk management. However data has been hard to access, analyze,
visualize and understand, and used this to link to the next part of the
presentation by Denny Yu of Numerix.
Denny explained that new regulations involving measures such
as Potential Future Exposure (PFE) and Credit Value Adjustment (CVA) were
moving the number of calculations needed in risk management to a level well
above that required by methodologies such as Value at Risk (VaR). Denny
illustrated how the a typical VaR calculation on a reasonable sized portfolio
might need 2,500,000 instrument valuations and how PFE might require as many as
2,000,000,000. He then explain more of the architecture he would see as optimal
for such a process and illustrated some of the analysis he had done using Excel
spreadsheets linked to Microsoft’s high performance computing technology.
Presentation 3: Big
Data in Practice: Unintentional Portfolio Risk: Kevin Chen of Opera
Solutions gave the third presentation, titled “Unintentional Risk via
Large-Scale Risk Clustering”. You can find a copy of the presentation here.
In summary, the presentation was quite visual and illustrating how large-scale
empirical analysis of portfolio data could produce some interesting insights
into portfolio risk and how risks become “clustered”. In many ways the analysis
was reminiscent of an empirical form of principal component analysis i.e. where
you can see and understand more about your portfolio’s risk without actually
being able to relate the main factors directly to any traditional factor
Panel Discussion: Brian
Sentance of Xenomorph and the PRMIA NYC Steering Committee then moderated a
panel discussion. The first question was directed at Michael “Is the relational database dead?” –
Michael replied that in his view relational databases were not dead and indeed
for dealing with problems well-suited to relational representation were still
and would continue to be very good. Michael said that NoSQL/Big Data
technologies were complimentary to relational databases, dealing with new types
of data and new sizes of problem that relational databases are not well
designed for. Brian asked Michael whether
the advent of these new database technologies would drive the relational
database vendors to extend the capabilities and performance of their offerings?
Michael replied that he thought this was highly likely but only time would tell
whether this approach will be successful given the innovation in the market at
the moment. Colleen Healy added that the advent of Big Data did not mean the
throwing out of established technology, but rather an integration of
established technology with the new such as with Microsoft SQL Server working
with the Hadoop framework.
Brian asked the panel whether
they thought visualization would make a big impact within Big Data? Ken
Akoundi said that the front end applications used to make the data/analysis more
useful will evolve very quickly. Brian asked whether this would be reminiscent of the days when VaR first appeared,
when a single number arguably became a false proxy for risk measurement and
management? Ken replied that the size of the data problem had increased
massively from when VaR was first used in 1994, and that visualization and
other automated techniques were very much needed if the headache of capturing,
cleansing and understanding data was to be addressed.
Brian asked whether
Big Data would address the data integration issue of siloed trading systems? Colleen
replied that Big Data needs to work across all the silos found in many
financial organizations, or it isn’t “Big Data”. There was general consensus
from the panel that legacy systems and people politics were also behind some of
the issues found in addressing the data silo issue.
Brian asked if the
panel thought the skills needed in risk management would change due to Big
Data? Colleen replied that effective Big Data solutions require all kinds
of people, with skills across a broad range of specific disciplines such as
visualization. Generally the panel thought that data and data analysis would
play an increasingly important part for risk management. Ken put forward his
view all Big Data problems should start with a business problem, with not just
a technology focus. For example are there any better ways to predict stock
market movements based on the consumption of larger and more diverse sources of
information. In terms of risk management skills, Denny said that risk
management of 15 years ago was based on relatively simply econometrics. Fast
forward to today, and risk calculations such as CVA are statistically and
computationally very heavy, and trading is increasingly automated across all
asset classes. As a result, Denny suggested that even the PRMIA PRM syllabus
should change to focus more on data and data technology given the importance of
data to risk management.
Asked how best to
should Big Data be applied?, then Denny replied that echoed Ken in saying
that understanding the business problem first was vital, but that obviously Big
Data opened up the capability to aggregate and work with larger datasets than
ever before. Brian then asked what
advice would the panel give to risk managers faced with an IT department about
to embark upon using Big Data technologies? Assuming that the business
problem is well understood, then Michael said that the business needed some familiarity
with the broad concepts of Big Data, what it can and cannot do and how it fits
with more mainstream technologies. Colleen said that there are some problems
that only Big Data can solve, so understanding the technical need is a first
checkpoint. Obviously IT people like working with new technologies and this
needs to be monitored, but so long as the business problem is defined and valid
for Big Data, people should be encouraged to learn new technologies and new
skills. Kevin also took a very positive view that IT departments should be encouraged to experiment with these new
technologies and understand what is possible, but that projects should have
well-defined assessment/cut-off points as with any good project management to
decide if the project is progressing well. Ken put forward that many IT staff
were new to the scale of the problems being addressed with Big Data, and that
his own company Opera Solutions had an advantage in its deep expertise of
large-scale data integration to deliver quicker on project timelines.
There then followed a number of audience questions. The first few related to other ideas/kinds of problems that could be
analyzed using the kind of modeling that Opera had demonstrated. Ken said
that there were obvious extensions that Opera had not got around to doing just
yet. One audience member asked how well
could all the Big Data analysis be aggregated/presented to make it
understandable and usable to humans? Denny suggested that it was vital that
such analysis was made accessible to the user, and there general consensus
across the panel that man vs. machine was an interesting issue to develop in
considering what is possible with Big Data. The next audience question was
around whether all of this data analysis
was affordable from a practical point of view. Brian pointed out that there
was a lot of waste in current practices in the industry, with wasteful
duplication of ticker plants and other data types across many financial
institutions, large and small. This duplication is driven primarily by the
perceived need to implement each institution’s proprietary analysis techniques,
and that this kind of customization was not yet available from the major data
vendors, but will become more possible as cloud technology such as Microsoft’s
Azure develops further. There was a lot of audience interest in whether Big Data could lead to better
understanding of causal relationships in markets rather than simply
correlations. The panel responded that causal relationships were harder to
understand, particularly in a dynamic market with dynamic relationships, but
that insight into correlation was at the very least useful and could lead to
better understanding of the drivers as more datasets are analyzed.