Using topological data analysis to detect stock market crashes

"There will be financial crashes as long as there are financial markets." When the market falls, most people lose money... those who can predict it can preserve their investments or take hazardous short positions to profit (a nevertheless stressful situation to be in, as depicted in the Big-short).

A market asset is linked to a dynamical system, whose price changes depending on the information provided. A wide range of information determines the price of an item in a financial market, and in the efficient market hypothesis, a simple change in the information is promptly priced in.

Financial systems have dynamics that are similar to those of physical systems.

We can distinguish a regular market regime from a chaotic one in the same manner as phase transitions occur between solids, liquids, and gases.

Financial crashes are preceded by a period of increasing asset price oscillation, according to observations. The geometric structure of the time series changes abnormally as a result of this event.

In this post, we'll show how to use topological data analysis (TDA) to capture these geometric changes in time series and create a reliable stock market crash detector. Thanks to Giotto-TDA, an open-source library for topological data analysis, the code implements Gidea and Katz's ideas.

There is scant agreement on what constitutes a financial crash.

Stock market collapses, on the surface, appear to be a sharp reduction in asset prices. The price drop is due to widespread asset sales, which are an attempt to liquidate holdings before prices go further lower.

Markets will crash if there is awareness of a massive speculative bubble (like in the case of sub-primes) or a catastrophic occurrence. Two big crashes have occurred in the previous two decades: the dot-com disaster of 2000 and the global financial crisis of 2008.

In a nutshell, these are our findings.

From 1980 until the present, we look at daily prices of the S&P 500 index. The S&P 500 is a stock index that monitors the performance of 500 large-cap US firms and is widely used to gauge the state of the financial market."

Compared to a simple baseline, we find that topological signals tend to be robust to noise and hence less prone to produce false positives.

This exemplifies one of TDA's main motivations: topology and geometry may be a strong tool for abstracting subtle structure in complex data.

Let's take a closer look at the two ways.

A simple starting point

Because market crashes are marked by a sharp drop in stock prices, one simple method for detecting them is to follow the first derivative of average price values over a rolling timeframe. Indeed, as seen in the graph below, this naive technique already catches the 1987 Black Monday catastrophe, the dot-com bubble explosion (2000–2004), and the financial crisis (2007–2008).

We can use a threshold to designate locations on our original time series where a crash happened by normalising this time series to take values in the [0,1] interval.

Following this advise will result in you over-panicking and selling your assets too quickly, as many points are labelled as crashes. Let's see if TDA can help us minimise signal noise and create a more reliable detector!

TDA's pipeline

The mathematics that underpin TDA is complex and will not be explored in this post; instead, we recommend reading this summary. TDA can be thought of as a way to extract informative features that can be used for downstream modelling for our objectives.

The pipeline we developed consists of the following steps: 2) embedding the time series into a point cloud and constructing sliding point cloud windows, 3) building a filtration on each window to have an evolving structure encoding the geometrical shape of each window, 4) extracting the relevant features of those windows using persistence homology, 5) comparing each window by measuring the difference of those features from one window to the next, and 6) constructing an indicator of crash banning.

Takens' embedding of time series as point clouds

The generation of a simplicial complex from a point cloud is a common starting point in a TDA pipeline. As a result, in time series applications, the most important question is how to construct such point clouds. Discrete time series, such as the ones we're looking at, are usually represented as two-dimensional scatter plots. By scanning the plot from left to right, this style makes it easy to track the time series' local behaviour. However, it is frequently inadequate at conveying significant consequences that may occur over longer time spans.

Fourier analysis is a well-known set of techniques for capturing periodic behaviour. The discrete Fourier transform of a temporal window over a time series, for example, can reveal if the signal in that window is the result of the sum of a few simple periodic signals.

We propose an alternative method of encoding a time-evolving process for our objectives. It is based on the assumption that in higher dimensions, some key aspects of dynamics can be revealed more clearly. We'll start by showing how to describe a univariate time series as a point cloud, which is a collection of vectors in a Euclidean space of any dimension.

The method is as follows: we choose two integers, d and. We collect the values of the variable y at d separate times, evenly spaced by and starting at ti, for each time ti (t0, t1,...) and display them as a vector with d entries, namely:

In d-dimensional space, the outcome is a set of vectors! is the embedding dimension, while d is the time delay parameter.

This time-delay embedding approach is also known as Takens' embedding, after Floris Takens, who established its importance in the context of nonlinear dynamical systems with a famous theorem.

Finally, doing this technique on each sliding window independently over the entire time series produces a time series of point clouds (one per sliding window) with potentially interesting topologies. In the GIF below, you can see how a 2-dimensional point cloud is created.

time-delay embedding approach is also known as Takens' embedding, after Floris Takens, who established its importance in the context of nonlinear dynamical systems with a famous theorem.

From point clouds to persistence diagrams, there's something for everyone.

What can we do with this information now that we know how to create a time series of point clouds? Enter persistent homology, which searches for topological properties in a simplicial complex that remain constant throughout a variety of parameter values. A feature, such as a hole, will typically go unnoticed at first, then appear, and finally disappear after a range of parameter values.

Distances between diagrams that are persistent

We may calculate a number of distance metrics using two windows and their accompanying persistence diagrams. We compare two distances, one based on the persistent landscape concept and the other on Betti curves.

We can deduce from these figures that the landscape distance measure is less noisy than the Betti curves.

A topological criterion

It's straightforward to normalise the landscape distance between windows as our topological feature, just as we did with the baseline model. The detection of stock market crashes for the dot-com bubble and the global financial crisis is shown below. We can see that utilising topological characteristics appears to reduce the noise in the signal of interest when compared to our simple baseline.

Conclusion

Our findings show that high volatility periods preceding a crash produce geometric signals that can be recognised more reliably with topological data analysis. However, because these findings are limited to a single market and a short period of time, more research into the procedure's resilience on multiple markets and thresholds is needed. Nonetheless, the findings are positive and suggest several promising directions for further research.