Group Abstract Group Abstract

Message Boards Message Boards

Johansen Test in Mathematica

Posted 7 years ago

A post from five years ago, How to run Johansen Test in Mathematica, requested the code for the Johansen test in Mathematica. However, the verbeia.com code that was offered had problems (incorrectly normalized eigenvectors, computational errors). As a better alternative, I'd like to post my Johansen test code here which I believe is correct. I've compared the output of this code with the output of the Matlab Johansen code in the Spatial Econometrics library and they agree. I've added my Mathematica code as an attachment to this post, "JohansenTest.nb".

The code includes a few subroutines that allows the output from the Johansen test to be displayed in a nice tabular form, such as:

Johansen Test Output

This table shows the results for a cointegrated portfolio of three Exchange Traded Funds (ETFs), having two cointegrating relationships (r <= 0 and r <= 1) for both the trace and eigen statistics (at > 99% confidence, except for the eigen statistic for r <= 0, which is > 95% confidence).

I use this code to generate the weights for a cointegrated porfolio of ETFs which I've trading profitably for several months now. I usually set order = 2, and detrend = 1. That seems to give the best results for the portfolios I've looked at. As in Ernie Chan's Algorithmic Trading: Winning Strategies and Their Rationale, I apply a Kalman filter to the ETF data and Johansen weights to improve the trading algorithm performance. If there is interest, I can discuss that in future posts, as well. (Chan's Kalman filter discussion is very incomplete, in my opinion.)

I've left a few optional "debug" statements in the code to allow you to check that the matrices are properly normalized. These lines can be deleted. Note that the Johansen weights are the rows of the eigenvector matrix, not the columns (as in the Spatial Economentrics code). I feel this is more consistent with the way that Mathematica handles vectors and matrices.

For detail on the equations on which this code is based, see this 2005 article by B.E. Sorenson: Cointegration.

I welcome any feedback.

Attachments:
POSTED BY: Amanda Gerrish
17 Replies
Posted 7 years ago
POSTED BY: Amanda Gerrish
Posted 7 years ago
POSTED BY: Amanda Gerrish
Posted 7 years ago
POSTED BY: Amanda Gerrish
POSTED BY: Jonathan Kinlay
Posted 6 years ago
POSTED BY: Amanda Gerrish
Posted 6 years ago
POSTED BY: Per Ravn

Amanda has correctly anticipated the direction I was headed in i.e to show that regardless of how small the size of the OOS period relative to the IS period, the Johansen procedure by itself is unable to produce a cointegrating vector capable of yielding a portfolio price process that is stationary out of sample. But her iterative Kalman Filter approach is able to cure the problem.

I don't want to gloss over this finding, because it is very important. In our toy problem we know the out-of-sample prices of the constituent ETFs, and can therefore test the stationarity of the portfolio process out of sample. In a real world application, that discovery could only be made in real time, when the unknown, future ETFs prices are formed. In that scenario, all the researcher has to go on are the results of in-sample cointegration analysis, which demonstrate that the first cointegrating vector consistently yields a portfolio price process that is very likely stationary in sample (with high probability).

The researcher might understandably be persuaded, wrongly, that the same is likely to hold true in future. Only when the assumed cointegration relationship falls apart in real time will the researcher then discover that it's not true, incurring significant losses in the process, assuming the research has been translated into some kind of trading strategy.

A great many researchers have been down exactly this path, learning this important lesson the hard way. Nor do additional "safety checks" such as, for example, also requiring high levels of correlation between the constituent processes add much value. They might offer the researcher comfort that a "belt and braces" approach is more likely to succeed, but in my experience it is not the case: the problem of non-stationarity in the out of sample price process persists.

For a more detailed discussion of the problem see this post: Why Statistical Arbitrage Breaks Down

I was hitherto unaware of any methodology for tackling this problem, which is why Amanda's discovery is so important. As she demonstrates in her latest post, the iterative Kalman Filter approach is capable of producing a stationary out of sample process, based on the initial estimates of the cointegrating vector derived from the Johansen procedure.

In fact, Amanda's discovery is important in two fields of econometric research: cointegration theory and the theory of Kalman Filters in modeling inter-asset relationships where, as with the Johansen procedure, KF models have traditionally suffered from difficulties associated with nonstationarity in the out of sample period.

It's a tremendous achievement.

So, despite the fact that Amanda has leapt ahead to the finish line, I shall continue to plod along because, firstly, only by implementing the methodology can I be sure that I have properly and fully understood it and, secondly, as one discovers as one progresses in the field of quantitative research, fine details are often very important. So I am hoping that Amanda will provide additional guidance if I stray too far off piste in the forthcoming exposition.

POSTED BY: Jonathan Kinlay
Posted 7 years ago
POSTED BY: Amanda Gerrish

Before we delve into the Kalman Filter model, its worth pointing out that the problem with the nonstationarity of the out-of-sample estimated portfolio values is not mitigated by adding more in-sample data points and re-estimating the cointegrating vector(s):

IterateCointegrationTest[data_, n_] := 
  Module[{isdata, osdata, JT, isportprice, osportprice},
   isdata = Take[data, All, n];
   osdata = Drop[data, 0, n];
   JT = JohansenTest[isdata, 2, 1];
   isportprice = JT[[2, 1]].isdata;
   osportprice = JT[[2, 1]].osdata;
   {UnitRootTest[isportprice], UnitRootTest[osportprice]}];

We continue to add more in-sample data points, reducing the size of the out-of-sample dataset correspondingly. But none of the tests for any of the out-of-sample datasets is able to reject the null hypothesis of a unit root in the portfolio price process:

ListLinePlot[
 Transpose@
  Table[IterateCointegrationTest[stockprices, 52*14 + i], {i, 1, 50}],
  PlotLegends -> {"In-Sample", "Out-of-Sample"}]

enter image description here

POSTED BY: Jonathan Kinlay
POSTED BY: Jonathan Kinlay
Posted 7 years ago

A problem with out-of-sample testing is that market structure can shift so that relationships (such as cointegration) may start to break down. One way to try to minimize this effect is to update your Johansen coefficients more frequently. In backtesting, I update the Johansen coefficients weekly, being careful to use only past data to calculate the current portfolio weights at any time point. (I think this is called "walk forward".) This reflects how I actually use the function in practice. In effect, my out-of-sample period is always one time step. This gives better backtest results, but because I'm avoiding look-ahead bias, it's valid. That's what I did in the backtest I described in a previous reply. You can even track the trace/eigen-statistics over time to make sure that the cointegration is not falling apart.

Also, the Kalman filter dynamically adjusts the Johansen weights so that the weighted price series is more stationary.

POSTED BY: Amanda Gerrish
POSTED BY: Jonathan Kinlay

Amanda, I think you may have hit on something very important. As you point out, the determination of the variance/covariances is critical and the adaptive tuning procedure you recommend appears very successful in stabilizing the portfolio, making it suitable for a stat-arb strategy.

As you saw, I did not use MMA in my own implementation because I felt that Wolfram's approach was somewhat unsympathetic to the needs of the economic researcher (vs. say the requirements of an engineer), compared to the available alternatives. I see that I am not entirely alone in that assessment: here, for instance. So I am delighted that you have successfully implemented this in MMA, presumably using KalmanEstimator(?). Or did you build the model from scratch?

I will run a few tests on your Johansen code and attempt to build a KF model in MMA using some of the ETF pairs/triplets Ernie discusses in his book and compare the results.

Meanwhile, I wondered if you could comment on the following:

1) While the initial trading performance appear very encouraging, what kind of performance results did the backtest produce, out of sample?

2) You mention that you update the model using weekly data and then trade it intraday during the following week. So presumably you are getting real-time market data into MMA somehow: via the Finance Platform, perhaps? And do you trade the signals via that platform, or some other way (manually)?

3) One extension that i found quite useful in my own research was to fit a GARCH model to the residuals and use this to determine the trade entry/exit points. But that procedure was probably only useful because of the nonstationarity in the portfolio returns process. If you have succeeded in dealing with that key issue at a more fundamental level, a GARCH extension is probably superfluous.

POSTED BY: Jonathan Kinlay
Posted 3 years ago

Hi Amanda,

This post is several years old now and so I don't know if you still follow it but I'm curious how your strategy performed and if you've made any modifications or changes to your methodology.

Also, could you implement your iterative weight estimation procedure with Kalman Filter using Mathematica's built-in KalmanEstimator function?

Thank you, Reid

POSTED BY: Reid Frasier
Posted 6 years ago

Amanda - Your help regarding the implementation of the Kalman filter would be greatly appreciated and I fully understand that you don't want to publish your code - I wouldn't either! I'm just about to finalize the index arbitrage backtesting and I'll let you know whether there is any value to be gained. Then I'll start working on your Kalman idea, expecting to get stuck rather soon(!). So if you don't mind, I'll contact you again once I'm on the move with that.

Per

POSTED BY: Per Ravn
Posted 6 years ago

Per,

Thanks for you post. I suppose that large excursions from equilibrium are a risk with mean-reversion strategies. (The underlying statistics are not strictly a normal distribution, and so "fat tails" imply that large excursions occur more frequently than one might expect.)

In reply to your points:

  1. I'm using the same ETF triplet for my trading because it has a high Johansen score (>> 99%), a modest half-life for mean-reversion, and the three ETFs are very liquid (which is also very important!). I can imagine that trading multiple portfolios (or perhaps larger portfolios of more than three ETFs) would likely reduce risk, but it would also increase transaction costs. My funds are limited (< $100,000), so I haven't pursued that option.

  2. As to arbitrage between an ETF and its components, I would imagine that there would be only limited arbitrage opportunities (because the ETFs track their components pretty closely) which would limit profits, as you suggest. However, I would certainly expect high cointegration. It's just the small excursions from equilibrium that would limit profitability.

  3. Combining strategies is probably a good idea. I hear that that's what the large quant hedge funds do. They have multiple quants pursuing different strategies, and when one strategy is not working, others are. I actually have a trend-following algorithm that I've been using with cryptocurrencies over the past seven months, so I suppose I am "combining strategies" -- even though my total investment in cryptocurrencies is small ($5,000). Unfortunately, the cryptocurrencies had an horrendous sell-off last year. Nevertheless, my algo limited my maximum drawdown to around 20% by mostly keeping me out of the market. I'm hoping that the period of relative stability in the cryptocurrencies in the past few months is a prelude to stronger prices. I'm actually starting to make a small amount of money in the cryptos.

I agree with your comments about Chan. I'm grateful that he's illuminated the basic concepts and strategies. His book inspired me to study how to best implement the Kalman filter when trading a cointegrated portfolio, which I decided to share with others. If you have difficulty implementing the Kalman filter strategy, let me know. I can help with explanations, but I won't post my Kalman filter code on a public forum. I put too much effort into that to just give it away. I'm sure you understand.

Amanda

POSTED BY: Amanda Gerrish
Posted 6 years ago

Amanda - thanks for taking your time to write such elaborate answer. To me this is extremely interesting and I had a similar experience beginning of this year in one of my cointegration baskets with European stock index futures, where the basket wandered off on a really long adverse excursion before eventually reverting at a loss. I realized that this became a quite lengthy post so I apologize for that in advance.

There are a few core concepts associated with this type of trading that I'm constantly working on in addition to refining the mathematical procedure of constructing a stationary portfolio. It would be interesting to here your view on these as well:

  1. Selecting the ETFs? Are you using the same ETFs or do you continuously screen for new combinations with potentially better cointegration statistics? I have relied on a basket with the same set of stock index futures, reasoning that the European economies are fundamentally interlinked at some level and indeed this can be validated statistically for extended periods. But not alwaysÂ…and there's the problem.

  2. Arbitrage between the ETF and its constituents? This is also briefly described by Chan, but of course any practical implementation comes with a heap of issues not covered in the book. I alluded to this in my first post and I think it is a quite interesting approach. The point here is that the ETF is perfectly cointegrated with its portfolio of weighted constituents by construction and not by a hidden set of underlying factors et c. The task and the challenge here is to find a subset of constituents with high enough cointegration properties in combination with sufficient variance to overcome the transaction cost. I'm exploring this approach again for stock indices and their constituents, where I periodically reconstruct the constituent subset basket. I would imagine it to be quite straight forward to apply your existing model to this approach as well?

  3. Combining strategies? Wether they are statistical flukes or cointegration breakdowns, one of the cures for painful or even catastrophic drawdowns is to maintain a portfolio of different strategies with limited correlation. I tend to focus more on methods to construct a portfolio of cointegratin baskets than going all in on one or a few of them. What are your thoughts in this?

Regarding Chan's book, I think the depth of your research is on a completely different level. You are rebuilding the concepts from scratch, finding and solving issues not even mentioned in the book. There are clearly shortcuts and maybe even mistakes in the book, but in all fairness I think Ernie's doing a great job explaining the basic concepts and setting the scene for further investigation.

Your post on StackExchange is what led me to the Wolfram site in the first place. I'm already working on incorporating your procedure in Python.

POSTED BY: Per Ravn
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard