Group Abstract Group Abstract

Message Boards Message Boards

Johansen Test in Mathematica

Posted 7 years ago

A post from five years ago, How to run Johansen Test in Mathematica, requested the code for the Johansen test in Mathematica. However, the verbeia.com code that was offered had problems (incorrectly normalized eigenvectors, computational errors). As a better alternative, I'd like to post my Johansen test code here which I believe is correct. I've compared the output of this code with the output of the Matlab Johansen code in the Spatial Econometrics library and they agree. I've added my Mathematica code as an attachment to this post, "JohansenTest.nb".

The code includes a few subroutines that allows the output from the Johansen test to be displayed in a nice tabular form, such as:

Johansen Test Output

This table shows the results for a cointegrated portfolio of three Exchange Traded Funds (ETFs), having two cointegrating relationships (r <= 0 and r <= 1) for both the trace and eigen statistics (at > 99% confidence, except for the eigen statistic for r <= 0, which is > 95% confidence).

I use this code to generate the weights for a cointegrated porfolio of ETFs which I've trading profitably for several months now. I usually set order = 2, and detrend = 1. That seems to give the best results for the portfolios I've looked at. As in Ernie Chan's Algorithmic Trading: Winning Strategies and Their Rationale, I apply a Kalman filter to the ETF data and Johansen weights to improve the trading algorithm performance. If there is interest, I can discuss that in future posts, as well. (Chan's Kalman filter discussion is very incomplete, in my opinion.)

I've left a few optional "debug" statements in the code to allow you to check that the matrices are properly normalized. These lines can be deleted. Note that the Johansen weights are the rows of the eigenvector matrix, not the columns (as in the Spatial Economentrics code). I feel this is more consistent with the way that Mathematica handles vectors and matrices.

For detail on the equations on which this code is based, see this 2005 article by B.E. Sorenson: Cointegration.

I welcome any feedback.

Attachments:
POSTED BY: Amanda Gerrish
17 Replies
Posted 6 years ago

Amanda - Your help regarding the implementation of the Kalman filter would be greatly appreciated and I fully understand that you don't want to publish your code - I wouldn't either! I'm just about to finalize the index arbitrage backtesting and I'll let you know whether there is any value to be gained. Then I'll start working on your Kalman idea, expecting to get stuck rather soon(!). So if you don't mind, I'll contact you again once I'm on the move with that.

Per

POSTED BY: Per Ravn
Posted 6 years ago

Per,

Thanks for you post. I suppose that large excursions from equilibrium are a risk with mean-reversion strategies. (The underlying statistics are not strictly a normal distribution, and so "fat tails" imply that large excursions occur more frequently than one might expect.)

In reply to your points:

  1. I'm using the same ETF triplet for my trading because it has a high Johansen score (>> 99%), a modest half-life for mean-reversion, and the three ETFs are very liquid (which is also very important!). I can imagine that trading multiple portfolios (or perhaps larger portfolios of more than three ETFs) would likely reduce risk, but it would also increase transaction costs. My funds are limited (< $100,000), so I haven't pursued that option.

  2. As to arbitrage between an ETF and its components, I would imagine that there would be only limited arbitrage opportunities (because the ETFs track their components pretty closely) which would limit profits, as you suggest. However, I would certainly expect high cointegration. It's just the small excursions from equilibrium that would limit profitability.

  3. Combining strategies is probably a good idea. I hear that that's what the large quant hedge funds do. They have multiple quants pursuing different strategies, and when one strategy is not working, others are. I actually have a trend-following algorithm that I've been using with cryptocurrencies over the past seven months, so I suppose I am "combining strategies" -- even though my total investment in cryptocurrencies is small ($5,000). Unfortunately, the cryptocurrencies had an horrendous sell-off last year. Nevertheless, my algo limited my maximum drawdown to around 20% by mostly keeping me out of the market. I'm hoping that the period of relative stability in the cryptocurrencies in the past few months is a prelude to stronger prices. I'm actually starting to make a small amount of money in the cryptos.

I agree with your comments about Chan. I'm grateful that he's illuminated the basic concepts and strategies. His book inspired me to study how to best implement the Kalman filter when trading a cointegrated portfolio, which I decided to share with others. If you have difficulty implementing the Kalman filter strategy, let me know. I can help with explanations, but I won't post my Kalman filter code on a public forum. I put too much effort into that to just give it away. I'm sure you understand.

Amanda

POSTED BY: Amanda Gerrish
Posted 3 years ago

Hi Amanda,

This post is several years old now and so I don't know if you still follow it but I'm curious how your strategy performed and if you've made any modifications or changes to your methodology.

Also, could you implement your iterative weight estimation procedure with Kalman Filter using Mathematica's built-in KalmanEstimator function?

Thank you, Reid

POSTED BY: Reid Frasier
Posted 6 years ago
POSTED BY: Per Ravn
Posted 6 years ago

Hi Amanda - not sure whether you still monitor this thread, but I'm curious if your algorithm is still peforming? I think your findings are remarkable to say the least. I stumled on this post because I'm trying to do something very similar, using another idea of Ernie's. My model trades the spread between an index and a basket of its constituents where the basket is reconstructed periodically using the Johansen procedure. It suffers from the same OOS stationarity issue as Kinlay describes and I will certainly try to apply your model if I can interpret the details correctly.
If you could share any details on the production performance of your model, it would be very interesting.

POSTED BY: Per Ravn
Posted 6 years ago

Hi Per. I got an email today notifying me of your post to this discussion. I've been trading this algorithm during this entire time, as well as having a lot of email exchanges with Jonathan Kinlay regarding how to implement both the Johansen code and the Kalman filter in a manner similar to Chan. The algorithm was working well for me until about 2 months ago when one of the components of my triplet started rising in price in a way that appears to violate the cointegration. The result was that I waited 9 weeks for mean reversion. The portfolio finally mean-reverted, but by that time I had taken a significant hit on my profits. As a result my total trading profit over this time is much smaller than it was before this event -- around 10% or so over the past 9 months.

The past 2 months are clearly an outlier as compared to past algorithm performance. I can tell that simply by looking at the variation of the z-score over the past 10 years. The average half-life for mean-reversion of the z-score was 1.17 weeks. On a few occasions it took as long as 4 or 5 weeks to mean revert. (Remember, I don't make a profit until the portfolio value -- and thus the z-score -- mean-reverts.) Over the past 2 months or so, it took 9 weeks for my portfolio to mean revert. Is the cointegration breaking down, or was this just a one-time statistical fluke?

The problem is that when I re-calculate the Kalman filter parameters each weekend, the z-score often shifts in such a way that losses can accumulate if the mean reversion is delayed for too long. For example, on the close Friday, I may show a z-score of around 1.0. Over the weekend I re-calculate the Johansen test + Kalman filter parameters, and then when I run the algo on Monday morning, the z-score is significantly lower (in the range 0.5 - 0.8), even if the market is essentially unchanged. Thus, even if mean reversion occurs that day, I don't make as much profit as I'd hoped. This isn't a big deal if the portfolio mean reverts within 3 weeks -- I still make a profit. But if mean reversion doesn't happen for more than 3 or 4 weeks, I may end up with a loss. Waiting 9 weeks for mean reversion accumulated a 12% loss, which was more than half my profit over the past 9 months.

I'm still trading with this algorithm, but I'm waiting for a higher z-score before I "pull the trigger" on my trades, in order to increase the probability of a quicker mean-reversion. This means I may miss some trades, but that's OK. Also, I scale in my buys. (Partial buy at, say, z-score = 1.0, and another partial buy at z-score = 1.5, etc.)

If you're interested in how I implement the Kalman filter -- which is significantly different than Chan -- I wrote a detailed post on StackExchange - Quantitative Finance:

Does Chan use the wrong state transition model in his Kalman filter code?

Using a careful analysis, I argued in my post that Chan uses the wrong state transition matrix, i.e., the identity matrix, in his Kalman filter. I showed how to calculate the correct state transition matrix for a cointegrated portfolio, as well as how to initialize the Kalman filter using an adaptive tuning method. I received positive feedback from the readers of the forum, and one reader emailed Ernie Chan. This precipitated an email exchange between Ernie and I. Ernie basically said that his treatment was meant to be more general and so he didn't assume cointegration. I didn't want to argue, so I let it go. (My method actually works when there's no cointegration -- you basically get the identity matrix solution that Chan uses in that case.) Chan replied that "a stationary [cointegrated] portfolio can be more profitable", and that "your analysis may be correct". I didn't press him any further. I know what I did is correct, and none of the readers of my StackExchange post criticized my analysis. I received a fair number of up-votes.

I hope this is helpful.

POSTED BY: Amanda Gerrish

Before we delve into the Kalman Filter model, its worth pointing out that the problem with the nonstationarity of the out-of-sample estimated portfolio values is not mitigated by adding more in-sample data points and re-estimating the cointegrating vector(s):

IterateCointegrationTest[data_, n_] := 
  Module[{isdata, osdata, JT, isportprice, osportprice},
   isdata = Take[data, All, n];
   osdata = Drop[data, 0, n];
   JT = JohansenTest[isdata, 2, 1];
   isportprice = JT[[2, 1]].isdata;
   osportprice = JT[[2, 1]].osdata;
   {UnitRootTest[isportprice], UnitRootTest[osportprice]}];

We continue to add more in-sample data points, reducing the size of the out-of-sample dataset correspondingly. But none of the tests for any of the out-of-sample datasets is able to reject the null hypothesis of a unit root in the portfolio price process:

ListLinePlot[
 Transpose@
  Table[IterateCointegrationTest[stockprices, 52*14 + i], {i, 1, 50}],
  PlotLegends -> {"In-Sample", "Out-of-Sample"}]

enter image description here

POSTED BY: Jonathan Kinlay
Posted 7 years ago

You're correct. I was thinking about the dynamic weights from the Kalman filter. However, when we use the static weights from the Johansen test, we lose the stationarity for out-of-sample data. So, for example, when I apply the unit root test to my weighted portfolio, using the Johansen (static) weights, I get:

In-sample data length = 289, Johansen weights
p = 0.0718

However, when I calculate the Johansen coefficients using only the first 189 data points, and then look at unit root test, I get:

In-sample data length = 189, Johansen weights
p = 0.109
Out-of-sample data length = 100, Johansen weights
p = 0.587

Clearly, the out-of-sample period cannot be considered stationary. The situation is not helped by going to a larger in-sample (smaller out-of-sample) period, as you point out.

Now, however, let's look at the same situation except using the dynamic weights from the Kalman filter. For the full sample length:

In-sample data length = 289, Kalman weights
p = 5.5 x 10^-11

Much higher confidence of stationarity! Now, for in-sample/out-of-sample:

In-sample data length = 189, Kalman weights
p = 1.76 x 10^-9
Out-of-sample data length = 100, Kalman weights
p = 3.722 x 10^-7

Still, very good. However, it may be argued that I'm cheating here because I used the entire array of data to calculate the Kalman filter parameters (transition matrix, noise variance/covariances, initial state covariance). So I re-calculated the in-sample/out-of-sample weights using only in-sample data to calculate these parameters:

In-sample data length = 189, Kalman weights, revised Kalman parameter calculations
p = 0.000211

Out-of-sample data length = 100, Kalman weights, revised Kalman parameter calculations
p = 0.0871

The out-of-sample p-value for the unit root test is not as good, but still what I would consider stationary. Furthermore, let's look at a smaller out-of-sample (larger in-sample) period:

In-sample data length = 239, Kalman weights, revised Kalman parameter calculations
p = 7.1 x 10^-10
Out-of-sample data length = 50, Kalman weights, revised Kalman parameter calculations
p = 0.0000548

Using the Kalman filter weights, the stationarity of the out-of-sample period appears to be dependent on the size of the in-sample/out-of-sample periods. A shorter out-of-sample period gives a much smaller p-value for the unit root test. Now, considering that I update my Kalman filter parameters once per week, my out-of-sample period is only 1 time step. Therefore, the loss in stationarity should be very small.

POSTED BY: Amanda Gerrish
POSTED BY: Jonathan Kinlay

So I thought it might be useful to work through an example, to try to make the mechanics clear. I'll try to do this is stages so that others can jump in along the way, if they want to.

Start with some weekly data for an ETF triplet analyzed in Ernie Chan's book:

`tickers = {"EWA", "EWC", "IGE"};
period = "Week";
nperiods = 52*15;
finaldate = DatePlus[Today, {-1, "BusinessDay"}];`

After downloading the weekly close prices for the three ETFs we divide the data into 14 years of in-sample data and 1 year out of sample:

  stockdata = 
          FinancialData[#, 
             "Close", {DatePlus[finaldate, {-nperiods, period}], finaldate, 
              period}, "DateList"] & /@ tickers;
    stockprices = stockdata[[All, All, 2]];
    isprices = Take[stockprices, All, 52*14];
    osprices = Drop[stockprices, 0, 52*14];

We then apply Amanda's JohansenTest function:

JT = JohansenTest[isprices, 2, 1]

We find evidence of up to three cointegrating vectors at the 95% confidence level:

enter image description here

Let's take a look at the vector coefficients (laid out in rows, in Amanda's function):

enter image description here

We now calculate the in-sample and out-of-sample portfolio values using the first cointegrating vector:

isportprice = (JT[[2, 1]]*100).isprices;
osportprice = (JT[[2, 1]]*100).osprices;

The portfolio does indeed appear to be stationary, in-sample, and this is confirmed by the unit root test, which rejects the null hypothesis of a unit root:

ListLinePlot[isportprice]

enter image description here

UnitRootTest[isportprice]

0.000232746

Unfortunately (and this is typically the case) the same is not true for the out of sample period:

ListLinePlot[osportprice]

enter image description here

UnitRootTest[osportprice]    

0.121912

We fail to reject the null hypothesis of unit root in the portfolio process, out of sample.

I'll press pause here before we go on to the next stage, which is Kalman Filtering.

POSTED BY: Jonathan Kinlay
Posted 7 years ago

A problem with out-of-sample testing is that market structure can shift so that relationships (such as cointegration) may start to break down. One way to try to minimize this effect is to update your Johansen coefficients more frequently. In backtesting, I update the Johansen coefficients weekly, being careful to use only past data to calculate the current portfolio weights at any time point. (I think this is called "walk forward".) This reflects how I actually use the function in practice. In effect, my out-of-sample period is always one time step. This gives better backtest results, but because I'm avoiding look-ahead bias, it's valid. That's what I did in the backtest I described in a previous reply. You can even track the trace/eigen-statistics over time to make sure that the cointegration is not falling apart.

Also, the Kalman filter dynamically adjusts the Johansen weights so that the weighted price series is more stationary.

POSTED BY: Amanda Gerrish
POSTED BY: Jonathan Kinlay
Posted 7 years ago
POSTED BY: Amanda Gerrish

Hi Amanda,

Your approach seems very promising.

On point 2: I made the assumption that you had to be getting (quasi) real-time data into MMA somehow and indeed this turns to be the case - a creative solution to the problem.

Of course, since you are only updating the model weekly you wouldn't need to use MMA at all during the week. Some trading platforms will allow you to place bids and offers for a synthetic contract according to a simple formula, where the betas are fixed (for the week). In other cases a simple api interface is provided to something like Excel. That would enable you to recalculate the entry/exit prices automatically tick-by-tick, if you wanted to, and would also eliminate the need for manual trading as the orders could be fired into the trading platform via the api.

There are the usual practical considerations that apply to any stat arb strategy. For instance, do you try to enter passively, posting orders on the bid and ask prices of the portfolio (treating it as a single synthetic security)? Another approach is to post resting orders for the individual ETF components at appropriate price levels then cross the spread on the other ETFs if you get filled on one of them. These execution strategies tend to apply more in the case of pairs trading. For more complex strategies involving multiple securities like yours they can be very tricky to implement and traders typically cross the spread on entry and exit, which is what is you are doing, I would guess.

Another question is how to treat open positions held over a w/e when models get updated. The original exit points will likely change. So you have some options there too: exit all positions by the end of the week; maintain the original exit prices (profit target and stop loss); or recalculate exit prices for existing positions once the models get updated.

Finally, one other important issue is whether to use prices or (log) returns in your cointegration model. I suspect you are using the former, as I did in my toy illustration. But the resulting portfolios are rarely dollar neutral and hence consume margin capital. On the other hand, if you use returns and create a dollar-neutral portfolio, rebalancing becomes more of an issue. In that case I suspect you would want to rebalance the portfolio at least once a day, or according to some more sophisticated rebalancing algorithm.

POSTED BY: Jonathan Kinlay
Posted 7 years ago

Of course, since you are only updating the model weekly you wouldn't need to use MMA at all during the week.

There's a subtlety here. I update my Kalman filter parameters (noise variances/covariances, initial values, etc.) once per week. However, I calculate the Kalman filter weights (using these parameters) for the latest real-time data point in real-time. Basically, I append the latest real-time data point to the weekly data series and run a single iteration of the Kalman filter. This gives me optimal weights for the current prices.

As to order entry, I've actually written code to automate order entry and I've done some simulated trading which looked good. However, trusting the code with my funds makes me a bit nervous. I prefer to enter the orders manually for now, but I may experiment with automated order entry in the future. The ETFs I'm trading are pretty liquid, and it's important that all of the legs of the trade get executed simultaneously (otherwise you risk significant losses if only one or two legs of the trade execute), so I use market orders. I've been watching the fills that I get and they seem reasonable.

Another question is how to treat open positions held over a w/e when models get updated.

Yes, this is a tricky issue. The problem is that the Kalman filter parameters change with the update, and so the statistic I'm using shifts a little between Friday close and Monday open. Therefore, if I have a decent profit near the close on Friday, I'll often sell my positions even if I haven't quite hit my "sell" limit. If I do hold the positions over the weekend, it's not catastrophic. I just sell when the limit is reached with the new statistic, although it may mean my profit is less than what I estimated it would be the previous week. On one occasion, I even had a loss as the statistic moved past my sell limit after the Monday open before the positions had turned profitable. One solution would be to update my Kalman filter parameters less often, say, once per month. However, that makes the weights more out-of-sample (as I get further into the month), which might reduce profitability. For now, once-per-week seems to be working OK.

Finally, I'm using prices, not log(prices), because my backtesting has indicated that using log(prices) is less profitable.

Thanks for bringing up these important practical considerations! I've thought about them, but I'm still getting a handle on all these issues.

POSTED BY: Amanda Gerrish

Nice work, Amanda.

Hopefully Wolfram will include more of these standard statistical tests in futures releases, to bring MMA to parity with comparable math/stats software packages.

I have written a few posts about using the Kalman filter approach in pairs trading (stat arb), for instance here and here.

I would certainly be interested to get your own take on the subject and any trading results you might care to share.

POSTED BY: Jonathan Kinlay
Posted 7 years ago

Thanks, Jonathan.

I read through your two Kalman filter papers and I found them interesting. Good analysis. Your approach is similar to Ernie Chan's chapter on Kalman filter in the book I mentioned. I believe his "measurement prediction error", e(t), is the same as your alpha(t).

You've hit on a major challenge in applying the Kalman filter, namely, how to determine the noise variances/covariances, R and Q. Most coders seem to use values determined by trial-and-error. However, if you're interested, I've come up with a derivation based on the observed measurement errors for calculating R (what I call ve), and the observed variation in beta(t) for calculating Q (what I call vw). This necessitates an iterative approach -- using initial estimates for Q, R, and the initial state and state covariance, implementing the Kalman filter, calculating new estimates, and so on, until the estimates converge to stable values.

This approach is more math intensive, but it allows generalization beyond pairs to trading cointegrated portfolios of 3 or more financial instruments. (I prefer trading a portfolio of ETFs.) I've posted my method here on the Quantitative Finance area of the StackExchange website.

I've been trading a portfolio of 3 ETFs using this algo for three months and nearly all of my trades have been profitable. I use weekly data to calculate the z-score and I usually get one or two "buy" signals a week with a holding time that seems to vary from a few hours to two weeks, partly depending upon where I choose my "sell" level. After a couple dozen trades, profit per trade has been in the range 0.5% to 3%, except for one trade where I had a loss of about 1%. This should give me a good return over the next year, if I can maintain that performance. I'm still fine tuning the algo, especially with regard to where to set "buy" and "sell" limits.

Here's a plot of the z-score that my code produces: enter image description here

This plot is only showing the last two years of weekly data but I use anywhere from 5 - 10 years of data in my algo. I'm displaying weekly data because I update my Kalman filter parameters every weekend. However, I actually calculate and plot the z-score in real-time during trading hours (using weights from the Kalman filter). When I hit a z-score of say, 1 (-1), I put on a short (long) portfolio position. I close the position when the z-score returns back to zero (or perhaps a little beyond zero). It's too early to say how profitable this will be over the long term. When I have more data perhaps I'll post my total returns.

I hope this is helpful.

POSTED BY: Amanda Gerrish
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard