# Johansen Test in Mathematica

Posted 2 months ago
400 Views
|
11 Replies
|
13 Total Likes
|
 A post from five years ago, How to run Johansen Test in Mathematica, requested the code for the Johansen test in Mathematica. However, the verbeia.com code that was offered had problems (incorrectly normalized eigenvectors, computational errors). As a better alternative, I'd like to post my Johansen test code here which I believe is correct. I've compared the output of this code with the output of the Matlab Johansen code in the Spatial Econometrics library and they agree. I've added my Mathematica code as an attachment to this post, "JohansenTest.nb".The code includes a few subroutines that allows the output from the Johansen test to be displayed in a nice tabular form, such as:This table shows the results for a cointegrated portfolio of three Exchange Traded Funds (ETFs), having two cointegrating relationships (r <= 0 and r <= 1) for both the trace and eigen statistics (at > 99% confidence, except for the eigen statistic for r <= 0, which is > 95% confidence).I use this code to generate the weights for a cointegrated porfolio of ETFs which I've trading profitably for several months now. I usually set order = 2, and detrend = 1. That seems to give the best results for the portfolios I've looked at. As in Ernie Chan's Algorithmic Trading: Winning Strategies and Their Rationale, I apply a Kalman filter to the ETF data and Johansen weights to improve the trading algorithm performance. If there is interest, I can discuss that in future posts, as well. (Chan's Kalman filter discussion is very incomplete, in my opinion.)I've left a few optional "debug" statements in the code to allow you to check that the matrices are properly normalized. These lines can be deleted. Note that the Johansen weights are the rows of the eigenvector matrix, not the columns (as in the Spatial Economentrics code). I feel this is more consistent with the way that Mathematica handles vectors and matrices.For detail on the equations on which this code is based, see this 2005 article by B.E. Sorenson: Cointegration.I welcome any feedback. Attachments:
11 Replies
Sort By:
Posted 8 days ago
 Amanda has correctly anticipated the direction I was headed in i.e to show that regardless of how small the size of the OOS period relative to the IS period, the Johansen procedure by itself is unable to produce a cointegrating vector capable of yielding a portfolio price process that is stationary out of sample. But her iterative Kalman Filter approach is able to cure the problem.I don't want to gloss over this finding, because it is very important. In our toy problem we know the out-of-sample prices of the constituent ETFs, and can therefore test the stationarity of the portfolio process out of sample. In a real world application, that discovery could only be made in real time, when the unknown, future ETFs prices are formed. In that scenario, all the researcher has to go on are the results of in-sample cointegration analysis, which demonstrate that the first cointegrating vector consistently yields a portfolio price process that is very likely stationary in sample (with high probability). The researcher might understandably be persuaded, wrongly, that the same is likely to hold true in future. Only when the assumed cointegration relationship falls apart in real time will the researcher then discover that it's not true, incurring significant losses in the process, assuming the research has been translated into some kind of trading strategy.A great many researchers have been down exactly this path, learning this important lesson the hard way. Nor do additional "safety checks" such as, for example, also requiring high levels of correlation between the constituent processes add much value. They might offer the researcher comfort that a "belt and braces" approach is more likely to succeed, but in my experience it is not the case: the problem of non-stationarity in the out of sample price process persists.For a more detailed discussion of the problem see this post: Why Statistical Arbitrage Breaks DownI was hitherto unaware of any methodology for tackling this problem, which is why Amanda's discovery is so important. As she demonstrates in her latest post, the iterative Kalman Filter approach is capable of producing a stationary out of sample process, based on the initial estimates of the cointegrating vector derived from the Johansen procedure. In fact, Amanda's discovery is important in two fields of econometric research: cointegration theory and the theory of Kalman Filters in modeling inter-asset relationships where, as with the Johansen procedure, KF models have traditionally suffered from difficulties associated with nonstationarity in the out of sample period. It's a tremendous achievement.So, despite the fact that Amanda has leapt ahead to the finish line, I shall continue to plod along because, firstly, only by implementing the methodology can I be sure that I have properly and fully understood it and, secondly, as one discovers as one progresses in the field of quantitative research, fine details are often very important. So I am hoping that Amanda will provide additional guidance if I stray too far off piste in the forthcoming exposition.
Posted 9 days ago
 You're correct. I was thinking about the dynamic weights from the Kalman filter. However, when we use the static weights from the Johansen test, we lose the stationarity for out-of-sample data. So, for example, when I apply the unit root test to my weighted portfolio, using the Johansen (static) weights, I get:In-sample data length = 289, Johansen weightsp = 0.0718However, when I calculate the Johansen coefficients using only the first 189 data points, and then look at unit root test, I get:In-sample data length = 189, Johansen weightsp = 0.109Out-of-sample data length = 100, Johansen weightsp = 0.587Clearly, the out-of-sample period cannot be considered stationary. The situation is not helped by going to a larger in-sample (smaller out-of-sample) period, as you point out.Now, however, let's look at the same situation except using the dynamic weights from the Kalman filter. For the full sample length:In-sample data length = 289, Kalman weightsp = 5.5 x 10^-11Much higher confidence of stationarity! Now, for in-sample/out-of-sample:In-sample data length = 189, Kalman weightsp = 1.76 x 10^-9Out-of-sample data length = 100, Kalman weightsp = 3.722 x 10^-7Still, very good. However, it may be argued that I'm cheating here because I used the entire array of data to calculate the Kalman filter parameters (transition matrix, noise variance/covariances, initial state covariance). So I re-calculated the in-sample/out-of-sample weights using only in-sample data to calculate these parameters:In-sample data length = 189, Kalman weights, revised Kalman parameter calculationsp = 0.000211Out-of-sample data length = 100, Kalman weights, revised Kalman parameter calculationsp = 0.0871The out-of-sample p-value for the unit root test is not as good, but still what I would consider stationary. Furthermore, let's look at a smaller out-of-sample (larger in-sample) period:In-sample data length = 239, Kalman weights, revised Kalman parameter calculationsp = 7.1 x 10^-10Out-of-sample data length = 50, Kalman weights, revised Kalman parameter calculationsp = 0.0000548Using the Kalman filter weights, the stationarity of the out-of-sample period appears to be dependent on the size of the in-sample/out-of-sample periods. A shorter out-of-sample period gives a much smaller p-value for the unit root test. Now, considering that I update my Kalman filter parameters once per week, my out-of-sample period is only 1 time step. Therefore, the loss in stationarity should be very small.
Posted 9 days ago
 Before we delve into the Kalman Filter model, its worth pointing out that the problem with the nonstationarity of the out-of-sample estimated portfolio values is not mitigated by adding more in-sample data points and re-estimating the cointegrating vector(s): IterateCointegrationTest[data_, n_] := Module[{isdata, osdata, JT, isportprice, osportprice}, isdata = Take[data, All, n]; osdata = Drop[data, 0, n]; JT = JohansenTest[isdata, 2, 1]; isportprice = JT[[2, 1]].isdata; osportprice = JT[[2, 1]].osdata; {UnitRootTest[isportprice], UnitRootTest[osportprice]}]; We continue to add more in-sample data points, reducing the size of the out-of-sample dataset correspondingly. But none of the tests for any of the out-of-sample datasets is able to reject the null hypothesis of a unit root in the portfolio price process: ListLinePlot[ Transpose@ Table[IterateCointegrationTest[stockprices, 52*14 + i], {i, 1, 50}], PlotLegends -> {"In-Sample", "Out-of-Sample"}] 
Posted 10 days ago
 Of course, since you are only updating the model weekly you wouldn't need to use MMA at all during the week. There's a subtlety here. I update my Kalman filter parameters (noise variances/covariances, initial values, etc.) once per week. However, I calculate the Kalman filter weights (using these parameters) for the latest real-time data point in real-time. Basically, I append the latest real-time data point to the weekly data series and run a single iteration of the Kalman filter. This gives me optimal weights for the current prices.As to order entry, I've actually written code to automate order entry and I've done some simulated trading which looked good. However, trusting the code with my funds makes me a bit nervous. I prefer to enter the orders manually for now, but I may experiment with automated order entry in the future. The ETFs I'm trading are pretty liquid, and it's important that all of the legs of the trade get executed simultaneously (otherwise you risk significant losses if only one or two legs of the trade execute), so I use market orders. I've been watching the fills that I get and they seem reasonable. Another question is how to treat open positions held over a w/e when models get updated. Yes, this is a tricky issue. The problem is that the Kalman filter parameters change with the update, and so the statistic I'm using shifts a little between Friday close and Monday open. Therefore, if I have a decent profit near the close on Friday, I'll often sell my positions even if I haven't quite hit my "sell" limit. If I do hold the positions over the weekend, it's not catastrophic. I just sell when the limit is reached with the new statistic, although it may mean my profit is less than what I estimated it would be the previous week. On one occasion, I even had a loss as the statistic moved past my sell limit after the Monday open before the positions had turned profitable. One solution would be to update my Kalman filter parameters less often, say, once per month. However, that makes the weights more out-of-sample (as I get further into the month), which might reduce profitability. For now, once-per-week seems to be working OK. Finally, I'm using prices, not log(prices), because my backtesting has indicated that using log(prices) is less profitable.Thanks for bringing up these important practical considerations! I've thought about them, but I'm still getting a handle on all these issues.
Posted 10 days ago
Posted 11 days ago
 A problem with out-of-sample testing is that market structure can shift so that relationships (such as cointegration) may start to break down. One way to try to minimize this effect is to update your Johansen coefficients more frequently. In backtesting, I update the Johansen coefficients weekly, being careful to use only past data to calculate the current portfolio weights at any time point. (I think this is called "walk forward".) This reflects how I actually use the function in practice. In effect, my out-of-sample period is always one time step. This gives better backtest results, but because I'm avoiding look-ahead bias, it's valid. That's what I did in the backtest I described in a previous reply. You can even track the trace/eigen-statistics over time to make sure that the cointegration is not falling apart. Also, the Kalman filter dynamically adjusts the Johansen weights so that the weighted price series is more stationary.
Posted 11 days ago
 Jonathan,I wrote my Kalman Filter routine in Mathematica, from scratch. This way I know exactly what it does. Regarding your questions:1) My backtesting showed average yearly returns (AYRs) in the 30% - 40% range (over a 6-year period), with a maximum drawdown under 10%. This was with fixed entry/exit limits, and 100% of my cash in and out. However, in live trading, what I do is put on 50% of my position when I cross one limit, another 25% when I cross another limit, etc., so that I reduce my drawdown if I get a large excursion in the statistic (say, 2 or 3 standard deviations), while capturing some returns on the smaller excursions (1 standard deviation). I really feel that I need to see how my track record goes with live trading. That's what counts.2) I wrote a small routine to download real-time ETF data from nasdaq.com. Basically, I use the Mathematica URLRead function and screen scrape for the real-time quote. I use the Mathematica Dynamic function to do this, and update the plot and recommended positions, automatically once per minute. Real-time 1-minute data is good enough for my purposes. I enter the orders manually on a multi-order trading screen. I've got a system that keeps the lag to a few seconds. Again, good enough for my purposes. 3) Yes, GARCH can be useful to show changes in volatility. I haven't implemented that. However, I've recently applied the Mathematica functions HiddenMarkovProcess and FindHiddenMarkovStates to detect and display a shift from a low volatility state to a high volatility state (and vice-versa) in my statistic. It's mainly for informational purposes. (I basically highlight areas of the plot with white or light-gray background, depending on whether I'm in a low-volatility state or a high volatility state.) It may affect when I place my trades. Too early to say yet. A big issue for me is how best to display the information so that I can easily and quickly react and trade when needed.
Posted 11 days ago
 So I thought it might be useful to work through an example, to try to make the mechanics clear. I'll try to do this is stages so that others can jump in along the way, if they want to.Start with some weekly data for an ETF triplet analyzed in Ernie Chan's book: tickers = {"EWA", "EWC", "IGE"}; period = "Week"; nperiods = 52*15; finaldate = DatePlus[Today, {-1, "BusinessDay"}]; After downloading the weekly close prices for the three ETFs we divide the data into 14 years of in-sample data and 1 year out of sample:  stockdata = FinancialData[#, "Close", {DatePlus[finaldate, {-nperiods, period}], finaldate, period}, "DateList"] & /@ tickers; stockprices = stockdata[[All, All, 2]]; isprices = Take[stockprices, All, 52*14]; osprices = Drop[stockprices, 0, 52*14]; We then apply Amanda's JohansenTest function: JT = JohansenTest[isprices, 2, 1] We find evidence of up to three cointegrating vectors at the 95% confidence level:Let's take a look at the vector coefficients (laid out in rows, in Amanda's function):We now calculate the in-sample and out-of-sample portfolio values using the first cointegrating vector: isportprice = (JT[[2, 1]]*100).isprices; osportprice = (JT[[2, 1]]*100).osprices; The portfolio does indeed appear to be stationary, in-sample, and this is confirmed by the unit root test, which rejects the null hypothesis of a unit root: ListLinePlot[isportprice]  UnitRootTest[isportprice] 0.000232746Unfortunately (and this is typically the case) the same is not true for the out of sample period: ListLinePlot[osportprice]  UnitRootTest[osportprice] 0.121912We fail to reject the null hypothesis of unit root in the portfolio process, out of sample.I'll press pause here before we go on to the next stage, which is Kalman Filtering.
Posted 11 days ago
 Amanda, I think you may have hit on something very important. As you point out, the determination of the variance/covariances is critical and the adaptive tuning procedure you recommend appears very successful in stabilizing the portfolio, making it suitable for a stat-arb strategy.As you saw, I did not use MMA in my own implementation because I felt that Wolfram's approach was somewhat unsympathetic to the needs of the economic researcher (vs. say the requirements of an engineer), compared to the available alternatives. I see that I am not entirely alone in that assessment: here, for instance. So I am delighted that you have successfully implemented this in MMA, presumably using KalmanEstimator(?). Or did you build the model from scratch?I will run a few tests on your Johansen code and attempt to build a KF model in MMA using some of the ETF pairs/triplets Ernie discusses in his book and compare the results. Meanwhile, I wondered if you could comment on the following:1) While the initial trading performance appear very encouraging, what kind of performance results did the backtest produce, out of sample?2) You mention that you update the model using weekly data and then trade it intraday during the following week. So presumably you are getting real-time market data into MMA somehow: via the Finance Platform, perhaps? And do you trade the signals via that platform, or some other way (manually)?3) One extension that i found quite useful in my own research was to fit a GARCH model to the residuals and use this to determine the trade entry/exit points. But that procedure was probably only useful because of the nonstationarity in the portfolio returns process. If you have succeeded in dealing with that key issue at a more fundamental level, a GARCH extension is probably superfluous.