Message Boards Message Boards

GROUPS:

A DL framework for financial time series using WT SAE and LSTM

Posted 4 months ago
1215 Views
|
18 Replies
|
1 Total Likes
|

Ahead of denoising OHLC data using a Haar DWT I am getting an error with AdjustedOHLCV despite using what appears to be the correct function and syntax. The first step is to apply a Haar wavelet to the OHLCV data before feeding this to technical indicators. Thank you in advance for any input here to replicate this paper's results in Wolfram if that's possible.

POSTED BY: Warren Tou
18 Replies
Posted 4 months ago

Hi Warren,

There is a syntax error, a missing comma, in the FinancialData expression. Try

data = FinancialData["SP500", "AdjustedOHLCV", {{2008, 7, 01}, {2016, 9, 30}}]
POSTED BY: Rohit Namjoshi

Hi Warren,

How is your research progressing? I read the paper and I am deeply sceptical. The results are too good to be true, and suggest some kind of feed-forward of information.

Here is a link to the stem paper: https://tinyurl.com/4nuvwb5x As you can see, the approach is very similar.

I have no particular expertise in wavelet transforms, but I suspect the issue lies there. Neither the original research, nor this sequel, discussed the use of wavelet transforms in sufficient detail to be sure, but my guess is that they are preprocessing the entire dataset with wavelet transforms before applying it to the inputs of the stacked auto-encoders. This embeds future information into the wavelet transforms, since the coefficients are estimated using the entire dataset.

If I am right, this is an elementary error disguised by the use of a fancy denoising procedure. You sometimes see a similar error where researchers standardize the data using the mean and standard deviation of the entire dataset, which embeds future information about the process into the transformed data. This can lead to spuriously accurate prediction results.

The correct procedure, of course, is to de-noise the in-sample data only, and separately de-noise the out-of-sample data used for testing purposes. The researchers do use a set of rolling train/test datasets, which is fine, except that each training & test set should be de-noised individually.

That's my initial take. Perhaps someone with specific expertise on wavelet transforms can chime in and give us their view.

PS: This paper echos my concerns about the incorrect use of wavelets in a forecasting context and highlights the concerns I expressed above:

The incorrect development of these wavelet-based forecasting models occurs during wavelet decomposition (the process of extracting high- and low-frequency information into different sub-time series known as wavelet and scaling coefficients, respectively) and as a result introduces error into the forecast model inputs. The source of this error is due to the boundary condition that is associated with wavelet decomposition (and the wavelet and scaling coefficients) and is linked to three main issues: 1) using ‘future data’ (i.e., data from the future that is not available); 2) inappropriately selecting decomposition levels and wavelet filters; and 3) not carefully partitioning calibration and validation data.

POSTED BY: Jonathan Kinlay

8-20-2022 Hi Jonathan and Warren, Very interesting and hope it is OK for me to submit a few questions re. validation. Would the application of AI neuron layers also involve the use of training data followed by analysis of future incoming data? Could another AI then be trained to analyze for errors and inconsistencies?

I think what I am saying is that the whole methodology may be invalidated by the method used to de-noise the dataset. I don't see a way forward from there, other than to do the analysis properly, i.e. de-noise each individual in-sample and out-of-sample dataset independently. I rather doubt whether the results will be very interesting after that. I'm trying to summon the energy to do the analysis to demonstrate my point.

POSTED BY: Jonathan Kinlay

8-20-22 Perhaps from a statistical point of view, a more feasible smaller random sample could be analyzed that could supply a probability value for the findings?

Well ok, perhaps I'll take a look.

So I think something like this to start, except that, as I said before, we want to do this on each in-sample dataset separately, not on the whole dataset. The variables closely match those described in the paper (see headers for details). Of course, I don't know the parameter values they used for most of the technical indicators, but it possibly doesn't matter all that much.

Note that I am applying DWT using the Haar wavelet twice: once to the original data and then again to the transformed data. This has the effect of filtering out higher frequency "noise" in the data, which is the object of the exercise. If follow this you will also see that the DWT actually adds noisy fluctuations to the USDollar index and 13-Wek TBill series. So I'm thinking these should probably be excluded from the de-noising process. We don't have sufficient detail in the paper to know exactly what the researchers did originally.

POSTED BY: Jonathan Kinlay

You can see how, for example the DWT denoising process removes some of the higher frequency fluctuations from the opening price:

enter image description here

POSTED BY: Jonathan Kinlay

While introducing unwanted fluctuations in the US Dollar Index: enter image description here

POSTED BY: Jonathan Kinlay

So it's increasingly clear that it makes no sense to use DWT to modify the technical indicators, or the US Dollar and TBill series - all that this achieves is to introduce cyclical fluctuations into series that are already smooth (i.e. much smoother than the price series).

Thus, although it doesn't say this in the paper, one should apply DWT transformations only to the O/H/L/C/V price series for the index of interest - DJ Industrial Average, in this case.

POSTED BY: Jonathan Kinlay

So this is a brief excursion into some of the rest of the methodology outlined in the paper. First up, we need to produce data for training, validation and testing. I am doing this for just the first batch of data. We would then move the window forward + 3 months, rinse and repeat.

Note that

(1) The data is being standardized. If you dont do this the outputs from the autoencoders is mostly just 1s and 0s. Same happens if you use Min/Max scaling.

(2) We use the mean and standard deviation from the training dataset to normalize the test dataset. This is a trap that too many researchers fall into - standardizing the test dataset using the mean and standard deviation of the test dataset is feeding forward information.

In[33]:= startISIdx = 1;
endISIdx = -1 + 
      Position[wlDates, x_ /; x > DateObject[{2009, 12, 31}, "Day"]] //
      First // First // Quiet;
endISDate = wlDates[[endISIdx]];

In[134]:= startValIdx = endISIdx + 1;
startValDate = wlDates[[startValIdx]];
endValDate = DatePlus[startValDate, Quantity[3, "Months"]];
endValIdx = 
  Position[wlDates, x_ /; x >= endValDate] // First // First // 
   Quiet;
endValDate = wlDates[[endValIdx]];
startTestIdx = endValIdx + 1;
startTestDate = wlDates[[startTestIdx]];
endTestDate = DatePlus[startTestDate, Quantity[3, "Months"]];
endTestIdx = 
  Position[wlDates, x_ /; x >= endTestDate] // First // First // Quiet;
endTestDate = wlDates[[endTestIdx]];

In[147]:= 
inputData = Standardize[transformedData[[startISIdx ;; endISIdx]]];
valData = 
  Standardize[transformedData[[startValIdx ;; endValIdx]], isRange];
testData = transformedData[[startTestIdx ;; endTestIdx]];
isMean = Mean[transformedData[[startISIdx ;; endISIdx]]] ;
isMean = ArrayReshape[isMean, Dimensions@testData, isMean];
isStDev = 
  StandardDeviation[transformedData[[startISIdx ;; endISIdx]]] ;
isStDev = ArrayReshape[isStDev, Dimensions@testData, isStDev];
testData = 
  (transformedData[[startTestIdx ;; endTestIdx]) - isMean) / isStDev;

In[116]:= Dimensions /@ {inputData, valData}

Out[116]= {{503, 17}, {63, 17}}
POSTED BY: Jonathan Kinlay

Next we build the stacked autoencoder network:

Auto-Encoders

In[117]:= AutoEncoders = {};
Trainers = {};
TrainedEncoders = {};

In[120]:= hiddenLayerSize = 10;

In[121]:= Do[If[i == 1, 
  AutoEncoders = {NetChain[{LogisticSigmoid, nvars, LogisticSigmoid, 
      hiddenLayerSize, LogisticSigmoid, nvars}]}, 
  AppendTo[AutoEncoders, 
   NetChain[{LogisticSigmoid, hiddenLayerSize, LogisticSigmoid, 
     hiddenLayerSize/2, LogisticSigmoid, hiddenLayerSize}]]];
 AppendTo[Trainers, 
  NetGraph[{AutoEncoders[[i]], 
    MeanSquaredLossLayer[]}, {1 -> NetPort[2, "Input"], 
    NetPort["Input"] -> NetPort[2, "Target"]}]], {i, 4}]

In[122]:= Do[If[i == 1, trainingData = inputData; 
   trainedAutoencoder = 
    NetTrain[Trainers[[i]], <|"Input" -> trainingData|>, 
     TargetDevice -> "GPU" ];
   TrainedEncoders = {NetTake[NetExtract[trainedAutoencoder, 1], 5]};,
   lastEncoder = TrainedEncoders[[i - 1]];
   trainingData = lastEncoder[trainingData];
   trainedAutoencoder = 
    NetTrain[Trainers[[i]], <|"Input" -> trainingData|>, 
     TargetDevice -> "GPU" ];
   AppendTo[TrainedEncoders, NetExtract[trainedAutoencoder, 1]]];,
 {i, 4}]
POSTED BY: Jonathan Kinlay

Now we can produce the output from the autoencoder stack for both the training and test data:

encodedInputData = 
  TrainedEncoders[[4]][
   TrainedEncoders[[3]][
    TrainedEncoders[[2]][TrainedEncoders[[1]][inputData]]]];

encodedTestData = 
  TrainedEncoders[[4]][
   TrainedEncoders[[3]][
    TrainedEncoders[[2]][TrainedEncoders[[1]][testData]]]];
POSTED BY: Jonathan Kinlay

Before we plow on any further lets do a sanity test. We'll use the Predict function to see if we're able to get any promising-looking results. here we are building a predictor that maps the autoencoded training data to the corresponding closing prices of the index, one step ahead.

pf = Predict[
  encodedInputData -> data[[startISIdx + 1 ;; endISIdx + 1, 4]] ]

Next we use the predictor on the test dataset to produce 1-step-ahead forecasts for the closing price of the index:

forecasts = pf[encodedTestData];

Finally, we construct a trading model, as described in the paper, in which we go long or short the index depending on whether the forecast is above or below the current index level. The results do not look good:

ListLinePlot@
 Accumulate[
  Sign[forecasts - data[[startTestIdx ;; endTestIdx, 4]]]* 
   Differences[data[[startTestIdx ;; 1 + endTestIdx, 4]]]]

enter image description here

POSTED BY: Jonathan Kinlay

Now, admittedly, an argument can be made that a properly constructed LSTM model would outperform a simple gradient-boosted tree - but not by the amount that would be required to improve the prediction accuracy from around 50% to nearer 95%, the level claimed in the paper. At most I would expect to se a 1% to 5% improvement in forecast accuracy.

So what this suggests to me is that the researchers have got something wrong, allowing forward information to leak into the modeling process. The most likely culprits are:

  1. Applying DWT transforms to the entire dataset, instead of the training and test sets individually
  2. Standardzing the test dataset using the mean and standard deviation of the test dataset, instead of the training data set
POSTED BY: Jonathan Kinlay

There's a much more complete attempt at replicating the research in this Git repo

As the author writes:

My attempts haven't been succesful so far. Given the very limited comments regarding implementation in the article, it may be the case that I am missing something important, however the results seem too good to be true, so my assuption is that the authors have a bug in their own implementation. I would of course be happy to be proven wrong about this statement ;-)

POSTED BY: Jonathan Kinlay

So this is a brief excursion into some of the rest of the methodology outlined in the paper. First up, we need to produce data for training, validation and testing. I am doing this for just the first batch of data. We would then move the window forward + 3 months, rinse and repeat.

Note that

(1) The data is being standardized. If you dont do this the outputs from the autoencoders is mostly just 1s and 0s. Same happens if you use Min/Max scaling.

(2) We use the mean and standard deviation from the training dataset to normalize the test dataset. This is a trap that too many researchers fall into - standardizing the test dataset using the mean and standard deviation of the test dataset is feeding forward information.

In[33]:= startISIdx = 1;
endISIdx = -1 + 
      Position[wlDates, x_ /; x > DateObject[{2009, 12, 31}, "Day"]] //
      First // First // Quiet;
endISDate = wlDates[[endISIdx]];

In[134]:= startValIdx = endISIdx + 1;
startValDate = wlDates[[startValIdx]];
endValDate = DatePlus[startValDate, Quantity[3, "Months"]];
endValIdx = 
  Position[wlDates, x_ /; x >= endValDate] // First // First // 
   Quiet;
endValDate = wlDates[[endValIdx]];
startTestIdx = endValIdx + 1;
startTestDate = wlDates[[startTestIdx]];
endTestDate = DatePlus[startTestDate, Quantity[3, "Months"]];
endTestIdx = 
  Position[wlDates, x_ /; x >= endTestDate] // First // First // Quiet;
endTestDate = wlDates[[endTestIdx]];

In[147]:= 
inputData = Standardize[transformedData[[startISIdx ;; endISIdx]]];
valData = 
  Standardize[transformedData[[startValIdx ;; endValIdx]], isRange];
testData = transformedData[[startTestIdx ;; endTestIdx]];
isMean = Mean[transformedData[[startISIdx ;; endISIdx]]] ;
isMean = ArrayReshape[isMean, Dimensions@testData, isMean];
isStDev = 
  StandardDeviation[transformedData[[startISIdx ;; endISIdx]]] ;
isStDev = ArrayReshape[isStDev, Dimensions@testData, isStDev];
testData = 
  isMean + 
   isStDev*Standardize[transformedData[[startTestIdx ;; endTestIdx]]];

In[116]:= Dimensions /@ {inputData, valData}

Out[116]= {{503, 17}, {63, 17}}
POSTED BY: Jonathan Kinlay

The moral of this story is as follows:

Mistrust any research that: 1) Is not by recognized researchers 2) Is not published in a reputable journal 3) Uses lots of pretty, colored pictures but skimps on important details of the methodology 4) Makes extraordinary claims

All four red flags are present in this case.

POSTED BY: Jonathan Kinlay
Posted 3 months ago

Hi Jonathan

Thank you for your detailed steps of the wavelet transform, links and comments. Its a big help as a) I have very little prior experience with wavelet transforms and b) have been struggling to make any progress with either Haar and Shannon DWT using Wolfram and c) did not expect anyone to respond!

Taken together my sense of this paper is 1. DWT are not well suited to de noising financial time series indicators. 2. Similar to Fourier transforms they introduce yet another layer of complexity and possible side effects/interactions that increase model risk. Charts showing transformed series are one view of that. 3. Bandpass/Lowpass filters are still a very credible alternative for de-noising OHLCV by comparison. 4. Without the full code supplied by the authors for this paper, look ahead and peeking bias is the more likely driver of the very high predictive accuracy as opposed to the efficacy or parameterization of wavelet transfers. The generic set of indicators and exclusion of any volume or entropy based indicators adds weight to that. 5. Too good to be true is usually...

I'm not done with filters/de-noising so will post follow up work here even if its does not track this paper's workflow exactly.

Thank you again Jonathan. Your help on this has been incredible and really appreciate the time and energy taken to do this.

POSTED BY: Warren Tou
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract