Hi Warren,
How is your research progressing? I read the paper and I am deeply sceptical. The results are too good to be true, and suggest some kind of feed-forward of information.
Here is a link to the stem paper: https://tinyurl.com/4nuvwb5x
As you can see, the approach is very similar.
I have no particular expertise in wavelet transforms, but I suspect the issue lies there. Neither the original research, nor this sequel, discussed the use of wavelet transforms in sufficient detail to be sure, but my guess is that they are preprocessing the entire dataset with wavelet transforms before applying it to the inputs of the stacked auto-encoders. This embeds future information into the wavelet transforms, since the coefficients are estimated using the entire dataset.
If I am right, this is an elementary error disguised by the use of a fancy denoising procedure. You sometimes see a similar error where researchers standardize the data using the mean and standard deviation of the entire dataset, which embeds future information about the process into the transformed data. This can lead to spuriously accurate prediction results.
The correct procedure, of course, is to de-noise the in-sample data only, and separately de-noise the out-of-sample data used for testing purposes. The researchers do use a set of rolling train/test datasets, which is fine, except that each training & test set should be de-noised individually.
That's my initial take. Perhaps someone with specific expertise on wavelet transforms can chime in and give us their view.
PS: This paper echos my concerns about the incorrect use of wavelets in a forecasting context and highlights the concerns I expressed above:
The incorrect development of these wavelet-based forecasting models occurs during wavelet decomposition (the process of extracting high- and low-frequency information into different sub-time series known as wavelet and scaling coefficients, respectively) and as a result introduces error into the forecast model inputs. The source of this error is due to the boundary condition that is associated with wavelet decomposition (and the wavelet and scaling coefficients) and is linked to three main issues: 1) using ‘future data’ (i.e., data from the future that is not available); 2) inappropriately selecting decomposition levels and wavelet filters; and 3) not carefully partitioning calibration and validation data.