Message Boards Message Boards

4
|
14267 Views
|
5 Replies
|
9 Total Likes
View groups...
Share
Share this post:

Estimation of conditional density distributions

Posted 11 years ago
I just published a blog post with temperature time series example calculations for a general problem stated as “How to estimate the conditional density of the predicted variable given a value of the conditioning covariate?” http://mathematicaforprediction.wordpress.com/2014/01/13/estimation-of-conditional-density-distributions .

The calculations and examples in the blog post can be summarized in following five steps.

1. Using temperature time series like this one:


2. we make pairs of {yesterday's temperature, today's temperature}:


3. We fit regression quantiles through the temperature pairs data:


4. With the regression quantiles we can estimate the CDF and PDF functions for a given yesterday's temperature:


5. Observations and conclusions.
POSTED BY: Anton Antonov
5 Replies
Very interesting.  It reminds me of the technique used to generate confidence levels for the results of Monte Carlo simulations in which some of the simulation parameters have uncertainty.  The simulation is run repeatedly, with the values of the uncertain parameters drawn from their distributions.  Quantiles representing various levels of uncertainty are then generated from the results.  However, that is simpler than what you did, since the Monte Carlo simulation results are equally spaced, so the quantiles can be generated simply by sorting.  In that language, what you have done is determine the uncertainty in the forecast for tomorrow's temperature, given today's temperature.
POSTED BY: Frank Kampas
Thank you for comments Frank and Nicolas!
1. I am also considering using the Monte Carlo simulations. I plan to combine with Markov chains or prefix trees in order to predict end part of a sequence of temperatures using head part.
2. As for the symmetricity of the plot around the line y = x, it should be expected since the points were derived with
Partition[tempData[[All,1]],2,1]
I.e. and the x-coordinate of a point is going to be the y-coordinate of another point and, of course, the temperature data has seasonality.

3. I need to look into the proposed use of the inverted Clayton copula.
POSTED BY: Anton Antonov
Posted 11 years ago
Hello

I am surprised that the correlation plot is so symmetric around the identity. This means that if you exchange yesterday and today axis, the plot is almost the same.  In turn, this means that the dynamics is symmetric : there is very weak time orientation !

Here is a suggestion for another way to create the model using bivariate probability distribution :

You could maybe model the random variable (yesterday, today) by an inverted Clayton copula (also called HRT) with suitable marginals CDFs. 
The model CDF will be of the form C[ F[ tT ] , G[ yT ] ]. Then, to get the conditional law of probability you have to cut the bivariate PDF model (i.e. make the value constant ) at a particular value of the yesterday temperature yT. So you get a univariate, PDF fonction c[ F[ tT ] , G[ yT=cste ] ] , to be rescaled by probability of yT, which is the probabilty density fonction you are looking for.

If it works this approach could be much more light and general than modelling individual conditionnal probabilities by a familly of distributions. 

What is nice with copula is that they usually have a simple symbolic expression, so if the margins also have one, you get a symbolic closed form of your model.


Nicolas
POSTED BY: Nicolas Venuti
Posted 11 years ago
Hi Anton

2) I don't agree , the cloud of points plot is a Poincaré section of some periodic dynamic signal, and the symmetry is not mandatory.
For example consider :
ListPlot[Partition[Table[Sin[t]/t, {t, 1, 100, 1}], 2, 1]]
which gives a kind of multispiral plot.

3) I did some trials will the Clayton copula, not so sure now it would give the best fit. But quite sure that the copula approach is worth to try.
The Clayton copula has higher correlation for the head of marginals but  Mathematica does not have the inverted Clayton availaible. The Gumbel correlates both head and tail ends, and it is easy to use it in Mathematica.

Here is the idea with the Gumbel :
(I am not used to do fitting of data, so this just gives an idea and this is  far to be a good fit)
dist = CopulaDistribution[{"GumbelHougaard", 4}, {NormalDistribution[16, 8], NormalDistribution[16, 8]}];
ListPlot[RandomVariate[dist, 3000], PlotRange -> {{-10, 35}, {-10, 35}}]
 which gives for 0° a plot which shows that the most probable temperature is higher, this is coherent with your results
Plot[Evaluate[PDF[dist, {Ty, Tt}]]/PDF[NormalDistribution[16, 8], Ty] /. Ty -> 0, {Tt, -5, 10}]

The overall behaviour of the distribution : symmetry in the middle range, and skew trend in the high temperatures seems to be correct,
nevertheless the model is very approximative and far to be as much accurate than with your quantile regression package.
POSTED BY: Nicolas Venuti
Here are couple of examples in support of my symmetricity conjecture -- time series with simple seasonality would produce almost symmetric plots with Parition[#,2,1,]& .
1. Sin with skew normal noise:
tsPoints = Table[{x, Sin[x] + RandomReal[SkewNormalDistribution[0, 0.2, 0.5]]}, {x, 0, 5*2 Pi, 0.02}];
Grid[{{ListPlot[tsPoints, ImageSize -> 350],
ListPlot[Partition[tsPoints[[All, 2]], 2, 1], Axes -> False, Frame -> True, AspectRatio -> Automatic, ImageSize -> 350]}}]

2. Sin with lower frequency Cos with normal noise:
tsPoints = Table[{x, Sin[x] + Cos[x/4] + RandomReal[NormalDistribution[0, 0.2]]}, {x,  0, 5*2 Pi, 0.02}];
[size=2]Grid[{{ListPlot[tsPoints, ImageSize -> 350],
[/size][size=2]ListPlot[Partition[tsPoints[[All, 2]], 2, 1], Axes -> False, [/size][size=2]Frame -> True, AspectRatio -> Automatic, ImageSize -> 350]}}][/size]
POSTED BY: Anton Antonov
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract