I need to calculate values of the InverseCDF function of the noncentral Student T distribution. This is used for constructing statistical tolerance intervals. The values most commonly used are tabulated, but occasionally one needs values not found in the tables, or needs to be able to update the value when the parameters change.
The function to compute is: InverseCDF[NoncentralStudentTDistribution[s1-1,d],cl]
Note: Because of a symmetry in this function, one could equally well calculate: InverseCDF[NoncentralStudentTDistribution[s1-1,-d],1-cl]. The result would be the same. But this trick has not helped with the difficulties described below.
Here s1 is related to your sample size, but in some applications it won't be an integer (but it is positive). The order of s1 could be in the tens, hundreds, thousands, etc.
d is the non-centrality parameter that is related to the sample size and to the probability you are trying to cover (p). Normally d= Sqrt[s1]*InverseCDF[NormalDistribution[0,1],p], where p can be something like 0.9, 0.95, 0.99, etc. (less than 1). For these values of p, the InverseCDF of the normal distribution is usually between one and two. The calculation of the InverseCDF for the normal distribution is not the problem.
In some applications, the sample size used for calculating d will be different than s1 (say s2), but for this example I used the same.
cl is the confidence level one is trying to cover, something like 0.9, 0.95, 0.99, etc. (less than 1).
I was trying this using p=cl=0.95 and using the same sample size for d, so that the result is just a function of the number I use for s1 (I used integers in this example). I checked whether the results were correct by plugging the answer back into CDF and checking whether one gets 0.95 (I also used alternative checks to confirm that the errors I found are in the calculation of the InverseCDF, rather than in CDF itself). I noticed the following:
1) The calculation is fast and accurate for small sample sizes (<100).
2) For some values (e.g., 150) I get warnings about convergence issues in NIntegrate, but the result is accurate. The warning may go away for larger values (200). The calculation starts taking long (~ 1minute, although I did not time it accurately).
3) For larger values (420) I get inaccurate results (plugging into CDF gives ~0.503 instead of 0.95) with no warnings about convergence.
I tried adjusting the options in NIntegrate but it did not help.
I tried coding a simple bisection procedure to invert the CDF function and the results are much better. The calculation is faster and the results are accurate even when s1 is in the thousands, but I still get warnings about convergence. The difficulty in coding the bisection is finding an interval that always brackets the solution, but that is fairly tight. I found such bounds by looking into the theory of the non-central student distribution, but there is probably room for improvement.
I tried to clarify what is causing the numerical difficulties. The CDF is the integral of the PDF. As the sample size (s1) grows, the peak of the PDF is not getting narrower (I think I found that the half-width approaches a finite value), but the peak is moving farther and farther away from zero, so, it is becoming narrower relative to the intervals being considered in the integration.
Clearly this is a bug, since it gives an incorrect result without printing a warning. It is also clear that Mathematica could do the calculation better (as when I code a bisection procedure).
Is there a way to influence the way Mathematica inverts the CDF function?
Is there a different way to do this in the statistical packages (I am not very familiar with them myself)?
Another thing I plan to check is whether version 10 gives better results.
Thanks,
O.L.