Message Boards Message Boards

GROUPS:

Quantile regression through linear programming

Posted 8 years ago
24277 Views
|
13 Replies
|
21 Total Likes
|
Hi All,
I just published a blog post proclaiming the implementation of a Mathematica package for quantile regression through linear programming. The blog post has couple of examples and several reference links:

http://mathematicaforprediction.wordpress.com/2013/12/16/quantile-regression-through-linear-programming/

The implementation with Minimize is fairly straightforward. The linear programming implementation is slightly harder and also care has to be taken when the data has negative values. 



Mathematica code difnitions:
 QuantileRegressionFit::"nmat" = "The first argument is expected to be a matrix of numbers with two [/size][size=2]columns.";
 QuantileRegressionFit::"fvlen" = "The second argument is expected to be list of functions to be fitted with at least one element.";
 QuantileRegressionFit::"nvar" = "The third argument is expected to be a symbol.";
 QuantileRegressionFit::"nqntls" = "The fourth argument is expected to be a list of numbers representing quantiles.";
 QuantileRegressionFit::"nmeth" = "The value of the method option is expected to be LinearProgramming, Minimize, or a list with LinearProgramming or Minimize as a first element.";
 QuantileRegressionFit::"mmslow" = "With the method Minimize the computations can be very slow for large data sets.";
 
 
 Clear[QuantileRegressionFit]
Options[QuantileRegressionFit] = {Method -> LinearProgramming};
QuantileRegressionFit[data_, funcs_, var_, qsArg_, opts : OptionsPattern[]] :=
  Block[{mOptVal, qs},
   qs = If[NumericQ[qsArg], {qsArg}, qsArg];
   (*This check should not be applied because the first function can be a constant.*)
   (*!Apply[And,Map[!FreeQ[#,var]&,funcs]],Message[
   LPQuantileRegressionFit::\"fvfree\"],*)
   Which[
    ! (MatrixQ[data] && Dimensions[data][[2]] >= 2),
    Message[LPQuantileRegressionFit::"nmat"]; Return[{}],
    Length[funcs] < 1,
    Message[LPQuantileRegressionFit::"fvlen"]; Return[{}],
    Head[var] =!= Symbol,
    Message[LPQuantileRegressionFit::"nvar"]; Return[{}],
    ! VectorQ[qs, NumericQ[#] && 0 <= # <= 1 &],
    Message[LPQuantileRegressionFit::"nqntls"]; Return[{}]
   ];
   mOptVal = OptionValue[QuantileRegressionFit, Method];
   Which[
    mOptVal === LinearProgramming,
    LPQuantileRegressionFit[data, funcs, var, qs],
    ListQ[mOptVal] && mOptVal[[1]] === LinearProgramming,
    LPQuantileRegressionFit[data, funcs, var, qs, Rest[mOptVal]],
    mOptVal === Minimize,
    MinimizeQuantileRegressionFit[data, funcs, var, qs],
    ListQ[mOptVal] && mOptVal[[1]] === Minimize,
    MinimizeQuantileRegressionFit[data, funcs, var, qs, Rest[mOptVal]],
    True,
    Message[QuantileRegressionFit::"nmeth"]; Return[{}]
   ]
  ];


Clear[LPQuantileRegressionFit]
LPQuantileRegressionFit[dataArg_?MatrixQ, funcs_, var_Symbol, qs : {_?NumberQ ..}, opts : OptionsPattern[]] :=
  Block[{data = dataArg, yMedian = 0, yFactor = 1, yShift = 0, mat, n = Dimensions[dataArg][[1]], pfuncs, c, t, qrSolutions},
   If[Min[data[[All, 2]]] < 0,
    yMedian = Median[data[[All, 2]]];
    yFactor = InterquartileRange[data[[All, 2]]];
    data[[All, 2]] = Standardize[data[[All, 2]], Median, InterquartileRange];
    yShift = Abs[Min[data[[All, 2]]]];(*this is Min[
    dataArg\[LeftDoubleBracket]All,2\[RightDoubleBracket]]-Median[
    dataArg\[LeftDoubleBracket]All,2\[RightDoubleBracket]]*)
   
    data[[All, 2]] = data[[All, 2]] + yShift ;
   ];
   
   pfuncs = Map[Function[{fb}, With[{f = fb /. (var -> Slot[1])}, f &]], funcs];
   mat = Map[Function[{f}, f /@ data[[All, 1]]], pfuncs];
   mat = Map[Flatten, Transpose[Join[mat, {IdentityMatrix[n], -IdentityMatrix[n]}]]];
   
   qrSolutions =
    Table[
     c = Join[ConstantArray[0, Length[funcs]], ConstantArray[1, n] q, ConstantArray[1, n] (1 - q)];
     t = LinearProgramming[c, mat, Transpose[{data[[All, 2]], ConstantArray[0, n]}], opts];
     If[! (VectorQ[t, NumberQ] && Length[t] > Length[funcs]), ConstantArray[0, Length[funcs]], t]
     , {q, qs}];
   
   If[yMedian == 0 && yFactor == 1,
    Map[funcs.# &, qrSolutions[[All, 1 ;; Length[funcs]]]],
    Map[Expand[yFactor ((funcs.#) - yShift) + yMedian] &, qrSolutions[[All, 1 ;; Length[funcs]]]]
   ]
  ];


Clear[MinimizeQuantileRegressionFit]
MinimizeQuantileRegressionFit[data_?MatrixQ, funcs_, var_Symbol, qs : {_?NumberQ ..}, opts : OptionsPattern[]] :=
  Block[{minFunc, Tilted, QRModel, b, bvars, qrSolutions},
   
   If[Length[data] > 300,
    Message[QuantileRegressionFit::"mmslow"]
    ];
   
   bvars = Array[b, Length[funcs]];
   
   Tilted[t_?NumberQ, x_] := Piecewise[{{(t - 1) x, x < 0}, {t x, x >= 0}}] /; t <= 1;
   QRModel[x_] := Evaluate[(bvars.funcs) /. var -> x];
   
   qrSolutions =
    Table[
     minFunc = Total[(Tilted[q, #1[[2]] - QRModel[#1[[1]]]] &) /@ data];
     Minimize[{minFunc}, bvars, opts]
     , {q, qs}];
   Map[funcs.# &, qrSolutions[[All, 2, All, 2]]]
   ];

Data generation function definition:
Clear[LogarithmicCurveWithNoise]
LogarithmicCurveWithNoise[nPoints_Integer, start_?NumberQ, end_?NumberQ] :=
  Block[{data},
   data =
    Table[{t, 5 + Log[t] + RandomReal[SkewNormalDistribution[0, Log[t]/5, 12]]}, {t, Rescale[Range[1, nPoints],{1, nPoints}, {start, end}]}];
   data
  ];

Data generation:
data = LogarithmicCurveWithNoise[1200, 10, 200];
ListPlot[data]


Regression quantiles calculation:
qs = {0.05, 0.25, 0.5, 0.75, 0.95};
qrFuncs = QuantileRegressionFit[data, {1, x, Sqrt[x], Log[x]}, x, qs]

"Standard" fit:
fFunc = Fit[data, {1, x, Sqrt[x], Log[x]}, x]

Plot all the curves together:
grData = ListPlot[data, PlotRange -> All];
grQReg = Plot[Evaluate[MapThread[Tooltip[#1, #2] &, {Append[qrFuncs, fFunc], Append[Map["\[CurlyRho](" <> ToString[#] <> ",x)" &, qs], "Least squares"]}]], {x, Min[data[[All, 1]]], Max[data[[All, 1]]]}, PlotStyle -> Append[Table[{AbsoluteThickness[2], Blend[{Yellow, Darker[Red]}, i/Length[qs]]}, {i, Length[qs]}], {AbsoluteThickness[2], Darker[Green]}], PlotLegends -> SwatchLegend[Append[Map["\[CurlyRho](" <> ToString[#] <> ",x)" &, qs], "Least squares"]]];
Show[{grData, grQReg}, ImageSize -> 600]
13 Replies
Anton, this sounds great, thank you for sharing! If you wish you could make a post with description of or even introduction to the packages or other additional functionality you develop. We place such discussions in the group Mathematica Add-Ons, - please check it out and see what other power-users like yourself like to write. There you can also post all appropriate links to GitHub pages and other related information.
Anton, this looks very nice and I'd like to understand it better.  Could you post some code that illustrates the method for one data set?
HI Frank,

I can email you the current version of a Quantile Regression demonstation I am working on. I will post some code on this forum later today.

Also, I am very interested in implementing quantile regression with OPL / CPLEX.
I just uploaded a guide for using QuantileRegression to GitHub :
https://github.com/antononcube/MathematicaForPrediction/blob/master/Documentation/Quantile%20regression%20through%20linear%20programming.pdf
The guide has some theoretical explanations and shows how the quantile regression problem can be formulated as a linear programming problem.
Here is a link to another blog post with a couple of examples that demonstrate the robustness of quantile regression:
http://mathematicaforprediction.wordpress.com/2013/12/23/quantile-regression-robustness/ .
Three sets of data are considered: skewed noise over a logarithmic curve, and alternations of it with two types of outliers. (The data has simple heteroscedasticity in it.)

These plots demonstrate the robustness of quantile regression:

Frank I added code to the original post.
Thanks, I downloaded the pdf from github.
I implemented quantile regression with B-splines and added it to the package at GitHub:
https://github.com/antononcube/MathematicaForPrediction/blob/master/QuantileRegression.m

Because of the spline quantile regression implementation I had to rename the function QuantileRegression originally discussed in this post to QuantileRegressionFit. I have made the corresponding changes of the code in this post.
Below is a demonstration of using quatile regression with B-splines for outlier detection in time series.

First we generate some time series data:
tempData = WeatherData["Atlanta", "Temperature", {{2006, 1, 1}, {2013, 12, 31}, "Day"}];
tempPData = tempData;
tempPData[[All, 1]] = DateDifference[tempData[[1, 1]], #, "Day"] & /@ tempData[[All, 1]];
tempPData[[All, 1]] = tempPData[[All, 1, 1]];
tempPData[[All, 2]] = Sqrt[tempPData[[All, 1]]] + tempPData[[All, 2]];
(I took the temperature in Atlanta, GA, USA for the last 7 years and modified the data with the formula Yi = sqrt(Xi) + Yi .)



Next, we compute the regression quantiles with:
qFuncs = QuantileRegression[tempPData, 44, {0.02, 0.2, 0.4, 0.6, 0.8, 0.98}]

Here is a plot with five regression quantiles calculated using B-splines for the quantiles {0.02, 0.2, 0.4, 0.6, 0.8, 0.98}:


Here is another plot with the 0.02 and 0.98 regression quantiles outlining the data:


Note that in order to get satisfactory results, the only tuning value I had to choose is for the second parameter, which in this case specifies the number of B-spline basis knots. The second parameter can also be a list of knots. The option InterpolationOrder specifies the splines order. (By default InterpolationOrder->3.)
Anton, this looks excellent, thank you for sharing.
Thank you, Sam! 
It was really interesting to program quantile regression using first order B-splines in CPLEX's OPL. The code is really clean and concise (see my blog post http://mathematicaforprediction.wordpress.com/2014/01/10/opl-code-for-quantile-regression-with-b-splines/ .)
The reason my OPL implementation to have only first order splines is because OPL provides only one type of piecewise functions — linear piecewise functions. OPL piecewise functions are somewhat awkward to interpret, but relatively easy to specify, using slopes, x coordinates for slope end points, and one {x,y} point.

I published a blog post about Quantile Regression application to finding local extrema -- see Finding local extrema in noisy data using Quantile Regression .

The algorithm has the following steps:

  1. Fit a polynomial through the data (using QuantileRegression).

  2. Find the local extrema of the fitted polynomial. (We will call them fit estimated extrema.)

  3. Around each of the fit estimated extrema find the most extreme point in the data by nearest neighbors search (by using Nearest).

Here is an example that shows noisy oscillatory data, the fitted regression quantiles, and the estimated local extrema:

enter image description here

Here is another example with more timid data:

enter image description here

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract