Group Abstract Group Abstract

Message Boards Message Boards

Quantile regression through linear programming

Posted 13 years ago
Hi All,
I just published a blog post proclaiming the implementation of a Mathematica package for quantile regression through linear programming. The blog post has couple of examples and several reference links:

http://mathematicaforprediction.wordpress.com/2013/12/16/quantile-regression-through-linear-programming/

The implementation with Minimize is fairly straightforward. The linear programming implementation is slightly harder and also care has to be taken when the data has negative values. 



Mathematica code difnitions:
 QuantileRegressionFit::"nmat" = "The first argument is expected to be a matrix of numbers with two [/size][size=2]columns.";
 QuantileRegressionFit::"fvlen" = "The second argument is expected to be list of functions to be fitted with at least one element.";
 QuantileRegressionFit::"nvar" = "The third argument is expected to be a symbol.";
 QuantileRegressionFit::"nqntls" = "The fourth argument is expected to be a list of numbers representing quantiles.";
 QuantileRegressionFit::"nmeth" = "The value of the method option is expected to be LinearProgramming, Minimize, or a list with LinearProgramming or Minimize as a first element.";
 QuantileRegressionFit::"mmslow" = "With the method Minimize the computations can be very slow for large data sets.";
 
 
 Clear[QuantileRegressionFit]
Options[QuantileRegressionFit] = {Method -> LinearProgramming};
QuantileRegressionFit[data_, funcs_, var_, qsArg_, opts : OptionsPattern[]] :=
  Block[{mOptVal, qs},
   qs = If[NumericQ[qsArg], {qsArg}, qsArg];
   (*This check should not be applied because the first function can be a constant.*)
   (*!Apply[And,Map[!FreeQ[#,var]&,funcs]],Message[
   LPQuantileRegressionFit::\"fvfree\"],*)
   Which[
    ! (MatrixQ[data] && Dimensions[data][[2]] >= 2),
    Message[LPQuantileRegressionFit::"nmat"]; Return[{}],
    Length[funcs] < 1,
    Message[LPQuantileRegressionFit::"fvlen"]; Return[{}],
    Head[var] =!= Symbol,
    Message[LPQuantileRegressionFit::"nvar"]; Return[{}],
    ! VectorQ[qs, NumericQ[#] && 0 <= # <= 1 &],
    Message[LPQuantileRegressionFit::"nqntls"]; Return[{}]
   ];
   mOptVal = OptionValue[QuantileRegressionFit, Method];
   Which[
    mOptVal === LinearProgramming,
    LPQuantileRegressionFit[data, funcs, var, qs],
    ListQ[mOptVal] && mOptVal[[1]] === LinearProgramming,
    LPQuantileRegressionFit[data, funcs, var, qs, Rest[mOptVal]],
    mOptVal === Minimize,
    MinimizeQuantileRegressionFit[data, funcs, var, qs],
    ListQ[mOptVal] && mOptVal[[1]] === Minimize,
    MinimizeQuantileRegressionFit[data, funcs, var, qs, Rest[mOptVal]],
    True,
    Message[QuantileRegressionFit::"nmeth"]; Return[{}]
   ]
  ];


Clear[LPQuantileRegressionFit]
LPQuantileRegressionFit[dataArg_?MatrixQ, funcs_, var_Symbol, qs : {_?NumberQ ..}, opts : OptionsPattern[]] :=
  Block[{data = dataArg, yMedian = 0, yFactor = 1, yShift = 0, mat, n = Dimensions[dataArg][[1]], pfuncs, c, t, qrSolutions},
   If[Min[data[[All, 2]]] < 0,
    yMedian = Median[data[[All, 2]]];
    yFactor = InterquartileRange[data[[All, 2]]];
    data[[All, 2]] = Standardize[data[[All, 2]], Median, InterquartileRange];
    yShift = Abs[Min[data[[All, 2]]]];(*this is Min[
    dataArg\[LeftDoubleBracket]All,2\[RightDoubleBracket]]-Median[
    dataArg\[LeftDoubleBracket]All,2\[RightDoubleBracket]]*)
   
    data[[All, 2]] = data[[All, 2]] + yShift ;
   ];
   
   pfuncs = Map[Function[{fb}, With[{f = fb /. (var -> Slot[1])}, f &]], funcs];
   mat = Map[Function[{f}, f /@ data[[All, 1]]], pfuncs];
   mat = Map[Flatten, Transpose[Join[mat, {IdentityMatrix[n], -IdentityMatrix[n]}]]];
   
   qrSolutions =
    Table[
     c = Join[ConstantArray[0, Length[funcs]], ConstantArray[1, n] q, ConstantArray[1, n] (1 - q)];
     t = LinearProgramming[c, mat, Transpose[{data[[All, 2]], ConstantArray[0, n]}], opts];
     If[! (VectorQ[t, NumberQ] && Length[t] > Length[funcs]), ConstantArray[0, Length[funcs]], t]
     , {q, qs}];
   
   If[yMedian == 0 && yFactor == 1,
    Map[funcs.# &, qrSolutions[[All, 1 ;; Length[funcs]]]],
    Map[Expand[yFactor ((funcs.#) - yShift) + yMedian] &, qrSolutions[[All, 1 ;; Length[funcs]]]]
   ]
  ];


Clear[MinimizeQuantileRegressionFit]
MinimizeQuantileRegressionFit[data_?MatrixQ, funcs_, var_Symbol, qs : {_?NumberQ ..}, opts : OptionsPattern[]] :=
  Block[{minFunc, Tilted, QRModel, b, bvars, qrSolutions},
   
   If[Length[data] > 300,
    Message[QuantileRegressionFit::"mmslow"]
    ];
   
   bvars = Array[b, Length[funcs]];
   
   Tilted[t_?NumberQ, x_] := Piecewise[{{(t - 1) x, x < 0}, {t x, x >= 0}}] /; t <= 1;
   QRModel[x_] := Evaluate[(bvars.funcs) /. var -> x];
   
   qrSolutions =
    Table[
     minFunc = Total[(Tilted[q, #1[[2]] - QRModel[#1[[1]]]] &) /@ data];
     Minimize[{minFunc}, bvars, opts]
     , {q, qs}];
   Map[funcs.# &, qrSolutions[[All, 2, All, 2]]]
   ];

Data generation function definition:
Clear[LogarithmicCurveWithNoise]
LogarithmicCurveWithNoise[nPoints_Integer, start_?NumberQ, end_?NumberQ] :=
  Block[{data},
   data =
    Table[{t, 5 + Log[t] + RandomReal[SkewNormalDistribution[0, Log[t]/5, 12]]}, {t, Rescale[Range[1, nPoints],{1, nPoints}, {start, end}]}];
   data
  ];

Data generation:
data = LogarithmicCurveWithNoise[1200, 10, 200];
ListPlot[data]


Regression quantiles calculation:
qs = {0.05, 0.25, 0.5, 0.75, 0.95};
qrFuncs = QuantileRegressionFit[data, {1, x, Sqrt[x], Log[x]}, x, qs]

"Standard" fit:
fFunc = Fit[data, {1, x, Sqrt[x], Log[x]}, x]

Plot all the curves together:
grData = ListPlot[data, PlotRange -> All];
grQReg = Plot[Evaluate[MapThread[Tooltip[#1, #2] &, {Append[qrFuncs, fFunc], Append[Map["\[CurlyRho](" <> ToString[#] <> ",x)" &, qs], "Least squares"]}]], {x, Min[data[[All, 1]]], Max[data[[All, 1]]]}, PlotStyle -> Append[Table[{AbsoluteThickness[2], Blend[{Yellow, Darker[Red]}, i/Length[qs]]}, {i, Length[qs]}], {AbsoluteThickness[2], Darker[Green]}], PlotLegends -> SwatchLegend[Append[Map["\[CurlyRho](" <> ToString[#] <> ",x)" &, qs], "Least squares"]]];
Show[{grData, grQReg}, ImageSize -> 600]
POSTED BY: Anton Antonov
13 Replies
POSTED BY: Anton Antonov
It was really interesting to program quantile regression using first order B-splines in CPLEX's OPL. The code is really clean and concise (see my blog post http://mathematicaforprediction.wordpress.com/2014/01/10/opl-code-for-quantile-regression-with-b-splines/ .)
The reason my OPL implementation to have only first order splines is because OPL provides only one type of piecewise functions — linear piecewise functions. OPL piecewise functions are somewhat awkward to interpret, but relatively easy to specify, using slopes, x coordinates for slope end points, and one {x,y} point.
POSTED BY: Anton Antonov
Anton, this looks excellent, thank you for sharing.
POSTED BY: Sam Carrettie
Thank you, Sam! 
POSTED BY: Anton Antonov
Below is a demonstration of using quatile regression with B-splines for outlier detection in time series.

First we generate some time series data:
tempData = WeatherData["Atlanta", "Temperature", {{2006, 1, 1}, {2013, 12, 31}, "Day"}];
tempPData = tempData;
tempPData[[All, 1]] = DateDifference[tempData[[1, 1]], #, "Day"] & /@ tempData[[All, 1]];
tempPData[[All, 1]] = tempPData[[All, 1, 1]];
tempPData[[All, 2]] = Sqrt[tempPData[[All, 1]]] + tempPData[[All, 2]];
(I took the temperature in Atlanta, GA, USA for the last 7 years and modified the data with the formula Yi = sqrt(Xi) + Yi .)



Next, we compute the regression quantiles with:
qFuncs = QuantileRegression[tempPData, 44, {0.02, 0.2, 0.4, 0.6, 0.8, 0.98}]

Here is a plot with five regression quantiles calculated using B-splines for the quantiles {0.02, 0.2, 0.4, 0.6, 0.8, 0.98}:


Here is another plot with the 0.02 and 0.98 regression quantiles outlining the data:


Note that in order to get satisfactory results, the only tuning value I had to choose is for the second parameter, which in this case specifies the number of B-spline basis knots. The second parameter can also be a list of knots. The option InterpolationOrder specifies the splines order. (By default InterpolationOrder->3.)
POSTED BY: Anton Antonov
I implemented quantile regression with B-splines and added it to the package at GitHub:
https://github.com/antononcube/MathematicaForPrediction/blob/master/QuantileRegression.m

Because of the spline quantile regression implementation I had to rename the function QuantileRegression originally discussed in this post to QuantileRegressionFit. I have made the corresponding changes of the code in this post.
POSTED BY: Anton Antonov
Thanks, I downloaded the pdf from github.
POSTED BY: Frank Kampas
Frank I added code to the original post.
POSTED BY: Anton Antonov
Here is a link to another blog post with a couple of examples that demonstrate the robustness of quantile regression:
http://mathematicaforprediction.wordpress.com/2013/12/23/quantile-regression-robustness/ .
Three sets of data are considered: skewed noise over a logarithmic curve, and alternations of it with two types of outliers. (The data has simple heteroscedasticity in it.)

These plots demonstrate the robustness of quantile regression:

POSTED BY: Anton Antonov
I just uploaded a guide for using QuantileRegression to GitHub :
https://github.com/antononcube/MathematicaForPrediction/blob/master/Documentation/Quantile%20regression%20through%20linear%20programming.pdf
The guide has some theoretical explanations and shows how the quantile regression problem can be formulated as a linear programming problem.
POSTED BY: Anton Antonov
HI Frank,

I can email you the current version of a Quantile Regression demonstation I am working on. I will post some code on this forum later today.

Also, I am very interested in implementing quantile regression with OPL / CPLEX.
POSTED BY: Anton Antonov
Anton, this looks very nice and I'd like to understand it better.  Could you post some code that illustrates the method for one data set?
POSTED BY: Frank Kampas
Anton, this sounds great, thank you for sharing! If you wish you could make a post with description of or even introduction to the packages or other additional functionality you develop. We place such discussions in the group Mathematica Add-Ons, - please check it out and see what other power-users like yourself like to write. There you can also post all appropriate links to GitHub pages and other related information.
POSTED BY: EDITORIAL BOARD
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard