Message Boards Message Boards

Fitting a Discrete Probability Distribution to Credit Card Fraud Event Data

Dear all, I have some sample data about the frequency of losses due to credit card fraud and I'd like to fit a discrete probability distribution to it for modeling and simulation purposes. The data is as follows:

data={0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 9, 12, 12, 13, 15, 15}

I fit a Poisson first and then I check how good the fit is:

estimatedpoisson = 
  EstimatedDistribution[data, PoissonDistribution[\[Mu]]];
\[ScriptCapitalP] = 
  DistributionFitTest[data, estimatedpoisson, 
   "HypothesisTestData"];
\[ScriptCapitalP]["TestDataTable", All]

The p-value is very close to 0 indicating a poor fit. I have also tried to fit both a binomial and a negative binomial since the data exhibits over dispersion with the variance being about 4 times bigger than the mean but both fits are still quite poor as well. Any other suggestions? Many thanks in advance, Ruben

Hi there,

I downloaded all discrete distributions and tried to fit them nothing seemed to work well. Now with MMA 10.1 there is the new (experimental) function FindDistribution.

FindDistribution[data]
NegativeBinomialDistribution[2, 0.407108]

which seems to be the best guess. Unfortunately,

DistributionFitTest[data, NegativeBinomialDistribution[2, 0.40710823909531507`], "ShortTestConclusion"]
(*"Reject"*)

it does not survive the hypothesis test. You can, of course, use the empirical distribution, which is closely related to the histogram.

dist = EmpiricalDistribution[data];
PDF[dist, x]
(*19/124 Boole[0 == x] + 4/31 Boole[1 == x] + 51/124 Boole[2 == x] + 
 9/124 Boole[3 == x] + 3/62 Boole[4 == x] + 5/124 Boole[5 == x] + 
 1/31 Boole[6 == x] + 3/62 Boole[7 == x] + 1/62 Boole[8 == x] + 
 1/124 Boole[9 == x] + 1/62 Boole[12 == x] + 1/124 Boole[13 == x] + 
 1/62 Boole[15 == x]*)

If you evaluate that for the respective integers:

Table[PDF[dist, x], {x, 1, 15, 1}]
(*{4/31, 51/124, 9/124, 3/62, 5/124, 1/31, 3/62, 1/62, 1/124, 0, 0, 1/62, 1/124, 0, 1/62}*)
ListLinePlot[%, PlotRange -> All]

enter image description here

This is basically the normalised histogram:

Histogram[data, {0.5, 15.5, 1}]

enter image description here

Cheers,

M.

POSTED BY: Marco Thiel
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract