Group Abstract

Message Boards

WOLFRAM COMMUNITY

11.5K Views

1 Reply

1 Total Like

View groups...

Follow this post

Share this post:

GROUPS:

Business Analytics Data Science Finance Mathematica Statistics and Probability

Fitting a Discrete Probability Distribution to Credit Card Fraud Event Data

Ruben Garcia Berasategui

Ruben Garcia Berasategui, Jakarta International College

Posted 11 years ago

Dear all, I have some sample data about the frequency of losses due to credit card fraud and I'd like to fit a discrete probability distribution to it for modeling and simulation purposes. The data is as follows: data={0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 9, 12, 12, 13, 15, 15} I fit a Poisson first and then I check how good the fit is: estimatedpoisson = EstimatedDistribution[data, PoissonDistribution[\[Mu]]]; \[ScriptCapitalP] = DistributionFitTest[data, estimatedpoisson, "HypothesisTestData"]; \[ScriptCapitalP]["TestDataTable", All] The p-value is very close to 0 indicating a poor fit. I have also tried to fit both a binomial and a negative binomial since the data exhibits over dispersion with the variance being about 4 times bigger than the mean but both fits are still quite poor as well. Any other suggestions? Many thanks in advance, Ruben

POSTED BY: Ruben Garcia Berasategui

1 Reply

Sort By:

Marco Thiel

Marco Thiel, University of Aberdeen - Dept. of Physics/Mathematics

Posted 11 years ago

Hi there, I downloaded all discrete distributions and tried to fit them nothing seemed to work well. Now with MMA 10.1 there is the new (experimental) function FindDistribution. FindDistribution[data] NegativeBinomialDistribution[2, 0.407108] which seems to be the best guess. Unfortunately, DistributionFitTest[data, NegativeBinomialDistribution[2, 0.40710823909531507`], "ShortTestConclusion"] ("Reject") it does not survive the hypothesis test. You can, of course, use the empirical distribution, which is closely related to the histogram. dist = EmpiricalDistribution[data]; PDF[dist, x] (19/124 Boole[0 == x] + 4/31 Boole[1 == x] + 51/124 Boole[2 == x] + 9/124 Boole[3 == x] + 3/62 Boole[4 == x] + 5/124 Boole[5 == x] + 1/31 Boole[6 == x] + 3/62 Boole[7 == x] + 1/62 Boole[8 == x] + 1/124 Boole[9 == x] + 1/62 Boole[12 == x] + 1/124 Boole[13 == x] + 1/62 Boole[15 == x]) If you evaluate that for the respective integers: Table[PDF[dist, x], {x, 1, 15, 1}] ({4/31, 51/124, 9/124, 3/62, 5/124, 1/31, 3/62, 1/62, 1/124, 0, 0, 1/62, 1/124, 0, 1/62}) ListLinePlot[%, PlotRange -> All] This is basically the normalised histogram: Histogram[data, {0.5, 15.5, 1}] Cheers, M.

Hi there,

I downloaded all discrete distributions and tried to fit them nothing seemed to work well. Now with MMA 10.1 there is the new (experimental) function FindDistribution.

FindDistribution[data]
NegativeBinomialDistribution[2, 0.407108]

which seems to be the best guess. Unfortunately,

DistributionFitTest[data, NegativeBinomialDistribution[2, 0.40710823909531507`], "ShortTestConclusion"]
(*"Reject"*)

it does not survive the hypothesis test. You can, of course, use the empirical distribution, which is closely related to the histogram.

dist = EmpiricalDistribution[data];
PDF[dist, x]
(*19/124 Boole[0 == x] + 4/31 Boole[1 == x] + 51/124 Boole[2 == x] + 
 9/124 Boole[3 == x] + 3/62 Boole[4 == x] + 5/124 Boole[5 == x] + 
 1/31 Boole[6 == x] + 3/62 Boole[7 == x] + 1/62 Boole[8 == x] + 
 1/124 Boole[9 == x] + 1/62 Boole[12 == x] + 1/124 Boole[13 == x] + 
 1/62 Boole[15 == x]*)

If you evaluate that for the respective integers:

Table[PDF[dist, x], {x, 1, 15, 1}]
(*{4/31, 51/124, 9/124, 3/62, 5/124, 1/31, 3/62, 1/62, 1/124, 0, 0, 1/62, 1/124, 0, 1/62}*)
ListLinePlot[%, PlotRange -> All]

enter image description here

This is basically the normalised histogram:

Histogram[data, {0.5, 15.5, 1}]

enter image description here

Cheers,

POSTED BY: Marco Thiel

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback