Message Boards Message Boards

Statistical Distributions of Areas of Voronoi Cells

Open code in Cloud | Download code to Desktop via Attachments Below

One of the brightest characteristics of the Wolfram Language (WL) is that it works easily across many different scientific fields (thanks to numerous built-in functions and data) and that due to the coherence of WL design, combining such different sciences via computation comes as natural as speaking to a human. Here WL will enable us to move quickly from computational geometry to machine learning to statistics on a quest of exploring an interesting subject. This seemingly simple question is not easy to answer without right framework and tools:

What is statistical distributions of areas of cells in Voronoi tessellation?

From a brief web search it appears to follow closely GammaDistribution. But in the absence of analytical proof, how can one quickly deduce this fact or verify it is good model? We will do exactly that below. For reference see also a relevant paper by TANEMURA and a real-interest question by actual user which prompted this exploration.

To start let's build a "large" VoronoiMesh on 5000 uniformly distributed points within a unit Disk:

pts = RandomPoint[Disk[], 5000];
mesh = VoronoiMesh[pts, Axes -> True]

We see the statistics we can gather from this will be obviously offset by the presence of "border" cells of much larger area than regular inner cells have:

enter image description here

Let's exclude large cells by selecting only those within original region of random points distribution - unit Disk:

vor=MeshPrimitives[mesh,2];
vor//Length

5000

vorInner=Select[vor,RegionWithin[Disk[],#]&];
vorInner//Length

4782

We got fewer elements of course and they all are nice regular cells without any large outliers:

Graphics[{FaceForm[None], EdgeForm[Gray], vorInner}]

enter image description here

Let's take a look at the statistics of areas' distribution:

areas = Area /@ vorInner;
hist = Histogram[areas, Automatic, "PDF", PlotTheme -> "Detailed"]

enter image description here

We can see that there is a minimum of distribution at zero area, which is intuitively correct. Close to zero-area cells of course are not present much with the finite number of points per region (or finite density). So there is some prevalent finite mean area there. We can find a very close simple analytic distribution using WL machine learning tools, which turns out to be GammaDistribution as discussed in some publications:

dis=FindDistribution[areas]

GammaDistribution[3.3458431368450587,0.00018456010158014188]

FindDistribution acted automatically without any additional options from our side. GammaDistribution matches very nicely the empirical histogram:

Show[hist, Plot[PDF[dis, x], {x, 0, .0015}]]

enter image description here

And now it is easy to find thing like:

{Mean[dis], Variance[dis], Kurtosis[dis]}

{0.0006175091492073445, 1.1396755130437449*^-7, 4.793269963533813`}

Probability[x > .0005, x \[Distributed] dis]

0.5740480899719699`

Attachments:
POSTED BY: Vitaliy Kaurov
10 Replies

You may be interested in this paper, which takes a semi-empirical approach to come up with a closed-form approximate formula:

At the time when it was written, Mathematica did not have VoronoiMesh, but one could "trick" it to quickly compute a Voronoi tessellation by using ListDensityPlot with InterpolationOrder -> 0 and extracting the polygons.

POSTED BY: Szabolcs Horvát
Posted 5 years ago

Very Nice. I am aware that the distribution of Lightning strike intensity from a Thunderstorm also closely follows a Gamma Distribution. Could this be due to cells of charge inside the cloud which organise like a Voronoi topology? Does a Gamma distribution also arise from a 3 D distribution of volumes of a Vornoi tesellation?

POSTED BY: BJ Miller
Posted 5 years ago

Thanks for sharing everybody! Maybe this can add some more information:

In the article "Statistical Distributions of Poisson Voronoi Cells in Two and Three Dimensions" by Masaharu Tanemura, mentioned by Marco Thiel, the distribution of the perimeter of the cells (and the number of edges etc.) is also analyzed. Here I adapted the code of Vitaly to perimeters and we see (as in the article) that now we have a normal distribution:

perimeters = Perimeter /@ vorInner;
hist = Histogram[perimeters, Automatic, "PDF", 
   PlotTheme -> "Detailed"];
dist = FindDistribution[perimeters]
Show[hist, Plot[PDF[dist, x], {x, 0, .1}]]

enter image description here

And for the number of edges, we find a BinomialDistribution with FindDistribution although the result look almost "normal" as we use EstimatedDitribution

numberOfEdges[poly_] := Length[First@poly]
edges = numberOfEdges /@ vorInner;
histE = Histogram[edges, Automatic, "PDF", PlotTheme -> "Detailed"];
distE = EstimatedDistribution[edges, 
  NormalDistribution[\[Mu], \[Sigma]]]
Show[histE, Plot[PDF[distE, x], {x, 0, 12}]]
distEB = FindDistribution[edges]
Plot[CDF[distEB, x], {x, 0, 12}, Filling -> Bottom]

enter image description here

POSTED BY: Erik Mahieu

Great, @Erik, thanks for exploring further and sharing!

POSTED BY: Vitaliy Kaurov

Dear All,

that is indeed interesting. I also believe that the exact distribution is unknown but it can be approximated by a GammaDistribution, see:

https://arxiv.org/pdf/1207.0608.pdf

https://arxiv.org/pdf/1612.02375.pdf

https://www.sciencedirect.com/science/article/pii/002608008990030X

http://www.scipress.org/journals/forma/pdf/1804/18040221.pdf

http://www.eurecom.fr/~arvanita/PVT.pdf

Interestingly, if you run the code in @Vitaliy Kaurov's cod for 50000 points you obtain:

dis = FindDistribution[areas]
(*GammaDistribution[3.2718, 0.0000197063]*)

I managed to get it done for 500000 points. I think that the edge effects should be getting smaller; as the area increases faster than the circumference.

So I obtain:

dis = FindDistribution[areas, 5]

(*
{MixtureDistribution[{0.658342, 0.341658}, {MaxwellDistribution[2.92588*10^-6], GammaDistribution[7.1867, 1.37091*10^-6]}], 

ExtremeValueDistribution[4.80446*10^-6, 2.73831*10^-6], 

GammaDistribution[2.94963, 2.17794*10^-6], 

BetaDistribution[2.94962, 459143.], 

MixtureDistribution[{0.687855, 0.312145}, {MaxwellDistribution[2.82857*10^-6], 
   LogNormalDistribution[-11.5031, 0.28619]}]}*)

This happens when we plot all of them:

Plot[Evaluate[PDF[#, x] & /@ dis], {x, 0, 0.00003}, PlotRange -> All, LabelStyle -> Directive[Bold, 16]]

enter image description here

If you compare that with the histogram:

Show[Histogram[areas, 200, "PDF"], Plot[Evaluate[PDF[#, x] & /@ dis], {x, 0, 0.00003}, PlotRange -> All,LabelStyle -> Directive[Bold, 16]]]

enter image description here

I would say that the second (ExtremeValueDistribution) fits best:

enter image description here

but it does not get the small areas quite right.

Regarding the boundary effects, one might also (similar to what Sander suggests) use the torus as the geometry:

disfun[x_, y_] := Sqrt[Min[Abs[x[[1]] - y[[1]]], 1 - Abs[x[[1]] - y[[1]]]]^2 + Min[Abs[x[[2]] - y[[2]]], 1 - Abs[x[[2]] - y[[2]]]]^2]

M = 50; nf = Nearest[Rule @@@ Transpose[{RandomReal[1, {M, 2}], Range[M]}], DistanceFunction -> disfun]

DensityPlot[First[nf[{x, y}]], {x, 0, 1}, {y, 0, 1}, PlotPoints -> 100, ColorFunction -> "TemperatureMap"]

enter image description here

Cheers,

Marco

POSTED BY: Marco Thiel

@Marco Thiel thank you for the references and insight! I believe FindDistribution, while finding correct distributions, needs some tune up to get the distribution parameters right. To test the waters, I tried using EstimatedDistribution and got a bit better results. Let's run this for 500,000 points:

pts=RandomPoint[Disk[],500000];
mesh=VoronoiMesh[pts,PerformanceGoal->"Speed"];
vor=MeshPrimitives[mesh,2]
vorInner=Select[vor,RegionWithin[Disk[],#]&];
areas=Area/@vorInner;
hist=Histogram[areas,200,"PDF",PlotTheme->"Detailed"];

and then find the parameters assuming distributions are known:

disGAM = EstimatedDistribution[areas, GammaDistribution[a, b]]

GammaDistribution[3.526166220777205, 1.7809806247800413*^-6]

disEXT = EstimatedDistribution[areas, ExtremeValueDistribution[a, b]]

ExtremeValueDistribution[4.775094275466527*^-6, 2.5746412541824736*^-6]

We see gamma (red) behaves a bit better now, both for zero and for the tail values:

Show[hist,Plot[{PDF[disEXT,x],PDF[disGAM,x]},{x,0,.00002},PlotStyle->{Blue,Red}]]

enter image description here

But while p-value of gamma is greater than extreme, it is still pretty low, not sure if there is a catch here somewhere:

DistributionFitTest[areas,disGAM]

0.000051399258187534436`

DistributionFitTest[areas,disEXT]

0.`

POSTED BY: Vitaliy Kaurov

I don't find it all that surprising that the result is a Gamma distribution. The 1-d case is well known as it is the distribution of interarrival times of a Poisson process and is exponentially distributed. The exponential distribution is a special case of the Gamma so maybe you're onto something.

POSTED BY: Carlo Barbieri

Very nice. Thanks for sharing. Some years ago I did this with 10^9 cells. If I recall correctly the left side is a bit higher (more probably) than a gamma distribution.

To remove any edge-effects, what people do is: take random points in a square, and then copy the square 9 times in a 3*3 grid, such as to emulate period boundary conditions. Then take only the polygons which center is in the center unit square…

Great stuff!

–SH

POSTED BY: Sander Huisman

Thanks! Awesome idea about edge-effects! Did you estimate any analytical models beyond Gamma or it was just pure empirical thing?

POSTED BY: Vitaliy Kaurov

I looked at different things: how many sides, angles, area, perimeter, and many conditionally averaged quantities. But area of the PDF as well. But no analytical shape is known for it. I believe that there is something known for perimeter.

POSTED BY: Sander Huisman
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract