# Statistical Distributions of Areas of Voronoi Cells

Posted 11 months ago
2622 Views
|
10 Replies
|
27 Total Likes
|
 Open code in Cloud | Download code to Desktop via Attachments BelowOne of the brightest characteristics of the Wolfram Language (WL) is that it works easily across many different scientific fields (thanks to numerous built-in functions and data) and that due to the coherence of WL design, combining such different sciences via computation comes as natural as speaking to a human. Here WL will enable us to move quickly from computational geometry to machine learning to statistics on a quest of exploring an interesting subject. This seemingly simple question is not easy to answer without right framework and tools: What is statistical distributions of areas of cells in Voronoi tessellation? From a brief web search it appears to follow closely GammaDistribution. But in the absence of analytical proof, how can one quickly deduce this fact or verify it is good model? We will do exactly that below. For reference see also a relevant paper by TANEMURA and a real-interest question by actual user which prompted this exploration.To start let's build a "large" VoronoiMesh on 5000 uniformly distributed points within a unit Disk: pts = RandomPoint[Disk[], 5000]; mesh = VoronoiMesh[pts, Axes -> True] We see the statistics we can gather from this will be obviously offset by the presence of "border" cells of much larger area than regular inner cells have: Let's exclude large cells by selecting only those within original region of random points distribution - unit Disk: vor=MeshPrimitives[mesh,2]; vor//Length  5000 vorInner=Select[vor,RegionWithin[Disk[],#]&]; vorInner//Length  4782 We got fewer elements of course and they all are nice regular cells without any large outliers: Graphics[{FaceForm[None], EdgeForm[Gray], vorInner}] Let's take a look at the statistics of areas' distribution: areas = Area /@ vorInner; hist = Histogram[areas, Automatic, "PDF", PlotTheme -> "Detailed"] We can see that there is a minimum of distribution at zero area, which is intuitively correct. Close to zero-area cells of course are not present much with the finite number of points per region (or finite density). So there is some prevalent finite mean area there. We can find a very close simple analytic distribution using WL machine learning tools, which turns out to be GammaDistribution as discussed in some publications: dis=FindDistribution[areas]  GammaDistribution[3.3458431368450587,0.00018456010158014188] FindDistribution acted automatically without any additional options from our side. GammaDistribution matches very nicely the empirical histogram: Show[hist, Plot[PDF[dis, x], {x, 0, .0015}]] And now it is easy to find thing like: {Mean[dis], Variance[dis], Kurtosis[dis]}  {0.0006175091492073445, 1.1396755130437449*^-7, 4.793269963533813} Probability[x > .0005, x \[Distributed] dis]  0.5740480899719699 Attachments:
10 Replies
Sort By:
Posted 11 months ago
 Very nice. Thanks for sharing. Some years ago I did this with 10^9 cells. If I recall correctly the left side is a bit higher (more probably) than a gamma distribution.To remove any edge-effects, what people do is: take random points in a square, and then copy the square 9 times in a 3*3 grid, such as to emulate period boundary conditions. Then take only the polygons which center is in the center unit square…Great stuff!–SH
Posted 11 months ago
 Thanks! Awesome idea about edge-effects! Did you estimate any analytical models beyond Gamma or it was just pure empirical thing?
Posted 11 months ago
 I looked at different things: how many sides, angles, area, perimeter, and many conditionally averaged quantities. But area of the PDF as well. But no analytical shape is known for it. I believe that there is something known for perimeter.
Posted 11 months ago
 I don't find it all that surprising that the result is a Gamma distribution. The 1-d case is well known as it is the distribution of interarrival times of a Poisson process and is exponentially distributed. The exponential distribution is a special case of the Gamma so maybe you're onto something.
Posted 11 months ago
 Dear All,that is indeed interesting. I also believe that the exact distribution is unknown but it can be approximated by a GammaDistribution, see:https://arxiv.org/pdf/1207.0608.pdfhttps://arxiv.org/pdf/1612.02375.pdfhttps://www.sciencedirect.com/science/article/pii/002608008990030Xhttp://www.scipress.org/journals/forma/pdf/1804/18040221.pdfhttp://www.eurecom.fr/~arvanita/PVT.pdfInterestingly, if you run the code in @Vitaliy Kaurov's cod for 50000 points you obtain: dis = FindDistribution[areas] (*GammaDistribution[3.2718, 0.0000197063]*) I managed to get it done for 500000 points. I think that the edge effects should be getting smaller; as the area increases faster than the circumference. So I obtain: dis = FindDistribution[areas, 5] (* {MixtureDistribution[{0.658342, 0.341658}, {MaxwellDistribution[2.92588*10^-6], GammaDistribution[7.1867, 1.37091*10^-6]}], ExtremeValueDistribution[4.80446*10^-6, 2.73831*10^-6], GammaDistribution[2.94963, 2.17794*10^-6], BetaDistribution[2.94962, 459143.], MixtureDistribution[{0.687855, 0.312145}, {MaxwellDistribution[2.82857*10^-6], LogNormalDistribution[-11.5031, 0.28619]}]}*) This happens when we plot all of them: Plot[Evaluate[PDF[#, x] & /@ dis], {x, 0, 0.00003}, PlotRange -> All, LabelStyle -> Directive[Bold, 16]] If you compare that with the histogram: Show[Histogram[areas, 200, "PDF"], Plot[Evaluate[PDF[#, x] & /@ dis], {x, 0, 0.00003}, PlotRange -> All,LabelStyle -> Directive[Bold, 16]]] I would say that the second (ExtremeValueDistribution) fits best:but it does not get the small areas quite right.Regarding the boundary effects, one might also (similar to what Sander suggests) use the torus as the geometry: disfun[x_, y_] := Sqrt[Min[Abs[x[[1]] - y[[1]]], 1 - Abs[x[[1]] - y[[1]]]]^2 + Min[Abs[x[[2]] - y[[2]]], 1 - Abs[x[[2]] - y[[2]]]]^2] M = 50; nf = Nearest[Rule @@@ Transpose[{RandomReal[1, {M, 2}], Range[M]}], DistanceFunction -> disfun] DensityPlot[First[nf[{x, y}]], {x, 0, 1}, {y, 0, 1}, PlotPoints -> 100, ColorFunction -> "TemperatureMap"] Cheers,Marco
Posted 11 months ago
 @Marco Thiel thank you for the references and insight! I believe FindDistribution, while finding correct distributions, needs some tune up to get the distribution parameters right. To test the waters, I tried using EstimatedDistribution and got a bit better results. Let's run this for 500,000 points: pts=RandomPoint[Disk[],500000]; mesh=VoronoiMesh[pts,PerformanceGoal->"Speed"]; vor=MeshPrimitives[mesh,2] vorInner=Select[vor,RegionWithin[Disk[],#]&]; areas=Area/@vorInner; hist=Histogram[areas,200,"PDF",PlotTheme->"Detailed"]; and then find the parameters assuming distributions are known: disGAM = EstimatedDistribution[areas, GammaDistribution[a, b]]  GammaDistribution[3.526166220777205, 1.7809806247800413*^-6] disEXT = EstimatedDistribution[areas, ExtremeValueDistribution[a, b]]  ExtremeValueDistribution[4.775094275466527*^-6, 2.5746412541824736*^-6] We see gamma (red) behaves a bit better now, both for zero and for the tail values: Show[hist,Plot[{PDF[disEXT,x],PDF[disGAM,x]},{x,0,.00002},PlotStyle->{Blue,Red}]] But while p-value of gamma is greater than extreme, it is still pretty low, not sure if there is a catch here somewhere: DistributionFitTest[areas,disGAM]  0.000051399258187534436 DistributionFitTest[areas,disEXT]  0.
Posted 11 months ago
 Thanks for sharing everybody! Maybe this can add some more information:In the article "Statistical Distributions of Poisson Voronoi Cells in Two and Three Dimensions" by Masaharu Tanemura, mentioned by Marco Thiel, the distribution of the perimeter of the cells (and the number of edges etc.) is also analyzed. Here I adapted the code of Vitaly to perimeters and we see (as in the article) that now we have a normal distribution: perimeters = Perimeter /@ vorInner; hist = Histogram[perimeters, Automatic, "PDF", PlotTheme -> "Detailed"]; dist = FindDistribution[perimeters] Show[hist, Plot[PDF[dist, x], {x, 0, .1}]] And for the number of edges, we find a BinomialDistribution with FindDistribution although the result look almost "normal" as we use EstimatedDitribution numberOfEdges[poly_] := Length[First@poly] edges = numberOfEdges /@ vorInner; histE = Histogram[edges, Automatic, "PDF", PlotTheme -> "Detailed"]; distE = EstimatedDistribution[edges, NormalDistribution[\[Mu], \[Sigma]]] Show[histE, Plot[PDF[distE, x], {x, 0, 12}]] distEB = FindDistribution[edges] Plot[CDF[distEB, x], {x, 0, 12}, Filling -> Bottom] 
Posted 11 months ago
 Great, @Erik, thanks for exploring further and sharing!
 You may be interested in this paper, which takes a semi-empirical approach to come up with a closed-form approximate formula: Jarai, Neda: On the size-distribution of Poisson Voronoi cells https://arxiv.org/abs/cond-mat/0406116 At the time when it was written, Mathematica did not have VoronoiMesh, but one could "trick" it to quickly compute a Voronoi tessellation by using ListDensityPlot with InterpolationOrder -> 0 and extracting the polygons.