Message Boards Message Boards


DataDistribution internals?

Posted 2 months ago
1 Reply
0 Total Likes

I think I understand the internals of an "Empirical" DataDistribution (dd) with one exception: I don't know what dd[[2,3]] (i.e., False) signifies. Can anyone tell me?

Here's the backstory:

I'm modeling conditional categorical distributions as nested Associations. For my purposes this implementation seems to be much faster than using the built-in CategoricalDistribution. The extensive functionality that exists for Associations provides the tools to do a variety of manipulations and do them quickly.

I don't need DataDistributions but I'm toying with the idea of using them as containers. The "weights" are then categorical distributions represented as Associations (or perhaps even DataDistributions themselves). I have no problem setting this up and I can easily convert back and forth between DataDistributions and Associations by dipping into the internals of DataDistribution. But I'm curious as to what I might be getting involved with.

BTW, I know it's dangerous to rely on the internal structure of a built-in object. I've done that in the past and lived to regret it. Many years ago I relied on the internal structure of InterpolatingFunction to implement functionality that was absent at the time. If I remember correctly, the functionality was eventually added, but not before the internals were changed.

Posted 1 month ago

It looks like dd[[2,3]] is where the bandwidth is stored for a "SmoothKernel" DataDistribution. So False makes sense for an "Empirical" DataDistribution, which has no bandwidth.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract