Message Boards Message Boards

0
|
2401 Views
|
1 Reply
|
0 Total Likes
View groups...
Share
Share this post:

DataDistribution internals?

Posted 3 years ago

I think I understand the internals of an "Empirical" DataDistribution (dd) with one exception: I don't know what dd[[2,3]] (i.e., False) signifies. Can anyone tell me?

Here's the backstory:

I'm modeling conditional categorical distributions as nested Associations. For my purposes this implementation seems to be much faster than using the built-in CategoricalDistribution. The extensive functionality that exists for Associations provides the tools to do a variety of manipulations and do them quickly.

I don't need DataDistributions but I'm toying with the idea of using them as containers. The "weights" are then categorical distributions represented as Associations (or perhaps even DataDistributions themselves). I have no problem setting this up and I can easily convert back and forth between DataDistributions and Associations by dipping into the internals of DataDistribution. But I'm curious as to what I might be getting involved with.

BTW, I know it's dangerous to rely on the internal structure of a built-in object. I've done that in the past and lived to regret it. Many years ago I relied on the internal structure of InterpolatingFunction to implement functionality that was absent at the time. If I remember correctly, the functionality was eventually added, but not before the internals were changed.

POSTED BY: Mark Fisher
Posted 3 years ago

It looks like dd[[2,3]] is where the bandwidth is stored for a "SmoothKernel" DataDistribution. So False makes sense for an "Empirical" DataDistribution, which has no bandwidth.

POSTED BY: Mark Fisher
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract