Hi Mike,
Thanks for sharing.
Generating test data like this is a great idea not just for TDD but also for using a small data sample to generate a larger sample. For features in the data are are not correlated I use FindDistribution
or LearnDistribution
to generate a distribution from the data sample and then use RandomVariate
to generate additional data according to the distribution.
Have used this a few times where a client provided a small data sample and I needed a much larger sample to see how the solution would scale (SQL query performance, ML algorithms, ...).
BTW. There is a mismatch between the function name randomDataset
used in example usage and the function name dsGenerateRandomDataset
.