Group Abstract Group Abstract

Message Boards Message Boards

4
|
7K Views
|
5 Replies
|
10 Total Likes
View groups...
Share
Share this post:

Generating Random Datasets

Posted 5 years ago

POSTED BY: Mike Besso
5 Replies
POSTED BY: Anton Antonov
Posted 5 years ago

Hi Mike,

Thanks for sharing.

Generating test data like this is a great idea not just for TDD but also for using a small data sample to generate a larger sample. For features in the data are are not correlated I use FindDistribution or LearnDistribution to generate a distribution from the data sample and then use RandomVariate to generate additional data according to the distribution.

Have used this a few times where a client provided a small data sample and I needed a much larger sample to see how the solution would scale (SQL query performance, ML algorithms, ...).

BTW. There is a mismatch between the function name randomDataset used in example usage and the function name dsGenerateRandomDataset.

POSTED BY: Rohit Namjoshi

A couple of updates:

POSTED BY: Anton Antonov
Posted 5 years ago
POSTED BY: Mike Besso
Posted 5 years ago

@Rohit:

Thank you for finding my typo and the additional use case suggestions.

I have updated the notebook to include the use of specific distributions.

Per your feedback, I will add the load and performance testing use cases in the next version.

THANKS

POSTED BY: Mike Besso
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard