Group Abstract Group Abstract

Message Boards Message Boards

0
|
7.2K Views
|
5 Replies
|
4 Total Likes
View groups...
Share
Share this post:

Give Classify a list of files on disk instead importing all at once?

Posted 4 years ago

I am trying to use Classify to train an image classifier on a dataset of 30k+ images. I followed this tutorial and created a list of the form {File[…]->class,…}. I can use NetTrain with this list, but when I try to use Classify it seems that Classify does not understand to import the images files and instead directly treats the File references as the inputs, evident by the input type reading "Nominal". Thus, it achieves an expectedly terrible result.

How can I pass a list of File references to Classify? Or, alternatively, how can I write some sort of data loader that only imports the images that are currently needed and pass that to Classify?

Thanks in advance.

POSTED BY: Sepehr Elahi
5 Replies
POSTED BY: Teck Boon Lim
Posted 4 years ago
POSTED BY: Sepehr Elahi
Posted 4 years ago

For 8 bit color depth images isn't it 3.6 GB?

200 * 200 * 3 * 30000
(* 3600000000 *)

You did not answer the question about the number of classes and whether they are approximately equally represented in the dataset. If there is a large class imbalance, then random sample from each class a number of images equal to the number of images in the lowest population class. Have you tried that?

POSTED BY: Rohit Namjoshi
Posted 4 years ago

Thanks for your reply!

Even if the resolution of each image is 200x200 (which is a conservative estimate), then the total size of the imported images will be 30000x200x200x8=21.6 GB! I tried training on a smaller subset of the dataset, but the results were not that good and so I want to train on the entire dataset.

POSTED BY: Sepehr Elahi
Posted 4 years ago
POSTED BY: Rohit Namjoshi
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard