Group Abstract Group Abstract

Message Boards Message Boards

0
|
7.1K Views
|
5 Replies
|
4 Total Likes
View groups...
Share
Share this post:

Give Classify a list of files on disk instead importing all at once?

Posted 4 years ago
POSTED BY: Sepehr Elahi
5 Replies

I did a small project to classify certain tropical fishes. The import of images worked well, but again, I faced a similar issue in handling large image archives.

(keeping three different classes of the image in separate folders)

SetDirectory[NotebookDirectory[]];
C01 = FileNames[All, "Fishes\\Guppy"];
C02 = FileNames[All, "Fishes\\Sword"];
C03 = FileNames[All, "Fishes\\Zebra"];

(define the association between the file path and the class)

Data01 = C01[[#]] -> "Guppy" & /@ Range[Length@C01];
Data02 = C02[[#]] -> "Sword" & /@ Range[Length@C02];
Data03 = C03[[#]] -> "Zebra" & /@ Range[Length@C03];

(union all defined associations)

Data = Union[Data01, Data02, Data03];

(Perform imports)

myData = Import@Data[[#]][[1]] -> Data[[#]][[2]] & /@Range[Length@Data];

(Train)

myClassify =  Classify[myData, TargetDevice -> "CPU"]
POSTED BY: Teck Boon Lim
Posted 4 years ago
POSTED BY: Sepehr Elahi
Posted 4 years ago

For 8 bit color depth images isn't it 3.6 GB?

200 * 200 * 3 * 30000
(* 3600000000 *)

You did not answer the question about the number of classes and whether they are approximately equally represented in the dataset. If there is a large class imbalance, then random sample from each class a number of images equal to the number of images in the lowest population class. Have you tried that?

POSTED BY: Rohit Namjoshi
Posted 4 years ago
POSTED BY: Sepehr Elahi
Posted 4 years ago

Hi Sepehr,

Using Classify with 30K images should not require out of core support.

What are the dimensions of the images? There is usually no need to train on large images.

How many different classes are in the dataset? If the distribution of images among classes fairly flat, maybe you do not need 30K images to train. If there is significant class imbalance you will probably get better results by randomly sampling a set of images with minimal class imbalance.

POSTED BY: Rohit Namjoshi
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard