# Is there a more efficient way to import a large number of files?

GROUPS:
 I'm working on an image classification machine learning project and have a somewhat large (~800 MiB) training data set. The data set is composed of individual PNG image files organized in directories. Upon trying to import them all loadedData = ParallelMap[Import, dataFiles] It takes an extremely long time (10+ minutes), and ends up using 21 GiB of memory. Obviously, something seems to be wrong. Is there a more efficient or more correct way to be loading all these images?
9 days ago
7 Replies
 Piotr Wendykier 1 Vote Try: loadedData = ParallelMap[Import[#, IncludeMetaInformation -> None] &, dataFiles] 
8 days ago
 This appears not to have made any (or much of a) difference. I'm not on a machine capable of loading 21 GiB of data at the moment, but it's still using at least 16 GiB before failing. I'm bewildered as to where all of this apparent data is coming from.
8 days ago
 Shadi Ashnai 1 Vote Also, if you want to train a neural net with those images you can perform an out-of-core training. See this example: http://www.wolfram.com/language/11/neural-networks/out-of-core-image-classification.html?product=language
8 days ago
 I'm aware of out-of-core training, but I had expected that it wouldn't be necessary for the size of my data.