Hello everyone,
I need help with a speed or memory management problem. I have a small function entirely wrapped in Module. It does the following:
- Imports the names of small text files within a folder (thousands of them)
- Imports the files in batches of 100, extracts some information from them, and then exports the extracted information in a single CSV file
If there are 14,000 files in the source folder, the Do loop in my module will import 140 batches of 100 files each and then export 140 CSV files. There are no global variables set within the module, and there are no variables that are accumulating data; each import overwrites the data of the previous one.
The problem I notice while testing the script is that every time I run it within a given Mathematica session, the time taken to complete a batch of 100 files would grow a lot. It takes 3 to 5 seconds per batch the first time I run it, and that grows to more than 50 seconds by the 7th or 8th time I run it. When I quit Mathematica and restart it, the function is fast again the first time I run it. Also, for a given run, each new batch within the Do loop takes longer than the previous one.
Update (1). Using Block instead of Module speeds things up but not by much.
Update (2). After more experimentation, my best guess is that Mathematica seems to get bogged down when a (custom) function is called repeatedly. Could it be that Mathematica somehow tracks the variables called, and that this requires an ever-increasing amount of memory?
What is going on? Any advice would be most appreciated.
Gregory