Message Boards Message Boards

0
|
4542 Views
|
1 Reply
|
2 Total Likes
View groups...
Share
Share this post:

Reduce time of importing WMLF files?

Posted 3 years ago

Hello, could someone help me with this. I have a program that imports 5000 matrix of size 780X280, and it takes a long time. I have used functions in parallel to optimize time, but something strange happens. if the cpu works close to 100% it takes 20 s, but if it works normally it takes 7 minutes and this is a lot, well, you have to repeat the process up to 100 times or more. And I can't get the cpu to work 100% more than once.

I have tried paralletable, and paralleMap, the files have a WMLF extension, which was the one that imported and exported the fastest

the code used to import the files is as follows

 campo[{i_, Shn_}] :=  Import[StringJoin["pmar", ToString[Shn], "-k=", ToString[i], 
".WMLF"] ]; 
reverso[{i_, Shn_}] := Import[StringJoin["prmar", ToString[Shn], "-k=", ToString[i], 
".WMLF"] ]; 

functions used to import, each one depends on two indices

n = 1;

m = 2;

nt = 5000;

value of the parameters for the indices.

then Table is used for index sh and ParalleMap for index i and increase the speed of the process, as follows.

      Print[Now];
 Table[
        LaunchKernels[];
        a = ParallelMap[campo, Table[{i, 1}, {i, 1, nt}]];
        b = ParallelMap[reverso, Table[{i, 1}, {i, 1, nt}]];
        Print[{Now, "import shot n", sh}];
        CloseKernels[];
       , {sh, n, m}]; // AbsoluteTiming

the Print[now] is used to measure the import time of each value of the index sh.

Each matrix pmar or prmar has size 780X280. when I run the program the following happens:

If the use of the cpu is close to 100%

    Mon 28 Jun 2021 09:54:26GMT-5.
   {Mon 28 Jun 2021 09:54:57GMT-5.,import shot n,1}

here it only took 31 s to import the first 5000 files

but then for sh = 2, the cpu usage drops to 8% to 20%, which reduced the import time a lot, as shown in the next block

{Mon 28 Jun 2021 09:54:57GMT-5.,import  shot n,1}
{Mon 28 Jun 2021 10:01:58GMT-5.,import  shot n,2}

so for sh = 2 with 8% to 20% cpu usage, the import time of the 5000 files is 7 minutes and maintains that rhythm for the rest of the values of sh.

The cpu parameters are i7 9700k, 8 physical cores 8 logical cores. The parallelization parameters are 16 kernels.

The question is is there a more optimal way to import the files?

Is there a way for the cpu usage to stay close to 100% so that it imports every 5000 files in less than one minute or maximum one three minutes?

Please I hope someone can help me.

I have attached a pdf file with all the code that I am running, but the delay is always in the import of the files.

Thank you

Attachments:
POSTED BY: Leonardo Sanchez

One problem that you have to watch out for is that it takes time to communicate between the compute kernel and the control kernel. So time you save by getting multiple kernels to import different files may be lost in communicating the results to the control kernel. Worse than that, if they are communicating a lot then the control kernel may be busy talking to one compute when another wants to speak creating a traffic jam.

If everything is on the same file system you might try asking each kernel to import the files and then write them to .mx files with DumpSave and return nothing to thecontrol kernel. Then have the control kernel pick up all the mx files once the processing is done. mx files are the fastest import available.

POSTED BY: Jon McLoone
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract