Message Boards Message Boards

Avoid oddities with parallel file access?

GROUPS:

Consider the following code:

SetDirectory["~/Fun/RadioMags"];
    fn = FileNames["*Electronics*"]
    (* {"Electronics-World-1961-01.pdf", "Radio-Electronics-1961-04.pdf", \
    "Radio-Electronics-1962-11.pdf"} *)
    Map[Identity, FileNames["*Electronics*"]]
    (* {"Electronics-World-1961-01.pdf", "Radio-Electronics-1961-04.pdf", \
    "Radio-Electronics-1962-11.pdf"} *)

So far, so good, no surprises. But:

ParallelMap[Identity, FileNames["*Electronics*"]]
(* {"/Users/jpd/Fun/RadioMags/Electronics-World-1961-01.pdf", \
"/Users/jpd/Fun/RadioMags/Radio-Electronics-1961-04.pdf", \
"/Users/jpd/Fun/RadioMags/Radio-Electronics-1962-11.pdf"} *)

Huh? Here we get absolute pathnames. Must be that FileNames has some weird knowledge, because:

ParallelMap[Identity, fn]
(* {"Electronics-World-1961-01.pdf", "Radio-Electronics-1961-04.pdf", \
"Radio-Electronics-1962-11.pdf"} *)

From this I conclude that parallel kernels do not inherit the parent's working directory, but that there is at least one kudgy, undocumented work-around. Are there other gotcha's here?

POSTED BY: John Doty
Answer
24 days ago

I apologize to John Doty for the big mess I made in this thread. I removed my previous posts because they had some wrong information and they went into unnecessary details. I hope a moderator can help me clean them up. (Done - Moderator)

Here's a summary:

I looked at the implementation of ParallelMap. While I did not understand all details, I found that:

  • The behaviour you observe, i.e. that FileNames gives full paths when used as the second argument of ParallelMap, is intentional.

  • FileNames is evaluated on the main kernel, and not on parallel kernels. ParallelMap (and other parallel functions) effectively transform FileNames[something] to FileNames[something, Directory[]].

  • Why is it like this? We can make guesses. Perhaps to avoid surprises when the parallel kernels have a different working directory than the main kernel.

  • ParallelMap is HoldRest precisely to allow special casing like the one done for FileNames. This is not the only special casing that parallel functions have.

Personally I do not like such "smart" implementations because they easily lead to bugs. There is in fact a bug with ParallelMap and FileNames: it doesn't work correctly when using FileNames[] without arguments. It's caused by this special-casing, and forgetting about the zero-argument syntax of FileNames[]. Also, what if FileNames[] gets new syntaxes in the future? Will the FileNames[] developer realize that the parallel tools also need updating, especially if the parallel tools are maintained by a different person?

Here's a demo of the bug:

In[25]:= FileNames[]    
Out[25]= {"a", "b", "c"}

In[26]:= Map[f, FileNames[]]    
Out[26]= {f["a"], f["b"], f["c"]}

In[27]:= ParallelMap[f, FileNames[]]    
Out[27]= {}

In[28]:= ParallelMap[f, FileNames["*"]]    
Out[28]= {f["/Users/szhorvat/test/par/a"], f["/Users/szhorvat/test/par/c"], f["/Users/szhorvat/test/par/b"]}
POSTED BY: Szabolcs Horvát
Answer
24 days ago

The problem that this kludge appears to address is that parallel kernels inherit the working directory of the master kernel when launched, but do not track subsequent SetDirectory invocations. Of course, once you've gone parallel, mutating global configuration like this is simply a bad thing to do. Attempting to automatically fix up undisciplined code causes trouble for disciplined code, but doesn't cover all the cases.

POSTED BY: John Doty
Answer
23 days ago

Attempting to automatically fix up undisciplined code causes trouble for disciplined code, but doesn't cover all the cases.

I very much agree.

POSTED BY: Szabolcs Horvát
Answer
23 days ago

Group Abstract Group Abstract