Message Boards Message Boards


Avoid oddities with parallel file access?

Posted 9 months ago
3 Replies
2 Total Likes

Consider the following code:

    fn = FileNames["*Electronics*"]
    (* {"Electronics-World-1961-01.pdf", "Radio-Electronics-1961-04.pdf", \
    "Radio-Electronics-1962-11.pdf"} *)
    Map[Identity, FileNames["*Electronics*"]]
    (* {"Electronics-World-1961-01.pdf", "Radio-Electronics-1961-04.pdf", \
    "Radio-Electronics-1962-11.pdf"} *)

So far, so good, no surprises. But:

ParallelMap[Identity, FileNames["*Electronics*"]]
(* {"/Users/jpd/Fun/RadioMags/Electronics-World-1961-01.pdf", \
"/Users/jpd/Fun/RadioMags/Radio-Electronics-1961-04.pdf", \
"/Users/jpd/Fun/RadioMags/Radio-Electronics-1962-11.pdf"} *)

Huh? Here we get absolute pathnames. Must be that FileNames has some weird knowledge, because:

ParallelMap[Identity, fn]
(* {"Electronics-World-1961-01.pdf", "Radio-Electronics-1961-04.pdf", \
"Radio-Electronics-1962-11.pdf"} *)

From this I conclude that parallel kernels do not inherit the parent's working directory, but that there is at least one kudgy, undocumented work-around. Are there other gotcha's here?

3 Replies

I apologize to John Doty for the big mess I made in this thread. I removed my previous posts because they had some wrong information and they went into unnecessary details. I hope a moderator can help me clean them up. (Done - Moderator)

Here's a summary:

I looked at the implementation of ParallelMap. While I did not understand all details, I found that:

  • The behaviour you observe, i.e. that FileNames gives full paths when used as the second argument of ParallelMap, is intentional.

  • FileNames is evaluated on the main kernel, and not on parallel kernels. ParallelMap (and other parallel functions) effectively transform FileNames[something] to FileNames[something, Directory[]].

  • Why is it like this? We can make guesses. Perhaps to avoid surprises when the parallel kernels have a different working directory than the main kernel.

  • ParallelMap is HoldRest precisely to allow special casing like the one done for FileNames. This is not the only special casing that parallel functions have.

Personally I do not like such "smart" implementations because they easily lead to bugs. There is in fact a bug with ParallelMap and FileNames: it doesn't work correctly when using FileNames[] without arguments. It's caused by this special-casing, and forgetting about the zero-argument syntax of FileNames[]. Also, what if FileNames[] gets new syntaxes in the future? Will the FileNames[] developer realize that the parallel tools also need updating, especially if the parallel tools are maintained by a different person?

Here's a demo of the bug:

In[25]:= FileNames[]    
Out[25]= {"a", "b", "c"}

In[26]:= Map[f, FileNames[]]    
Out[26]= {f["a"], f["b"], f["c"]}

In[27]:= ParallelMap[f, FileNames[]]    
Out[27]= {}

In[28]:= ParallelMap[f, FileNames["*"]]    
Out[28]= {f["/Users/szhorvat/test/par/a"], f["/Users/szhorvat/test/par/c"], f["/Users/szhorvat/test/par/b"]}

The problem that this kludge appears to address is that parallel kernels inherit the working directory of the master kernel when launched, but do not track subsequent SetDirectory invocations. Of course, once you've gone parallel, mutating global configuration like this is simply a bad thing to do. Attempting to automatically fix up undisciplined code causes trouble for disciplined code, but doesn't cover all the cases.

Attempting to automatically fix up undisciplined code causes trouble for disciplined code, but doesn't cover all the cases.

I very much agree.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract