Message Boards Message Boards

0
|
4458 Views
|
3 Replies
|
2 Total Likes
View groups...
Share
Share this post:

Avoid oddities with parallel file access?

Posted 6 years ago

Consider the following code:

SetDirectory["~/Fun/RadioMags"];
    fn = FileNames["*Electronics*"]
    (* {"Electronics-World-1961-01.pdf", "Radio-Electronics-1961-04.pdf", \
    "Radio-Electronics-1962-11.pdf"} *)
    Map[Identity, FileNames["*Electronics*"]]
    (* {"Electronics-World-1961-01.pdf", "Radio-Electronics-1961-04.pdf", \
    "Radio-Electronics-1962-11.pdf"} *)

So far, so good, no surprises. But:

ParallelMap[Identity, FileNames["*Electronics*"]]
(* {"/Users/jpd/Fun/RadioMags/Electronics-World-1961-01.pdf", \
"/Users/jpd/Fun/RadioMags/Radio-Electronics-1961-04.pdf", \
"/Users/jpd/Fun/RadioMags/Radio-Electronics-1962-11.pdf"} *)

Huh? Here we get absolute pathnames. Must be that FileNames has some weird knowledge, because:

ParallelMap[Identity, fn]
(* {"Electronics-World-1961-01.pdf", "Radio-Electronics-1961-04.pdf", \
"Radio-Electronics-1962-11.pdf"} *)

From this I conclude that parallel kernels do not inherit the parent's working directory, but that there is at least one kudgy, undocumented work-around. Are there other gotcha's here?

POSTED BY: John Doty
3 Replies

I apologize to John Doty for the big mess I made in this thread. I removed my previous posts because they had some wrong information and they went into unnecessary details. I hope a moderator can help me clean them up. (Done - Moderator)

Here's a summary:

I looked at the implementation of ParallelMap. While I did not understand all details, I found that:

  • The behaviour you observe, i.e. that FileNames gives full paths when used as the second argument of ParallelMap, is intentional.

  • FileNames is evaluated on the main kernel, and not on parallel kernels. ParallelMap (and other parallel functions) effectively transform FileNames[something] to FileNames[something, Directory[]].

  • Why is it like this? We can make guesses. Perhaps to avoid surprises when the parallel kernels have a different working directory than the main kernel.

  • ParallelMap is HoldRest precisely to allow special casing like the one done for FileNames. This is not the only special casing that parallel functions have.

Personally I do not like such "smart" implementations because they easily lead to bugs. There is in fact a bug with ParallelMap and FileNames: it doesn't work correctly when using FileNames[] without arguments. It's caused by this special-casing, and forgetting about the zero-argument syntax of FileNames[]. Also, what if FileNames[] gets new syntaxes in the future? Will the FileNames[] developer realize that the parallel tools also need updating, especially if the parallel tools are maintained by a different person?

Here's a demo of the bug:

In[25]:= FileNames[]    
Out[25]= {"a", "b", "c"}

In[26]:= Map[f, FileNames[]]    
Out[26]= {f["a"], f["b"], f["c"]}

In[27]:= ParallelMap[f, FileNames[]]    
Out[27]= {}

In[28]:= ParallelMap[f, FileNames["*"]]    
Out[28]= {f["/Users/szhorvat/test/par/a"], f["/Users/szhorvat/test/par/c"], f["/Users/szhorvat/test/par/b"]}
POSTED BY: Szabolcs Horvát
POSTED BY: John Doty

Attempting to automatically fix up undisciplined code causes trouble for disciplined code, but doesn't cover all the cases.

I very much agree.

POSTED BY: Szabolcs Horvát
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract