There are many cases where you might want to run code that works on all cloud objects in an entire cloud object directory tree:
- apply permissions to all files in a cloud directory
- count all files in a cloud directory
- add up the total storage footprint of the directory tree
- look for files with a specific name in a cloud directory tree
All of these follow the same iteration structure, and I wrote little utility that abstracts that structure into a function:
visitCloudDirectory::usage =
"visitCloudDirectory[dir,fileFn] runs fileFn on each non-directory file contained in the given cloud object directory tree. It supports a \"BeforeVisit\" option that is called as before[dir,contents] prior
to iteration of each directory level. It also supports an \"AfterVisit\" option that is called as after[dir] after iteration.";
Options[visitCloudDirectory] = {"BeforeVisit" -> (Null &), "AfterVisit" -> (Null &)}
visitCloudDirectory[dir_CloudObject, fileFn_, opts : OptionsPattern[]] :=
Module[{contents},
contents = CloudObjects[dir];
OptionValue["BeforeVisit"][dir, contents];
Do[
If[DirectoryQ[obj],
visitCloudDirectory[obj, fileFn, opts],
(* Else it's a plain file, so run the fileFn *)
fileFn[obj]
],
{obj, contents}
];
OptionValue["AfterVisit"][dir];
]
You could then count all files in a cloud directory tree like this:
dir = CloudObject["/Base"];
count = 0;
current = None;
Dynamic[Column[{count, current}]]
visitCloudDirectory[dir, Function[count++], "BeforeVisit" -> Function[d, current = d]]
count
The Dynamic
output cell just helps you keep an eye on the progress of the function as it runs.
Similarly, you could make sure permissions are assigned to all files in a subtree:
visitCloudDirectory[CloudObject["/"],
Function[obj, SetOptions[obj, Permissions -> "alice@example.com" -> "Read"]]]
There are ways to improve this function, of course:
- add a parameter that can be applied to directories, before recursion
- add a way to limit the depth to which the function recurses
- add an option to control visit order, i.e. when recursion happens: either visit subdirectories first (depth-first) or after (breadth-first) the plain files are processed
- add built-in progress monitoring