Message Boards Message Boards

2
|
521 Views
|
1 Reply
|
3 Total Likes
View groups...
Share
Share this post:

FindFileNames, a wrapper around Unix find that replaces FileNames

Posted 2 months ago

Hello,

FindFileNames is a little wrapper around Unix find utility to generate a list of filenames matching a certain pattern (or list of patterns).

I wrote it when I realized that the FileNames function has a few limitations. First, it takes much longer to run; a little example below shows a 3x times difference on a folder with 362k files, and it sometimes hangs (as in, kills the Kernel) when run on larger directories. Notice that FileNames implementation is not super strict about closing open files, with a few "Too many open files" warnings, and a suspicious looking ""Cannot set current directory to "~/"".

enter image description here

Another benefit of writing a wrapper around find is having an ability to prune some paths from search; that speeds things up considerably when you know that you don't want to descend in specific folders: the following command completes in a reasonable time, while with FileNames you would need to filter the result yourself (and it never completes anyways!).

enter image description here

Here is the code - enjoy, and please give me your feedback in comments.

BeginPackage["FileFinder`",  {"GeneralUtilities`"}]

SetUsage[FindFileNames,
"FindFileNames[searchRoots$, opts$] uses a system find to list all files under searchRoots$, which can be a dir or a list of dirs.
opts$ can be one of the following:
* \"Files\": list of file name patterns to find (default is {}, meaning all files).
* \"Directoriers\":  list of directory patterns to limit the search to (default is {}).
* \"Prune\": list of directory patterns not to descend into (default is {}).
* \"FileTypes\": what objects to return, either File, Directory, or All."]

Begin["`Private`"]

RunAndCaptureOutput[cmd_] := Module[{tmpfile, result},
    tmpfile = CreateFile[];
    RunProcess[{"sh", "-c", StringRiffle[{Echo@cmd, ">", tmpfile}]}];
    result = ReadList[tmpfile, "String"];
    DeleteFile[tmpfile];
    result
]

OrFindConditions[cond : {___String}] :=
    Switch[Length[cond],
       0, "",
       1, First@cond,
       _, "\\( " <> StringRiffle[cond, " -o "] <> " \\)"]

StringQuote[s_String] := If[StringStartsQ[s, "'"], s, "'" <> s <> "'"]

FileNameFindCondition[name_String] := If[StringContainsQ[name, "/"], "-path", "-name"] <> " " <> StringQuote[name]

Options[FindFileNames] := {"Files" -> {}, "Directories" -> {}, "Prune" -> {}, "FileType" -> File}

FindFileNames::strlst = "Expect `1` to be a string or a list of strings, but found `2`"

FindFileNames::ftype = "Expect 'FileType' to be one of File, Directory, All but found `1`"

FindFileNames[searchRoots : (_String | List[String]) : ".", OptionsPattern[]] := Module[
    {dirArg, dirsOpt, filesOpt, pruneOpt, fileTypesOpt, cmd},

    If[!MatchQ[OptionValue["Files"], {___String}],
       Message[FindFileNames::strlst, "Files", OptionValue["Files"]];
       Return[$Failed]
    ];
    If[!MatchQ[OptionValue["Directories"], {___String}],
       Message[FindFileNames::strlst, "Directories", OptionValue["Directories"]];
       Return[$Failed]
    ];
    If[!MatchQ[OptionValue["Prune"], {___String}],
       Message[FindFileNames::strlst, "Prune", OptionValue["Prune"]];
       Return[$Failed]
    ];
    If[!MemberQ[{File, Directory, All}, OptionValue["FileType"]],
       Message[FindFileNames::ftype, OptionValue["FileType"]];
       Return[$Failed]];

    dirArg = StringRiffle[StringQuote /@ Flatten[{searchRoots}]];

    filesOpt = If[OptionValue["Files"] === {},
       "",
       " " <> OrFindConditions[FileNameFindCondition /@ OptionValue["Files"]]];

    dirsOpt = If[OptionValue["Directories"] === {}, 
       "", 
       " -type d " <> OrFindConditions[FileNameFindCondition /@ OptionValue["Directories"]]];

    pruneOpt = If[OptionValue["Prune"] === {},
       "", 
       " -type d " <> OrFindConditions[FileNameFindCondition /@ OptionValue["Prune"]] <> " -prune -o"];

    fileTypesOpt = Switch[OptionValue["FileType"],
       File, " -type f",
       Directory, " -type d",
       _, ""];

    RunAndCaptureOutput[
       "find -L " <> dirArg <> pruneOpt <> dirsOpt <> filesOpt <> fileTypesOpt <> " -print"]
]

Testing helpers

FileNamesOnly[dir_] := Quiet@Select[Sort@FileNames[All, dir, Infinity], FileType[#] === File&]

NotSocket[file_] := !StringContainsQ[RunProcess[{"file", file}, "StandardOutput"], "socket"]

TestIdentical[dir_] := Module[{my = Sort[FindFileNames[dir]], wolfram = FileNamesOnly[dir]},
    Complement[my, wolfram] === {} &&
    Select[Complement[wolfram, my], NotSocket] === {}
]

(*
<<FileFinder`
report =TestReport["/Users/victor/Documents/packages/FileFinder/Tests/TestFileFinder.wl"]
TabView[Column/@report["ResultsByOutcome"]]
*)

End[]
EndPackage[]
POSTED BY: Victor Kryukov
Posted 2 months ago

Nice, thanks. You should consider submitting it to the Wolfram Function Repository.

POSTED BY: Rohit Namjoshi
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract