Message Boards Message Boards

Manage and automate PDF file split and join operations

Simple program for splitting and grouping PDF files.

In my daily job I find myself handling a large quantity of PDF files, so I have crafted a Notebook to take advantage of the Freeware PDF24 (https://tools.pdf24.org/) to automate the process. For this to work the pdf24-DocTool.exe program needs to be included in the Windows PATH i.e. you need to be able to run it from the Windows Command Prompt

The Problem:


A collection of multiple folders each containing several PDF files, each file has always two pages. What I need is a single PDF file per folder containing only the second page of each individual file in the folder.

I tried using the Import[ ] Export[ ] built-in functions of Mathematica, however, because of the way Mathematica renders the PDF the result was an absurdly large file, so the solution I found is to call an external program with the RunProcess[ ] function.

Split a PDF with PDF24.


To Split the PDF into individual pages:

RunProcess[{"pdf24-DocTool.exe", "-splitByPage", "-outputFile", outFile, file1}, "StandardOutput"];

Where:

outFile =  (*is the header name for the resulting multiple files, one for each page, pdf24 creates from the file to be split.*)

file1 = (*is the path to the file to be split.*)

Join PDFs with PDF24.


After running this process in all the PDF files in the current directory I save the names of the files that interest me on a list to execute the following function:

CMDJoinPDF[raiz_, salida_, lista_] := Module[{exe, borrar},
   exe = lista;
   exe = Flatten[
              PrependTo[exe, {"pdf24-DocTool.exe", "-join", "-profile","\"default/good\"", "-outputFile", 
                  FileNameJoin[{raiz, salida}]}]];

   RunProcess[exe, "StandardOutput"];

   borrar = FileNameJoin[{raiz, #}] & /@ lista;
   DeleteFile[#] & /@ borrar;
]

Arguments:

raiz_   = (*Path to the directory where all the PDF files to be join are*)
salida_ = (*String with the desired name for the resulting joined file*)
lista_  = (*List of Strings where each element is the name of a file to be joined*)

If someone is interested I can elaborate further on the full blown notebook I created to tackle this program.

Hope this is useful to somebody,

Ernesto P.

P.S. I'm a native spanish speaker that is why som names may "sound" weird

POSTED BY: Vlad Palacios
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract