Recently I was trying to process notebooks generated by the documentation tools and make small adjustments to them. This is a "soft question" on what is the best way to do such notebook manipulations. I am asking about this because I found that in practice these tasks tend to turn out to be more difficult than one might initially expect. It's hard to tell what the best approach is without having any practical experience.
Is it better to process notebooks with the Kernel or with the Front End? When and why?
Pattern matching with the kernel
One way is to get a Notebook[...]
expression into the Kernel and rewrite it using Mathematica's strong pattern matching capabilities: use functions like Replace
, ReplaceAll
, DeleteCases
, etc. While these functions are very general and powerful, there are many typical notebook processing tasks that still require quite a bit of work with this approach.
Object references with the Front End
Another way is to use the Front End to change a notebook object. In this case we don't have a Mathematica expression to manipulate. Instead we have references to parts of notebooks: NotebookObject
and CellObject
. Cell references can be retrieved using Cells
, which filters based many properties such as style, cell tags, cell labels, etc. Once we have a reference, we can set or retrieve its properties using CurrentValue
(or Options
/SetOptions
), delete the corresponding object, or move to the next, previous or parent cell.
I find this so much more convenient than using pattern matching.
I can take e.g. TaggingRule -> {"foo"->1}
and use a simple CurrentValue[..., {TagginRules, "bar", "baz"}] = 2
to transform it to {"foo" -> 1, "bar" -> {"baz" -> 2}}
. This isn't nearly as easy with expression manipulation, through associations should help a bit.
I can reference the parent of a cell easily, or the following cell. This isn't possible with pattern matching without matching directly on the full parent cell expression. Think of e.g. modifying all cells with tag A, but only if they are within a cell with tag B.
The front end also maintains a consistent structure, e.g. I can count on {Cell["1+1", "Input"], Cell["2", "Output"]}
getting grouped into Cell[CellGroupData[{Cell["1+1", "Input"], Cell["2", "Output"]}, ...]]
right away, and open or close the group.
Problems with the Front End
If I see all these advantages, why shouldn't I just use the Front End? It turns out that there are a number of difficulties. I can't work around all of them and I am not confident that there aren't several more I haven't thought of yet.
If I create an actual notebook from a Notebook
expression using NotebookPut
, that has a number of immediate effects, some undesirable.
Dynamic evaluations may happen. We need to turn it off temporarily using CurrentValue[$FrontEndSession, DynamicUpdating] = False
.
A number of options get immediately added to the notebook, or modified: WindowSize
, WindowMargins
.
We still need to be careful when modifying notebook options. My example with CurrentValue[..., {TagginRules, "bar", "baz"}] = 2
would also inherit all of the Front End's tagging rules if the notebook had no tagging rules at the beginning. Most people don't have any tagging rules set for their front end so they may not even realize this. Currently I do not know how to detect if a notebook option belongs to the notebook or whether it is inherited from the Front End. (At least not without retrieving the Notebook[...]
expression.)
There are also many notebook manipulation functions which move the selection to different places and may open cell groups in the process. Such changes are undesirable. This shouldn't be a big issue for as long as I am careful not to use such functions, but it's not yet clear to me what will and what won't cause cell groups to open. I have not yet found a way to write or delete things without opening the surrounding cell group.
Quesions
I would like to know others experiences with notebook manipulation and their advice on whether I should go with the kernel or front end route. Which one will cause fewer problems in the end? I lean towards the kernel way now becausewhile it has several proglems I didn't metionI feel like I am in full control. Many of the problems with the front end method were things I didn't expect.
Is there a way to have more control over the behaviour of notebooks? To prevent creating options such as WindowSize
, etc.? I suspect that there is a way because Export[..., "NB"]
does seem to go through the front end (it creates a proper notebook cache) and it doesn't add these options.
Do you already have ways to make the kernel method (transformations through pattern matching) more convenient, like the object references method is?
Typical tasks to accomplish would be:
- close the input cell of any input-output pair where the input cell has a certain cell tag
- remove
CellChangeTimes
from Text cells with a given tag in the second Section, and also remove that tag
- rasterize output cells at double resolution, except if they are in the first Section
- modify certain nested rules within tagging rules without affecting any other rules there