Message Boards Message Boards

Manipulating notebooks: Kernel vs Front End

Posted 7 years ago

Recently I was trying to process notebooks generated by the documentation tools and make small adjustments to them. This is a "soft question" on what is the best way to do such notebook manipulations. I am asking about this because I found that in practice these tasks tend to turn out to be more difficult than one might initially expect. It's hard to tell what the best approach is without having any practical experience.

Is it better to process notebooks with the Kernel or with the Front End? When and why?

Pattern matching with the kernel

One way is to get a Notebook[...] expression into the Kernel and rewrite it using Mathematica's strong pattern matching capabilities: use functions like Replace, ReplaceAll, DeleteCases, etc. While these functions are very general and powerful, there are many typical notebook processing tasks that still require quite a bit of work with this approach.

Object references with the Front End

Another way is to use the Front End to change a notebook object. In this case we don't have a Mathematica expression to manipulate. Instead we have references to parts of notebooks: NotebookObject and CellObject. Cell references can be retrieved using Cells, which filters based many properties such as style, cell tags, cell labels, etc. Once we have a reference, we can set or retrieve its properties using CurrentValue (or Options/SetOptions), delete the corresponding object, or move to the next, previous or parent cell.

I find this so much more convenient than using pattern matching.

I can take e.g. TaggingRule -> {"foo"->1} and use a simple CurrentValue[..., {TagginRules, "bar", "baz"}] = 2 to transform it to {"foo" -> 1, "bar" -> {"baz" -> 2}}. This isn't nearly as easy with expression manipulation, through associations should help a bit.

I can reference the parent of a cell easily, or the following cell. This isn't possible with pattern matching without matching directly on the full parent cell expression. Think of e.g. modifying all cells with tag A, but only if they are within a cell with tag B.

The front end also maintains a consistent structure, e.g. I can count on {Cell["1+1", "Input"], Cell["2", "Output"]} getting grouped into Cell[CellGroupData[{Cell["1+1", "Input"], Cell["2", "Output"]}, ...]] right away, and open or close the group.

Problems with the Front End

If I see all these advantages, why shouldn't I just use the Front End? It turns out that there are a number of difficulties. I can't work around all of them and I am not confident that there aren't several more I haven't thought of yet.

If I create an actual notebook from a Notebook expression using NotebookPut, that has a number of immediate effects, some undesirable.

Dynamic evaluations may happen. We need to turn it off temporarily using CurrentValue[$FrontEndSession, DynamicUpdating] = False.

A number of options get immediately added to the notebook, or modified: WindowSize, WindowMargins.

We still need to be careful when modifying notebook options. My example with CurrentValue[..., {TagginRules, "bar", "baz"}] = 2 would also inherit all of the Front End's tagging rules if the notebook had no tagging rules at the beginning. Most people don't have any tagging rules set for their front end so they may not even realize this. Currently I do not know how to detect if a notebook option belongs to the notebook or whether it is inherited from the Front End. (At least not without retrieving the Notebook[...] expression.)

There are also many notebook manipulation functions which move the selection to different places and may open cell groups in the process. Such changes are undesirable. This shouldn't be a big issue for as long as I am careful not to use such functions, but it's not yet clear to me what will and what won't cause cell groups to open. I have not yet found a way to write or delete things without opening the surrounding cell group.

Quesions

I would like to know others experiences with notebook manipulation and their advice on whether I should go with the kernel or front end route. Which one will cause fewer problems in the end? I lean towards the kernel way now because—while it has several proglems I didn't metion—I feel like I am in full control. Many of the problems with the front end method were things I didn't expect.

Is there a way to have more control over the behaviour of notebooks? To prevent creating options such as WindowSize, etc.? I suspect that there is a way because Export[..., "NB"] does seem to go through the front end (it creates a proper notebook cache) and it doesn't add these options.

Do you already have ways to make the kernel method (transformations through pattern matching) more convenient, like the object references method is?

Typical tasks to accomplish would be:

  • close the input cell of any input-output pair where the input cell has a certain cell tag
  • remove CellChangeTimes from Text cells with a given tag in the second Section, and also remove that tag
  • rasterize output cells at double resolution, except if they are in the first Section
  • modify certain nested rules within tagging rules without affecting any other rules there
POSTED BY: Szabolcs Horvát

Like you, I tend to both sorts of transformations, and I would not absolutely recommend one style over the other.

There's a third possibility that you missed. Start with a notebook template and create versions of the notebook using GenerateDocument. This isn't quite as general as the other two options, and is certainly not useful for the specific transformations you list, but you shouldn't discount it as a possibility for certain kinds of applications.

You ask many questions, and I'm not going to answer them all. Instead, I'll just describe some of my observations about doing notebook transformations which you may find interesting or not.

  • Get[] is fast. Super fast. If you open a notebook, the FE has to do a lot of work to set it up for display. The kernel doesn't.
  • NotebookImport[_->"Cell"] is a nice way to pull in notebooks, and if you're pulling in raw cell content, it doesn't need to launch the FE. Why bother doing this as opposed to Get[]? Because it flattens cell groups. Which can make processing of cells a lot simpler, when you know that you have a simple list of all top-level cells, rather than an arbitrarily deep set of CellGroupData. Of course, this will destroy cell group data, so if preserving open/closed status is important, then this isn't useful to you.
  • Sometimes, I want to carve out subsets of notebook cells for processing. This ought to be easier using something like NotebookImport, but not yet. It's significantly easier to do, e.g., Cells[nb, CellTags->"foo"] than the equivalent Cases call (which is deceptively difficult to get right).
  • You can determine what options are set on a NotebookObject or CellObject by determining whether the option is listed in Options[obj]. There probably ought to be a better way of doing that.
  • CurrentValue is super nice for doing surgery on nested metadata. Being able to do, e.g., CurrentValue[obj, {TaggingRules, "sel1", "sel2"}] = value is incredibly nice. Before the introduction of associations, there was nothing like it in the kernel. But now, if you translated the lists of rules to associations, that could be made to work pretty well. E.g., assoc["sel1"]["sel2"] = value. Expect future versions of the FE to have more awareness of associations, incidentally.

That's my off-the-top-of-my-head list. :)

POSTED BY: John Fultz
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract