Message Boards Message Boards

4
|
8003 Views
|
30 Replies
|
13 Total Likes
View groups...
Share
Share this post:

Version control of Mathematica notebooks to include it in git repositories

Posted 2 years ago

In order to clean the Mathematica notebooks to meet the requirement of git version control, I noticed the following code snippets here:

CleanNotebook[file_] := Module[{nb, contents, newcontents},
   nb = NotebookOpen[file];
   SetOptions[nb, "TrackCellChangeTimes" -> False, 
    PrivateNotebookOptions -> {"FileOutlineCache" -> False}];
   contents = NotebookGet[nb];
   newcontents = 
    contents /. {(CellChangeTimes -> _) -> 
       Sequence[], (CellTags -> _) -> Sequence[]};
   NotebookPut[newcontents, nb];
   NotebookSave[nb];
   NotebookClose[nb];
   ];

But based on tries by re-evaluating, the notebook still changes a lot even after using the above code snippet.

So, I wonder if there is any good way to clean up Mathematica notebooks and track (by continuous commits) them in the git repository easily.

Regards, HZ

POSTED BY: Hongyi Zhao
30 Replies
Posted 2 years ago

Thank you for your notice. BTW, I finally think that Notebook is not suitable for version control, but should use wl and other script/package/source code related formats.

POSTED BY: Hongyi Zhao
Posted 2 years ago

Hello Hongyi Zhao,

The ResourceFunction["SaveReadableNotebook"] was updated recently, and now it has no problem we discussed above: evaluation of

ResourceFunction[
  "SaveReadableNotebook"]["Debugging-and-semantic-logic-analysis.nb", 
"Debugging-and-semantic-logic-analysis (Git).nb", 
 "ExcludedCellOptions" -> {CellChangeTimes, ExpressionUUID, CellLabel}]

produces a formatted version of the Notebook which the FrontEnd opens without error messages. Pattern[sym,obj] is now written as Pattern[sym,obj], not as sym:obj, as it was earlier.

POSTED BY: Alexey Popkov
Posted 2 years ago

Thank you for your confirmation. I've informed the author about the bug of SaveReadableNotebook discussed here by using the link "Send a message about this function".

POSTED BY: Hongyi Zhao

Thank you for bringing this to my attention, Hongyi! I'm forwarding your feedback to the author now to make sure that your concerns are heard.

Posted 2 years ago

Thank you for your advice. BTW, it seems that the result of the SaveReadableNotebook function is more beautiful and readable than the result generated by your code snippet.

POSTED BY: Hongyi Zhao
Posted 2 years ago

I agree. I wrote you that SaveReadableNotebook looks like a more generic and advanced version of CleanNotebookForGit. It is a much more complicated function, as you can see from its source code.

POSTED BY: Alexey Popkov
Posted 2 years ago

Thank you for explanation. I've confirmed that for the latest version of Mathematica, the only feasible way is to use your improved version of CleanNotebookForGit.

POSTED BY: Hongyi Zhao
Posted 2 years ago

At the bottom of the function's page is a link "Send a message about this function". You can use it to inform the author and the support about the bug.

POSTED BY: Alexey Popkov
Posted 2 years ago

I meet a very strange problem. Even with the ResourceFunction["SaveReadableNotebook"], the generated notebook still has syntax error as shown below:

enter image description here

I attached all related notebooks, please check.

POSTED BY: Hongyi Zhao
Posted 2 years ago

The file "readable.nb" is obviously incomplete and contains obvious syntax errors. But evaluating the code

ResourceFunction["SaveReadableNotebook"][EvaluationNotebook[], "readable.nb" , 
 "ExcludedCellOptions" -> {CellChangeTimes, ExpressionUUID, CellLabel} ]

produces correct output file without syntax errors.

But with your previous problematic file "Debugging-and-semantic-logic-analysis.nb" I get exactly the same syntax error as we got using the original version of my function CleanNotebookForGit. And the reason exactly the same: ResourceFunction["SaveReadableNotebook"] writes Pattern[sym,obj] as sym:obj, and the syntax sym:obj is recognized by the FrontEnd as a syntax error when opening the file as Notebook. So ResourceFunction["SaveReadableNotebook"] also suffers from this bug...

POSTED BY: Alexey Popkov
Posted 2 years ago

Based on this discussion, I noticed another similar project called mathematica-notebook-filter, which also aims to solve the problems discussed here.

But this project is developed with rust and hasn't been updated for several years. So I think the method discussed here should be preferred. Basically, I suggest ResourceFunction["SaveReadableNotebook"] based workflow as described below:

  1. Implement the related operation in script with the notebook want to be manipulated as an argument.
  2. Use a temporary file to store the result of the above script, and finally rename/move it to the original notebook file. In this way, only one notebook is used.
  3. Integrate the above workflow with the git pre-commit hook, as described here, here, and here.
POSTED BY: Hongyi Zhao
Posted 2 years ago

Thank you for your hint. Based on the insightful and straightforward examples there, the following code does the trick:

ResourceFunction["SaveReadableNotebook"][
 EvaluationNotebook[], "readable.nb" , 
 "ExcludedCellOptions" -> {CellChangeTimes, ExpressionUUID, 
   CellLabel} ]
POSTED BY: Hongyi Zhao
Posted 2 years ago

Yes. This does the trick. But the diff gives the following results for continuous evaluation of the same NB file:

$ diff readable.nb readable1.nb 
50c50
<       CellLabel -> "In[18]:="
---
>       CellLabel -> "In[16]:="
55c55
<       CellLabel -> "Out[18]="
---
>       CellLabel -> "Out[16]="
84c84
<       CellLabel -> "In[19]:="
---
>       CellLabel -> "In[17]:="
89c89
<       CellLabel -> "Out[17]="
---
>       CellLabel -> "Out[15]="

As you can see, the In/Out cell numbering has been growing. This information is not conducive to git version control. It would be better if the numbers of these cells don't be tracked in the result file.

POSTED BY: Hongyi Zhao
Posted 2 years ago

Please read carefully the Documentation page of this function! It has an option "ExcludedCellOptions"... Try it!

POSTED BY: Alexey Popkov
Posted 2 years ago

Thank you very much for telling me this function. The biggest problem is that the first execution of this function is time-consuming because of the network query. But downloading the notebook and deploying it locally on the computer can solve this problem, as shown below:

enter image description here

I tried some example codes shown in the document of **ResourceFunction["SaveReadableNotebook"]**, but encountered the following error:

FilePrint::badfile: The specified argument, FunctionRepository`$31760fa6b86145928a98537accf1a6b8`SaveReadableNotebook[nb,FileNameJoin[{/home/werner,readable.nb}]], should be a valid string or File.

See the attachment for more detailed information.

POSTED BY: Hongyi Zhao
Posted 2 years ago

A correct call to this function:

ResourceFunction["SaveReadableNotebook"][EvaluationNotebook[], "readable.nb"]
POSTED BY: Alexey Popkov
Posted 2 years ago

Hi Alexey Popkov,

This works. Some additional considerations:

  1. How can I script the above code snippet instead of running it from inside Mathematica?
  2. Why not make it a customized package, to facilitate loading and using it?
  3. Some advice about the default arguments: Set the inputFile's default value to the notebook name where this function is called, and set outputFile's default value to something meaningful ending with git or like.
POSTED BY: Hongyi Zhao
Posted 2 years ago
  1. Please read the Tutorial.
  2. and 3. The function is quite simple, so everyone can easily adapt it to suit their goals.

P.S. I just discovered that in the WFR is available ResourceFunction["SaveReadableNotebook"] which at first glance looks like a more generic and advanced version of CleanNotebookForGit.

POSTED BY: Alexey Popkov
Posted 2 years ago

Here is an improved version of CleanNotebookForGit with a fix for above-described Pattern export bug. Another improvement is that it doesn't clean up options inside of BoxData, where this can cause problems:

CleanNotebookForGit[inputFile_, outputFile_] := 
  Module[{contents = Get[inputFile], newcontents,
    uniqCellHeadName = "$$$MyUniqueCellHead$$$" <> ToString[RandomInteger[{10^8, 10^10}]],
    uniqPatternHeadName = 
     "$$$MyUniquePatternHead$$$" <> ToString[RandomInteger[{10^8, 10^10}]]},
   newcontents = contents /. {bd_BoxData :> bd,
      HoldPattern[
        CellLabel | CellChangeTimes | ExpressionUUID | WindowSize | WindowMargins -> _] :>
        Sequence[]};
   Export[outputFile,
    StringReplace[
     ExportString[
      newcontents /. {Cell -> Symbol[uniqCellHeadName], 
        Pattern -> Symbol[uniqPatternHeadName]},
      "Package", PageWidth -> Infinity, "Comments" -> None],
     {uniqCellHeadName -> "\nCell", uniqPatternHeadName -> "Pattern"}], "Text"]];
POSTED BY: Alexey Popkov
Posted 2 years ago

I try to use the code snippet above from another new notebook to deal with the notebook in the attachment. But what is generated is a corrupt file with syntax errors.

Attachments:
POSTED BY: Hongyi Zhao
Posted 2 years ago

Something was changed in the exporter to the "Package" format since developing the original implementation. With Mathematica 8.0.4 it still works correctly, but not with the latest versions. I'll inverstigate and try to fix it this week.

POSTED BY: Alexey Popkov
Posted 2 years ago

The immediate reason for the syntax error is that in recent Mathematica versions Export as "Package" writes Pattern[sym,obj] as sym:obj instead of Pattern[sym,obj] as it was in earlier versions. The syntax sym:obj is recognized by the FrontEnd as a syntax error when opening the file as Notebook. Exactly the same problem is with Put. Hence in the recent versions they broken the compatibility between Get and Put. I think that it is a bug.

POSTED BY: Alexey Popkov
Posted 2 years ago

I want to execute the code snippet in the original nb file and write the processing results back to the same file. Is that possible?

POSTED BY: Hongyi Zhao
Posted 2 years ago

I want to execute the code snippet in the original nb file and write the processing results back to the same file. Is that possible?

No, it isn't possible since the NB file will be blocked until you close it.

POSTED BY: Alexey Popkov
Posted 2 years ago

Here is an improved version of the original code snippet which should fit your purposes much better:

CleanNotebookForGit[inputFile_, outputFile_] := 
  Module[{contents = Get[inputFile], newcontents, 
    uniqName = "$$$MyUniqueCellHead$$$" <> ToString[RandomInteger[{10^8, 10^10}]]},
   newcontents = 
    contents /. {HoldPattern[
        CellLabel | CellChangeTimes | ExpressionUUID | WindowSize | WindowMargins -> _] :>
        Sequence[]};
   Export[outputFile, 
    StringReplace[
     ExportString[newcontents /. Cell -> Symbol[uniqName], "Package", 
      PageWidth -> Infinity, "Comments" -> None], uniqName -> "\nCell"], "Text"]
   ];

An example of use:

CleanNotebookForGit["ExampleData/document.nb", "document (for Git).nb"] // SystemOpen
POSTED BY: Updating Name
Posted 2 years ago

This implementation doesn't require FrontEnd, so it can be used in a script.

POSTED BY: Alexey Popkov
Posted 2 years ago

@Alexey Popkov

Thank you for your wonderful comments and sharing your valuable experience. But your method seems quite complicated. I need to learn it. If there is any problem, I will give follow-up feedback.

POSTED BY: Hongyi Zhao
Posted 2 years ago

Hello Hongyi Zhao,

I think the following answer of mine can help you:

POSTED BY: Alexey Popkov
Posted 2 years ago

If you want version control use packages or save notebooks as .m / .wl and version control them.

But this doesn't have the capability of tracking image or other multimedia related content.

POSTED BY: Hongyi Zhao
Posted 2 years ago

The expression for a notebook is quite verbose so a small change in the frontend appearance can result in many changes to the expression. It is not easy to understand what has changed by looking at the textual differences. Trying to resolve merge conflicts is very painful. If you want version control use packages or save notebooks as .m / .wl and version control them.

POSTED BY: Rohit Namjoshi
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract