Message Boards Message Boards

About package initialization

GROUPS:

I would like to share some thoughts about package initialization. As always, I post these ideas in the hope to stimulate discussion between package developers, hopefully get some feedback and learn something. It is more efficient if we learn from each other than if each of us has to go through the same stumbling blocks alone.

Introduction

Some packages only contain definitions. Complex packages typically contain some sort of initialization—code that evaluates when the package loads. There are many different reasons why this may be necessary:

  • Load external dependencies such as LibraryLink functions, installable MathLink programs, etc.
  • Load and process configuration files
  • Load data files (precomputed data, caches, etc.)
  • Verify and set up some external environment, e.g. what if the package needs to call external processes using RunProcess, or even modify the system PATH or LD_LIBRARY_PATH to make them work? (MaTeX needs something similar)

Things to keep in mind when the package has initialization code

Initialization code can affect loading performance. A polished, high-quality package loads without delay. This is especially important on slow platforms such as the Raspberry Pi computer. The currently released version of my IGraph/M package takes 30-60 seconds to load on the Raspberry Pi, which is just unacceptable. The development version fixes this.

Initialization code can cause serious problems when the package is loaded on kernel startup, not to mention slowing down kernel start. There are certain fundamental functions which do not work during kernel initialization, such as Throw and Catch. These are dependencies to many other common functions. In the end, even very bad things can happen: if Import is used during kernel initialization, not only will it fail once, it will also stop working for the rest of the session.

How might a package get loaded during kernel startup? A user may load it in their kernel init.m file. Or they may place it in their $UserBaseDirectory/Autoload directory. Or if the package is distributed as a paclet, Mathematica will offer a GUI setting to load it on startup. According to my experience, Murphy's laws apply everywhere: if you give your users even the slightest chance to mess up something, one of them certainly will.

Of course, these are minor issues. You do not need to solve them to write a useful package. But if you are aiming for creating a high-quality package that will be used my many people, it is good to keep these things in mind.

Strategies for robust initialization

This is the section I would like some feedback on. My goal is to make initialization fast and robust so it is safe to run during kernel startup.

Prefer simple functions to complex ones

Try to use functions which run fast and are safe during initialization.

For reading and writing configuration files, prefer Get instead of Import[..., "Package"] and Put instead of Export[..., "Package"].

For reading text files, prefer ReadString instead of Import[..., "String"].

Generally, use low-level IO functions, such as Read, ReadList, etc., and do not forget to Close the stream.

ReadString is new in version 10.0, and triggers loading ProcessLink (which contains functions like RunProcess). This means that we cannot use it in v9.0, and also its first use is slow because it has to load soe dependencies. A faster and more compatible alternative is

readString[file_] :=
  Module[{stream}
    stream = OpenRead[file];
    res = Read[stream, Record, RecordSeparators -> {}];
    Close[stream]
    res
  ]

Delay initialization when possible

This is the most effective and most general tool when dealing with initialization. The usual idiom to define a symbol sym is

sym := sym = computeValue[];

Then sym will be computed the first time it is used and not when the package is loaded. sym could be a variable holding persistent configuration that is read from a file, precomputed data, or even a LibraryLink function:

libFun := libFun = LibraryFunctionLoad[...];

This idiom is extremely convenient because it can be applied to existing code with a minimal change.

Use scheduled tasks to delay-load

I learnt this trick from @WReach on StackExchange and @Ilian Gachevski.

Use a scheduled task to delay initialization until kernel startup has finished:

RunScheduledTask[
  (* perform some complex initialization *)
; RemoveScheduledTask @ $ScheduledTask
, {0}
]

This is really just a hack and it is not fully reliable (perhaps there's a risk of race conditions). I needed to use a delay of 1 second instead of 0 seconds to make it work reliably.

I do not recommend using this in published packages, but the technique is very useful for personal packages. I use it to set up function argument completions in a personal package that I always auto-load.

If your users ask about reliably auto-loading your package, suggest DeclarePackage

This is just an idea: if you need to load a package at kernel startup (but not actually use it in kernel init files), then use DeclarePackage instead of Needs



Any comments, suggestion, are most welcome!

POSTED BY: Szabolcs Horvát
Answer
1 year ago

enter image description here - Congratulations! This post is now Staff Pick! Thank you for your wonderful contributions. Please, keep them coming!

POSTED BY: Moderation Team
Answer
1 year ago

Great Post, i get the whole information about package initialization. This thread is also used for beginner guide of package.

POSTED BY: sxope consolidate
Answer
6 months ago

Group Abstract Group Abstract