Message Boards Message Boards

GROUPS:

Package development: How to spend less time creating a polished interface?

Posted 1 year ago
3386 Views
|
16 Replies
|
43 Total Likes
|

One of the great things about Mathematica is that it makes it so easy to do so much in so little code. It is partly because of this that I am so frustrated with the amount of time I need to spend developing the interface of package functions compared to their functionality.

Suppose you wrote a Wolfram Language function that does something useful and interesting. Now you want to wrap it up into a package, and make it usable by everyone. A polished package is expected to have functions that:

  • Will check their input for errors
  • Will report errors in an informative way
  • Will use messages appropriately (i.e. associate them with the correct symbol name)
  • Will adhere to the de-facto Mathematica interface conventions: proper use of optional arguments, options, option inheritance (as in Graphics -> Plot), default option value handling, use of Automatic, etc.
  • Have SyntaxInformation

I find that not infrequently I spend more time on making the function user-friendly than developing its functionality.

How do people generally deal with this task? How do you implement error checking and reporting in your packages?

To give an example, take a function as simple a moving average calculator. It is really easy to implement:

movingAverage[vec_, n_] := Mean /@ Partition[vec, n, 1]

But to bring it to the quality of the built-in MovingAverage, it should at least:

  • check the number of arguments (precisely 2)
  • check the types of arguments (a list and an integer)
  • check the values of arguments for correctness (non-empty list and positive integer)
  • make sure that all these checks don't introduce severe performance degradation (such as array unpacking, which can even be triggered by an inefficient argument pattern)

This also involves the introduction of multiple messages (for each type of error) associated to movingAverage. If we now want to add a movingMedian, we will find that we will mostly need to carry out the same checks and report the same messages. There will be small differences though, e.g. average calculations are feasible for symbolic lists like {1,x}, but not median calculations. So the checks won't quite be identical. The messages will be mostly identical, but each function must associate messages to its own symbol, which means a lot of duplication.

So if we care about a high-quality interface and high-quality error reporting, we will end up writing considerably more code for this than for the function's core task. We will also end up with a lot of code duplication, which is frustrating and a maintenance burden. The whole thing ends up being a lot of work and not a lot of fun (which is not very Mathematica-like :-) )

Are there good ways to simplify these tasks? Option handling also used to be error-prone and frustrating, but the introduction of OptionsPattern[] and OptionValue[] made it much easier.


I can see that there are some built-in tool to ease these tasks, though they are mostly undocumented. One example is ArgumentCountQ, another is Developer`CheckArgumentCount, or some tools in GeneralUtilities`. I would love to hear from others about how they deal with the tasks I described, which such internal functions they make use of, etc.

16 Replies

I am going to keep posting here as I explore this topic.

One useful built-in function seems to be ArgumentCountQ. Check what it does using ?ArgumentCountQ. Here's an example:

In[16]:= ArgumentCountQ[f, 3, 1, 2]

During evaluation of In[16]:= f::argt: f called with 3 arguments; 1 or 2 arguments are expected.

Out[16]= False

It returns a boolean value indicating whether the argument count is correct. If it is not, it also issues a correctly formatted and informative message.

One problem with this function is that it does not count arguments on its own. Counting them is not as trivial as a Length: when there are options present, we would not usually want to include these in the argument count. It is also somewhat unclear what should be considered an option, and what should be considered an argument that is a Rule (e.g. as in Replace).

The Developer`CheckArgumentCount function solves this. For example,

In[17]:= Developer`CheckArgumentCount[f[1, foo -> 2], 1, 1]

During evaluation of In[17]:= f::argx: f called with 2 arguments; 1 argument is expected.

Out[17]= False

In[18]:= Options[f] = {foo -> Automatic};
Developer`CheckArgumentCount[f[1, foo -> 2], 1, 1]

Out[19]= True

Since both of these functions return True/False, they can be used in Condition to easily report errors while keeping the input unevaluated.

This is a general framework that can be used to define any production level function.

I am going to use the ThrowFailure mechanism and the SetUsage function so I need GeneralUtilities`

Needs["GeneralUtilities`"]

First thing: documentation!

SetUsage[movingAverage,
"
movingAverage[vec$, n$] computes the moving average of the numerical vector vec$ \
by taking the mean of subsets of length n$.
"
];

The function may need options, we define them here and sort them properly

Options[movingAverage] = {

} // SortBy[ToString @* First];

Some internal options that we may not want to expose can go here

movingAverageOptions = {

} // SortBy[ToString @* First];

The main entry point for our function. This will do basic argument parsing and counting (no validation) and return either the result or unevaluated depending on the input or the execution. The key piece here is Arguments: it returns a list with two sub-list (or whatever you specify in the optional third argument), one for the arguments and one for the options. Arguments are counted and the proper message is issued, options are verified against Options[] and an optional list in the fourth argument. If some arguments are expected to be rules, use ArgumentsWithRules.

movingAverage[args___] :=
Module[{a, res},
    a = System`Private`Arguments[movingAverage[args], 2, List, movingAverageOptions];
    res /; a =!= {} && !FailureQ[res = imovingAverage @@ a]
]

Now the main code for the function. I tend to define a getOption utility not to have to write that longish piece of code over and over.

imovingAverage[args_, opts_] :=
CatchFailureAsMessage[movingAverage,
Module[
    {getOption, vec, n},

    getOption[name_] := OptionValue[{movingAverage, movingAverageOptions}, opts, name];

    n = argumentTest["PositiveInteger", "sizeinv"][args[[2]]];
    vec = argumentTest[{"NumericList", n}, "numinv"][args[[1]]];

    Mean[Transpose[Partition[vec, n, 1]]]

]]

Argument validation can be done in countless ways, here's a sub-values based one I just came up with. It can be saved in a separate file (Common.wl, Test.wl, ...) to be used in many places. If the tests are standard one can probably avoid passing the message name.

argumentTest["PositiveInteger", message_] := 
Function[
    If[Internal`PositiveIntegerQ[#],
       #,
       ThrowFailure[message, #]
    ]
]

argumentTest[{"NumericList", n_Integer}, message_] := 
Function[
    If[VectorQ[#, Internal`RealValuedNumericQ] && Length[#] >= n,
       #,
       ThrowFailure[message, #, n]
    ]
]

And now the messages. For these kind of standard tests, one can define General messages that can be used with any head.

movingAverage::numinv = "Expecting a list of numerical quantities of length greater of equal then `2` instead of `1`.";
movingAverage::sizeinv = "Expecting a positive integer instead of `1`.";

What does the

@* 

do in

SortBy[ToString @* Last]

? How do I get the documentation?

Not so easy to find the documentation for @*. I found it on the Documentation Center page tutorial/OperatorInputForms, where it says that @* is a special form for Composition (q.v.).

Then SortBy[ToString@*Last] is a shorthand for SortBy[Composition[ToString, Last]].

  1. Thanks for the information. I should have recognized that operator, but didn't and couldn't find the documentation. Since my Mathematica code tends to use composed functions often, I'm going to look harder for applications of Composition[] and RightComposition[]. Further, RightComposition[] is used often in calculations within DataSet[], and I should understand RightComposition[] better.

  2. I don't understand the behavior of

    SortBy[ToString@*Last]
    

when it is used in SortBy[]

Examples::

(ToString@*Last)["a"->10] yields "10"

(ToString@*First)["a" -> 10] yields "a"

So I would expect

SortBy[ToString@*Last][{"a" -> 10, "b" -> 20, "c" -> 3}] 

to sort by string versions of the numerical elements of the list. That does not happen, however.

SortBy[ToString@*Last][{"a" -> 10, "b" -> 20, "c" -> 3}] 

yields {"a" -> 10, "b" -> 20, "c" -> 3} rather than { "c" -> 3, "a" -> 10, "b" -> 20}

and

SortBy[ToString@*First][{"a" -> 10, "b" -> 20, "c" -> 3}] 

yields {"a" -> 10, "b" -> 20, "c" -> 3}

Given a choice, I'd to sort by option names, using something like:

{"a" -> 10, "b" -> 20, "c" -> 3}//KeySort

or, if I wanted to sort by option values, maybe something like:

[{"a" -> 10, "b" -> 20, "c" -> 3} //SortBy[Values]

and I'd recommend using //KeySort or, better yet, use //KeySet to sort the Options list, then cut that sorted list and paste it right after

Options[<functionName>] =

@Bill Lewis, please, make sure you read the guidelines: https://wolfr.am/READ-1ST

The guidelines explain how to format your code properly. If you do not format code, it may become corrupted and useless to other members. Please EDIT your posts and make sure code blocks start on a new paragraph and look framed and colored like this.

int = Integrate[1/(x^3 - 1), x];
Map[Framed, int, Infinity]

enter image description here

Why you expect that string are sorted the same way as numbers?

In[813]:= Sort[{3, 10, 20}]

Out[813]= {3, 10, 20}

In[814]:= Sort[ToString /@ {3, 10, 20}]

Out[814]= {"10", "20", "3"}

I think this is a standard ordering for strings, unless you pad them with zeros on the left to match the length

In[815]:= Sort[IntegerString[#, 10, 2] & /@ {3, 10, 20}]

Out[815]= {"03", "10", "20"}

I am not using KeySort as it produces an association.

Good points. 1. I should have proposed

Normal@KeySort[{"a"->3, "b"->20,"c"->1}]

to produce a list rather than an association.

  1. I should have asked a question, as follows:
    In option initialization, I usually put option names in alphabetical order and ignore option values. I do this because during code debugging and maintenance I usually start with the option name. I did not expect to see sorting by option value.

By the same sort of argument (convenience in code debugging and maintenance), I would prefer that both numerical and text values be ordered identically.

There are obviously advantages to SortBy[ToString@*Last] that I'm missing. Could you describe them?

There are no secret advantages, just a misunderstanding. Indeed I am not sorting by Last but by First (see my post).

I thought you where asking a general question about the sorting algorithms.

There is a list of operators here:

You might find it useful.

@* (Composition, as Murray said), was introduced in version 10.0. Normally the Documentation Centre will return the correct page when searching for things like @@, /@, /.. But it does not seem to know these new operators. Another new one is /*, RightComposition.


Update: I could be that the Doc Centre is having problem with @* because * is a wildcard symbol. It also doesn't find ** or *^. But it does find * (Times).

Having trouble with the GeneralUtilities` package. Mathematica becomes unresponsive more or less unpredictiably after package is loaded.

@Giulio, I don't know how I missed this when you originally posted it. Thanks for this post.

One thing I observe here is the heavy reliance on internal and undocumented functionality. I mentioned before how this is basically unavoidable for package development (think e.g. Internal`WithLocalSettings).

There's GeneralUtilities, System`Private`Arguments, several Internal` context functions for verifying types, etc.

Since these are not documented, they are hard to discover. If I do discover them, I cannot be sure about how to use them. If I do use them, and they break, I can't get support, and my bug report might be rejected (so why even bother sending it?)

Of course, I still do use some of these because they are essential to package development. But the situation demonstrates well how WRI basically ignores (potential) package developers. There are very few quality packages that do not come fro WRI directly, and this is surely the main reason.

Another thing I observe is that you construct several layers around the basic function. How will this affect performance? This function may later be used to implement other functions, which are then used to implement yet other functions, and all this checking overhead adds up. Paying attention to the error checking overhead is a critical aspect of package development.

I notice that you use VectorQ[..., Internal`RealValuedNumericQ]. What you did not mention that this is basically the only right way to check if the input is a real vector because it won't unnecessarily check every element of packed arrays, and thus will perform very well.

In[371]:= arr = RandomReal[1, 1000000];

In[372]:= VectorQ[arr, Internal`RealValuedNumericQ] // RepeatedTiming
Out[372]= {2.4*10^-7, True}

In[373]:= MatchQ[arr, {___?Internal`RealValuedNumericQ}] // RepeatedTiming
Out[373]= {0.15, True}

I'm not aware of this being documented anywhere in spite of how important it is—another example of how support for package developers is lacking.

Because these things are not documented, they are also more likely to be buggy (since they're not used by as many people). Example: in M10.3 and before (if I recall correctly) VectorQ[{}, Developer`MachineIntegerQ] returns False.

The point I am trying to make is this:

  1. If Mathematica is to stay relevant, there must be a vibrant package ecosystem

  2. Support for package development and package developers is currently next to non-existent. We are forced to use internal unsupported functions whose use is either reverse engineered by the community or "leaked" by developers (like you did here). As a consequence, there are few Mathematica packages and many are not of high quality.

  3. Functions that are not important for interactive use but critical for package development should be documented. There should be guides and tutorials on how to solve the basic problems that come up during package development. If there are standard solutions for the most common tasks, and both internal functions and third-party developers use these solutions, then we will see fewer bugs and better performance (both in Mathematica and in third-party packages).

  4. Finally, quality packages should be highlighted and promoted by WRI themselves, both as an example of a healthy package ecosystem and as an example for future package developers to follow. I am not saying this because I develop packages, but because I believe it is critical for the health of Mathematica, a system in which I invested heavily.

Seeing how you use GeneralUtilities, it would be useful to know:

  • How stable is this interface? If we use it today, can we count on it working in later versions?

  • From which version onwards can it be used? There's no ThrowFailure in v10.0.2.

  • How does it work? Looking at the definitions, it appears to use Throw/Catch, but Catch[ThrowFailure[], _] does not work, so there must be some more to it. This is important for those cases when we need to make a package compatible with older versions of Mathematica. Thus we either need to implement similar functionality, or re-implement a subset in a compatible way (as a fallback for older versions).

This topic demands to be included in a Documentation Center Tutorial page!

I'm pretty sure there's no such page now, but then I don't know just what to look under: "interface"? "idiot-proofing"?

I have attached two files that might be considered a partial answer to this question.

One is "template for Package". This is a slightly changed version of Maeder's original suggestion for Mathematica package format, to include documentation and a layout that facilitates code maintenance and code readability. A full reference to Maeder' work appears in this file.

The other is template for function.nb. This is derivative of Maeder's format, intended to facilitate code curation, code maintenance, and code validation after Mathematica version changes.

Please feel free to change and improve these files. As an example of a needed change, template for Package.nb contains none of Mathematica's automated test functions. Since the skeleton of "templateForPackageV0R0.nb" dates back a bit, perhaps it is time to incorporate Wolfram functions and practices introduced since then.

@Szabolcs Horvát @Giulio Alessandrini

I don't recommend using GeneralUtilities for production. It is a very nice but a playground. E.g. 11.3's version does not export ThrowTaggedFailure anymore and I needed to patch it in my code.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract