Message Boards Message Boards

18
|
16467 Views
|
14 Replies
|
41 Total Likes
View groups...
Share
Share this post:

Suggestion: Better support for package development

Posted 9 years ago

This is in the spirit of the "New Functions I would like to see in future..." post.

I think that one area where Mathematica could improve considerably is supporting and encouraging third-party package development. Mathematica is great for end-users, but it feels like WRI isn't really thinking about making it truly extensible.

A few concrete problems I noticed:

  • Recent versions add new symbols to the System` namespace indiscriminately, and prioritize always-present built-in functions over user-loadable standard packages. At 6000 built-in symbols this is just not sustainable and it's hostile to third-party packages: at this rate of adding new symbols each new release is virtually guaranteed to break some package through symbol name conflict.

    Previous versions used to ship with a few new standard packages bundled with Mathematica. Packages felt like a first-class citizen of the Mathematica ecosystem as WRI was using them too. Recent versions never add anything that is clearly a separate package from the user's perspective and needs to be loaded. Instead everything is crammed into the default namespace. Technically, new packages are added in each release, but these are now always auto-loaded and appear as built-in from the user's perspective.

    With the amount of functionality Mathematica has, better namespace management is sorely needed!

    Even MATLAB, long without namespace support, is now adopting namespaces. Mathematica had them from the beginning but doesn't use them anymore, which seems like a step backwards.

    A particularly awful thing about cramming everything into the System` context is that naming conflicts actually break packages, not just cause shadowing. If two packages use the same symbol names, one will simply shadow the other one, but won't break it. If there's a conflict with a System` symbol, the package will need fixes to work again.

  • Features that would typically be used by package developers are not well documented or completely undocumented. There are many symbols in the Internal` context that are very useful, such as InheritedBlock, WithLocalSettings, PositiveMachineIntegerQ, etc. There are also many in the semi-documented Developer` context.

  • Formerly many features of Mathematica were designed to be extensible and came with some documentation on how to extend them. NDSolve and NIntegrate are good examples. Recent additions tend to be less extensible and more opaque in their working. Is there a way to extend machine learning functions with new algorithms? Or can I add new graph layouts? Can I re-use the built in packing methods for graph layouts in my own layout computation? These don't seem to be possible.

    A good example of how strongly the focus has shifted to the highest level use case while ignoring more sophisticated use cases is the recently added Dendrogram: it goes directly from data to plot, and supports only the most superficial use. The old HierarchicalClustering package has a special data structure for dendrograms which can be manipulated separately from their plot. We had separate functions for computing or plotting a dendrogram, thus we could add our own, separate computation or plotting methods (and I did in the IGraph/M package which does produce dendrograms in this format).

  • There are some things that would be really useful or even necessary for package development, but they are missing. For example, how can we clean up temporary files created by packages, on kernel exit? There's no robust solution for this, even though the need must have come up internally (Compile-generated shared libraries do get cleaned up on exit).

  • Wolfram Workbench, the only tool for creating proper documentation, is stagnating and the released version (2.0) is not compatible with the latest Mathematica. I am aware that we can request 3.0 beta but it seems to take forever until it gets released. Also, Workbench is only for Premiere Support members, thus difficult to get.

  • The licensing that Wolfram offers does not make it easy to create multiplatform packages. One license is for one computer. If a package needs to special case different operating systems (which is always the case for LibraryLink/MathLink stuff) then the only realistic way for most of us to develop a package is to have access to an academic site license, so we can install three copies on OS X/Windows/Linux.

    A company that makes and sells commercial packages can afford multiple licenses. But for a healthy package ecosystem we need lots of free and open source packages, which are not easy with Mathematica at the moment.

To be fair, recent versions have also added very useful developer-oriented tools. LibraryLink seems to be evolving nicely: it got the very necessary "managed library expressions" feature. Associations are not purely developer-oriented, but they are extremely useful in this scenario. 10.4 added support for RawArrays in LibraryLink. I appreciate all these features very much. MUnit, a unit testing package was updated finally added to Mathematica in v10 (and doesn't require the separate Workbench), but unfortunately this update also brought quality and usability problems so in the end I don't use this tool ... the old Workbench version seemed more practically useful.

The point I am trying to make with this post is that I have the impression that overall Mathematica encourages third-party package development less than it used to. This is a bit surprising because there has been a big push to market Mathematica outside its traditional academic niche, to programmers (all the Wolfram Language rebranding, Wolfram Development Platform, etc.). In 2016 a healthy package ecosystem is critical for a software like Mathematica to survive. I am hoping that in the future there will be more emphasis on supporting and encouraging third-party package development.

POSTED BY: Szabolcs Horvát
14 Replies

Szabolcs, if you allow me, slightly related to this topic is the following post: "[not so] Elementary Introduction[s] to the W.L.” at the WTC2016

POSTED BY: Pedro Fonseca
Posted 9 years ago

Here are two things I think should be opened up to third party developers:

  • External services. Mathematica comes with built-in support for some popular APIs like Google Analytics, Dropbox etc. but what if I want to use another API? I would like to be able to write a package that I could share with others that adds a new API to the list of $Services. It would work just like an external service with built-in support.
  • I would like to be able to provide my own data source for entities. Imagine collaborating on some project where you need some data that suits the entity framework, then being able to set up your own data source, a database in the cloud say, would be very nice. I should have to write a lot of code myself to get it working, I should just have to configure the endpoint and where to find the entities etc.
POSTED BY: Calle E

... in the meantime, I'm using the original 2011 version of what is now the ApplicationMaker by jmlopez. It's been working for my purposes in WL 8, 9 and 10. Then again, having to use a third-party application for a task like this is probably not the way to encourage users to develop and document as David described.

POSTED BY: Bianca Eifert

In reply to George Woodrow I have been documenting a rather large application, Grassmann Calculus, with a beta version of Workbench 3. And another developmental application also. It basically works, but it required a fair amount of effort, back and forth with support, and installing of obscure file types to get it working.

In Mathematica 5 it was possible for every user to do documentation. One just wrote a text file. Yes, the new paclet documentation is much better but now nobody except those who pay extra through Premier Service or otherwise can write documentation. And there is not even a standard facility for doing it. WRI should consider a possible application or program specialized for user documentation - either outside of Workbench or maybe a stripped down version of Workbench. It should be available to EVERY user. And they should listen carefully to users and third party developers because their needs are not exactly coincident with internal documentation. One important requirement is that user produced documentation should have the same 'look and feel' as standard Wolfram documentation.

The fact that ordinary users have been cut off from documentation since version 5 looks as if this is by deliberate decision. It is a HUGE design and business mistake.

People in the community should think more in terms of "applications" and a little less in terms of "packages". An application may or may not include packages. It may not initially include a package but add one later. It is just a good way to develop, organize, preserve and communicate your work around some particular subject matter. If you start out the right way it's easy to grow into the more advanced features as you are ready for them. This is discussed in my essay A Mathematica Style. It is something everybody should consider, not just "developers". Except for the ability to document it is already designed and provided by WRI.

It's basically easy to write packages and, if Wolfram would provide the facility, to write documentation. Writing documentation is not just some irksome chore. It's part of developing and preserving your own work. If it's difficult to document how your routine is used then maybe it's badly designed and you should change it. A year from now will you remember how some routine was used or where there were some examples? Documentation could also include test examples. The examples would probably be written at the time you developed the routine so why not just copy them to a documentation page?

Many more users should be writing packages and documentation and consider it as just part of standard Mathematica usage.

I am in complete agreement with this proposition.

As a long-time user of Mathematica, I can remember when a lot of functionality had to be loaded using packages. This was primarily due to the limited storage and RAM available to computers in the late 1980s and 1990s. Even so, you had to be careful or you would run out of RAM.

Even though it is now possible to load 'everything', I think that the idea of packages is just as important now as ever. It helps to keep functionality organized, and I would like the opportunity to just load what I need.

We have been 'promised' Workbench 3 for at least two years, and it is long overdue. Fooling around with Eclipse is not a satisfactory resolution.

It would be interesting to learn if anyone outside of Wolfram Research has built a large scale project -- commercial or otherwise -- that makes use of Wolfram Language since version 8 or 9. That was about the time Workbench broke.

I completely agree with most of what you've said. Not so sure how I feel about the System shadowing issue, but I see your point. (It's just that excessive namespacing drives me nuts. I don't want to declare or load a module to use something as basic as sine and pi.)

Anyway, a somewhat related point: PackageData.net is a nifty site, but a proper repository for packages would be really cool. (As in, a place where the packages can actually be hosted and automatically be installed from within a WL system, possibly even loaded directly from the cloud without a local installation.) It doesn't even have to have version control and other development features (not right away anyway), as long as it's a good place to put finished releases.

POSTED BY: Bianca Eifert

Hi Bianca,

On excessive namespacing:

If we have to load a "package" (or rather: add a namespace) to use Sin or Pi, that would be just ridiculous. I fully agree. Python does it, but Python is a general purpose language. Mathematica is used in a different way most of the time.

Also, namespacing doesn't have to be as fine grained as in Python. What I envision is something like this: There should be a (still large) core language which lives in System`. Major areas of functionality, such as image processing, control systems, even geometric regions, etc. should live in their own separate namespace. They would still be builtins, implemented in the kernel itself, but they will have their own context.

Now the most controversial point will be this: should these contexts be in $ContextPath by default or not? Some will say yes, some no. But either way, they wouldn't conflict with packages anymore because when we do BeginPackage["Pack`"], $ContextPath gets set to {"Pack`", "System`"}, and this extra functionality (image processing, etc.) would be available to use for the package author within the package unless he explicitly requests it using something like BeginPackage["Pack`", {"ImageProcessing`"}].

So a possible solution could be that:

  1. every function is available by default for interactive use, without adding contexts/namespaces
  2. package authors do need to add these extra contexts explicitly; but at the same time they don't need to worry about conflicts with namespaces they don't add

About a package repository:

Again, I fully agree that having this would be great. But to set this up as a community project, and actually make it successful, is very, very difficult. Somebody could step up and create such a repository, but will it become popular? Will people put their package in the correct format to work with a special installer? The challenge is to convince people to use the repository and its package format.

That's why I think the initiative should come from Wolfram Research. There should be an officially endorsed, well documented package format and Mathematica should have a built-in installer accessible both through the existing File -> Install... and programmatically. Actually there is already a package format (see PacletInfo.m used in standard packages), but most of it is not documented. The most important bits, such as versioning, dependencies, etc. are not public.

There have been efforts to make something like this. Leonid Shifrin has his ProjectInstaller package, which has a lot of nice functionality. This is the idealistic approach, it wants to be complete and it defines its own package format with dependencies, etc. But it didn't really catch on.

There are other, more practically-minded solutions like Rolf Mertig's MathematicaPackageInstall tool, which will just unzip something.

When PackageData.net was launched, some people said that it should really act as a repository too, with auto-installable packages that handle dependencies, etc. (like Python's pip and pypi). And that would be great, but if we want to do all that, then PackageData just wouldn't happen as a community project. As a Wolfram-supported project, yes, but as something done by one guy in their spare time, very unlikely.

My hope was that if PackageData is launched in its current form, it could gradually grow and get more popular, and eventually get all these extra features.

But realistically, such an effort, completely with a versioned, dependency handling package format, has to come from Wolfram if it is to catch on.

POSTED BY: Szabolcs Horvát

namespacing

Yes, I figured that that would be the kind of namespacing you had in mind, I just wanted to caution against overshooting the target. There's probably a host of reasons why Python does this (I suspect the collaborative nature of development will be at the top of the list). It may very well be a reasonable path for many languages, and Python is a lovely language in the greater scheme of things; I'd just hate to see WL turn to this type of namespacing, not that I sense any immediate danger of that happening.

As you describe it though, I have to agree that there are portions of functionality that look as though they could be a clearly defined group of things. That might very well be feasible and would serve as a way to structure things a bit.

On the other hand, the result would be far from clash-proof, since any package can still be broken by any of the namespaces it uses. For instance, a package that is likely to be broken by new functions in the ImageProcessing namespace is also a package that very likely uses ImageProcessing already. Unless of course you want to get rid of $ContextPath altogether and force fully-qualified names for everything that's not System or Global. (I'm against that idea!)

repository

Regarding the package repository: I'm aware of most of the discussion (although I didn't participate), and I appreciate the difficulties. Of course something like this shouldn't be a community effort, that's just not how the Wolfram ecosystem is designed (and that's totally fine). My suggestion was really geared towards WRI, because honestly, the Library Archive doesn't cut it as a code repository.

In the meantime, you guys did a great job on PackageData and I'm definitely using it. I haven't tried the automatic installer solutions that have appeared recently because I have a peculiar setup and don't really use $UserBaseDirectory, but for the majority of use cases I can see the benefit of a unified installer.

prevention of extension

You also mentioned that a lot of functions are easy to use in their default mode, but really hard to extend (your examples were NIntegrate and Dendrogram). I just wanted to add that this is indeed also a pet peeve of mine, although I usually run into this with Plot and friends. I recall a talk by Stephen Wolfram where he said that people always ask for more customizability, but then a survey of real usage always shows that almost everyone uses the default even in places where it's easy enough to change (such as ViewPoint in Graphics3D). So maybe this reduction in extendability is actually a conscious decision.

POSTED BY: Bianca Eifert

Just a small correction: NIntegrate is actually extensible. Thanks @Anton Antonov !

POSTED BY: Szabolcs Horvát

On the name conflict with System. Two things: If one does not use UpperCase first letter for functions and symbols names in the package (as they should ;) then the chance of conflict is almost zero. Second: I always think it is best to call a function in a package using the syntax

  packageName`foo

After loading the package even though adding the context name is not needed. But suppose a package now had a function called NDSolve and one used

    packageName`NDSolve 

To call it, then there is no conflict, right? Would this not resolved the issue of name conflict? May be Mathematica should make this a requirement for calling functions in non-system context, i.e. one must always add the context name. I do this myself all the time, just to make it clear in the code, where a function came from, unless it is from the System of course. This way one can write

      packageOne`foo 
      packageTwo`foo 

in the same code and know from which package each function comes from. Do you think there will still be problem if these two things are followed by package developers and users?

POSTED BY: Nasser M. Abbasi

The conflict with System` symbols that I was referring to is the following. Suppose we have this package:

BeginPackage["MyPack`"];
DistanceMatrix;
Begin["`Private`"];
DistanceMatrix[pts_] := Outer[EuclideanDistance, pts, pts, 1]
End[];
EndPackage[];

Version 10.0 does not have System`DistanceMatrix, but it does have HierarchicalClustering`DistanceMatrix in the HierarchicalClustering package. If we load MyPack, it will work fine. If we now load HierarchicalClustering, then MyPack`DistanceMatrix will be shadowed, but it will still function correctly and can be called using the fully qualified name

MyPack`DistanceMatrix

HierarchicalClustering`DistanceMatrix and MyPack`DistanceMatrix can coexist peacefully, it's just necessary to use the full name to refer to them in an unambigous manner.

But what if we load this package in 10.3, which does have System`DistanceMatrix? It simply won't work. We get the error

SetDelayed::write: Tag DistanceMatrix in DistanceMatrix[pts_] is Protected. >>

To fix it, we need to modify the package now to look like this:

BeginPackage["MyPack`"];
MyPack`DistanceMatrix; (* must give fully qualified name here *)
Begin["`Private`"];
DistanceMatrix[pts_] := Outer[EuclideanDistance, pts, pts, 1]
End[];
EndPackage[];

So perhaps in this new era where the System context is so crowded, it is indeed necessary to use a fully qualified name in packages in the section where symbols are made public, so prepare for shadowing by a System symbol in the future.

But currently this is not standard practice. The only time it is done even in standard packages is when the shadowing has already happened, such as in Combinatorica.

POSTED BY: Szabolcs Horvát
Posted 9 years ago

Nice points!

Actually I think a `DistanceMatrix would suffice instead of a fully qualified name. It shouldn't be such a breaking change to get used to adding a heading `.

POSTED BY: Rui Rojo

Writting big packages and using full names everywhere just in case is something I can't agree on :-)

It will be a pain and it will be unreadable.

POSTED BY: Kuba Podkalicki

I agree completely. Support for 3rd party packages used to be done by a group of 3 people. Now it is a collateral duty for one person.

POSTED BY: Frank Kampas
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract