Message Boards Message Boards

Cluster of the new functions in Mathematica 10


cover image

Now Mathematica 10.4 was released, I was excited and cannot wait going through all the new functions. Usually I do it from the "Summary of New Features in 10.4", which is quite a great collection of entrances to the new features' separate documentations with detailed category according to their functionalities. However, given the obvious naming convention of the built-in functions, I was always wondering another kind of "category": what it is alike to group them by the similarity of their names. Now of course it looks doable by iteratively applying the FindClusters with a DistanceFunction like EditDistance, but to really implement it will turn out to be a bit clumpy - especially comparing to what I used in the following text :)

Look at the Summary page, under the Mathematical Computation & Algorithms > Cluster Analysis, there is a new introduced function called ClusteringTree, whose function can be roughly considered as the "hierarchical version" of FindClusters, with some Graph-related wrapper. That is our star today.

Now our goal is to

  1. hierarchically group the function names according to their literal similarity,

  2. and to style the result to a more visually informative way.

With the help of ClusteringTree, the 1st goal can be accomplished fully automatically, while the 2nd one is also partly done.

Details, details ...

OK. For a step-by-step walk-through, we first extract all the newly introduced functions/symbols by scanning all the documentation notebooks with word "NEW IN xxx":

snapshot of NEW IN xxx

referencePath = "C:\\Program Files\\Wolfram Research\\Mathematica\\10.4\\Documentation\\English\\System\\ReferencePages\\";
filelst = FileNames["*.nb", {referencePath}, ∞];

taglst = Function[filename,
                        Module[{file, str},
                            file = OpenRead[filename];
                            str = ReadList[file, "String", 1000] // StringJoin;
                                str, ___ ~~ "\"\\<\\\"NEW IN " ~~ v : Shortest[___] ~~ 
                                        "\\\"\\>\"" ~~ ___ :> v]
                        ] /@ filelst; // AbsoluteTiming // First

(* Out: 22.2431 *)

Then we group them by the version number of their first debut:

newGroup = Select[
                        StringDrop[FileNameSplit[#][[-1]], -3] & /@ filelst,
                        taglst /. {} -> 0 /. {s_String} :> s
                #[[2]] =!= 0 &
                ] // GroupBy[Last];
newGroup // Dataset


From the statistics we can see the trend of new feature introduction since version 10:

Length /@ newGroup //
        Labeled[#2, #1] & @@@ # &,
                PlotTheme -> {"Business", "SizeScale"},
                PlotRange -> {{.5, 6.5}, All},
                ScalingFunctions -> "Log"
                ] &


As the new functions introduced in the version 10 is too large, we'll omit them herefrom (but please do check the 2MB PNG attachment at the end of the post!):

newFuncHistory = 
        newGroup[[{"10.0", "10.1", "10.3", "10.4", "10.2"}]] //
                Flatten[#, 1] &,
                MapIndexed[Join[##] &, #] &
        newFuncHistory[[1 ;; 3]],
        {Table["\[VerticalEllipsis]", 3]},
        newFuncHistory[[-3 ;;]]
        ] // Grid


At this stage we are all good for trying ClusteringTree:

ClusteringTree[newFuncHistory[[;; 50, 1]], ClusterDissimilarityFunction -> "Ward"]


Now the default appearance from ClusteringTree looks possibly not that good for larger tree like the full data of newFuncHistory, we are going to post-process it a bit.

First we would like to rotate the tree to grow from left to right:

rotateRule = {x_Real, y_Real} :> {-6. y, x};

Then we might wish to use a more curved edge style than straight-line:

bzcRule = {a : {x1_Real, y1_Real}, b : {x2_Real, y2_Real}} :>
            BezierCurve[{a, {Mean[{x1, x2}], y1}, {Mean[{x1, x2}], y2}, b}, SplineDegree -> 3];

Last, as the function names are going to be labels in our final graphics, we would like to style them according to their introduced versions. More specifically, we would like to place the newest ones on the toppest layers, render them with the biggest font-size and highlight with distinguishable colors:

keys = Keys[newGroup] // Sort;
colorRule = 
    MapIndexed[#1 -> Lighter[ColorData["Rainbow"][Rescale[#2[[1]], {1, Length@keys}]], .8] &, keys]
shadowRule = 
    MapIndexed[#1 -> Round[Rescale[#2[[1]], {1, Length@keys}, {1, 5}]] &, keys]
sizeRule = 
    MapIndexed[#1 -> Round[Rescale[Rescale[#2[[1]], {1, Length@keys}]^2, {0, 1}, {8, 15}]] &, keys]

styling rules

All of those styling rules go into this label rendering function:

fbFunc = Function[{str, ver},
                    Style[str, FontSize -> (ver /. sizeRule)],
                    FrameMargins -> None,
                    FrameStyle -> None,
                    Background -> (ver /. colorRule),
                    RoundingRadius -> 3
                    ] //
                If[ver === "10", #,
                        (* for the shadow effect: *)
                            FrameMargins -> ({{-1, 1.2 ver}, {ver, -1}} /. shadowRule),
                            FrameStyle -> None,
                            Background -> GrayLevel[0.1, 0.21],
                            RoundingRadius -> 5
                        ] &

Note here we used nested Frameds, with the outer one for simulating a drop-shadow effect. If you find the covering picture a little bit of 3D feeling, he's the magic. :)

Now we have the rendering function, we can construct a substitution rule fitting the data structure of the result returned by ClusteringTree:

styleRule = 
        StringJoin["\"", #1, "\""] -> {fbFunc[#1, #2], #3} & @@@ newFuncHistory;
labelFunc = Function[{str, pos},
              {Inset[#1, pos, {-1, 0}, BaseStyle -> "Graphics"], #2} & @@ (str /. styleRule)

Finally, we compose all the steps together, a "fancy" version of ClusteringTree is here (better right-click and open in a new tab for full size of view):

newFuncHistory[[;; , 1]] //
        (* here we only randomly take 100 samples as a test: *)
        RandomSample[#, 100] &,
        ClusteringTree[#, ClusterDissimilarityFunction -> "Ward"] &,
        #[[1, 2, 2, 1]] &,
        Cases[#, (LineBox | InsetBox)[__], ∞] &,
        # /. {
                    LineBox[pts_] :> 1[pts /. rotateRule],
                    InsetBox[BoxData[""], __] :> Sequence[],
                    InsetBox[FormBox[FrameBox[str_, __], TraditionalForm], pos_, 
                            BaseStyle -> "Graphics"] :> 2[{str, pos /. rotateRule}]
                    } &,
        GroupBy[Head -> First],
        MapAt[Join[{Hue[0.15, 0.09, 0.84], Thick}, 
                        Flatten[# /. bzcRule]] &, #, 1] &,
        MapAt[labelFunc @@@ # &, #, 2] &,
        MapAt[SortBy[Last], #, 2] &,
        MapAt[#[[;; , 1]] &, #, 2] &,
        Graphics[#, Background -> Hue[0.15, 0.03, 0.98]] &


By taking all the function names from 10.0 to 10.4, we can have a more comprehensive infograph:


And for those kind readers having reached the end of this post, an infograph counting from the initial version of 10.x (it's a thumbnail with a hyperlink to a 1.93MB PNG with size of 1939 x 14621, so you may want to right-click and open link in new tab! :)

NewFunctionSince10 - Thumbnail

POSTED BY: Silvia Hao
1 year ago

Very nice job! I like the Bezier curves! very neat!

POSTED BY: Sander Huisman
1 year ago

Thanks Sander! I hope those Bezier curves don't cause too much lag when resizing the graphics :)

POSTED BY: Silvia Hao
1 year ago

Thanks Silvio for sharing this gem of programming and visualization with the rest of the Community. It gives a terrific insight into the new functionality in M10.4


POSTED BY: Michael Kelly
1 year ago

This diagram is also a very good summary for WRI developers. I feel lost in a vast amount new functions added inside each major release.

POSTED BY: Shenghui Yang
1 year ago

Thanks Shenghui! I'm flattered to know that! I always consider pictures much more intuitive than plain text, and infograph is a really powerful thing!

POSTED BY: Silvia Hao
1 year ago

Thanks Michael! @Vitaliy Kaurov suggested me to take advantage of the computable WolframLanguageData, I believe that can bring better infograph, maybe from a different angle.

POSTED BY: Silvia Hao
1 year ago

enter image description here - another post of yours has been selected for the Staff Picks group, congratulations !

We are happy to see you at the tops of the "Featured Contributor" board. Thank you for your wonderful contributions, and please keep them coming!

POSTED BY: Moderation Team
1 year ago

Thanks! It's my pleasure to be able to sharing those ideas!

POSTED BY: Silvia Hao
1 year ago

Group Abstract Group Abstract