Sorting numerical data and the behavior of Sort
Mathematica has various ways of sorting data. Most commonly one wants to sort numbers. The command Sort can be used to do so:
Sort[{4, 2, 8, 1}]
returns:
{1, 2, 4, 8}
all very well. However, if one has symbolic answers, Sort might not do what you expect:
data3 = Sin[Range[10]]
data3 // N
Sort[data3]
gives:
{Sin[1],Sin[2],Sin[3],Sin[4],Sin[5]}
{0.841471,0.909297,0.14112,-0.756802,-0.958924}
{Sin[1],Sin[2],Sin[3],Sin[4],Sin[5]}
It looks like if Sort did not do anything! What is happening here? Well, Sort puts things in canonical order and decides whether they are ordered correctly using the default function Order (which works basically the same as OrderedQ):
OrderedQ[1,2] (* True, they are ordered *)
OrderedQ[2,1] (* False; they are not ordered *)
OrderedQ[Sin[2], Sin[3]] (* True, they are ordered! *)
That might be a bit unexpected. Sort will only look at the structure, and then 2 comes before 3. If we want to explicitly sort by numerical value (rather than canonical order) one needs to specify this:
data3 = Sin[Range[5]]
data3 // N
Sort[data3, Less] (* prior to V11.1 *)
N[%]
Sort[data3, NumericalOrder] (* V11.1 and up *)
N[%]
One can do this in the above two ways (using NumericalOrder rather than Order) or using the function Less which will evaluate the number numerically and then compare them. Since Version 11.1 there is also a new function called NumericalSort which just does that:
data3 = Sin[Range[5]]
data3 // N
NumericalSort[data3]
N[%]
Now that we know why Sort sometimes gives unexpected results, and that we have found alternate ordering functions, we find that this is generally not fast.
data = RandomReal[1, 10^6];
AbsoluteTiming[Sort[data, Less];]
It takes several seconds (5.3 seconds on my laptop) to sort a million numbers! Ok, what's going on here? The problem is that Sort with a second argument does pair-wise comparisons, such that the algorithm scales very poorly. If the problem can be re-stated such that each element in a list gets a numerical value which can be sorted using the default ordering function (Order), then this is much faster. We can then improve the performance considerably by using SortBy and a second argument:
AbsoluteTiming[SortBy[data, N];]
or equivalently in this case (we already have approximate numbers so N is not necessary):
AbsoluteTiming[SortBy[data, Identity];]
which only takes 0.17 seconds on my laptop. Since we have approximate numbers, Sort itself works correctly and is fast as well:
AbsoluteTiming[Sort[data];]
Takes 0.16 seconds on my laptop.
Summary
If one only has pure numbers (1337, 1.337 et cetera) use Sort (or from version 11.1 on use NumericalSort).
If one has numbers but in some kind of symbolic form (Sin[3], Sqrt[5], Pi, E, Exp[2], 10 Degree, Quantity[Sqrt[..],...] or ...) use NumericalSort (or SortBy[... , N] for versions prior to 11.1).
If your problem is not one of those, then check if the problem can be recast in one where a single function gives a 'value' to each element that then can be sorted efficiently, if that is possible use SortBy with the correct second argument
Compare the following two examples:
SortBy[{"abc","ab","e","cd","efg"},StringLength]
Sort[{"abc","ab","e","cd","efg"},StringLength[#1]<=StringLength[#2]&]
Give the same result but the first one is generally much faster.
- If such a rephrasing of the problem is not possible use the general Sort with appropriate ordering function.
Final words
Much more can be said about Sort (like the ordering of Sort for a mix of strings, symbols, and numbers). Feel free to comment on this post for additional things to consider when sorting...