Group Abstract

Message Boards

WOLFRAM COMMUNITY

2.1K Views

14 Replies

14 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Wolfram Language

Alphabetizing the Integers: Is there a better way?

Hans Havermann

Hans Havermann, Glad Hobo Express

Posted 1 year ago

This results from an idea by Philip Cohen, noted by A. Ross Eckler in 1981's Alphabetizing the Integers, which was used recently to formalize and extend an existing sequence in the OEIS. I struggled to make Mathematica do what it needed to do. Worse, it took me most of a day to generate the ten thousand terms that a co-author of the sequence managed in minutes using Python. I wanted something like SortBy[Range[13],IntegerName] {8,11,5,4,9,1,7,6,10,13,3,12,2} and then pick out the fixed points (integers at their sequence position: {4,7,12}). Unfortunately, this SortBy procedure doesn't do a true AlphabeticSort as evidenced by doing Range[800] where the 18 appears before the 800 in the output (an alphabetic sort should have a space precede any letter). We also need IntegerName to default to "Words", the full letters-only descriptor. Another caveat is that we need to strip the commas out of all our IntegerName conversions prior to sorting. Failing to do so might compromise the number of items in our word list (as commas are used as list delimiters). I finally settled on this clunker: Do[w=AlphabeticSort[Table[StringReplace[IntegerName[n,"Words"],{","->"","-"->"-"}],{n,i}]];k={};Do[If[StringReplace[IntegerName[j,"Words"],{","->"","-"->"-"}]==w[[j]],AppendTo[k,j]],{j,i}];Print[i," ",k],{i,13}] 1 {1} 2 {1,2} 3 {1} 4 {3} 5 {} 6 {} 7 {} 8 {6} 9 {} 10 {} 11 {4,7} 12 {4,7} 13 {4,7,12} In addition to stripping the commas from IntegerName, I replaced the special (character code 8208) hyphen with the generic (character code 45) hyphen. That's because I don't know how AlphabeticSort deals with hyphens in words and, anyhow, I have to share results with people who use generic hyphens. Once I have my alphabetically-sorted list, I have to recapture their numeric equivalents by doing the StringReplace all over again.

POSTED BY: Hans Havermann

14 Replies

Sort By:

Michael Rogers

Michael Rogers, Emory University

Posted 1 year ago

Apparently, I'm a slow thinker about fast computation. A refinement that came to me during breakfast: Length /@ With[{iNames = StringReplace[ IntegerName[Range@10000, "Words"], {"," -> "", "-" -> "-"}]}, With[{orderingAll = Position[iNames, #] & /@ AlphabeticSort[iNames] // Flatten}, Table[ With[{seq = Pick[orderingAll, UnitStep[n - orderingAll], 1]}, Position[Range@n - seq, 0]], {n, Length@iNames}] ]]; // AbsoluteTiming (* {6.70907, {1, 2, 1, 1, 0, 0, 0, 1, 0, 0, 2, 2,..., 2, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0}} *) 11 min. for 100K terms on a 8+2 core MacBook Pro M1 Max (used 40-50% CPU due to vectorization of some operations — unsure how that translates to other CPUs). :)

Apparently, I'm a slow thinker about fast computation. A refinement that came to me during breakfast:

Length /@ 
   With[{iNames = 
      StringReplace[
       IntegerName[Range@10000, "Words"], {"," -> "", "-" -> "-"}]},
    With[{orderingAll = 
       Position[iNames, #] & /@ AlphabeticSort[iNames] // Flatten},
     Table[
      With[{seq = Pick[orderingAll, UnitStep[n - orderingAll], 1]},
       Position[Range@n - seq, 0]], {n, Length@iNames}]
     ]]; // AbsoluteTiming

(* {6.70907, 
    {1, 2, 1, 1, 0, 0, 0, 1, 0, 0, 2, 2,..., 2, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0}} *)

11 min. for 100K terms on a 8+2 core MacBook Pro M1 Max (used 40-50% CPU due to vectorization of some operations — unsure how that translates to other CPUs). :)

POSTED BY: Michael Rogers

Michael Rogers

Michael Rogers, Emory University

Posted 1 year ago

Somewhat better yet: Length /@ With[{iNames = StringReplace[ IntegerName[Range@10000, "Words"], {"," -> "", "-" -> "-"}]}, With[{orderingAll = Position[iNames, #] & /@ AlphabeticSort[iNames] // Flatten}, Table[With[{seq = Select[orderingAll, # <= n &]}, Position[Range@n - seq, 0]], {n, Length@iNames}] ]] // AbsoluteTiming (* {30.4936, {1, 2, 1, 1, 0, 0, 0, 1, 0, 0, 2, 2,..., 2, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0}} *)

Somewhat better yet:

Length /@ 
  With[{iNames = 
     StringReplace[
      IntegerName[Range@10000, "Words"], {"," -> "", "-" -> "-"}]},
   With[{orderingAll = 
      Position[iNames, #] & /@ AlphabeticSort[iNames] // Flatten},
    Table[With[{seq = Select[orderingAll, # <= n &]},
      Position[Range@n - seq, 0]], {n, Length@iNames}]
    ]] // AbsoluteTiming

(* {30.4936,
    {1, 2, 1, 1, 0, 0, 0, 1, 0, 0, 2, 2,..., 2, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0}} *)

POSTED BY: Michael Rogers

Michael Rogers

Michael Rogers, Emory University

Posted 1 year ago

To get the fixed points, remove `Length` from Length /@ With[ ...] Replace it with `Flatten` to get lists of fixed points: Flatten /@ With[ ...] Without `Flatten`, you get `Position` forms, instead of `Part` index forms, that is `{{4}, {7}, {12}}` instead of `{4, 7, 12}`. My approach is based on this elegant solution, which unfortunately is invalid because the default order puts 18 before 800: fixedPoints = Position[# - Ordering[IntegerName[#, "Words"]], 0] &; Range[13] // fixedPoints (* 27 sec for Range[10^6] ) ( {{4}, {7}, {12}} ) There is no built-in ordering for `AlphabeticSort[]`, which is unfortunate for your problem. So I made my own: alphabeticOrdering = Function[list, Position[list, #] & /@ AlphabeticSort[list] // Flatten]; Range[800] // alphabeticOrdering[IntegerName[#, "Words"]] & // Take[#, 5] & Range[800] // Ordering[IntegerName[#, "Words"]] & // Take[#, 5] & ( {8, 800, 18, 80, 88} {8, 18, 800, 80, 88} ) alphabeticFixedPoints = Position[# - alphabeticOrdering[IntegerName[#, "Words"]], 0] &; Range[800] // alphabeticFixedPoints ( {{15}} *) The rest came from your suggestion not to re-do the sorting each time. I couldn't find an efficient way to insert a new word in order (among built-in functions) — maybe there is one — so I tried sorting all once and filtering the sorted names. `UnitStep[a - b]` is a numeric way to compute `a < b` (return value 0) and `a >= b` (return value 1). It is vectorized, so it is fast for packed arrays `a` and `b`. `Pick[]` is also a fast way to filter packed lists. One way to break down code is to use a small example and examine the steps. I usually copy the whole code, then strip off the layers down to the core, and examine the output as I add a layer at a time back. For small codes, one might use `Echo[]` with labels to see the whole process. The following filters `Range[8]` from `Range[12]`: Length /@ With[{iNames = Echo[ StringReplace[ IntegerName[Range@12, "Words"], {"," -> "", "-" -> "-"}], "Initial"]}, With[{orderingAll = Echo[ Position[iNames, #] & /@ Echo[ AlphabeticSort[iNames], "Sorted"] // Flatten, "Ordering"]}, Table[ With[{seq = Echo[ Pick[orderingAll, Echo[ UnitStep[n - orderingAll], "UnitStep: < -> 0, >= -> 1"], 1], "Pick"]}, Echo[ Position[Range@n - seq, 0], "Fixed-point positions"]], {n, 8, 8}]]] HTH.

POSTED BY: Michael Rogers

Michael Rogers

Michael Rogers, Emory University

Posted 1 year ago

One can see from my two timings that an 10-fold increase in the problem size led to a 100-fold increase in time. If a reliable estimate, another 10-fold increase in the problem would take a minimum of 2000 minutes (20 min x 100), or 33-34 hours, provided it ran at the same efficiency. Since the vectorization relies on efficient cache operation, at some point I would expect some degradation of efficiency once the problem is too large. I'm not anywhere near an expert on CPU architecture, but I've observed such degradation in trial timings. If you pass that threshold, then expect it to take more than 34 hours. Also when I said 40-50%, I meant of the total, that is, 400+% CPU usage. I assume I didn't reach 800% or higher, because the vectorized operations alternate with other operations. For larger ranges, more time will be spent in vectorized operations, which may explain why you're getting 90% of max CPU.

POSTED BY: Michael Rogers

Michael Rogers

Michael Rogers, Emory University

Posted 1 year ago

Your idea is better than my other answer, I think, but I was hoping for faster: Length /@ With[{iNames = StringReplace[ IntegerName[Range@10000, "Words"], {"," -> "", "-" -> "-"}]}, FoldList[ {Position[Take[iNames, Length@#] - #, 0], #} &@ AlphabeticSort@ Flatten@{Last[#], #2} &, {0, {}}, iNames] ][[2 ;;, 1]] // AbsoluteTiming (* {156.516, {1, 2, 1, 1, 0, 0, 0, 1, 0, 0, 2, 2,..., 2, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0}} *)

Your idea is better than my other answer, I think, but I was hoping for faster:

Length /@ With[{iNames =
      StringReplace[
       IntegerName[Range@10000, "Words"], {"," -> "", "-" -> "-"}]},
    FoldList[
     {Position[Take[iNames, Length@#] - #, 0], #} &@
       AlphabeticSort@
        Flatten@{Last[#], #2} &,
     {0, {}}, iNames]
    ][[2 ;;, 1]] // AbsoluteTiming

(* {156.516,
    {1, 2, 1, 1, 0, 0, 0, 1, 0, 0, 2, 2,..., 2, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0}} *)

POSTED BY: Michael Rogers

Michael Rogers

Michael Rogers, Emory University

Posted 1 year ago

Maybe this: a = Function[n, IntegerName[Range@n, "Words"] // Position[# - AlphabeticSort[#], 0] &]; Length@a /@ Range[1000] // AbsoluteTiming ( {9.982, {1, 2, 1, 1, 0, 0, 0, 1, 0, 0, 2, 2, 3, 2, 0,...} *) `Range[2000]` takes 40 sec. so probably n^2 time complexity.

POSTED BY: Michael Rogers

Eric Rimbey

Posted 1 year ago

Maybe something along these lines? FixedIntegersByName[n_] := Count[ Transpose[{Range[n], SortBy[Range[n], IntegerName[#, "Words"] &, AlphabeticOrder]}], {i_, i_}] This reproduces the sequence shown at OEIS (up to 87) FixedIntegersByName /@ Range[87] (* {1, 2, 1, 1, 0, 0, 0, 1, 0, 0, 2, 2, 3, 2, 0, 1, 0, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 3, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 0, 1, 1, 2, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2} *)

Maybe something along these lines?

FixedIntegersByName[n_] := 
  Count[
    Transpose[{Range[n], SortBy[Range[n], IntegerName[#, "Words"] &, AlphabeticOrder]}], 
    {i_, i_}]

This reproduces the sequence shown at OEIS (up to 87)

FixedIntegersByName /@ Range[87]
(* 
  {1, 2, 1, 1, 0, 0, 0, 1, 0, 0, 2, 2, 3, 2, 0, 1, 0, 1, 1, 1, 1, 1, 2, 
  2, 2, 1, 1, 1, 2, 2, 3, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 1, 1, 2, 2, 2, 
  2, 2, 2, 0, 1, 1, 2, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2}
 *)

POSTED BY: Eric Rimbey

Hans Havermann

Hans Havermann, Glad Hobo Express

Posted 1 year ago

Thank you for this. One more thing, as your code is a little obscure to me in terms of how exactly it works. If I wanted the fixed points of a single number (i.e., {4,7,12} for 13), what needs to change?

POSTED BY: Hans Havermann

Hans Havermann

Hans Havermann, Glad Hobo Express

Posted 1 year ago

I have no sense how long my million-term run is supposed to take but it's been more than 24 hours. I should have started smaller! Also, the WolframKernel for this is running at over 900% CPU and I had no idea that this was going to happen. Finally, I've been relying on copy/paste to transfer these programs to my computer and I realize now that my intended StringReplace of the IntegerName hyphen (CharacterCode 8208) to a generic hyphen (CharacterCode 45) isn't actually in any of the code on this page because any pasting here of the former ends up looking like the latter.

POSTED BY: Hans Havermann

Hans Havermann

Hans Havermann, Glad Hobo Express

Posted 1 year ago

20 minutes for 100K terms on a 2020 iMac. It's a 10-core Intel i9 but I do have a bunch of other stuff running on it. I'm going to try for a million terms.

POSTED BY: Hans Havermann

Hans Havermann

Hans Havermann, Glad Hobo Express

Posted 1 year ago

Remarkable! I was able to generate 100000 terms in just over 2 hours.

POSTED BY: Hans Havermann

Michael Rogers

Michael Rogers, Emory University

Posted 1 year ago

Oops, forgot your `StringReplace[]`. Sorry about that.

POSTED BY: Michael Rogers

Hans Havermann

Hans Havermann, Glad Hobo Express

Posted 1 year ago

Your code reproduces the OEIS sequence up to 1000 in ~10 minutes on my (busy-with-other-stuff) 2020 iMac. That's hopeful. Up to 2000 in ~45 minutes, all terms correct!

POSTED BY: Hans Havermann

Hans Havermann

Hans Havermann, Glad Hobo Express

Posted 1 year ago

Here's a thought. We're doing an AlphabeticSort on the IntegerName of a large — say n — Range of numbers. This takes a certain length of time. Then we do the same thing for n+1, which should take slightly longer. Wouldn't it be much faster if there was an easy way to fit the IntegerName of n+1 into the already-sorted list for Range[n]? To example this, we've done a sort for Range[13] yielding {8,11,5,4,9,1,7,6,10,13,3,12,2}. Now, instead of doing a sort for Range[14], isn't it possible to deduce where in this list for Range[13] the 14 is to be inserted without re-sorting the entire list?

POSTED BY: Hans Havermann

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback