Message Boards Message Boards

1
|
445 Views
|
14 Replies
|
14 Total Likes
View groups...
Share
Share this post:

Alphabetizing the Integers: Is there a better way?

This results from an idea by Philip Cohen, noted by A. Ross Eckler in 1981's Alphabetizing the Integers, which was used recently to formalize and extend an existing sequence in the OEIS. I struggled to make Mathematica do what it needed to do. Worse, it took me most of a day to generate the ten thousand terms that a co-author of the sequence managed in minutes using Python.

I wanted something like

SortBy[Range[13],IntegerName]
{8,11,5,4,9,1,7,6,10,13,3,12,2}

and then pick out the fixed points (integers at their sequence position: {4,7,12}). Unfortunately, this SortBy procedure doesn't do a true AlphabeticSort as evidenced by doing Range[800] where the 18 appears before the 800 in the output (an alphabetic sort should have a space precede any letter). We also need IntegerName to default to "Words", the full letters-only descriptor. Another caveat is that we need to strip the commas out of all our IntegerName conversions prior to sorting. Failing to do so might compromise the number of items in our word list (as commas are used as list delimiters). I finally settled on this clunker:

Do[w=AlphabeticSort[Table[StringReplace[IntegerName[n,"Words"],{","->"","-"->"-"}],{n,i}]];k={};Do[If[StringReplace[IntegerName[j,"Words"],{","->"","-"->"-"}]==w[[j]],AppendTo[k,j]],{j,i}];Print[i," ",k],{i,13}]
1 {1}
2 {1,2}
3 {1}
4 {3}
5 {}
6 {}
7 {}
8 {6}
9 {}
10 {}
11 {4,7}
12 {4,7}
13 {4,7,12}

In addition to stripping the commas from IntegerName, I replaced the special (character code 8208) hyphen with the generic (character code 45) hyphen. That's because I don't know how AlphabeticSort deals with hyphens in words and, anyhow, I have to share results with people who use generic hyphens. Once I have my alphabetically-sorted list, I have to recapture their numeric equivalents by doing the StringReplace all over again.

POSTED BY: Hans Havermann
14 Replies

Somewhat better yet:

Length /@ 
  With[{iNames = 
     StringReplace[
      IntegerName[Range@10000, "Words"], {"," -> "", "-" -> "-"}]},
   With[{orderingAll = 
      Position[iNames, #] & /@ AlphabeticSort[iNames] // Flatten},
    Table[With[{seq = Select[orderingAll, # <= n &]},
      Position[Range@n - seq, 0]], {n, Length@iNames}]
    ]] // AbsoluteTiming

(* {30.4936,
    {1, 2, 1, 1, 0, 0, 0, 1, 0, 0, 2, 2,..., 2, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0}} *)
POSTED BY: Michael Rogers

Apparently, I'm a slow thinker about fast computation. A refinement that came to me during breakfast:

Length /@ 
   With[{iNames = 
      StringReplace[
       IntegerName[Range@10000, "Words"], {"," -> "", "-" -> "-"}]},
    With[{orderingAll = 
       Position[iNames, #] & /@ AlphabeticSort[iNames] // Flatten},
     Table[
      With[{seq = Pick[orderingAll, UnitStep[n - orderingAll], 1]},
       Position[Range@n - seq, 0]], {n, Length@iNames}]
     ]]; // AbsoluteTiming

(* {6.70907, 
    {1, 2, 1, 1, 0, 0, 0, 1, 0, 0, 2, 2,..., 2, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0}} *)

11 min. for 100K terms on a 8+2 core MacBook Pro M1 Max (used 40-50% CPU due to vectorization of some operations — unsure how that translates to other CPUs). :)

POSTED BY: Michael Rogers

One can see from my two timings that an 10-fold increase in the problem size led to a 100-fold increase in time. If a reliable estimate, another 10-fold increase in the problem would take a minimum of 2000 minutes (20 min x 100), or 33-34 hours, provided it ran at the same efficiency. Since the vectorization relies on efficient cache operation, at some point I would expect some degradation of efficiency once the problem is too large. I'm not anywhere near an expert on CPU architecture, but I've observed such degradation in trial timings. If you pass that threshold, then expect it to take more than 34 hours.

Also when I said 40-50%, I meant of the total, that is, 400+% CPU usage. I assume I didn't reach 800% or higher, because the vectorized operations alternate with other operations. For larger ranges, more time will be spent in vectorized operations, which may explain why you're getting 90% of max CPU.

POSTED BY: Michael Rogers

To get the fixed points, remove Length from

Length /@ With[ ...]

Replace it with Flatten to get lists of fixed points:

Flatten /@ With[ ...]

Without Flatten, you get Position forms, instead of Part index forms, that is {{4}, {7}, {12}} instead of {4, 7, 12}.


My approach is based on this elegant solution, which unfortunately is invalid because the default order puts 18 before 800:

fixedPoints = Position[# - Ordering[IntegerName[#, "Words"]], 0] &;
Range[13] // fixedPoints (* 27 sec for Range[10^6] *)
(*  {{4}, {7}, {12}}  *)

There is no built-in ordering for AlphabeticSort[], which is unfortunate for your problem. So I made my own:

alphabeticOrdering = Function[list, Position[list, #] & /@ AlphabeticSort[list] // Flatten];
Range[800] // alphabeticOrdering[IntegerName[#, "Words"]] & // Take[#, 5] &
Range[800] // Ordering[IntegerName[#, "Words"]] & // Take[#, 5] &
(*
{8, 800, 18, 80, 88}
{8, 18, 800, 80, 88}
*)

alphabeticFixedPoints = Position[# - alphabeticOrdering[IntegerName[#, "Words"]], 0] &;
Range[800] // alphabeticFixedPoints
(*  {{15}}  *)

The rest came from your suggestion not to re-do the sorting each time. I couldn't find an efficient way to insert a new word in order (among built-in functions) — maybe there is one — so I tried sorting all once and filtering the sorted names.

UnitStep[a - b] is a numeric way to compute a < b (return value 0) and a >= b (return value 1). It is vectorized, so it is fast for packed arrays a and b. Pick[] is also a fast way to filter packed lists.

One way to break down code is to use a small example and examine the steps. I usually copy the whole code, then strip off the layers down to the core, and examine the output as I add a layer at a time back. For small codes, one might use Echo[] with labels to see the whole process. The following filters Range[8] from Range[12]:

Length /@ With[{iNames =
    Echo[
     StringReplace[
      IntegerName[Range@12, "Words"], {"," -> "", "-" -> "-"}],
     "Initial"]},
  With[{orderingAll =
     Echo[
      Position[iNames, #] & /@
        Echo[
         AlphabeticSort[iNames],
         "Sorted"] //
       Flatten,
      "Ordering"]},
   Table[
    With[{seq =
       Echo[
        Pick[orderingAll,
         Echo[
          UnitStep[n - orderingAll],
          "UnitStep: < -> 0, >= -> 1"],
         1],
        "Pick"]},
     Echo[
      Position[Range@n - seq, 0],
      "Fixed-point positions"]],
    {n, 8, 8}]]]

enter image description here

HTH.

POSTED BY: Michael Rogers
Posted 23 days ago

Maybe something along these lines?

FixedIntegersByName[n_] := 
  Count[
    Transpose[{Range[n], SortBy[Range[n], IntegerName[#, "Words"] &, AlphabeticOrder]}], 
    {i_, i_}]

This reproduces the sequence shown at OEIS (up to 87)

FixedIntegersByName /@ Range[87]
(* 
  {1, 2, 1, 1, 0, 0, 0, 1, 0, 0, 2, 2, 3, 2, 0, 1, 0, 1, 1, 1, 1, 1, 2, 
  2, 2, 1, 1, 1, 2, 2, 3, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 1, 1, 2, 2, 2, 
  2, 2, 2, 0, 1, 1, 2, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2}
 *)
POSTED BY: Eric Rimbey

Maybe this:

a = Function[n,
   IntegerName[Range@n, "Words"] //
    Position[# - AlphabeticSort[#], 0] &];

Length@*a /@ Range[1000] // AbsoluteTiming
(*  {9.982, {1, 2, 1, 1, 0, 0, 0, 1, 0, 0, 2, 2, 3, 2, 0,...} *)

Range[2000] takes 40 sec. so probably n^2 time complexity.

POSTED BY: Michael Rogers

Your idea is better than my other answer, I think, but I was hoping for faster:

Length /@ With[{iNames =
      StringReplace[
       IntegerName[Range@10000, "Words"], {"," -> "", "-" -> "-"}]},
    FoldList[
     {Position[Take[iNames, Length@#] - #, 0], #} &@
       AlphabeticSort@
        Flatten@{Last[#], #2} &,
     {0, {}}, iNames]
    ][[2 ;;, 1]] // AbsoluteTiming

(* {156.516,
    {1, 2, 1, 1, 0, 0, 0, 1, 0, 0, 2, 2,..., 2, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0}} *)
POSTED BY: Michael Rogers

Here's a thought. We're doing an AlphabeticSort on the IntegerName of a large — say n — Range of numbers. This takes a certain length of time. Then we do the same thing for n+1, which should take slightly longer. Wouldn't it be much faster if there was an easy way to fit the IntegerName of n+1 into the already-sorted list for Range[n]?

To example this, we've done a sort for Range[13] yielding {8,11,5,4,9,1,7,6,10,13,3,12,2}. Now, instead of doing a sort for Range[14], isn't it possible to deduce where in this list for Range[13] the 14 is to be inserted without re-sorting the entire list?

POSTED BY: Hans Havermann

Your code reproduces the OEIS sequence up to 1000 in ~10 minutes on my (busy-with-other-stuff) 2020 iMac. That's hopeful. Up to 2000 in ~45 minutes, all terms correct!

POSTED BY: Hans Havermann

Oops, forgot your StringReplace[]. Sorry about that.

POSTED BY: Michael Rogers

Remarkable! I was able to generate 100000 terms in just over 2 hours.

POSTED BY: Hans Havermann

20 minutes for 100K terms on a 2020 iMac. It's a 10-core Intel i9 but I do have a bunch of other stuff running on it. I'm going to try for a million terms.

POSTED BY: Hans Havermann

I have no sense how long my million-term run is supposed to take but it's been more than 24 hours. I should have started smaller! Also, the WolframKernel for this is running at over 900% CPU and I had no idea that this was going to happen. Finally, I've been relying on copy/paste to transfer these programs to my computer and I realize now that my intended StringReplace of the IntegerName hyphen (CharacterCode 8208) to a generic hyphen (CharacterCode 45) isn't actually in any of the code on this page because any pasting here of the former ends up looking like the latter.

POSTED BY: Hans Havermann

Thank you for this. One more thing, as your code is a little obscure to me in terms of how exactly it works. If I wanted the fixed points of a single number (i.e., {4,7,12} for 13), what needs to change?

POSTED BY: Hans Havermann
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract