Message Boards Message Boards

0
|
31315 Views
|
18 Replies
|
13 Total Likes
View groups...
Share
Share this post:

Find the largest element in a list and its position?

Posted 7 years ago

I want to find the largest element in a very long list. And its position in that list at the same time.

Of course I can do something like

{max}=TakeLargest[list,1];
{iMax}=FirstPosition[list,max];

But this is expensive if list is a complicated or long expression. Is there something like:

{max,iMax}=FindLargest[list]

I cannot find such a function within WL.

Any help?

POSTED BY: Werner Geiger
18 Replies
Posted 7 years ago

Thank you all very much for your contributions. I am impressed on your engagement and your results. I have learned a lot and will use the best of your hints. And if we can convince Wolfram to optimize some of its internal code - even better!

Just as a kind of sidestep concerning functional programming: I know of course that in such languages as WL one can achieve large performance gains by formulating complicated, very tricky one-liners. As you can see from my example code above, I do not really like that, except for special really performance-critical circumstances. The price you pay is readability and hence maintainability. I have a background as developer, project leader, architect of really big software systems with many developers and maintenance people. Within such contexts this kind of programming can be a nightmare, since nobody understands the code without reverse engineering. But for those kind of problems like the current, I would use it anyway with excessive comments explaining what it exactly does.

POSTED BY: Werner Geiger

@Sander Huisman ,

Nice approach. It seems odd to me that Pick (Pick[Range[Length[list]], list, Max[list]]) works faster than Position. I normally would have thought that generating the Range list and comparing one list to the other would take extra time. Position (Position[list,Max[list]]) seems the most direct approach (and usually the one function approach is most optimized in Mathematica). Can you explain why Pick is faster? Does this mean that the code for Position needs work?

Thank you.

POSTED BY: Neil Singer

Hi Neil,

I would've expected also that Position would be faster. I think it is entirely to 'blame' that Pick has a few optimisations for the case that the pattern (Max[list] in this case) is 'simple'. (i.e. a number), and that the 'list' and 'sel' lists (the first two arguments) are packed arrays. Position does not have such optimisations.

$HistoryLength = 1;
list = RandomInteger[10^3, 10^7];
list = Developer`FromPackedArray[list];
AbsoluteTiming[out1 = Pick[Range[Length[list]], list, Max[list]];]
AbsoluteTiming[out2 = Position[list, Max[list]];]
Developer`PackedArrayQ[out1]
Developer`PackedArrayQ[out2]

{0.676997, Null}
{0.379614, Null}
False
False

Explicitly converting it to a non-packed array, you see that Position is faster. Both output a non-packed array.

With packed arrays:

$HistoryLength = 1;
list = RandomInteger[10^3, 10^7];
AbsoluteTiming[out1 = Pick[Range[Length[list]], list, Max[list]];]
AbsoluteTiming[out2 = Position[list, Max[list]];]
Developer`PackedArrayQ[out1]
Developer`PackedArrayQ[out2]

{0.046818, Null}
{0.433556, Null}
True
False

Pick is much faster and outputs a packed array. I can't know for sure, but I think Pick has some special routines for the case the first the arguments are packed and the pattern 'simple'. I've tried using Trace[....,TraceInternal ->True] but Pick and Position seem impenetrable.

And yes creating the Range[Length[list]] takes considerable amount of time! creating that list takes (for the 10^8 case) about 70% of the time!

We might want to point the people at Wolfram at this, because this could have system-wide performance boosts because Position is used throughout the system... An efficient version of Position for special cases should be ~40x faster (perhaps more) than its current implementation.

POSTED BY: Sander Huisman

Sander,

Thank you for your excellent description. After reading your description I did some more reading here: wolfram docs. I now understand more about how Mathematica optimizes computation.

Thank you again

POSTED BY: Neil Singer

on my laptop this is faster by taking advantage of picking the max.

list = RandomInteger[10^4, 10^8]; (*large!*)

then with

With[{m=Max[list]},{m,Position[list,m]}]

it takes 8.81563 seconds

with

Block[{a = Max[list],  b = DeleteDuplicates[ UnitStep[list - Max[list]] Range[Length[list]]][[2 ;;]]}, {a, b}]

it takes 2.03651 seconds.

DeleteDuplicates is very fast with little variation. I substract the list with it's max value. UnitStep generates (very fast) a list with 0 and 1. The 1 is max position in the list which is multiplied with the Range[list//Length] and this generates for all the 1's the position. Then DeleteDuplicates and you're done.

POSTED BY: l van Veen

Very nice finding, very strange though! Because DeleteDuplicates does comparisons between elements which are totally not necessary!

Even using a much simpler DeleteCases[....,0] :

AbsoluteTiming[
 Block[{a = Max[list], 
    b = DeleteCases[UnitStep[list - Max[list]] Range[Length[list]], 
       0][[2 ;;]]}, {a, b}];]

which should be much faster, is not at all faster, actually 10x slower! strange strange! It probably has to do that DeleteDuplicates can handle packed arrays, while DeleteCases gives back a regular array:

a = RandomInteger[10, 100];
DeleteDuplicates[a];
Developer`PackedArrayQ[%]
DeleteCases[a, 0];
Developer`PackedArrayQ[%]

gives back:

True
False
POSTED BY: Sander Huisman

Now that I think of it, one should use your trick but then use Pick, so one does not need the multiplication, nor checking duplicates:

$HistoryLength=1;
list=RandomInteger[10^4,10^8];
AbsoluteTiming[Block[{a=Max[list],b=Pick[Range[Length[list]],list,Max[list]]},{a,b}]]
AbsoluteTiming[Block[{a=Max[list],b=Pick[Range[Length[list]],UnitStep[list-Max[list]] ,1]},{a,b}];]
AbsoluteTiming[Block[{a=Max[list],b=DeleteDuplicates[UnitStep[list-Max[list]] Range[Length[list]]][[2;;]]},{a,b}];]
AbsoluteTiming[With[{m=Max[list]},{m,Join@@Position[list,m]}];]
AbsoluteTiming[Block[{a=Max[list],b=DeleteCases[UnitStep[list-Max[list]] Range[Length[list]],0]},{a,b}];]

giving the following times:

0.592327
1.00116
1.87969
8.04148
17.3015

So even 3-4x faster than the method of I van Veen. Very surprising (at least to me).

POSTED BY: Sander Huisman

Fantastic! I never used Pick but this is great. Better because this works for any number! Good find. {0.53667, Null} {1.18056, Null} {2.08323, Null} {9.38183, Null} {21.0306, Null} "10.2.0 for Microsoft Windows (64-bit) (July 7, 2015)"

POSTED BY: l van Veen

It just came to mind because Pick does not unpack, unlike Cases, DeleteCases (probably because it uses the general pattern-matching capabilities). But like UnitStep, Times, and DeleteDuplicates.

POSTED BY: Sander Huisman

From your original post it was not clear if you wanted ALL the max value-positions. Unfortunately the easiest method is just:

With[{m=Max[list]},{m,Position[list,m]}]

without doing trickery. You can speed it up marginally by explicitly specifying a level specification and Head -> False.

Ordering performs a sort if you ask for all (not just -1 or 1, which performs linearly), which is much slower generally (n*log(n) scaling).

POSTED BY: Sander Huisman
Posted 7 years ago

This seems to do what you require

AbsoluteTiming[
 Position[a = RandomInteger[10^7, 10^6], b = Max[a]]; {b, c}]

{0.0947826, {9999991, {33357, 164350, 204442, 426967, 488483, 524782, 602322, 728467, 831565, 868132}}}

POSTED BY: Paul Cleary
Posted 7 years ago

Thanks to all for your hints. Maybe I was not really clear with my question. I want to get the maximum of a list and ALL its positions. Sander, I did not understand, how I could avoid Position. I made some test meanwhile:

What I really want to have is a WL-function that delivers the largest entry of a list and all their positions in that list. Something like {max,poss}=FindLargest[list], where poss is the list of positions with value max within list. But this does not seem to exist in WL. I made some tests with different solutions using TakeLargest, Max and Ordering together with Position. They are all similar fast, Max and Ordering being slightly faster than TakeLargest.

Programming findLargest manually ist easy, but very slow. See the last solution below.

Our example-list:

listLength = 10000000;
SeedRandom[471117];
list = RandomInteger[{1, Round[listLength/10]}, listLength];
Print["A list with ", listLength, " entries ranging from 1 to ", 
  Round[listLength/10], "\n", Short[list]];

A list with 10000000 entries ranging from 1 to 1000000
{784246,826384,700864,590699,641241,597870,<<9999988>>,115297,533997,724089,645211,308921,721824}

Show results t,max,poss from function findFunc on list list:

show[t_, max_, poss_List, findFunc_Symbol, findFuncTxt_String, list_List] :=
 Print[Column[{
    Style[Row[{findFunc, " (", findFuncTxt, "): ", Round[absTime, 0.001], 
       " sek. Max=", max, " at ", Length[poss], "/", Length[list], " poss:"}],
      Blue],
    Short[poss]
    }]
  ]

Use TakeLargest and Position. Pretty fast even though list is scanned twice:

findLargest1[list_List] := Module[{max, poss},
  (* Find largest element in list *)
  {max} = TakeLargest[list, 1];
  (* Find all positions of largest element in list *)
  poss = Flatten[Position[list, max]];
  (* Return *)
  {max, poss}
  ]

absTime = AbsoluteTime[];
{max, poss} = findLargest1[list];
absTime = AbsoluteTime[] - absTime;
show[absTime, max, poss, findLargest1, "TakeLargest & Position", list];

findLargest1 (TakeLargest & Position): 0.923 sek. Max=1000000 at 10/10000000 poss:
{39794,299330,1695095,3366172,3779495,4947531,5083363,5596437,7052596,8823588}

Use Max and Position. Not very different from TakeLargest. Pretty fast even though list is scanned twice:

findLargest2[list_List] := Module[{max, poss},
  (* Find largest element in list *)
  max = Max[list];
  (* Find all positions of largest element in list *)
  poss = Flatten[Position[list, max]];
  (* Return *)
  {max, poss}
  ]

absTime = AbsoluteTime[];
{max, poss} = findLargest2[list];
absTime = AbsoluteTime[] - absTime;
show[absTime, max, poss, findLargest2, "Max & Position", list];

findLargest2 (Max & Position): 0.863 sek. Max=1000000 at 10/10000000 poss:
{39794,299330,1695095,3366172,3779495,4947531,5083363,5596437,7052596,8823588}

Use Ordering and Position. Sounds very expensive since list seems to be fully sorted and scanned twice. But this is not true. It is not very different from the solutions with TakeLargest or Max. Unfortunately Ordering[list,-1] does not deliver all positions of the maximum, but only the first one:

findLargest3[list_List] := Module[{max, poss, pos},
  (* Find largest element in list *)
  {pos} = Ordering[list, -1];
  max = list[[pos]];
  (* Find all positions of largest element in list *)
  poss = Flatten[Position[list, max]];
  (* Return *)
  {max, poss}
  ]

absTime = AbsoluteTime[];
{max, poss} = findLargest3[list];
absTime = AbsoluteTime[] - absTime;
show[absTime, max, poss, findLargest3, "Ordering & Position", list];

findLargest3 (Ordering & Position): 0.846 sek. Max=1000000 at 10/10000000 poss:
{39794,299330,1695095,3366172,3779495,4947531,5083363,5596437,7052596,8823588}

What I really want to have is a WL-function that delivers the largest entry of a list and all their positions in that list. Something like the following manually programmed. But this is horribly expensive.:

findLargest[list_List] := Module[{max, poss, entry},
  (* Initialize *)
  max = -Infinity; (* the current maximum of all entries *)
  (* Run through all elements in list and collect the largest together with \
their positions *)
  For[i = 1, i <= Length[list], i++,
   entry = list[[i]]; (* the current element of the list *)
   Switch[Sign[entry - max],
    (* A new larger entry than max. 
    Start a new collection on poss and begin it with i *)
    +1, max = entry; poss = {i},
    (* entry is less than current max. Do nothing *)
    -1, , 
     (* entry is equal to max. Collect it into poss *)
    0, AppendTo[poss, i]
    ]
   ];
  (* Return *)
  {max, poss}
  ]

absTime = AbsoluteTime[];
{max, poss} = findLargest[list];
absTime = AbsoluteTime[] - absTime;
show[absTime, max, poss, findLargest, "My programmed solution", list];

findLargest (My programmed solution): 27.694 sek. Max=1000000 at 10/10000000 poss:
{39794,299330,1695095,3366172,3779495,4947531,5083363,5596437,7052596,8823588}
POSTED BY: Werner Geiger

I think that Frank and Sander's suggestion of Ordering will still work for you.

The last few elements of Ordering are the positions of ALL the maximum values. in my example below, the last two elements of ord point to the two locations of the maximum value (which is 5 in this example). You can reverse the ordering and stop when you get to a value that is no longer the maximum. you only go through the list once and you have to go through the last few elements of the output of Ordering once (either in a loop or a binary search if there are many duplicates).

In[1]:= lst = {1, 3, 4, 5, 3, 2, 5, 3, 2, 4}

Out[1]= {1, 3, 4, 5, 3, 2, 5, 3, 2, 4}

In[2]:= ord = Ordering[lst]

Out[2]= {1, 6, 9, 2, 5, 8, 3, 10, 4, 7}

In[3]:= lst[[ord[[-1]]]]. (* value of largest element *)

Out[3]= 5

In[4]:= lst[[ord[[-2]]]] (* value of 2nd largest element -- same as largest because there are two *)

Out[4]= 5

In[5]:= lst[[ord[[-3]]]] (* value of third largest element  -- not the same so we only have two *)

Out[5]= 4

So in this example, the last two elements in ord are the two positions (4 and 7) of the largest element (5).

I hope this helps.

POSTED BY: Neil Singer
Posted 7 years ago

Thanks Neil, this is clear.

But I think this could become expensive. Image a list of huge length with many (or even almost all) duplicates of the maximum value. Will this not finally result into a complete sort of that list?

Hence I had to do a full sort of that list and then collect all the positions with maximum value. Sounds not very effectiv. May be I will try it.

POSTED BY: Werner Geiger
Posted 7 years ago

You can try with lists functions Max or Min or Position...

     lna = RandomInteger[999, 600];

     mlna = Max[lna]

    Out[2]= 994

     Position[lna, mlna]

mxpo[lst_] := {Max[lst], Position[lst, Max[lst]], Min[lst],   Position[lst, Min[lst]]}
 mxpo[lna]

 {994, {{556}}, 1, {{538}}}
POSTED BY: Antxon Zeta

This will take even longer than the OP's method. Because Max will scan the full list (you can't get away with that!). But also Position will scan the entire list. Using Ordering, you only scan the list once.

POSTED BY: Sander Huisman

To be more precise:

list = RandomReal[{0, 999}, 100]
pos = Ordering[list, -1]
max = Extract[list, max]

use -1 as the second argument in Ordering.

POSTED BY: Sander Huisman
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract