Message Boards Message Boards

3
|
6174 Views
|
11 Replies
|
21 Total Likes
View groups...
Share
Share this post:

How to find a sequence in a list?

Hi, beginner here, again. Thanks for looking. I would like to "partition" this list from given

{1, 2, 3, 8, 4, 2, 6, 7, 3, 2, 1, 9, 8, 9, 1, 3, 1, 2, 9, 2, 8, 2, 7, 7, 1, 2, 3, 4, 3, 2, 1};

into output (for further processing), either or:

{{1, 2, 3}, {8, 4, 2, 6, 7, 3, 2, 1}, {9, 8, 9, 1, 3, 1}, {2, 9, 2, 8, 2, 7, 7, 1, 2, 3}, {4, 3, 2}, {1}};
{{1, 2, 3}, {8, 4, 2, 6, 7, 3, 2, 1}, {9, 8, 9, 1, 3, 1}, {2, 9, 2, 8, 2, 7, 7, 1, 2, 3}, {4, 3, 2}};

That is a clear partioning cut, whenever the last three numbers are "LessThan5", no matter if the next number(s) are <5 too, i.e. without Overlaps. My gut feeling is that we would use the SequenceCases function, or maybe SequenceSplit or SequenceReplace, but I can't figure out how. ((A different approach would be programming a short algorithm with iterations/procedures/loops, but let's figure out the functional approach first, maybe with conditional pattern matching.))

Thank You!

POSTED BY: Raspi Rascal
11 Replies

Here is a linear-time method. The idea is to Sow positions where a cleave should occur, then construct the sublists from those positions.

cleaveList[ll_List, val_, rlen_] := Module[
{j = 0, k = 0, posns, td, res},
posns = 
Flatten[Append[
Prepend[Reap[
Fold[(k++; 
If[#2 < val, j++; If[j == rlen, Sow[k]; j = 0], j = 0];) &,
0, ll]][[2]], 0], Length[ll]]];
res = Table[
ll[[posns[[j]] + 1 ;; posns[[j + 1]]]], {j, Length[posns] - 1}]]

The reference example:

In[368]:= cleaveList[{1, 2, 3, 8, 4, 2, 6, 7, 3, 2, 1, 9, 8, 9, 1, 3, 
  1, 2, 9, 2, 8, 2, 7, 7, 1, 2, 3, 4, 3, 2, 1}, 5, 3]

(* Out[368]= {{1, 2, 3}, {8, 4, 2, 6, 7, 3, 2, 1}, {9, 8, 9, 1, 3, 
  1}, {2, 9, 2, 8, 2, 7, 7, 1, 2, 3}, {4, 3, 2}, {1}} *)

It is straightforward to assess the speed.

In[369]:= Table[
 ll = RandomInteger[8, 2^n];
 Timing[cp2 = cleavePosns2[ll, 5, 3]; Length[cp2]]
 , {n, 14, 22}]

(* Out[369]= {{0.038943, 1532}, {0.050557, 3092}, {0.094123, 
  6107}, {0.189573, 12010}, {0.409559, 24282}, {0.811616, 
  48129}, {1.6438, 96797}, {3.02218, 193246}, {6.49907, 385671}} *)
POSTED BY: Daniel Lichtblau

Okay, now I am impressed. Just switching from AppendTo to Sow/Reap reduced the runtime from 49min down to 8.7sec, unbelievable!! Many thanks to @Benjamin Goodman for posting the tip in a lasting thread where the contributive post has an actual chance to get viewed eventually by the curious surfer.

t1 = SessionTime[];
Anzahlversuche := 40000;
p = 0.3;
versuchnr = 0; (* ID number, for checking purposes only *)
randlis = {}; (* complete list of random numbers, for checking purposes only *)
lis = {};
{randlis, lis} =
  Reap[
    Do[
     gewartet = 0;
     drittletzt = vorletzt = letzt = True;(* True means car has passed street *)
     While[(! drittletzt \[And] ! vorletzt \[And] ! letzt) == False, 
      drittletzt = vorletzt;
      vorletzt = letzt;
      If[Sow[RandomReal[], x] < p, letzt = True, letzt = False];
      gewartet++
     ];
     versuchnr++;
     Sow[{versuchnr, gewartet - 3}, y]
     , Anzahlversuche
     ]
    ][[2]];
SortBy[tall = Tally@lis[[All, 2]], First] /. {sek_, H_} :> {sek, N@(H/Anzahlversuche)};
Fold[#1 + #2[[2]]*#2[[1]] &, 0, tall]/Anzahlversuche // N
TableForm[%%, TableHeadings -> {None, {"k in [sec]", "h(k)"}}]
Total[tall[[All, 2]]] == Anzahlversuche
t2 = SessionTime[];
t2 - t1

Interestingly in this code example there is no major performance gain if I use Map instead of the pattern matching in the SortBy line. Session timings now on the Raspi are 8.7sec for 40000 (1.05sec at FREE Wolfram Cloud), 85sec for 400000 (10.4sec FWC), and 215sec for 1000000 street crossings (25.6sec FWC). On FWC 37.7sec for 1500000, and 50.6sec for 2000000, 58.3sec for 2300000 which is at the FWC time limit of 60sec ("Cloud: This computation has exceeded the time limit for your plan"). The Raspi takes 8min for the 2.3 million. So with this task the Raspi is consistently 8x slower than the FWC.

Thanks also @Hans Dolhaine and @Henrik Schachner for your contributions, much appreciated!

POSTED BY: Raspi Rascal

Here is another ( not intensively tested) version, avoiding pattern - matching and using "normal" If's (what pattern matching does for sure in the background as well). Perhaps this is faster with your "long" problems. But I wouldn't be astonished if it was not. Be sure to make hit long enough (here it has 100 entries)

list = {1, 2, 3, 8, 4, 2, 6, 7, 3, 2, 1, 9, 8, 9, 1, 3, 1, 2, 9, 2, 8,
    2, 7, 7, 1, 2, 3, 4, 3, 2, 1};

n = 0;
nf = 0;
hit = Table[0, {100}];
While[n <= Length[list] - 3,
 n = n + 1;
 If[list[[n]] < 5,
  If[list[[n + 1]] < 5,
   If[list[[n + 2]] < 5,
    nf = nf + 1; hit[[nf]] = n + 2; n = n + 2]
   ]
  ]
 ]

hit = Select[hit, # != 0 &];
hitL = Length[hit];

list1 = {
  Take[list, hit[[1]]],
  Table[Take[list, {hit[[j]] + 1, hit[[j + 1]]}], {j, 2, hitL - 1}],
  Take[list, {hit[[hitL]] + 1, Length[list]}]}
POSTED BY: Hans Dolhaine

Hi Raspi!

Ok, here comes an approach using SequenceSplit - I am not claiming that this is in some way elegant! Using from above

list = {1, 2, 3, 8, 4, 2, 6, 7, 3, 2, 1, 9, 8, 9, 1, 3, 1, 2, 9, 2, 8, 2, 7, 7, 1, 2, 3, 4, 3, 2, 1};

we already get a nice intermediate result like so:

imr = SequenceSplit[list, p : {_, _, _} /; (Max[p] <= 5) :> Splice[{"Join", p}]]
(* Out:   {"Join",{1,2,3},{8,4,2,6,7},"Join",{3,2,1},{9,8,9},"Join",{1,3,1},{2,9,2,8,2,7,7},"Join",{1,2,3},"Join",{4,3,2},{1}}  *)

The string "Join" is used here just as a marker where sublist should be joined:

SequenceReplace[imr, {a_, "Join", b_} :> Join[a, b]](* now cleaning up: *) /. "Join" -> Nothing
(*  Out:    {{1,2,3},{8,4,2,6,7,3,2,1},{9,8,9,1,3,1},{2,9,2,8,2,7,7,1,2,3},{4,3,2},{1}}  *)

I admit that I did not do any excessive testing, so there might well be some flaw! Regards -- Henrik

POSTED BY: Henrik Schachner
Posted 3 years ago

@Raspi:

Thank you for explaining the original challenge. This is a fascinating topic.

POSTED BY: Mike Besso

Don't assume that Functional programming will always give you a faster runtime. Also don't assume that all Wolfram functions are purely single stream with no loops under the covers. Much of Mathematica's performance comes from platform-compiled functions in the kernel. The front-end though is a multithreaded progressive interpreter with performance bound by the CPU.

I believe this problem is O(N) = N lnN, but perhaps someone can show otherwise?

POSTED BY: Richard Frost

@Hans Dolhaine That's beautiful and works perfectly as desired and I am learning from it, thank you @all for your attention, appreciated!

Comment: The pattern matching works well with "small" lists. The original problem of "pedestrian must wait before finally crossing street" can also be implemented with the procedural paradigm. My humble code:

t1 = SessionTime[];
Anzahlversuche := 10000; (* total number of street crossings to simulate *)
p = 0.3;
versuchnr = 0; (* ID of the street crossing, for control purposes *)
randlis = {}; (* full list of random numbers, for control purposes *)
lis = {};
Do[
  gewartet = 0;
  drittletzt = vorletzt = letzt = True; (* True = "car is passing street" *)
  While[(!drittletzt && !vorletzt && !letzt) == False, 
   jetzigeSek = RandomReal[];
   AppendTo[randlis, jetzigeSek];
   drittletzt = vorletzt;
   vorletzt = letzt;
   If[jetzigeSek < p, letzt = True, letzt = False];
   gewartet++
   ];
  versuchnr++;
  AppendTo[lis, {versuchnr, gewartet - 3}]
  , Anzahlversuche
  ];
SortBy[tall = Tally@lis[[All, 2]], First] /. {sek_, H_} :> {sek, N@(H/Anzahlversuche)};

Fold[#1 + #2[[2]]*#2[[1]] &, 0, tall]/Anzahlversuche // N (* output of mean waiting time *)
TableForm[%%, TableHeadings -> {None, {"k in [sec]", "h[k]"}}] (* output of tabulated waiting time *)
Total[tall[[All, 2]]] == Anzahlversuche (* True, output of a simple test *)
t2 = SessionTime[];
t2 - t1  (* Raspberry Pi 3B takes 49min for Anzahlversuche=40000 *)

This code works okayish up to 40000 street crossings (~250000 random numbers), but I am not impressed by the performance. There has to be a faster performing implementation for the problem, maybe with NestList FixedPointList NestWhileList FoldList instead of For While If Do. But maybe we should give up at this point?

POSTED BY: Raspi Rascal

@ Bill Nelson: Your solution is great.

But, in { 4, 3, 2, 1 } the first three figures match the condition to be less than five, so it should be split into { 4, 3, 2 } {1}. I think changing the pattern in head does the job

v = {1, 2, 3, 8, 4, 2, 6, 7, 3, 2, 1, 9, 8, 9, 1, 3, 1, 2, 9, 2, 8, 2,
    7, 7, 1, 2, 3, 4, 3, 2, 1};
{v} //. {head___, x_, y_, z_, tail__} :> 
  Sequence[{head, x, y, z}, {tail}] /; x < 5 && y < 5 && z < 5
POSTED BY: Hans Dolhaine

@Mike Besso Thanks for your time looking into this! The partitioning problem arises directly from using a super long sequence of RandomReal numbers for simulating stochastic processes like crossing the street ("A pedestrian needs 3 seconds to cross the street. The probability of a car passing in any given 1 second be p=0.3. Until a time slot of (at least) 3 seconds with no car in it finally arrives, the pedestrian is forced to wait. With the help of [0;1] random numbers, show that the arithmetic average waiting time is around 3.38 seconds." — Monte Carlo simulation)

From the list of random numbers one would identify all the sequences which correspond to a street crossing. And then from that sequence's length (minus 3) one would know how many seconds the pedestrian had to wait before he finally crossed the street. So the 3 numbers themselves (of the ",x,y,z"-delimiters) aren't needed later in the calculation, we can replace them with the meaningless word "del". Hence my coding workaround was:

p = 0.3;
randlis = RandomReal[{0, 1}, n = 1000];
dellis = SequenceReplace[randlis, {x_, y_, z_} /; (x>p && y>p && z>p) :> del];
SequenceCases[dellis, {head___, del} /; FreeQ[{head}, del], Overlaps -> False];
Length[#] - 1 & /@ %

Then i saw the much improved pattern matching construct by @Bill Nelson and @Hans Dolhaine which i included in my documentation of the solution of the pedestrian-crossing-street problem. It is interesting to learn, though, that the performance of pattern matching in a list Length>1000 is getting abysmal on an old PC. I would like to simulate 10000, 40000, or even 100000 street crossings (i.e. ~640000 random numbers). That's not viable with our proposed pattern matching constructs. But the performance limitation of pattern matching wasn't the point of the thread anyway.

We've learned something new about Sequence vocab items and that's the most important part to come out of this thread, thank you!!

POSTED BY: Raspi Rascal
Posted 3 years ago

This isn't functional programming, but perhaps rule based programming will be a start.

Please test this exhaustively on all kinds of different cases before you even think of trusting this.

v={1,2,3,8,4,2,6,7,3,2,1,9,8,9,1,3,1,2,9,2,8,2,7,7,1,2,3,4,3,2,1};
{v}//.{head__,x_,y_,z_,tail__}:>Sequence[{head,x,y,z},{tail}]/;x<5&&y<5&&z<5

That has two underscores after head and tail and one underscore elsewhere and returns

{{1,2,3,8,4,2,6,7,3,2,1},{9,8,9,1,3,1},{2,9,2,8,2,7,7,1,2,3},{4,3,2,1}}

In my limited testing that seems to correctly deal with all cases except your leading {1,2,3,...} case and there are reasons you might discover why it does not deal with that case. If you can find any other examples where this does not work then please let everyone know.

If you can convince yourself that you really understand exactly why this works then perhaps you can use that in the future on other problems.

POSTED BY: Bill Nelson
Posted 3 years ago

Raspi:

I've spent a couple of hours on this and could not get SequenceCases to work. And SequenceSplit removes the delimiter.

If no one posts a solution by tomorrow, I'll try to find some time to work on it some more.

Could you explain why you are trying to do this? It is a fascinating problem, but I can't imagine a use for it.

thanks

POSTED BY: Mike Besso
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract