Message Boards Message Boards

0
|
528 Views
|
10 Replies
|
7 Total Likes
View groups...
Share
Share this post:

How to specify incremental constraint in LongestOrderedSequence?

Posted 1 month ago

I'm looking for the longest ordered subsequences of integer pairs with specific incremental constraints. In these examples, I'd like the last value of each pair to be 1 less than the first value of the next pair. So far I've not discovered what pattern to specify.

Example 1.

LongestOrderedSequence[{{4, 5}, {7, 8}, {11, 12}, {13, 14}, {15, 16},
{65, 66}}, (#2[[1]] - #1[[2]] == 1) &]

{{4, 5}}

I was hoping for

{{11, 12}, {13, 14}, {15, 16}}

Example 2.

LongestOrderedSequence[{{2, 4}, {6, 8}, {9, 11}, {12, 14},
{98, 100}}, (#2[[1]] - #1[[2]] == 1) &]

{{2, 4}}

I was hoping for

{{6, 8}, {9, 11}, {12, 14}}
POSTED BY: Richard Frost
10 Replies

A fast but obscure code:

SeedRandom[1];
pairs = RandomInteger[100, {1000000, 2}];

Unitize[pairs[[2 ;;, 1]] - pairs[[;; -2, 2]] - 1] //
   SparseArray[#, Automatic, 1]["AdjacencyLists"] & //
  Function[adj,
   Switch[Length@adj,
    0, {} -> ArrayReshape[pairs, Insert[Dimensions@pairs, 1, 2]],
    1, {pairs[[First@adj ;; First@adj + 1]]},
    _, (pairs[[Span @@ (# + {1, 2})]] & /@ 
      MaximalBy[
       Transpose@{adj[[Prepend[# + 1, 1]]], adj[[Append[#, -1]]]} &@
         SparseArray[Differences@adj, Automatic, 1][
          "AdjacencyLists"] - 1, -Subtract @@ # &])
    ]
   ] // AbsoluteTiming
(*
{0.026446, 
   {{{0, 69}, {70, 62}, {63, 21}, {22, 3}}, 
    {{9, 9}, {10, 31}, {32, 49}, {50, 72}}}}
*)

MaximalBy[
  Split[
   pairs,
   (#2[[1]] - #1[[2]] == 1) &],
  Length] // AbsoluteTiming
(*
{1.05492, 
   {{{0, 69}, {70, 62}, {63, 21}, {22, 3}},
    {{9, 9}, {10, 31}, {32, 49}, {50, 72}}}}
*)
POSTED BY: Michael Rogers

Michael, thank you for these solutions. They are more general than my approach and thus have greater utility.

The pair data I'm working with is generated from StringPosition[] searches in plant chromosome data. For example:

(* chromosome 15 from Haplome A, www.rosaceae.org/Analysis/20220983 *)
pairs = StringPosition[chr15, "AGC"];
Length[pairs]

626785

The work-around I implemented looks like this:

start = AbsoluteTime[];
pairsCount = Length[pairs];
orderedSequencesDA = CreateDataStructure["DynamicArray"];
orderedSequencesDA["Append", {}];
prevPos = 1;
currentPos = 1;
While[currentPos < pairsCount,
  currentPos++;
  If[pairs[[currentPos, 1]] - pairs[[currentPos - 1, 2]] != 1,
   orderedSequencesDA["Append", pairs[[prevPos ;; currentPos - 1]]];
   prevPos = currentPos;
   ]
  ];
If[pairsCount > 0,
  orderedSequencesDA["Append", pairs[[prevPos ;; currentPos]]]
  ];
orderedSequences = Normal[orderedSequencesDA];
maxLengths = Max[Length[#] & /@ orderedSequences];
finish = AbsoluteTime[];
{finish - start, maxLengths}

{1.7654381, 7}

I was happy you demonstrated the use of MaximalBy[], a function I had no knowledge of. That solution is speedier than mine:

MaximalBy[Split[pairs, (#2[[1]] - #1[[2]] == 1) &], 
  Length] // AbsoluteTiming

{0.757178, {{{7740578, 7740580}, {7740581, 7740583}, {7740584, 
    7740586}, {7740587, 7740589}, {7740590, 7740592}, {7740593, 
    7740595}, {7740596, 7740598}}, {{51367577, 51367579}, {51367580, 
    51367582}, {51367583, 51367585}, {51367586, 51367588}, {51367589, 
    51367591}, {51367592, 51367594}, {51367595, 
    51367597}}, {{54035620, 54035622}, {54035623, 
    54035625}, {54035626, 54035628}, {54035629, 54035631}, {54035632, 
    54035634}, {54035635, 54035637}, {54035638, 54035640}}}}

And finally, your "obscure" solution will come in handy when searching for dozens of motifs (e.g. "AGC") in two dozen specimens with about a dozen chromosomes each -- a project starting next year.

Unitize[pairs[[2 ;;, 1]] - pairs[[;; -2, 2]] - 1] // 
   SparseArray[#, Automatic, 1]["AdjacencyLists"] & // 
  Function[adj, 
   Switch[Length@adj, 
    0, {} -> ArrayReshape[pairs, Insert[Dimensions@pairs, 1, 2]], 
    1, {pairs[[First@adj ;; 
        First@adj + 1]]}, _, (pairs[[Span @@ (# + {1, 2})]] & /@ 
      MaximalBy[
       Transpose@{adj[[Prepend[# + 1, 1]]], adj[[Append[#, -1]]]} &@
         SparseArray[Differences@adj, Automatic, 1][
          "AdjacencyLists"] - 
        1, -Subtract @@ # &])]] // AbsoluteTiming

{0.0599331, {{{7740578, 7740580}, {7740581, 7740583}, {7740584, 
    7740586}, {7740587, 7740589}, {7740590, 7740592}, {7740593, 
    7740595}, {7740596, 7740598}}, {{51367577, 51367579}, {51367580, 
    51367582}, {51367583, 51367585}, {51367586, 51367588}, {51367589, 
    51367591}, {51367592, 51367594}, {51367595, 
    51367597}}, {{54035620, 54035622}, {54035623, 
    54035625}, {54035626, 54035628}, {54035629, 54035631}, {54035632, 
    54035634}, {54035635, 54035637}, {54035638, 54035640}}}}
POSTED BY: Richard Frost

I think the problem is that the test function is not an order.

Perhaps your workaround was this:

MaximalBy[
 Split[
  {{4, 5}, {7, 8}, {11, 12}, {13, 14}, {15, 16}, {65, 66}},
  (#2[[1]] - #1[[2]] == 1) &],
 Length]

(*  {{{11, 12}, {13, 14}, {15, 16}}}  *)

MaximalBy[
 Split[
  {{2, 4}, {6, 8}, {9, 11}, {12, 14}, {98, 100}},
  (#2[[1]] - #1[[2]] == 1) &],
 Length]

(*  {{{6, 8}, {9, 11}, {12, 14}}}  *)

It should return multiple lists when there are repeated maximal sequences.

POSTED BY: Michael Rogers

I think the problem is that the test function is not an order.

I guess this exactly describes the problem, simply try:

LongestOrderedSequence[{{4, 5}, {7, 8}, {11, 12}, {13, 14}, {15, 16}, {65, 66}}, 
    (#2[[1]] <= #1[[2]] + 1) &]
(*  Out:   {{11, 12}, {13, 14}, {15, 16}}  *)
POSTED BY: Henrik Schachner

The trouble is that "the last value of each pair to be 1 less than the first value of the next pair" is not an order. The desire outcome of the following is {{11, 12}, {13, 14}, {15, 16}}, but the output includes an extra term:

LongestOrderedSequence[
 {{4, 5}, {7, 8}, {11, 12}, {13, 14}, {15, 16}, {14, 66}},
 (#2[[1]] <= #1[[2]] + 1) &]

(*  {{11, 12}, {13, 14}, {15, 16}, {14, 66}}  *)
POSTED BY: Michael Rogers

Yes. Now you see my implicit assumption from StringPosition[] output that all pairs are "well-behaved".

POSTED BY: Richard Frost

Yes - thanks for spotting! Maybe nesting helps:

LongestOrderedSequence[
 LongestOrderedSequence[{{4, 5}, {7, 8}, {11, 12}, {13, 14}, {15, 16}, {14, 66}}, 
        ((#2[[1]] > #1[[2]])) &], 
    (#2[[1]] <= #1[[2]] + 1) &]
(*  Out:   {{11,12},{13,14},{15,16}}  *)
POSTED BY: Henrik Schachner

Nice idea! I didn't think of that. I guess the question is, Does the first filtering always contain the longest desired sequence?

This seems to be a counterexample, in which the longest "increasing" run has jumps by more than 1:

LongestOrderedSequence[
 LongestOrderedSequence[
  {{4, 5}, {7, 8}, {11, 12}, {13, 14}, {15, 16}, {14, 66},
   {1, 1}, {3, 4}, {5, 6}, {8, 7}, {9, 10}},
  ((#2[[1]] > #1[[2]])) &
  ],
 (#2[[1]] <= #1[[2]] + 1) &]

(*  {{3, 4}, {5, 6}}  *)

I think if we reverse the order of the filters, then we have the opposite problem: The longest sequence might be decreasing by arbitrary amounts.

I think one might have to find all increasing runs, not just the longest one, and then find the longest sequence in which the increment is 1.

POSTED BY: Michael Rogers

Henrik, Using "pairs" from the chromosome example above, I receive this result:

LongestOrderedSequence[
  pairs, (#2[[1]] <= #1[[2]] + 1) &] // AbsoluteTiming

{2.50395, {{561068, 561070}, {561071, 561073}, {561074, 
   561076}, {561077, 561079}}}
POSTED BY: Richard Frost

I have implemented a work-around that partitions the list into contiguous sublists and then computers the max sublist length.

POSTED BY: Richard Frost
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract