Group Abstract Group Abstract

Message Boards Message Boards

0
|
2.4K Views
|
10 Replies
|
7 Total Likes
View groups...
Share
Share this post:

How to specify incremental constraint in LongestOrderedSequence?

Posted 10 months ago

I'm looking for the longest ordered subsequences of integer pairs with specific incremental constraints. In these examples, I'd like the last value of each pair to be 1 less than the first value of the next pair. So far I've not discovered what pattern to specify.

Example 1.

LongestOrderedSequence[{{4, 5}, {7, 8}, {11, 12}, {13, 14}, {15, 16},
{65, 66}}, (#2[[1]] - #1[[2]] == 1) &]

{{4, 5}}

I was hoping for

{{11, 12}, {13, 14}, {15, 16}}

Example 2.

LongestOrderedSequence[{{2, 4}, {6, 8}, {9, 11}, {12, 14},
{98, 100}}, (#2[[1]] - #1[[2]] == 1) &]

{{2, 4}}

I was hoping for

{{6, 8}, {9, 11}, {12, 14}}
POSTED BY: Richard Frost
10 Replies

A fast but obscure code:

SeedRandom[1];
pairs = RandomInteger[100, {1000000, 2}];

Unitize[pairs[[2 ;;, 1]] - pairs[[;; -2, 2]] - 1] //
   SparseArray[#, Automatic, 1]["AdjacencyLists"] & //
  Function[adj,
   Switch[Length@adj,
    0, {} -> ArrayReshape[pairs, Insert[Dimensions@pairs, 1, 2]],
    1, {pairs[[First@adj ;; First@adj + 1]]},
    _, (pairs[[Span @@ (# + {1, 2})]] & /@ 
      MaximalBy[
       Transpose@{adj[[Prepend[# + 1, 1]]], adj[[Append[#, -1]]]} &@
         SparseArray[Differences@adj, Automatic, 1][
          "AdjacencyLists"] - 1, -Subtract @@ # &])
    ]
   ] // AbsoluteTiming
(*
{0.026446, 
   {{{0, 69}, {70, 62}, {63, 21}, {22, 3}}, 
    {{9, 9}, {10, 31}, {32, 49}, {50, 72}}}}
*)

MaximalBy[
  Split[
   pairs,
   (#2[[1]] - #1[[2]] == 1) &],
  Length] // AbsoluteTiming
(*
{1.05492, 
   {{{0, 69}, {70, 62}, {63, 21}, {22, 3}},
    {{9, 9}, {10, 31}, {32, 49}, {50, 72}}}}
*)
POSTED BY: Michael Rogers

Michael, thank you for these solutions. They are more general than my approach and thus have greater utility.

The pair data I'm working with is generated from StringPosition[] searches in plant chromosome data. For example:

(* chromosome 15 from Haplome A, www.rosaceae.org/Analysis/20220983 *)
pairs = StringPosition[chr15, "AGC"];
Length[pairs]

626785

The work-around I implemented looks like this:

start = AbsoluteTime[];
pairsCount = Length[pairs];
orderedSequencesDA = CreateDataStructure["DynamicArray"];
orderedSequencesDA["Append", {}];
prevPos = 1;
currentPos = 1;
While[currentPos < pairsCount,
  currentPos++;
  If[pairs[[currentPos, 1]] - pairs[[currentPos - 1, 2]] != 1,
   orderedSequencesDA["Append", pairs[[prevPos ;; currentPos - 1]]];
   prevPos = currentPos;
   ]
  ];
If[pairsCount > 0,
  orderedSequencesDA["Append", pairs[[prevPos ;; currentPos]]]
  ];
orderedSequences = Normal[orderedSequencesDA];
maxLengths = Max[Length[#] & /@ orderedSequences];
finish = AbsoluteTime[];
{finish - start, maxLengths}

{1.7654381, 7}

I was happy you demonstrated the use of MaximalBy[], a function I had no knowledge of. That solution is speedier than mine:

MaximalBy[Split[pairs, (#2[[1]] - #1[[2]] == 1) &], 
  Length] // AbsoluteTiming

{0.757178, {{{7740578, 7740580}, {7740581, 7740583}, {7740584, 
    7740586}, {7740587, 7740589}, {7740590, 7740592}, {7740593, 
    7740595}, {7740596, 7740598}}, {{51367577, 51367579}, {51367580, 
    51367582}, {51367583, 51367585}, {51367586, 51367588}, {51367589, 
    51367591}, {51367592, 51367594}, {51367595, 
    51367597}}, {{54035620, 54035622}, {54035623, 
    54035625}, {54035626, 54035628}, {54035629, 54035631}, {54035632, 
    54035634}, {54035635, 54035637}, {54035638, 54035640}}}}

And finally, your "obscure" solution will come in handy when searching for dozens of motifs (e.g. "AGC") in two dozen specimens with about a dozen chromosomes each -- a project starting next year.

Unitize[pairs[[2 ;;, 1]] - pairs[[;; -2, 2]] - 1] // 
   SparseArray[#, Automatic, 1]["AdjacencyLists"] & // 
  Function[adj, 
   Switch[Length@adj, 
    0, {} -> ArrayReshape[pairs, Insert[Dimensions@pairs, 1, 2]], 
    1, {pairs[[First@adj ;; 
        First@adj + 1]]}, _, (pairs[[Span @@ (# + {1, 2})]] & /@ 
      MaximalBy[
       Transpose@{adj[[Prepend[# + 1, 1]]], adj[[Append[#, -1]]]} &@
         SparseArray[Differences@adj, Automatic, 1][
          "AdjacencyLists"] - 
        1, -Subtract @@ # &])]] // AbsoluteTiming

{0.0599331, {{{7740578, 7740580}, {7740581, 7740583}, {7740584, 
    7740586}, {7740587, 7740589}, {7740590, 7740592}, {7740593, 
    7740595}, {7740596, 7740598}}, {{51367577, 51367579}, {51367580, 
    51367582}, {51367583, 51367585}, {51367586, 51367588}, {51367589, 
    51367591}, {51367592, 51367594}, {51367595, 
    51367597}}, {{54035620, 54035622}, {54035623, 
    54035625}, {54035626, 54035628}, {54035629, 54035631}, {54035632, 
    54035634}, {54035635, 54035637}, {54035638, 54035640}}}}
POSTED BY: Richard Frost

I think the problem is that the test function is not an order.

Perhaps your workaround was this:

MaximalBy[
 Split[
  {{4, 5}, {7, 8}, {11, 12}, {13, 14}, {15, 16}, {65, 66}},
  (#2[[1]] - #1[[2]] == 1) &],
 Length]

(*  {{{11, 12}, {13, 14}, {15, 16}}}  *)

MaximalBy[
 Split[
  {{2, 4}, {6, 8}, {9, 11}, {12, 14}, {98, 100}},
  (#2[[1]] - #1[[2]] == 1) &],
 Length]

(*  {{{6, 8}, {9, 11}, {12, 14}}}  *)

It should return multiple lists when there are repeated maximal sequences.

POSTED BY: Michael Rogers

I think the problem is that the test function is not an order.

I guess this exactly describes the problem, simply try:

LongestOrderedSequence[{{4, 5}, {7, 8}, {11, 12}, {13, 14}, {15, 16}, {65, 66}}, 
    (#2[[1]] <= #1[[2]] + 1) &]
(*  Out:   {{11, 12}, {13, 14}, {15, 16}}  *)
POSTED BY: Henrik Schachner

The trouble is that "the last value of each pair to be 1 less than the first value of the next pair" is not an order. The desire outcome of the following is {{11, 12}, {13, 14}, {15, 16}}, but the output includes an extra term:

LongestOrderedSequence[
 {{4, 5}, {7, 8}, {11, 12}, {13, 14}, {15, 16}, {14, 66}},
 (#2[[1]] <= #1[[2]] + 1) &]

(*  {{11, 12}, {13, 14}, {15, 16}, {14, 66}}  *)
POSTED BY: Michael Rogers

Yes. Now you see my implicit assumption from StringPosition[] output that all pairs are "well-behaved".

POSTED BY: Richard Frost

Yes - thanks for spotting! Maybe nesting helps:

LongestOrderedSequence[
 LongestOrderedSequence[{{4, 5}, {7, 8}, {11, 12}, {13, 14}, {15, 16}, {14, 66}}, 
        ((#2[[1]] > #1[[2]])) &], 
    (#2[[1]] <= #1[[2]] + 1) &]
(*  Out:   {{11,12},{13,14},{15,16}}  *)
POSTED BY: Henrik Schachner

Nice idea! I didn't think of that. I guess the question is, Does the first filtering always contain the longest desired sequence?

This seems to be a counterexample, in which the longest "increasing" run has jumps by more than 1:

LongestOrderedSequence[
 LongestOrderedSequence[
  {{4, 5}, {7, 8}, {11, 12}, {13, 14}, {15, 16}, {14, 66},
   {1, 1}, {3, 4}, {5, 6}, {8, 7}, {9, 10}},
  ((#2[[1]] > #1[[2]])) &
  ],
 (#2[[1]] <= #1[[2]] + 1) &]

(*  {{3, 4}, {5, 6}}  *)

I think if we reverse the order of the filters, then we have the opposite problem: The longest sequence might be decreasing by arbitrary amounts.

I think one might have to find all increasing runs, not just the longest one, and then find the longest sequence in which the increment is 1.

POSTED BY: Michael Rogers

Henrik, Using "pairs" from the chromosome example above, I receive this result:

LongestOrderedSequence[
  pairs, (#2[[1]] <= #1[[2]] + 1) &] // AbsoluteTiming

{2.50395, {{561068, 561070}, {561071, 561073}, {561074, 
   561076}, {561077, 561079}}}
POSTED BY: Richard Frost

I have implemented a work-around that partitions the list into contiguous sublists and then computers the max sublist length.

POSTED BY: Richard Frost
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard