Michael, thank you for these solutions. They are more general than my approach and thus have greater utility.
The pair data I'm working with is generated from StringPosition[] searches in plant chromosome data. For example:
(* chromosome 15 from Haplome A, www.rosaceae.org/Analysis/20220983 *)
pairs = StringPosition[chr15, "AGC"];
Length[pairs]
626785
The work-around I implemented looks like this:
start = AbsoluteTime[];
pairsCount = Length[pairs];
orderedSequencesDA = CreateDataStructure["DynamicArray"];
orderedSequencesDA["Append", {}];
prevPos = 1;
currentPos = 1;
While[currentPos < pairsCount,
currentPos++;
If[pairs[[currentPos, 1]] - pairs[[currentPos - 1, 2]] != 1,
orderedSequencesDA["Append", pairs[[prevPos ;; currentPos - 1]]];
prevPos = currentPos;
]
];
If[pairsCount > 0,
orderedSequencesDA["Append", pairs[[prevPos ;; currentPos]]]
];
orderedSequences = Normal[orderedSequencesDA];
maxLengths = Max[Length[#] & /@ orderedSequences];
finish = AbsoluteTime[];
{finish - start, maxLengths}
{1.7654381, 7}
I was happy you demonstrated the use of MaximalBy[], a function I had no knowledge of. That solution is speedier than mine:
MaximalBy[Split[pairs, (#2[[1]] - #1[[2]] == 1) &],
Length] // AbsoluteTiming
{0.757178, {{{7740578, 7740580}, {7740581, 7740583}, {7740584,
7740586}, {7740587, 7740589}, {7740590, 7740592}, {7740593,
7740595}, {7740596, 7740598}}, {{51367577, 51367579}, {51367580,
51367582}, {51367583, 51367585}, {51367586, 51367588}, {51367589,
51367591}, {51367592, 51367594}, {51367595,
51367597}}, {{54035620, 54035622}, {54035623,
54035625}, {54035626, 54035628}, {54035629, 54035631}, {54035632,
54035634}, {54035635, 54035637}, {54035638, 54035640}}}}
And finally, your "obscure" solution will come in handy when searching for dozens of motifs (e.g. "AGC") in two dozen specimens with about a dozen chromosomes each -- a project starting next year.
Unitize[pairs[[2 ;;, 1]] - pairs[[;; -2, 2]] - 1] //
SparseArray[#, Automatic, 1]["AdjacencyLists"] & //
Function[adj,
Switch[Length@adj,
0, {} -> ArrayReshape[pairs, Insert[Dimensions@pairs, 1, 2]],
1, {pairs[[First@adj ;;
First@adj + 1]]}, _, (pairs[[Span @@ (# + {1, 2})]] & /@
MaximalBy[
Transpose@{adj[[Prepend[# + 1, 1]]], adj[[Append[#, -1]]]} &@
SparseArray[Differences@adj, Automatic, 1][
"AdjacencyLists"] -
1, -Subtract @@ # &])]] // AbsoluteTiming
{0.0599331, {{{7740578, 7740580}, {7740581, 7740583}, {7740584,
7740586}, {7740587, 7740589}, {7740590, 7740592}, {7740593,
7740595}, {7740596, 7740598}}, {{51367577, 51367579}, {51367580,
51367582}, {51367583, 51367585}, {51367586, 51367588}, {51367589,
51367591}, {51367592, 51367594}, {51367595,
51367597}}, {{54035620, 54035622}, {54035623,
54035625}, {54035626, 54035628}, {54035629, 54035631}, {54035632,
54035634}, {54035635, 54035637}, {54035638, 54035640}}}}