Message Boards Message Boards

3
|
652 Views
|
7 Replies
|
8 Total Likes
View groups...
Share
Share this post:

Optimization challenge: row and column total preserving matrix randomization

Hello! As part of my efforts to develop a public toolbox for doing ecology with the WL, I am looking to optimize an often-used matrix randomization algorithm. Simply put, the end goal is an effectively random binary matrix that preserves the row and column totals of a starting binary matrix. The standard algorithm uses 'quad flips'. A 'quad' is set of four locations in a rectangular arrangement (i.e., the intersections of two rows and two columns). If the elements of the quad are either {{1,0},{0,1}} or {{0,1},{1,0}}, then the elements can be flipped while preserving the row and column totals. Randomizing a matrix thus consists of searching for such quads, and flipping them, many times. Figuring out the number of flips required to asymptotically decouple the final matrix from the starting matrix is complicated, but it rises with the size and density of the matrix and can be in the thousands or tens of thousands for some kinds matrices in ecology. And, as these randomized matrices are typically used to generate a null destruction of some statistic of interest (say, nestedness), the whole process itself may nee to be repeated maybe 1000 times. So, an efficient algorithm is key!

I attach below my first attempt, which does a single flip. For each flip it randomly picks a '1', then picks the other '1' entries in a random sequence and checks whether each forms a flippable quad (the other two entries in the quad are 0). If they do, it makes the flip and returns the new matrix, otherwise it keeps looking until every other 1 has been tested.

matrixSwap[m_] := 
 Module[{mCoords, swapCandidate, quadCandidates, n, quadCandidate},
  total = Total[m, 2];
  mCoords = Position[m, 1];
  swapCandidate = RandomSample[mCoords, 1][[1]];
  quadCandidates = RandomSample[DeleteCases[mCoords, swapCandidate]];
  n = 1;
  Until[(quadCandidate = {{m[[swapCandidate[[1]], 
         swapCandidate[[2]]]], 
       m[[swapCandidate[[1]], 
         quadCandidates[[n, 2]]]]}, {m[[quadCandidates[[n, 1]], 
         swapCandidate[[2]]]], 
       m[[quadCandidates[[n, 1]], 
         quadCandidates[[n, 2]]]]}}; (swapCandidate[[1]] =!= 
        quadCandidates[[n, 1]] && 
       swapCandidate[[2]] =!= quadCandidates[[n, 2]] && 
       Total[quadCandidate] === {1, 1} && 
       Total[quadCandidate, {2}] === {1, 1}) || n == (total - 1)), n++];
  ReplacePart[m, 
   MapThread[#1 -> #2 &, {{{swapCandidate[[1]], 
       swapCandidate[[2]]}, {swapCandidate[[1]], 
       quadCandidates[[n, 2]]}, {quadCandidates[[n, 1]], 
       swapCandidate[[2]]}, {quadCandidates[[n, 1]], 
       quadCandidates[[n, 2]]}}, Flatten[Reverse[quadCandidate]]}]]
  ]

Here is a test, using Nest to apply it many times.

testMatrix =
  RandomBinaryMatrix[100, 100, 5000];
Timing[randomizedMatrix = Nest[matrixSwap, testMatrix, 1000];]
Image[testMatrix, ImageSize -> 150]
Image[randomizedMatrix, ImageSize -> 150]
{Total[testMatrix] === Total[randomizedMatrix], 
 Total[testMatrix, {2}] === Total[randomizedMatrix, {2}]}

For me it takes about 0.8 seconds to apply 1000* swaps. Some notes on this particular algorithm:

  1. My code implements a suggestion I got from Daniel Lichtblau when I asked a related question on stackexchange over 10 years ago! Basically to not generate random quads but start with pairs of ones, and to search iteratively (here using Until) for a flippable quad rather than generating all possible pairs. Good advice which I had forgotten about, but this time managed to come up more or less on my own. But, there is surely still room for improvement.
  2. If more than half the entries in the matrix are 1, then it is more efficient to switch to testing pairs of zero entries. I will make this simple improvement depending on what the final version looks like.
  3. The function only picks one starting '1' entry. It is possible, especially for a small matrix, that there are no flippable quads for a given starting point. That means that in n* iterations using Nest, there may be less than n actual swaps. This could be solved within the function by, if no flippable quad is found, trying again with a different starting entry. But the problem is likely not a big deal though, because small matrices are fast to randomize, so one could just make n quite large in the Nest. And large matrices are slower but almost always have quads.
  4. As this is a single flip function, there is some overhead in passing the matrix, and doing some of the initial processing, e.g., the Position command, which might be avoided by writing a function that does n flips internally,

Anyway, if anyone can see some ways to significantly speed this up, I would be most interested to hear about it. A challenge perhaps? Meanwhile I will continue to see what I can do. Perhaps there is a compiled version...

Gareth

POSTED BY: Gareth Russell
7 Replies

One possibility to consider is generating ab initio given the row sums and column sums. The code below will do this except it is not guaranteed to terminate (oh well). The idea is to generate rows that conform to the desired sums, then swap 0-1 in column pairs that have a deficit and an excess of 1s respectively, choosing a row with such a 0-1 pair. We put in place a stop after iterating a certain number of times so we don't get painted into a corner so to speak (this appens if for some pair of columns as described, there is no row with the requisite 0-1 pair to swap). When this happens we restart the process.

genBinaryMatrix[rsums_, csums_] := Catch[Module[
   {nr = Length[rsums], nc = Length[csums], mat, posns, cdiffs, 
    locols, hicols, done, clow, chigh, rrow},
   While[True,
    mat = Normal[SparseArray[Flatten@Table[
         posns = RandomSample[Range[nc], rsums[[i]]];
         Map[{i, #} -> 1 &, posns], {i, nr}]]];
    cdiffs = Total[mat] - csums;
    locols = Flatten@Position[cdiffs, _?Negative];
    hicols = Flatten@Position[cdiffs, _?Positive];
    If[(locols == {} && hicols == {}), Throw[mat, "done"]];
    done = (locols == {} && hicols == {});
    jj = 0;
    While[jj < nr*nc,
     jj++;
     clow = RandomChoice[locols];
     chigh = RandomChoice[hicols];
     rrow = RandomChoice[Flatten[Position[mat[[All, clow]], 0]]];
     If[mat[[rrow, chigh]] == 0, Continue[]];
     mat[[rrow, clow]] = 1;
     mat[[rrow, chigh]] = 0;
     cdiffs[[clow]]++;
     If[cdiffs[[clow]] == 0, 
      locols = DeleteElements[locols, {clow}]];
     cdiffs[[chigh]]--;
     If[cdiffs[[chigh]] == 0, 
      hicols = DeleteElements[hicols, {chigh}]];
     done = (locols == {} && hicols == {});
     If[(locols == {} && hicols == {}), Throw[mat, "done"]];
     ];
    ]], "done"]

Here is an experiment. We create a matrix, check the row and column sums, and generate a new one based thereon.

SeedRandom[1234];
mat = RandomInteger[1, {8, 8}]
rowsums = Map[Total, mat]
colsums = Total[mat]
newmat = genBinaryMatrix[rowsums, colsums]
Map[Total, newmat] == rowsums
Total[newmat] == colsums

(* Out[172]= {{1, 1, 1, 0, 0, 0, 0, 0}, {0, 1, 1, 0, 1, 0, 0, 1}, {0, 1, 
  1, 0, 1, 0, 1, 0}, {0, 0, 0, 0, 0, 1, 1, 1}, {0, 0, 0, 0, 0, 0, 0, 
  0}, {1, 0, 0, 0, 0, 1, 0, 0}, {0, 1, 0, 0, 1, 0, 0, 0}, {0, 1, 0, 1,
   1, 0, 0, 1}}

Out[173]= {3, 4, 4, 3, 0, 2, 2, 4}

Out[174]= {2, 5, 3, 1, 4, 2, 2, 3}

Out[175]= {{0, 0, 1, 0, 0, 1, 0, 1}, {0, 1, 1, 0, 1, 0, 0, 1}, {1, 1, 
  0, 1, 0, 0, 1, 0}, {1, 1, 1, 0, 0, 0, 0, 0}, {0, 0, 0, 0, 0, 0, 0, 
  0}, {0, 1, 0, 0, 1, 0, 0, 0}, {0, 0, 0, 0, 1, 1, 0, 0}, {0, 1, 0, 0,
   1, 0, 1, 1}}

Out[176]= True

Out[177]= True *)

To give an idea of speed, I show this for a 200 x 200.

SeedRandom[1234];
dims = 200;
mat = RandomInteger[1, {dims, dims}];
rowsums = Map[Total, mat];
colsums = Total[mat];
Timing[newmat = genBinaryMatrix[rowsums, colsums];]
Map[Total, newmat] == rowsums
Total[newmat] == colsums

(* Out[183]= {1.42914, Null}

Out[184]= True

Out[185]= True *)
POSTED BY: Daniel Lichtblau

Interesting! By starting with random elements within rows, and then adjusting them to create the necessary column sums, the resultant matrix will presumably be fully random within the row and column sum constraints. Therefore one can stop as soon as the constraints are achieved, rather than 'swapping' for a pre-determined number of times to move sufficiently far away from the original matrix, as in the algorithm I am using.

In practice, where my code takes 1.2 seconds to do 10,000 swaps, yours takes an average of 0.8 seconds to achieve the constraints, but with an exponential distribution of actual times. (The longest time was a bit over 4 seconds.)

I need to check if the long runs might have failed to meet the constraints... ;)

POSTED BY: Gareth Russell

What are the requirements of the result? Need it be a succession of these quad flips from a given matrix? Or is it acceptable simply to produce a new binary matrix with prescribed row and column sums?

POSTED BY: Daniel Lichtblau

Just a new matrix. I'm attaching a notebook with what I have so far. It is the MatrixSwap function that is at play here: the MatrixRandomize function calls that (many times) when the option to preserve row and column totals is selected. (Otherwise it calls various simpler functions which don't really need to be optimized further.) I implement the refinement to find quads based on the less common element (1 or 0).

POSTED BY: Gareth Russell

You might get a big speed boost if you use Compile and just rewrite in a procedural programming style.

As for the advice you received, I hope it was good. I’m not sure I’d have thought of it today (and certainly didn’t remember it).

POSTED BY: Daniel Lichtblau
Posted 1 month ago

I’m still thinking about your code and I’m not ready to suggest a new algorithm. But in any case speed will increase if instead of ordinary arrays use SparseArrays.

POSTED BY: Denis Ivanov

I think you are right in a sense: note that in the MatrixSwap function I pull the coordinates of the target elements (1 or 0) using Position, and thereafter work with that list, which is effectively a Sparse Array. (Although I look back at the original matrix for the 'opposite corners'. Maybe it would be faster to do that with a Sparse Array, but I suspect not.)

POSTED BY: Gareth Russell
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract