Message Boards Message Boards

Constructing correlated vectors

Suppose one has a list of vectors v1, v2, ... vn. One wants to construct a new vector w such that the correlation between the new vector w and v1 is r1, the correlation between w and v2 is r2, ... and the correlation between the new vector w and vn is rn. How do you do it? Here's code adapted from an R version that does it for the special case when n = 1. But I haven't figured out how to extend it to settings where n = 2, n = 3, etc. I am pretty confident there will be vectors v1, v2, ..., vn and correlations r1, r2, ... rn where no solution exists, but I am confident here will be situations where such a vector does exist.

simcor[x_, ymean_, ysd_, \[Rho]_ : 0] :=
 Module[{y, z, lmf},
  y = RandomVariate[NormalDistribution[], Length[x]];
  lmf = LinearModelFit[{Transpose[{ConstantArray[1, Length[x]], x}], 
     y}];
  z = \[Rho]*Standardize[x] + 
    Sqrt[1 - \[Rho]^2] Standardize[lmf["FitResiduals"]]; 
  ymean + ysd*z
  ]

Bonus: the new vector here is normally distributed since its the sum of normally distributed variables. But what if one wants the new vector to be a 0-1 variable instead? Any ideas in either the one dimensional case or the multi-dimensional case whereby one can specify some correlation measure between the continuous variables v1 .. vn and the binary variable w, and out pops a vector w that has the requisite correlation measure(s)? I'm wondering if conceivably one could do some sort of thresholding or if one would use LogisticRegression and some other form of residuals instead of LinearRegression and FitResiduals. But, as you can probably tell, I really have no idea.

Note: I know one can use CopulaDistributions to generate vectors with various correlation matrices. And you should get variables that are correlated. But that isn't quite my problem. One can't, so far as I know, use RandomVariate on a Copula to hold all but one of the vectors fixed and generate a new vector.

All help appreciated!

POSTED BY: Seth Chandler
2 Replies

You are looking for a vector that forms angles with given cosines to given vectors. If you have n vectors of dimension n+1 then you can get exact (possibly complex-valued) solutions.

You can start by normalizing the given vectors and moreover enforce that the result is normalized. Here is a simple example. Start by creating a random pair.

vecsa = RandomReal[{-1, 1}, {2, 3}];
vecs = Map[#/Norm[#] &, vecsa]

(* Out[428]= {{0.987183, -0.0620831, 0.147023}, {0.727193, 0.259849, -0.635349}} *)

We'll work with random correlations as well.

corrs = RandomReal[{-1, 1}, 2]

(* Out[429]= {-0.74406, -0.658859} *)

Create a new vector and appropriate equations.

newvec = Array[x, 3];
polys = Flatten[{newvec . newvec - 1, vecs . newvec - corrs}]

(* Out[433]= {-1 + x[1]^2 + x[2]^2 + x[3]^2, 
 0.74406 + 0.987183 x[1] - 0.0620831 x[2] + 0.147023 x[3], 
 0.658859 + 0.727193 x[1] + 0.259849 x[2] - 0.635349 x[3]} *)

Now we can solve for this vector.

NSolve[polys == 0, newvec]

(* Out[436]= {{x[1] -> -0.776951, x[2] -> -0.620562, 
  x[3] -> -0.106063}, {x[1] -> -0.775027, x[2] -> 0.518085, 
  x[3] -> 0.361831}} *)
POSTED BY: Daniel Lichtblau

Some additional research and some inspired guessing has led to progress. Here is a generalization of the code to the case of multiple vectors. If the vectors in v are perfectly orthogonal, then the code appears to generate a vector that has exactly the desired correlation coefficients. If the vectors in v are only roughly orthogonal to each other, the code generates a vector that has correlation coefficients close to what was requested. I wish I could understand/explain better why this works.

simcor2[v_?(MatrixQ[#, NumericQ] &), ymean_, 
  ysd_, \[Rho]_?(VectorQ[#, NumericQ] &)] :=

 Module[{n = Dimensions[v][[2]], y, z, lmf},
  y = RandomVariate[NormalDistribution[0, 1], n];
  lmf = LinearModelFit[{Transpose[Prepend[v, ConstantArray[1, n]]], 
     y}];
  z = Append[\[Rho], Sqrt[1 - Total[\[Rho]^2] ]] . 
    Map[Standardize, Append[v, lmf["FitResiduals"]]];
  ymean + ysd*z
  ]

The bonus question is still very much up for grabs. How do I take real valued vectors and generate a binary vector whose correlations (measured in some reasonable way) with the binary vectors meets some specification?

POSTED BY: Seth Chandler
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract