Message Boards Message Boards

GROUPS:

Obtaining Gaussian expectation formulas with einsums and TensorSimplify

Posted 3 months ago
837 Views
|
1 Reply
|
7 Total Likes
|

I wanted to share an application of Carl Woll's TensorSimplify/einsum utilities, which I first discovered here and I think should be known more widely.

There's a number of special formulas involving expectations of Gaussian variables, see section 20.5.2 of Seber "A Matrix Handbook for Statisticians". For instance, the following holds for a general matrix $A$ and $x$ distributed as zero-centered Gaussian with covariance $\Sigma$:

$$E[(x'Ax)^3]=\left[\text{trace}(A\Sigma)\right]^3 + 6\ \text{trace}(A\Sigma)\text{trace}\left[(A\Sigma)^2\right]+8\ \text{trace}\left[(A\Sigma)^3\right]$$

These was painstakingly derived by hand, but a more general and gentler approach is possible with Mathematica.

As a motivating example, I'll show how to use Mathematica to get a formula for $E[xx'Axx']$ which I needed for my work but couldn't find in textbooks.

Use Einstein's summation notation to write:

$$E[xx'Axx']_{il}=M_{ijkl}A_{jk}$$

Where $M_{ijkl}$ is the 4th moment tensor $E[x_i x_j x_k x_l]$

The next trick is the use Isserlis/Wicks Theorem. For centered Gaussian variable $x$, Isserlis theorem tells us that

$$E[x_i x_j x_k x_l]=E[x_i x_k]E[x_j x_l] +E[x_i x_l]E[x_j x_k]+ E[x_i x_j]E[x_k x_l] $$

Right-hand side of this formula is easier to remember in diagrammatic notation enter image description here

Now substitute this into original formula involving $M$ to get three different einsums, and use TensorSimplify package to obtain matrix expressions for each of them.

However, if we need non-centered Gaussian variables, Isserlis theorem doesn't apply. In such case, use Mathematica to derive a more general form of Isserlis.

Isserlis theorem is a special case of a general law linking cumulants and moments together - the general compositional formula (section 2.3.4 of McCullough book). This fundamental equation, linking moments and cumulants, is implemented in Mathematica's MomentConvert function.

To use this function here use the fact that cumulants of order above 2 are zero in a Gaussian. This property is the defining property of a Gaussian, and number "2" is special. This property is the reason why Gaussian expectations have nice algebraic expressions.

To use it in practice, use Mathematica's MomentConvert function to convert your target moment to cumulants, set to zero all cumulants above order 2, MomentConvert each of the cumulant terms back to moments, then Simplify. In our example involves 4th moment tensor $M_{ijkl}$ so we start with Moment[{1,1,1,1}], and get the following fourth moment formula in terms of first and second moments:

$$E[x_i x_j x_k x_l]=E[x_i x_j]E[x_k x_l] + E[x_i x_k]E[x_j x_l]+E[x_i x_l]E[x_j x_k]-2 E[x_i]E[x_j]E[x_k]E[x_l] $$

Equivalently in diagrammatic notation: enter image description here

You can see that it is almost the Isserlis theorem for centered Gaussian, but with an extra correction term involving mean $E[x_i]=\mu$.

Now plug this formula into original einsum involving $M$, distribute summation to get 4 einsums, and use FromTensor from the TensorSimplify package to obtain matrix expressions for each of them.

For instance to get matrix expression for mean term, run this command: FromTensor@TensorReduce@ einsum[{{2, 3}, {1}, {2}, {3}, {4}} -> {1, 4}, A, mu, mu, mu, mu]; The result corresponds to the following identity:

$$\sum_{jk} A_{jk} \mu_i \mu_j \mu_k \mu_l=\mu^T A\mu (\mu\otimes \mu) $$

Running this for all 4 terms, we get the following formula

$$E[xx'Axx']=E[xx']AE[xx']+E[xx']A^T E[xx'] + E[xx'] \text{Tr}(A^T E[xx'])-2 \mu^T A\mu \mu\otimes \mu$$

As a correctness check, create a multivariate Gaussian with random integer parameters, use Expectation to compute expectations, and use Reduce to ensure exact equality with our formula.

This package saved me time in deriving formulas needed to obtain heuristics needed to properly set learning rate hyper-parameter for my neural network training experiments, here's a write-up of underlying theory.

Below are related useful identities I obtained using this approach (background: SE post,Google doc). $$ \begin{eqnarray} X^2&=&E[xx']\\ E[(xx')(xx')]&=&2X ^2+X^2 \text{Tr} X^2 -2\|\mu\|^2 \mu \mu'\\ E[(x'x)(xx')]&=&E[(xx')(xx')]\\ E[xx'\otimes xx']&=&2P_d( X^2\otimes X^2) + \text{vec} X^2 (\text{vec} X^2)'-2(\mu \mu')\otimes (\mu \mu')\\ E[\langle x, x\rangle^2]&=&2\text{tr}(X^2 X^2)+\text{tr}(X^2)^2-2\|\mu\|^4\\ E[\langle x, y\rangle^2]&=&\text{jointly Gaussian x,y: formula involving }E[xx'],E[yy'],E[xy'] \end{eqnarray} $$

This is a semi-automatic approach. A useful contribution would be a utility that could automatically derive such formulas (from this post and from Seber, Section 20.5)

Attachments:

enter image description here -- you have earned Featured Contributor Badge enter image description here Your exceptional post has been selected for our editorial column Staff Picks http://wolfr.am/StaffPicks and Your Profile is now distinguished by a Featured Contributor Badge and is displayed on the Featured Contributor Board. Thank you!

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract