Group Abstract

Message Boards

WOLFRAM COMMUNITY

7.3K Views

1 Reply

8 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Staff Picks Mathematics Mathematica Add-Ons Algebra Wolfram Language Packages Statistics and Probability Computer-Based Maths

Obtaining Gaussian expectation formulas with einsums and TensorSimplify

Yaroslav Bulatov

Yaroslav Bulatov, Together AI

Posted 4 years ago

I wanted to share an application of Carl Woll's TensorSimplify/einsum utilities, which I first discovered here and I think should be known more widely. There's a number of special formulas involving expectations of Gaussian variables, see section 20.5.2 of Seber "A Matrix Handbook for Statisticians". For instance, the following holds for a general matrix $A$ and $x$ distributed as zero-centered Gaussian with covariance $\Sigma$: $$E[(x'Ax)^3]=\left[\text{trace}(A\Sigma)\right]^3 + 6\ \text{trace}(A\Sigma)\text{trace}\left[(A\Sigma)^2\right]+8\ \text{trace}\left[(A\Sigma)^3\right]$$ These was painstakingly derived by hand, but a more general and gentler approach is possible with Mathematica. As a motivating example, I'll show how to use Mathematica to get a formula for $E[xx'Axx']$ which I needed for my work but couldn't find in textbooks. Use Einstein's summation notation to write: $$E[xx'Axx']_{il}=M_{ijkl}A_{jk}$$ Where $M_{ijkl}$ is the 4th moment tensor $E[x_i x_j x_k x_l]$ The next trick is the use Isserlis/Wicks Theorem. For centered Gaussian variable $x$, Isserlis theorem tells us that $$E[x_i x_j x_k x_l]=E[x_i x_k]E[x_j x_l] +E[x_i x_l]E[x_j x_k]+ E[x_i x_j]E[x_k x_l] $$ Right-hand side of this formula is easier to remember in diagrammatic notation Now substitute this into original formula involving $M$ to get three different einsums, and use TensorSimplify package to obtain matrix expressions for each of them. However, if we need non-centered Gaussian variables, Isserlis theorem doesn't apply. In such case, use Mathematica to derive a more general form of Isserlis. Isserlis theorem is a special case of a general law linking cumulants and moments together - the general compositional formula (section 2.3.4 of McCullough book). This fundamental equation, linking moments and cumulants, is implemented in Mathematica's `MomentConvert` function. To use this function here use the fact that cumulants of order above 2 are zero in a Gaussian. This property is the defining property of a Gaussian, and number "2" is special. This property is the reason why Gaussian expectations have nice algebraic expressions. To use it in practice, use Mathematica's `MomentConvert` function to convert your target moment to cumulants, set to zero all cumulants above order 2, `MomentConvert` each of the cumulant terms back to moments, then `Simplify`. In our example involves 4th moment tensor $M_{ijkl}$ so we start with `Moment[{1,1,1,1}]`, and get the following fourth moment formula in terms of first and second moments: $$E[x_i x_j x_k x_l]=E[x_i x_j]E[x_k x_l] + E[x_i x_k]E[x_j x_l]+E[x_i x_l]E[x_j x_k]-2 E[x_i]E[x_j]E[x_k]E[x_l] $$ Equivalently in diagrammatic notation: You can see that it is almost the Isserlis theorem for centered Gaussian, but with an extra correction term involving mean $E[x_i]=\mu$. Now plug this formula into original einsum involving $M$, distribute summation to get 4 einsums, and use `FromTensor` from the `TensorSimplify` package to obtain matrix expressions for each of them. For instance to get matrix expression for mean term, run this command: `FromTensor@TensorReduce@ einsum[{{2, 3}, {1}, {2}, {3}, {4}} -> {1, 4}, A, mu, mu, mu, mu];` The result corresponds to the following identity: $$\sum_{jk} A_{jk} \mu_i \mu_j \mu_k \mu_l=\mu^T A\mu (\mu\otimes \mu) $$ Running this for all 4 terms, we get the following formula $$E[xx'Axx']=E[xx']AE[xx']+E[xx']A^T E[xx'] + E[xx'] \text{Tr}(A^T E[xx'])-2 \mu^T A\mu \mu\otimes \mu$$ As a correctness check, create a multivariate Gaussian with random integer parameters, use `Expectation` to compute expectations, and use `Reduce` to ensure exact equality with our formula. This package saved me time in deriving formulas needed to obtain heuristics needed to properly set learning rate hyper-parameter for my neural network training experiments, here's a write-up of underlying theory. Below are related useful identities I obtained using this approach (background: SE post,Google doc). $$ \begin{eqnarray} X^2&=&E[xx']\\ E[(xx')(xx')]&=&2X ^2+X^2 \text{Tr} X^2 -2\\|\mu\\|^2 \mu \mu'\\ E[(x'x)(xx')]&=&E[(xx')(xx')]\\ E[xx'\otimes xx']&=&2P_d( X^2\otimes X^2) + \text{vec} X^2 (\text{vec} X^2)'-2(\mu \mu')\otimes (\mu \mu')\\ E[\langle x, x\rangle^2]&=&2\text{tr}(X^2 X^2)+\text{tr}(X^2)^2-2\\|\mu\\|^4\\ E[\langle x, y\rangle^2]&=&\text{jointly Gaussian x,y: formula involving }E[xx'],E[yy'],E[xy'] \end{eqnarray} $$ This is a semi-automatic approach. A useful contribution would be a utility that could automatically derive such formulas (from this post and from Seber, Section 20.5) Attachments: forum-tensor-reduce.nb

POSTED BY: Yaroslav Bulatov

1 Reply

Sort By:

EDITORIAL BOARD

EDITORIAL BOARD, WOLFRAM

Posted 4 years ago

-- you have earned *Featured Contributor Badge* Your exceptional post has been selected for our editorial column *Staff Picks* http://wolfr.am/StaffPicks and Your Profile is now distinguished by a *Featured Contributor Badge* and is displayed on the Featured Contributor Board. Thank you!

POSTED BY: EDITORIAL BOARD

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback