Message Boards Message Boards

2
|
4417 Views
|
2 Replies
|
6 Total Likes
View groups...
Share
Share this post:

Discovering symbolic square roots?

Posted 2 years ago

Is there any technology in Mathematica (or even as pure research), that would let me discover formulas for square root of a matrix specified as a formula?

For example for any vector with positive entries adding up to 1, the following factorization seems to hold

$$A=\text{diag}(p)-pp'=B^T B$$

where

$$B=\text{diag}(\sqrt{p}) - (\sqrt{p}) p^T$$

As you can check with the code below: p = {1, 10, 3}; p = With[{q = Abs[p]}, q/Total[q]]; (* positive + normalized *) A = DiagonalMatrix[p] - Outer[Times, p, p]; B = DiagonalMatrix[Sqrt[p]] - Outer[Times, Sqrt[p], p]; A == Transpose[B] . B

I got this formula by expanding the components and staring at them for a while. This obviously doesn't scale. This was a factorization of cross-entropy hessian, and there are at least 20 more loss functions I'd like to do .... can anyone think of a way such formulas could be discovered automatically?

This begs the question of what "formula" is....if we restrict attention to factorizations of Hessians, a symbolic way to derive them is this . You can see that if your original loss was specified in terms of pointwise ops and some tensor contractions, then its Hessian will consist of a sum of terms with some pointwise ops and some tensor contractions.

The author posted formula for 3rd derivative of cross-entropy loss on Mathematica stack-exchange so you can see that small formulas remain small after differentiation.

POSTED BY: Yaroslav Bulatov
2 Replies

I think you are looking for symbolic matrix capabilities, and the Wolfram Language can do some nontrivial computations on this difficult area. There are purely symbolic capabilities based on group-theoretical algorithms in TensorReduce and TensorExpand, and then there are also indexed-differentiation capabilities in D itself.

I just posted the notebook https://wolframcloud.com/obj/b50c9dd5-bfec-4651-940d-ccf640238be3 in the Wolfram Cloud with some examples of indexed differentiation in relation to the question you mention at the end of your post. This is an area under development for us. The results can probably be expressed in a more compact way using symmetrizations.

Regarding square roots of matrices, I think the problem you pose is too general. Can you restrict it to some family of objects with some special properties that can help finding a solution? I do not understand immediately the information you mention about those loss functions.

Let us show the identity you found using the functions defined in the notebook above:

Define your initial and final objects:

A[i_, j_] := KroneckerDelta[i, j] p[i] - p[i] p[j]

B[i_, j_] := KroneckerDelta[i, j] Sqrt[p[i]] - Sqrt[p[i]] p[j]

Then this is the identity you have found, using the notations in the notebook, and with the simplification functions defined there:

In[]:= A[i, k] - sum[B[j, i] B[j, k], {j, n}] // SumExpand // DeltaSimplify // Simplify
Out[]= -p[i] p[k] (-1 + Inactive[Sum][p[j], {j, n}])

That's zero if p is normalized to have its components to add 1.

Thanks for the in-depth reply!

It's intriguing that your pSimplify can discover compact formulas for higher derivatives of cross-entropy, is there an explanation of how it works?

To respond to your question, I don't have a good definition of the scope, but roughly speaking, taking a second derivative of a small algebraic expression like cross-entropy above, is there way to obtain A=BB' expression in the example above automatically?

POSTED BY: Yaroslav Bulatov
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract