I think you are looking for symbolic matrix capabilities, and the Wolfram Language can do some nontrivial computations on this difficult area. There are purely symbolic capabilities based on group-theoretical algorithms in TensorReduce and TensorExpand, and then there are also indexed-differentiation capabilities in D itself.
I just posted the notebook https://wolframcloud.com/obj/b50c9dd5-bfec-4651-940d-ccf640238be3 in the Wolfram Cloud with some examples of indexed differentiation in relation to the question you mention at the end of your post. This is an area under development for us. The results can probably be expressed in a more compact way using symmetrizations.
Regarding square roots of matrices, I think the problem you pose is too general. Can you restrict it to some family of objects with some special properties that can help finding a solution? I do not understand immediately the information you mention about those loss functions.
Let us show the identity you found using the functions defined in the notebook above:
Define your initial and final objects:
A[i_, j_] := KroneckerDelta[i, j] p[i] - p[i] p[j]
B[i_, j_] := KroneckerDelta[i, j] Sqrt[p[i]] - Sqrt[p[i]] p[j]
Then this is the identity you have found, using the notations in the notebook, and with the simplification functions defined there:
In[]:= A[i, k] - sum[B[j, i] B[j, k], {j, n}] // SumExpand // DeltaSimplify // Simplify
Out[]= -p[i] p[k] (-1 + Inactive[Sum][p[j], {j, n}])
That's zero if p is normalized to have its components to add 1.