If you know the PDF of the prior you can compute the posterior distribution for univariate distributions manually using Bayes rule. For example, consider a normal distribution with fixed standard deviation \[Sigma]
and unknown mean \[Mu]
where \[Mu]
has a normal prior with hyperparameters m
and s
. Given an observation x
, we can compute the posterior distribution manually using Bayes rule
In[33]:= posterior=ProbabilityDistribution[PDF[NormalDistribution[m,s],\[Mu]]*Likelihood[NormalDistribution[\[Mu],\[Sigma]],{x}],{\[Mu],-\[Infinity],\[Infinity]},Method->"Normalize",Assumptions->(m|x)\[Element]Reals&&(\[Sigma]|s)\[Element]PositiveReals]
Out[33]= ProbabilityDistribution[(E^(-((\[FormalX]-m)^2/(2 s^2))-(-\[FormalX]+x)^2/(2 \[Sigma]^2)+(m-x)^2/(2 (s^2+\[Sigma]^2))) Sqrt[s^2+\[Sigma]^2])/(Sqrt[2 \[Pi]] s \[Sigma]),{\[FormalX],-\[Infinity],\[Infinity]},Assumptions->(m|x)\[Element]\[DoubleStruckCapitalR]&&(\[Sigma]|s)\[Element]\[DoubleStruckCapitalR]&&\[Sigma]>0&&s>0]
You'll notice that this result, while technically correct, is very messy.
In[39]:= {Mean,Variance}[posterior] // Through
Out[39]= {(s^2 x+m \[Sigma]^2)/(s^2+\[Sigma]^2),(s^2 \[Sigma]^2)/(s^2+\[Sigma]^2)}
The Wolfram Language doesn't implicitly know any conjugate distributions, nor does it seem to be able to identify when distributions have special known forms, outside of a few cases.
One feature that would make these kinds of calculations significantly easier, I believe, would be the ability to recognize exponential families automatically. This would allow the automatic calculation of many conjugate prior distributions, their update equations, and posterior predictive distributions.
Additionally, identifying conjugate priors would allow the system to fit posterior predictive distributions and more advanced clustering models efficiently using the EM algorithm automatically.
One final note is that the Wolfram Language cannot do symbolic matrix calculus when the number of dimensions aren't specified, which makes working with multivariate distributions somewhat of a chore. For example, the same calculation for a multivariate normal specified using matrix algebra doesn't work, since the Wolfram Language can't integrate with symbolic tensors to obtain the normalizing constant.