# Message Boards

Answer
(Unmark)

GROUPS:

This project explores the potential of the Meijer-G function to learn a black-box process and perform symbolic regression on a given set of data generated by such a process. The black-box process can be a system for which we have limited understanding, such as a neural network, or it can be any natural or artificial phenomenon that leads to the generation of data and for which we wish to increase our comprehension. The basis for this work is the referenced article where the authors propose a novel method for modeling data that comprises of the repetitive fitting of Meijer-G terms in order to capture the behaviour present in the data. The use of Meijer-G as a ridge function is motivated by its excellent analytic characteristics and its great expressibility that, especially for common functions, is usually achieved through only a small number of parameters. The same characteristics allow the Meijer-G based models to learn and reveal hidden information about the process that generated the data, ranging from an explanation of how the different factors contribute to the result to the actual law that generates the data. All these provide excellent interpolation and extrapolation capabilities that are highly sought after in modeling and regression analysis. In this project, we implemented the algorithm presented in the referenced article and utilizing the advanced optimization capabilities of Mathematica, we explore some key examples that illustrate both the power of this approach as well as recognize important issues and challenges that lay ahead and may lead to even greater future improvements.
Theory of the Meijer-G Function The Meijer-G function is one of the most generic special functions that can express all the elementary functions as well as many advanced special functions like many kinds of Hypergeometric functions, Bessel functions etc. The main definition of the Meijer-G function employes a contour integration over the complex plane. mn G pq
1 2πi Γ(1- a 1 a n b 1 b m Γ( a n+1 a p b m+1 b q -s z where 0≤m≤q 0≤n≤p m n p q a i b j i∈{1,...,n} j∈{1,...,m} z≠0 Γ( b i i∈{1,...,m} Γ(1- a i i∈{1,...,n} In order to see the scope of this function, we present some of the better known expressions that cover the usual elementary functions. z 1,0 G 0,1
Sin(z)= π 1,0 G 0,2 2 z 4
Sinh(z)=- π 1,0 G 0,2 2 z 4
ArcTan(z)= 1 2 1,2 G 2,2 2 z
Log(1+z)= 1,2 G 2,2
Also, most of the advanced special functions are also supported like the lower and upper incomplete gamma functions γ(a,z) 1,1 G 1,2
Γ(a,z) 2,0 G 1,2
the Bessel function of the first and second kind along with their modified variants
as well as most hypergeometric functions like the Gauss’ hypergeometric function or even the generalized hypergeometric function p F q p>q+1
In addition to this wide functional coverage, Meijer-G is characterized by excellent analytic properties like the h-th order derivative, being simply h z h d h dz mn G pq
mn+1 G p+1q+1
as well as the celebrated convolution theorem ∞ ∫ 0 mn G pq
μν G στ
1 η n+μ,m+ν G q+σ,p+τ ω η
1 ω m+ν,n+μ G q+τ,q+σ η ω
that allows for direct definite integration of a vast range of functions.
Meijer-G as a Symbolic Model The properties presented in the previous section show that Meijer-G is an excellent function to be used for modeling, learning and symbolic regression purposes. In particular, when we are presented with a set of data, we are interested in finding a symbolic expression that fits the data as good as possible. Linear regression is the most prominent representative of this class of problems, where one tries to balance a line (or more generally a hyperplane) through the data in an optimal way to minimize the approximation error that is left out. Symbolic regression is a more advanced kind of regression where, instead of trying to fit a simple line, one tries to discover the best symbolic expression among a particular class, that best approximates the data. If this class is large enough and contains the usual functions that commonly appear in practice, one can even hope to discover the actual law that generated the data. In this sense, Meijer-G has all the desirable characteristics in order to achieve this goal. It has a vast coverage of functions and can express all the common ones typically encountered in practical problems. What is more, the Meijer-G is an analytic function over the whole complex domain, with the possible exception of the origin z=0 z=1 In this project, we implement the above algorithm in Wolfram Language and study it from the perspective of symbolic regression. In particular, we are interested in studying its behavior and assess its capability to model various kinds of data. Furthermore, we want to examine its capacity to discover the function that generated the data. For the purposes of this exposition within the time confines of the summer school, we limit ourselves to the study only some specific, simple examples that serve as a proof of concept for the suitability of Meijer-G in performing symbolic regression and also allow us to recognize some key issues, that need further investigation.
The Basic Theorem The fundamental result showed in the referenced paper is that it takes only a small number of Meijer-G families in order to cover all the important functions. In particular, the theorem states that: Consider the set of Meijer-G functions of the form f mn G pq r z
where a 1 a p, b 1 b q r,s∈ (m,n,p,q)∈={(1,0,0,2),(0,1,3,1),(2,1,2,3),(2,2,3,3),(2,0,1,3)} f(z)=Φ(w l z t z with w,l,t∈ Φ∈id,sin,cos,sinh,cosh,exp,log(1+.),arcsin,arctan, J n Y n I n 1 1+. J n Y n I n Γ In particular, the first family covers the functions sin,cos,sinh cosh id exp Γ log(1+.),arcsin arctan J n Y n I n
The Fitting Algorithm: Symbolic Pursuit The fitting algorithm is called symbolic pursuit and conceptually, it is very close to the well-known projection pursuit. The main idea is that we have an error or more properly a residual function whose norm expresses how far we are from actually capturing the data. What we to do then is an attempt to fit a model function to the residual, in order to “explain” a part of it, and make the residual as small as possible. The model function in our case is the Meijer-G function. Since our model function will typically be unable to fully explain the whole residual function, we repeatedly add more terms trying to further reduce (or “hunt down”) the residual, one step at a time and hopefully in the most optimal way possible. The process stops when the improvement in the residual has become too small (here we select a relative improvement of less that 10%), effectively meaning than what is left is probably beyond the reach of even a linear combination of our model functions. The reason for saying “probably” is because the optimization problem in each step is non-convex and the search surface may contain regions that have not been explored which may lead to better approximations than what we discovered.The way the method supports multidimensional data is by utilizing each Meijer-G function as a ridge function, passing the data first through an affine transformation before handing the over to the Meijer-G. The general form of the affine transformation is v 1 x 1 v n x n v i x i + <v,x> v n +
Implementing Symbolic Pursuit Our code first starts with initializing some helper structures and functions that allow us to easily build up and access the different Meijer-G families symbolically. In particular, H is a list of four associations giving the hyperparameters (m,n,q,p) H=Association@@MapThread[Rule,{{"m","n","p","q"},#}]&/@{{1,0,0,2},{0,1,3,1},{2,1,2,3},{2,2,3,3},{2,0,1,3}}; Then we construct the function meijerG that accepts a single argument from 1 to 5, returning the corresponding Meijer-G function, with hyperparameters set to their corresponding family values and containing the parameters in symbolic form. The function returned is a pure function. meijerG[f_]:=(MeijerG[{Table[ a i a i b i b i The parameters in symbolic form for each Meijer-G function (including the coefficients of the affine transformation) are generated using the parameters function which accepts as arguments the Meijer-G family as well as the dimensions of the data. parameters[f_,dim_]:=Flatten@{w,Table[ v i a i a i b i b i The final helper function is lossF that returns the loss for a particular Meijer-G term in symbolic form. The symbolic quantities are the Meijer-G parameters, the affine transform coefficients and the weight of the Meijer-G. The lossF function takes as arguments the explanatory variables, the corresponding value of the response variables (which typically are the values of the residual function) and the family of the used Meijer-G. Note that the Max function in the affine transformation is used to keep the computation away from a potential singularity at zero. The loss function that is evaluated and returned is the mean squared difference between the response and the estimated Meijer-G model values. lossF[xTrain_List,yTrain_List,f_Integer]:=Module{dim,n,vVec,zVec,affine},dim=Length[xTrain[[1]]];n=Length[xTrain];(*AffineTransformCoefficients*)vVec=Table[ v i z i vVec.zVec Norm[vVec]Sqrt[dim] 2 (yTrain[[i]]-wmeijerG[f][affine]/.MapThread[Rule,{zVec,xTrain[[i]]}]) The first important function of our implementation has the task to fit a single Meijer-G function to a particular set of data. It is called FitMeijer and expects as arguments the values of the explanatory variables in xTrain, the response (or residual function) values in yTrain, the families the Meijer-G over which to search for an optimal fit in fCheck and whether the search method should be done "Locally", "Globally" or using "Both" approaches. If these two arguments are not provided, the method will search among all the five families using both local and global search and return the best result it finds. Local and Global search is performed using Wolfram Languages’ own FindMinimum and NMinimize functions respectively. The authors of the referenced paper opt to use Conjugate Gradient Descent, but we choose to let Mathematica’s methods to decide the best optimization algorithm for each case. The user has also the option to define the accuracy and precision goals for the optimization methods using AG and PG, in order to tweak them towards better and slower or quicker but less accurate results. The default value we have selected for these arguments is 3 digits, which usually give a pretty good tradeoff between accuracy/precision and time. We give some additional explanations as comments in the code that follows. The function returns the chosen fitted term as a list containing the family, the loss, and a list of rules connecting the symbolic parameters to their optimal values. FitMeijerG[xTrain_List,yTrain_List,fCheck_:{1,2,3,4,5},opt_:"Both",AG_:3,PG_:3]:=Module[{dim,n,optimizeLocally,optimizeGlobally,optimizeList,weightedLoss,minPos},(*Datadimensions*)dim=Length[xTrain[[1]]];(*Numberofsamples*)n=Length[xTrain];(*Checkwhetherthealocalsearchshouldtakeplace.ThesymboliclossfunctionsreturnedfromlossFisgiventotheMathematica'sFindMinimumfunctionthatperformsalocalsearch.*)If[opt=="Both"||opt=="Locally",optimizeLocally=Table[Echo[{f}~Join~FindMinimum[lossF[xTrain,yTrain,Echo@f],parameters[f,dim],AccuracyGoal->AG,PrecisionGoal->PG]],{f,fCheck}],optimizeLocally={}];(*Checkwhethertheaglobalsearchshouldtakeplace.ThesymboliclossfunctionsreturnedfromlossFisgiventotheMathematica'sNMinimizefunctionthatperformsaglobalsearch.*)If[opt=="Both"||opt=="Globally",optimizeGlobally=Table[Echo[{f}~Join~NMinimize[lossF[xTrain,yTrain,Echo@f],parameters[f,dim],AccuracyGoal->10,PrecisionGoal->10]],{f,fCheck}],optimizeGlobally={}];(*Combinethepreviousresultinasinglelist.*)optimizeList=Join[optimizeLocally,optimizeGlobally];(*Ifsomewistoobig,weightthelossesappropriately.*)If[Norm[optimizeList[[All,3,1]][[All,2]]]>100,weightedLoss=optimizeList[[All,2]]*optimizeList[[All,3,1]][[All,2]];(*Ifsomeweightswarebig,weightthelossvaluesbythecorrespondingweightsandselectthebestresult.*)minPos=Flatten@Position[weightedLoss,Min[weightedLoss]];,(*Ifweightwisnotbig,returntheresultwiththeminimumlossvalue.*)minPos=Flatten@Position[optimizeList[[All,2]],Min[optimizeList[[All,2]]]];];Return[Flatten[optimizeList[[minPos]],1]];] We would like to comment on the last few lines of code here. We noticed that in some cases, the optimization algorithm may return a fitted Meijer-G with an exceedingly high value for the weight w w Finally, the second important function is LearnSymbolicModel. This is a rather long function whose purpose is to build upon FitMeijerG in order to successively add new Meijer-G terms and create the final model. Due to the term-by-term operation of the function and in order to pay closer attention to the inner-workings of each step, we chose to use mainly procedural programming for constructing it. The function accepts as arguments the data for the explanatory and the response variables as xTrain and yTrain respectively. The optional data are the same as the ones for FitMeijerG, with the only difference of selecting local search for the default method, since it tends to being faster and yielding better results. The basic operation LearnSymbolicModel does is to iteratively apply the FitMeijerG function, organize the returned results and keep a track of the progress made. Every new term is placed in the list curModelTerms and the family it belongs to is cataloged in fSet. preResidual and curResidual keep track of the progress and when its relative improvement becomes less that 10% the algorithm is terminated. This is being taken care of the While loop in the function. From the second model term onwards, backfitting is performed for each new term that gets in the model. The purpose of backfitting is to adjust the previous terms in the presence of the new term, in order to improve the overall model. Backfitting is performed by the For loop that resides inside the While loop of the function. In particular, what this does is the function visits each of the previous terms, takes away their contribution to minimizing the residual, and fitting them again to the resulting residual. After a term has been backfitted, its new contribution to the residual is being incorporated, in theory leading to a smaller residual. The reason we say “in theory” is that in practice, as we also comment in an example further down, we have observed that there are cases where the backfitting leads to worse results. We attribute this to the fact that, as it is implemented, backfitting does not start from the current set of parameters but, as customary, selects a random initial point and optimizes from there. This may lead the final result to a worse minimum compared to where it was. There are many ways that the implementation can be improved, starting from making sure the current state of the model parameters are taken into account, but for now we choose to discard the last iteration along with the new term should this happen, and recover its immediate previous step, stored in the curModelTermsBackup list. LearnSymbolicModel[xTrain_List,yTrain_List,fCheck_:{1,2,3,4,5},opt_:"Locally",AG_:3,PG_:3]:=Module{fSet,curResidual,preResidual,curModelTerms,fitResult,dim,n,vVec,zVec,affine,impPer,curModelTermsBackup,curResidualBack,j},(*Keeparecordofthefamiliesoftermspresentinthemodel.*)fSet={};(*Tworesidualtermstokeeptrackoftheimprovement.*)curResidual=yTrain;preResidual={};(*Alistofthetermscomprisingourmodel.*)curModelTerms={};Print["Fitting first term"];(*InvokeFitMeijerGtofitthefirsttermofthemodel.*)fitResult=FitMeijerG[xTrain,yTrain,fCheck,opt];Print["Selected: "<>ToString[First@fitResult]];(*Dimentionsoftheexplanatorydata*)dim=Length[xTrain[[1]]];(*Numberofsamples*)n=Length[xTrain];(*AffineTransformationCoefficients*)vVec=Table[ v i z i vVec.zVec Norm[vVec]Sqrt[dim] |