Message Boards Message Boards

Predictive Analytics in Finance

GROUPS:

We extend the discussion on machine learning one step further and focus on predictive analysis offered in the ML domain. Prediction builds on classification and clustering techniques discussed previously and uses pattern detection and similarity features in data to estimate the future outcome. This is particularly relevant to finance where the ability of data groups to predict the values of less-liquid instruments is of high interest. We demonstrate the prediction using CDS data and show the application of non-regression models as superior methods for predictive analysis.

PA image

Introduction

Classification and clustering which we discussed in previous installments naturally extends into another field of data mining - prediction. It is well-known that there will be limited benefit of massive amount of data collection if the data cannot be analysed, decomposed into signals and used to predict future behaviour. Therefore, data analysis in statistics and econometrics can be broken down into four major categories: (i) prediction, (ii) summarisation, (iii) estimation, and (iv) hypothesis testing. Machine learning , as we have stated previously, focuses primarily at data organisation and prediction. However, closely related field is also data summarisation where pattern finding in the data - covered by the classification - is of primary interest. Machine learning is therefore about developing high-performance computer systems that can provide useful predictions in the presence of challenging computational constraints. This whole area of data processing is today routinely described as data science, which is primarily looks at both prediction and summarisation, but also extends into data manipulation, visualization, and other similar tasks.

In pure statistical terms, the prediction problem reduces to finding conditional distribution of some variable y given some other variables $x= (x_ 1,x_ 2,...x_n)$. In data science methodology, variables x are called 'predictors' or 'features'. The prediction point y is generally the highest value (mean) of the calibrated conditional distribution.

Prediction basics

Much of applied econometrics, economics and finance is concerned with detecting and summarising relationships in the data. The most common tool is (linear) regression analysis. Machine learning offers richer set of tools that can usefully summarise various sorts of nonlinear relationships in the data. We will focus on these regression-like tools because they are the most natural for economic and financial applications.

The objective of prediction is to find some function that gives a good estimate of y as function of x. In machine learning context the data we work with are generally considered independent and identically distributed. They may be 'fat' which implies more predictors relative to observations or 'tall' which indicated many observations relative to predictors.

When we build prediction routines, we start with some observation of y and x and then want to obtain the predicted point for the new value x .'Good' prediction involves finding a function that minimises some error estimation. Sum of squared residuals, mean residual value are generally the most frequently used tools in this context. Tradition all method to solve this problem is regression - typically linear or logistic However, machine learning offers better options especially when there is a lot of data available. These include trees, nearest neighbors, random forests or neural networks.

Working with predictions

Objective of prediction routines is to find good out-of-sample estimation to the data. This task is not always trivial and regressions are generally known to be 'weak' around out-of-sample predictions. This is generally known as 'overfitting' problem. Machine learning addresses this problem through:

  • Regularisation - preference is given to simpler models
  • Dataset division into (i) training and (ii) testing sets
  • Cross-validation of the dataset and the model

Prediction relation to other data mining techniques

Irrespective of the method structure, there is a common confusion between explanatory and predictive models. In a predictive model only the model's accuracy is important and the structure of the model is relatively unimportant, as long as it is robust. In general, models become more robust as the number of predictors declines, and hence there is a greater tendency towards parsimony in predictive models, often using variable selection algorithms.

However, if a model is developed for explanatory purposes, the identity and weights of the predictors is the most important part of the model because these are used to infer the nature of the relationship between the class and the values of the predictor variables.

The contribution made by the predictors is obviously most important in an explanatory, rather than a predictive, model. In predictive models the role of predictors is less important than the accuracy of the prediction. Consequently, the use of variable selection routines that produce a more parsimonious model are generally more acceptable in predictive models.

Prediction in Finance

The advantages offered by machine learning are well suited for the data analysis in finance, particularly when the larger datasets are available. For years, the regression technique was dominant in economics and finance on the premise of limited amount of data. However,nowadays, in data-rich era, there is no reason not to start applying machine learning methods more frequently in economic and financial industries with data separation or the use of validation sets.

Problem definition

Let's define the following problem: we have available set of actively traded and liquid CDS rates ( 5 years of daily rates) that we would like to use for prediction of lesser liquid CDS. These types of problem frequently occur in the credit or equity markets where liquidity factor drives certain instruments into specific categories.

  • 3 CDS will serve as features
  • We split the data into three sets: (i) training, (ii) testing and (iii) validation
  • We apply several predictive routines to obtain the future value in the out-of-sample domain

Data generation

vols = {0.01, 0.008, 0.011, 0.008};
crl = ({
    {1, 0.35, 0.45, 0.36},
    {0.35, 1, 0.43, 0.32},
    {0.45, 0.43, 1, 0.46},
    {0.36, 0.32, 0.46, 1}
   });
cm = Table[vols[[i]]*vols[[j]]*crl[[i, j]], {i, 1, Length[vols]}, {j, 1, Length[vols]}]

{{0.0001, 0.000028, 0.0000495, 0.0000288}, {0.000028, 0.000064, 0.00003784, 0.00002048}, {0.0000495, 0.00003784, 0.000121, 0.00004048}, {0.0000288, 0.00002048, 0.00004048, 0.000064}}

init = {0.55, 0.72, 1.25, 1};
mn = MultinormalDistribution[{0, 0, 0, 0}, cm];
data = Accumulate[Prepend[RandomVariate[mn, 1500], init]];
ListLinePlot[Transpose[data], 
 PlotLegends -> {"cds 1", "cds 2", "cds 3", "cds 4"}, 
 PlotLabel -> Style["CDS of observed series", 15]]

enter image description here

traindata = data[[;; 600]];
validata = data[[601 ;; 900]];
testdata = data[[901 ;;]];

trainset = 
  Table[Drop[traindata, None, -1][[i]] -> 
    Flatten[Take[traindata, All, -1]][[i]], {i, 1, Length[traindata]}];
testset = 
  Table[Drop[testdata, None, -1][[i]] -> 
    Flatten[Take[testdata, All, -1]][[i]], {i, 1, Length[testdata]}];
validset = 
  Table[Drop[validata, None, -1][[i]] -> 
    Flatten[Take[validata, All, -1]][[i]], {i, 1, Length[validata]}];

{ListLinePlot[Transpose[traindata], 
  PlotLegends -> {"cds 1", "cds 2", "cds 3", "cds 4"}, 
  PlotLabel -> Style["Training data", 15]], 
 ListLinePlot[Transpose[testdata], 
  PlotLegends -> {"cds 1", "cds 2", "cds 3", "cds 4"}, 
  PlotLabel -> Style["Testing data", 15]], 
 ListLinePlot[Transpose[validata], 
  PlotLegends -> {"cds 1", "cds 2", "cds 3", "cds 4"}, 
  PlotLabel -> Style["Validation data", 15]]}

enter image description here

We can investigate the predictors separately by looking at histogram of the training data:

edata = Transpose[Drop[traindata, None, -1]];
{Histogram[edata[[1]], 20, ChartStyle -> "Pastel", 
  PlotLabel -> "Feature 1"], 
 Histogram[edata[[2]], 20, ChartStyle -> "Rainbow", 
  PlotLabel -> "Feature 2"], 
 Histogram[edata[[3]], 20, ChartStyle -> "NeonColors", 
  PlotLabel -> "Feature 3"]}

enter image description here

Setting the predictors

Having defined the training, testing and validation sets, we can now set up the prediction model:

Take[trainset, 10]

{{0.55, 0.72, 1.25} -> 1, {0.571876, 0.719578, 1.24021} -> 0.997824, {0.577651, 0.719425, 1.23595} -> 1.00456, {0.573492, 0.706407, 1.23029} -> 1.00455, {0.564589, 0.721109, 1.2309} -> 0.999223, {0.582398, 0.73248, 1.2497} -> 1.00538, {0.583999, 0.727467, 1.25179} -> 0.999114, {0.592003, 0.721993, 1.24155} -> 0.990911, {0.592474, 0.718474, 1.23417} -> 0.985031, {0.603327, 0.710775, 1.22836} -> 0.98689}

Given the size of the training set we select Random Forest algorithm for prediction routine

pred = Predict[trainset, ValidationSet -> validset, Method -> "RandomForest", PerformanceGoal -> "Quality"]

enter image description here

and obtain the information on the model

PredictorInformation[pred]

enter image description here

The automatic routine selects the Random Forest as the most optimal algorithm for this task

Conditional distribution

We can obtain the information about the distribution of the predicted value:

dist = pred[Drop[init, -1], "Distribution"]

NormalDistribution[1.00262, 0.0288765]

It is confirmed that the predicted value is normally distributed with mean close to the and low volatility.

PDF[dist, x]
Plot[PDF[dist, x], {x, 0.8, 1.25}, PlotStyle -> {Thick, Purple}, 
 FillingStyle -> LightGray, Filling -> Axis, 
 PlotLabel -> Style["Conditional PDF of the predicted value", 15]]

13.8155 E^(-599.627 (-1.00262 + x)^2)

enter image description here

Testing predictor accuracy

We may be interested to check how accurate the predictor works relative to the original data.

plotdata = Drop[traindata, None, -1];
adata = Transpose[Take[traindata, All, -1]] // Flatten;
pdata = Table[pred[plotdata[[i]]], {i, 1, Length[traindata]}];
ListLinePlot[{adata, pdata}, 
 PlotLabel -> Style["Actual and Predicted data", 15], 
 PlotLegends -> {"Actual", "Predicted"}, PlotStyle -> {Blue, Red}]

enter image description here

We can see that the predictor replicates the original data quite well.

Verification with Testing data set

We can test and validate the classifier with the <ClassifierMeasurements> command. For example, we can used the testing set defined above and look at its subset:

Take[testset, 10]

{{0.485579, 0.621485, 1.15959} -> 1.0466, {0.485904, 0.630628, 1.15086} -> 1.04336, {0.477018, 0.618847, 1.14833} -> 1.04416, {0.492894, 0.627871, 1.15071} -> 1.04294, {0.496431, 0.626622, 1.15025} -> 1.04363, {0.501258, 0.629071, 1.14728} -> 1.04505, {0.506147, 0.630316, 1.15071} -> 1.05803, {0.492122, 0.627663, 1.15387} -> 1.05949, {0.485113, 0.626491, 1.16395} -> 1.0621, {0.476587, 0.632491, 1.16424} -> 1.05147}

pm = PredictorMeasurements[pred, testset]

enter image description here

The classifier returns the correct class group ping for all categories with high precision.

  • Resting of residuals

    pm["ResidualPlot"]
    

enter image description here

  • Testing of comparison

    pm["ComparisonPlot"]
    

enter image description here

Predictor sensitivity to the features variability

We look how the predictor reacts to the features input w.r.t. calibrated conditional distribution

{Plot[pred[{x, 0.7, 1.25}], {x, 0.4, 0.7}, PlotStyle -> Blue, 
  Filling -> Axis, FillingStyle -> LightYellow, 
  PlotLabel -> Style["1st CDS sensiitivtiy", 13]], 
 Plot[pred[{0.6, x, 1.25}], {x, 0.6, 0.8}, PlotStyle -> Magenta, 
  Filling -> Axis, FillingStyle -> LightGray, 
  PlotLabel -> Style["2nd CDS sensiitivtiy", 13]], 
 Plot[pred[{0.6, 0.7, x}], {x, 1.1, 1.3}, PlotStyle -> Green, 
  Filling -> Axis, FillingStyle -> LightBrown, 
  PlotLabel -> Style["3rd CDS sensiitivtiy", 13]]}

enter image description here

As charts above suggest, each feature affects the predictor differently:

  • 1st CDS both decreases and increases the predicted value as the CDS move upwards
  • 2nd CDS increased the predicted value on the entire range, however the degree of impact varies
  • 3rd CDS is impact less on the lower side of the range and then increases the predicted CDS rate

Predicting outcomes with new data

We now apply the predictor to the out-of-sample data based on the specification above:

  • Define new covariance matrix

    tvols = {0.015, 0.02, 0.03};
    tcorr = ({
        {1, 0.4, 0.5},
        {0.4, 1, 0.45},
        {0.5, 0.45, 1}
       });
    tcm = Table[ tvols[[i]]*tvols[[j]]*tcorr[[i, j]], {i, 1, Length[tvols]}, {j, 1,  Length[tvols]}]
    

{{0.000225, 0.00012, 0.000225}, {0.00012, 0.0004, 0.00027}, {0.000225, 0.00027, 0.0009}}

  • Generate the new CDS data for each feature

    newinit = {0.68, 0.93, 1.4};
    mn = MultinormalDistribution[{0, 0, 0}, tcm];
    tdata = Accumulate[Prepend[RandomVariate[mn, 150], newinit]];
    ListLinePlot[Transpose[tdata], 
     PlotLegends -> {"new cds 1", "new cds 2", "new cds 3"}, 
     PlotLabel -> Style["Out-of-sample CDS data", 15]]
    

enter image description here

newdata = Table[predauto[tdata[[i]]], {i, 1, Length[tdata]}];
ListLinePlot[newdata, PlotTheme -> "Web", 
 PlotLabel -> 
  Style["Predicted value of CDS for put-of-sample input", 15]]
![enter image description here][14]

We have obtained the new predicted data which we can investigate further:

  • Data distribution

    edist = SmoothKernelDistribution[newdata]
    

enter image description here

    Plot[PDF[edist, x], {x, 1, 1.3}, 
     PlotLabel -> "Probability density of the predicted value", 
     Filling -> Axis, PlotRange -> All]

enter image description here

This is the estimated probability density from the predicted data.

  • Descriptive statistics

    stats = {Mean, Median, Variance, Min, Max, Skewness, Kurtosis};
    TableForm[Through[stats[newdata]], TableHeadings -> {stats, None}]
    

enter image description here

  • Histogram

    Histogram[newdata, 25, ChartStyle -> "Rainbow"]
    

enter image description here

Alternative prediction methods

We can test other prediction methods available in Predict command:

  • Linear regression

    plinreg = 
     Predict[trainset, ValidationSet -> validset, 
      Method -> "LinearRegression", PerformanceGoal -> "Quality"]
    

enter image description here

    pmlin = PredictorMeasurements[plinreg, testset];
    pmlin["ResidualPlot"]

enter image description here

    plinreg[Drop[init, -1], "Distribution"]

NormalDistribution[1.09925, 0.0769071]

We observe higher predicted value and higher volatility relative to Random Forest

  • NearestNeighbors

    pnn = Predict[trainset, ValidationSet -> validset, 
      Method -> "NearestNeighbors", PerformanceGoal -> "Quality"]
    

enter image description here

    pnntest = PredictorMeasurements[pnn, testset];
    pnntest["ResidualPlot"]

enter image description here

pnn[Drop[init, -1], "Distribution"]

NormalDistribution[0.982606, 0.0304784]

NearestNeighbors produces excellent fit to the data, but significantly higher volatility

  • Neural network

    pneural = 
     Predict[trainset, ValidationSet -> validset, 
      Method -> "NeuralNetwork", PerformanceGoal -> "Quality"]
    

enter image description here

    pneuraltest = PredictorMeasurements[pneural, testset];
    pneuraltest["ResidualPlot"]

enter image description here

pneural[Drop[init, -1], "Distribution"]

NormalDistribution[0.974001, 0.0461196]

Reasonable fit to the data but with higher volatility than Random Forest.

Conclusion

Prediction is useful technique for finding relationship in data, especially if we have at our disposal larger set of data. Machine learning comes particularly handy in detecting patterns better than traditional linear model. The tools and methods such as decision trees, neutral networks or nearest neighborhood offer more flexibility to find complex relationship

Prediction as a method to estimate outcome from supporting features naturally offers practical solutions to finance and economics where the estimation can be quite valuable. Market data generation, growth rates forecast, marketing campaign model testing or inflation prediction are just few examples where machine learning with prediction focus becomes essential tool for decision making.Richer contents and better prediction ability makes machine learning more powerful than traditional regression techniques.

Attachments:
POSTED BY: Igor Hlivka
Answer
2 years ago

enter image description here - Congratulations! This post is now a Staff Pick as distinguished by a badge on your profile! Thank you, keep it coming!

POSTED BY: Moderation Team
Answer
1 month ago

Very interesting and lovely presentation!

I get confused at the Prediction model part. I have no idea what the data you gave it represents.

You know allot about prediction I'm hoping you can explain how to use Mathematica's "machine learning" (how to set up a prediction model?) and what it's doing in terms of (what older math book might say) (what it can learn, and where it fails as opposed to classical math "do it all by hand" approach).

Is this "machine learning" Mathematica is doing similar to Expected Value? Ev= x1p1+x2p2+...xk*pk ? If so is that what the "priming data" is used for?

For counting I'm familiar with using the Fundamental Principle of Counting, Probability (event spaces, logic, dependency), Expected Value, and the many statistical technique (which i don't put much weight on)

Great post!

POSTED BY: John Hendrickson
Answer
1 month ago

HI John

Machine learning in general extends traditional areas of probability and statistics into the subject of 'data science'. Probabilits would call this 'non-parametric' field of distribution representation. I agree, it may be bit confusing for traditional statisticians alike, but you may think about this as an 'extension' of statistical sciences into adjacent fields of science where we are modelling future given the observation in the past. The entire time series analysis is built on this premise.

The entire theory of data science / machine learning resides on the concept of 'learning from data'. The idea as such is not new, but what is new is its transformation into a scientific subject. Progress on the hardware and software side made this transition feasible. Wolfram Research was one of the first who recognised this aspect and entered into the data science world with the set of tools and routines to make data science as smooth as possible.

What I like in particular on Mathematica is its approach to make the entire ML practice as simple as possible. All functions and algorithms are finely tuned and automated. You do not have to be data science expert to start practicing ML in Mathematica. My article was all about it - once you have your data, you start analysing it and building prediction models quickly and efficiently. Mathematica will select for you the most optimal models to give you decent predictions. Further fine-tuning is possible if you want to drive your predictions 'manually' or if you want to build predictions using particular model,

When you look at the documentation on ML, you will see the information of models, methods, and options available. Yes, you're right - the models are essentially expectations on the outcome given their calibration to the past data. Models examine the data, detect relationships, patterns and features, and then set the parameters for future predictions. Regression in statistics is one of these 'early' examples of how ML work. However, ML is much richer and capable to extract much more dependencies than traditional regression techniques in statistics.

I do encourage you to start exploring data science further, it is an exciting area of science, very dynamic and still evolving. People have started re-discovering the power of information contained in the data, so the ML offers bring business prospects. With your background in statistics and number theory, the transition towards the full data scientists profile should be smooth and quick. If you would like to get further guidance in this field, I will be glad to assist further.

Best Igor

POSTED BY: Igor Hlivka
Answer
1 month ago

I gather that Neural Networks are "pre-coded predict" by choosing a "Net Model" (that these have complex Predict rules too length to enter by hand), but that if one used Predict facilities at length they could implement these models the same. But I'm unsure I understand if Neural Networks is not some other kind of animal. I didn't see a tutorial on how to use mm's resources posted in mm's front-end help. (perhaps I can do more looking on mm's website for tutorials)

essentially expectations on the outcome given their calibration to the past data

Ev=x1p1+x2p2+... here, x1 is the known weight (cost), and p1 is the calculated probability it will occur. so for a lottery giving 3 prizes of $2, $3, $4 and a 1/10000000000 chance of winning Ev= 2/e-100 + 3/e-100 + 4/e-1000

I think what your saying is that there isn't a firm connection to "Probability & Statistics" mathematica package, that while statistics is well implemented as used methods, probability could be used only like any other mathematica functions could be used. I mostly mean that there are not options to change the probabilities.

I don't see a way for Predict in->out to use functions (#+x)& rather than real values as inputs: except if ActivePrediction is used? It doesn't like either of these (noting i believe ActivePredict does like them)

Predict[{x -> x + 1, x -> x*2}, Method -> "LinearRegression"] (* not allowed? *)
Predict[{# -> # + 1, # -> #*2}, Method -> "LinearRegression"] (* not allowed? *)

Anyway thank you for the reply that you already gave, I enjoyed.

POSTED BY: John Hendrickson
Answer
1 month ago

Group Abstract Group Abstract