Community RSS Feed
http://community.wolfram.com
RSS Feed for Wolfram Community showing any discussions in tag Staff Picks sorted by active[WSC18] Classifying & Converting the Major Currencies By Using Neural Nets
http://community.wolfram.com/groups/-/m/t/1382598
![enter image description here][1]
#Introduction
Throughout the years, I've traveled around the world experiencing new places, eating great food, and learning about other cultures. In order to do some of these things, I needed to use the national currency of the country I was in. Whether it was using Euros in France to buy some macarons, or using the Pound in England to buy tea, I noticed that each currency is unique and its design is a part of the country's identity. My project allows a user to input as many images of different coins, and select a currency to convert them all to. The Microsite identifies what country the coin is from, the name of the currency, the value in native currency, and the converted value.
#Gathering Data
Since the Wolfram database didn't have a large dataset of foreign coin images, we had to gather some data manually. We started out by using WebImageSearch and keywords to pull images off of Bing. Since coin designs change every couple years we used en.ucoin.com as a reference as to what the current design was. Then, we edited the set of images that the WebImageSearch gave me. We then exported the relevant images to a folder named after the coin's country of origin, and named the file after its name and value. The files were saved as .wxf. We did this process for all 42 coins and additionally pulled more data from Google Images.
![enter image description here][2]
#Different Approaches to Feeding the Neural Network
Creating the right data set and matching it with a network took 4 attempts
##Attempt 1
For the first try, we created training data by joining all of the data saved in the folders (at the time this was just Euros and Canadian Dollars) and randomized it. We threaded each folder and gave the images labels of the country. We tried to use the Classify function but found that it was not complex enough.
fullEUData =
Thread[Flatten[
Map[ImageResize[#, {50, 50}] &, Import[#]] & /@
FileNames["*", "coin_data/euro"]] -> "EU"];
fullCanadianData =
Thread[Flatten[
Map[ImageResize[#, {50, 50}] &, Import[#]] & /@
FileNames["*", "coin_data/canadian"]] -> "Canadian"];
trainingData = RandomSample@Join[fullEUData, fullCanadianData];
cf = Classify[trainingData]
![enter image description here][3]
##Attempt 2
Since I didn't have the skills to build my own neural network, my mentor suggested we use the ImageIdentify net. We were able to modify the net to fit the goal of the project and had great results with the net. However, during this attempt we found that the net was depending on the transparent or white backgrounds of the pure images.
![enter image description here][4]
##Attempt 3
After analyzing the results of the 2nd attempt, we decided to process and format the images. We removed the backgrounds from each image, randomized the brightness, contrast, and angles, and layered the coins onto a background of blended color and noise.
preprocess[image_ (*Removes background*)
] :=
ImagePad[RemoveBackground@image, 5, Padding -> None]
background // Clear;
background := (*Creates a random background*)
Blend[{ConstantImage[RandomColor[], {224, 224}],
RandomImage[1, {224, 224}, ColorSpace -> "RGB"]}, RandomReal[]];
randomLighting[
image_] := (*randomizes brightness and contrast of image*)
ImageAdjust[image, RandomReal[{-.25, .25}, 2]];
overlay[coin_Image, background_Image] := (*Layers everything together*)
ImageCompose[
ImageCrop[background, {224, 224}]
,
ImagePerspectiveTransformation[
randomLighting@ImageResize[coin, RandomReal[{60, 260}]],
IdentityMatrix[2] + RandomReal[{-.75, .75}, {2, 2}],
DataRange -> Full,
Background -> White
]
,
{RandomReal[{60, 190}], RandomReal[{70, 180}]}
]
For this attempt we also created data using a generator rather than having a pre made dataset. This helped the net train much quicker and was great at identifying 3 currencies. However, when we increased the number of currencies to 7, it struggled.
![enter image description here][5]
##Attempt 4 / Final Attempt
In the last 3 attempts we had assigned the images from each country the same label, ex: "USA" or "UK". We realized that the net could get confused due to thinking up to 7 different coins were all the same thing. So, we changed the labels from country, to country & value. Ex: "USA_0.25". Then, we joined all of the countries data and that was the fullOriginalData set. This was the final attempt and worked the best because each image of a specific value and country had its own label, therefore avoiding confusion for the neural net.
fullOriginalData =
RandomSample[
Join[fullCanadianData, fullEUData, fullUKData, fullChinaData,
fullJapanData, fullSwissData, fullUSData, fullUSData]];
We used the same process to format the images with backgrounds, but instead of using a generator to create data, we used a pre-made set. In order to create test data we split the fullData 80%, 20%.
createRandomData // ClearAll;
createRandomData[coin_ -> label_, background_] :=
Thread[overlay[preprocess@coin, background] -> label];
data1 = Table[createRandomData[RandomChoice[fullOriginalData], background],
Length[fullOriginalData]];
data2 = Table[createRandomData[RandomChoice[fullOriginalData], background],
Length[fullOriginalData]];
(*Random group of data*)
fullData = RandomSample@Flatten@Join[data1, data2];
Later on, we increased the data set from 900 to 5,000.
trained =
NetTrain[new, trainingData, ValidationSet -> testdata,
BatchSize -> 20, MaxTrainingRounds -> 30]
##Accuracy
The results of the neural net were extremely accurate. With a 99.58% accuracy rate on training data, and a 99.52% accuracy rate on test data, the program is able to make an accurate guess every time.
![enter image description here][6]
##Assigning the Output with a List of Characteristics
The net outputs the label which gives the country and value, ex: "USA_0.25". In order to have the microsite's output look better, we created a list of lists that included written out characteristics.
assignments =
Association[ {"CAN_1" -> {"Canada", "CanadianDollars", 1.00},
"CAN_2" -> {"Canada", "CanadianDollars", 2.00},
"CAN_0.50" -> {"Canada", "CanadianDollars", 0.50},
"CAN_0.25" -> {"Canada", "CanadianDollars", 0.25},
"CAN_0.10" -> {"Canada", "CanadianDollars", 0.10},
"CAN_0.05" -> {"Canada", "CanadianDollars", 0.05},
"CAN_0.01" -> {"Canada", "CanadianDollars", 0.01},
"EU_1" -> {"European Union" , "Euros", 1.00},
"EU_0.50" -> {"European Union", "Euros", 0.50},
"EU_0.20" -> {"European Union", "Euros", 0.20},
"EU_2" -> {"European Union", "Euros", 2.00},
"EU_0.10" -> {"European Union", "Euros", 0.10},
"EU_0.05" -> {"European Union", "Euros", 0.05},
"EU_0.02" -> {"European Union", "Euros", 0.02},
"EU_0.01" -> {"European Union", "Euros", 0.01},
"UK_1" -> {"United Kingdom", "BritishPounds", 1.00},
"UK_2" -> {"United Kingdom", "BritishPounds", 2.00},
"UK_0.50" -> {"United Kingdom", "BritishPounds", 0.50},
"UK_0.20" -> {"United Kingdom", "BritishPounds", 0.20},
"UK_0.10" -> {"United Kingdom", "BritishPounds", 0.10},
"UK_0.05" -> {"United Kingdom", "BritishPounds", 0.05},
"UK_0.02" -> {"United Kingdom", "BritishPounds", 0.02},
"UK_0.01" -> {"United Kingdom", "BritishPounds", 0.01},
"CHN_1" -> {"China", "ChineseYuan", 1.00},
"CHN_0.50" -> {"China", "ChineseYuan", 0.50},
"CHN_0.10" -> {"China", "ChineseYuan", 0.10},
"JPN_1" -> {"Japan", "Yen", 1}, "JPN_5" -> {"Japan", "Yen", 5},
"JPN_10" -> {"Japan", "Yen", 10},
"JPN_50" -> {"Japan", "Yen", 50},
"JPN_100" -> {"Japan", "Yen", 100},
"JPN_500" -> {"Japan", "Yen", 500},
"SUI_5" -> {"Switzerland", "SwissFrancs", 5.00},
"SUI_2" -> {"Switzerland", "SwissFrancs", 2.00},
"SUI_1" -> {"Switzerland", "SwissFrancs", 1.00},
"SUI_0.50" -> {"Switzerland", "SwissFrancs", 0.50},
"SUI_0.20" -> {"Switzerland", "SwissFrancs", 0.20},
"SUI_0.05" -> {"Switzerland", "SwissFrancs", 0.05},
"USA_0.25" -> {"United States", "USDollars", 0.25},
"USA_0.10" -> {"United States", "USDollars", 0.10},
"USA_0.05" -> {"United States", "USDollars", 0.05},
"USA_0.01" -> {"United States", "USDollars", 0.01}
}];
The second element in the sublist is the name of the currency saved in Mathematica. This string is an input for the function CurrencyConvert. Not only does the Microsite list this value as the name of the currency, but it also fetches it to use in the conversion feature.
##Deploying the Microsite
Using CloudDeploy and FormFunction, we made a microsite that allows the user to input as many images as they'd like from the 7 currencies into the drop box. The site also allows the user to select one of the 7 currencies in the dropdown menu to covert all of the coins to. The output gives the image, country, name of currency, value in native currency, and value in converted currency all in table form. At the bottom of the page the output also neatly displays the total in original currencies, and the total in the converted currency.
![enter image description here][7]
![enter image description here][9]
##Future Work
In the future, this project can be expanded by:
Adding all of the currencies in the world
, Since the design changes often, date the currencies by years
, Use this to create a mobile app that allows the user to easily take a photo of the coins they have.
##Acknowledgements
This project could not have been completed without the help and insight from my mentor Rick Hennigan
[Click Here to view the Microsite][10]
##Computational Essay Is Down Below
[1]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-13at1.11.38PM.png&userId=1371841
[2]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-13at1.22.37PM.png&userId=1371841
[3]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-13at1.35.35PM.png&userId=1371841
[4]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-13at1.38.33PM.png&userId=1371841
[5]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-13at1.45.54PM.png&userId=1371841
[6]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-13at1.59.43PM.png&userId=1371841
[7]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-12at7.01.17PM.png&userId=1371841
[8]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-12at7.01.35PM.png&userId=1371841
[9]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-13at2.09.09PM.png&userId=1371841
[10]: https://www.wolframcloud.com/objects/kennethleeny/CoinIdentifierMorgan Lee2018-07-13T18:12:01Z[WSC18] Music Sentiment Analysis through Machine Learning
http://community.wolfram.com/groups/-/m/t/1383518
![A Representation of the emotion categorization system][1]
----------
#Abstract
This project aims to develop a machine learning application to identify the sentiments in a music clip. The data set I used consists of one hundred 45-second clips from the Database for Emotional Analysis of Music and an additional 103 gathered by myself. I manually labeled all 203 clips and used them as training data for my program. This program works best with classical-style music, which is the main component of my data set, but also works with other genres to an reasonable extent.
#Introduction
One of the most important functions of music is to affect emotion, but the experience of emotion is ambiguous and subjective to individual. The same music may induce a diverse range of feelings in people as a result of different context, personality, or culture. Some underlying features and elements of music, however, usually lead to the same effect on the human brain. For example, louder music often leads to more intense emotional responses from people. This consistency provides a foundation to train a supervised machine learning program based on music feature extraction.
#Background
This project is based on James Russell's circumplex model, in which a two-dimensional emotion space is constructed from the x-axis of valence level and y-axis of arousal level, as shown above in the picture. Specifically, valence is a measurement of an emotion's pleasantness, whereas arousal is a measurement of an emotion's intensity. Russell's model provides a metric on which different sentiments can be compared and contrasted, creating four main categories of emotion: Happy (high valence, high arousal), Stressed (low valence, high arousal), Sad (low valence, low arousal), and Calm (high valence, low arousal). Within these main categories there are various sub-categories, labeled on the graph above. Notably, "passionate" is a sub-category that does not belong to any main category due to its ambiguous valence value.
----------
#Program Structure
The program contains a three-layer structure. The first layer is responsible for extracting musical features, the second for generating a list of numerical predictions based on different features, and the third for predicting and displaying the most probable emotion descriptors based on the second layer's output.
![enter image description here][2]
##First Layer
The first layer consists of 23 feature extractors that generate numerical sequence based on different features:
(*A list of feature extractors*)
feMin[audio_] := Normal[AudioLocalMeasurements[audio, "Min", List]]
feMax[audio_] := Normal[AudioLocalMeasurements[audio, "Max", List]]
feMean[audio_] := Normal[AudioLocalMeasurements[audio, "Mean", List]]
feMedian[audio_] := Normal[AudioLocalMeasurements[audio, "Median", List]]
fePower[audio_] := Normal[AudioLocalMeasurements[audio, "Power", List]]
feRMSA[audio_] := Normal[AudioLocalMeasurements[audio, "RMSAmplitude", List]]
feLoud[audio_] := Normal[AudioLocalMeasurements[audio, "Loudness", List]]
feCrest[audio_] := Normal[AudioLocalMeasurements[audio, "CrestFactor", List]]
feEntropy[audio_] := Normal[AudioLocalMeasurements[audio, "Entropy", List]]
fePeak[audio_] := Normal[AudioLocalMeasurements[audio, "PeakToAveragePowerRatio", List]]
feTCent[audio_] := Normal[AudioLocalMeasurements[audio, "TemporalCentroid", List]]
feZeroR[audio_] := Normal[AudioLocalMeasurements[audio, "ZeroCrossingRate", List]]
feForm[audio_] := Normal[AudioLocalMeasurements[audio, "Formants", List]]
feHighFC[audio_] := Normal[AudioLocalMeasurements[audio, "HighFrequencyContent", List]]
feMFCC[audio_] := Normal[AudioLocalMeasurements[audio, "MFCC", List]]
feSCent[audio_] := Normal[AudioLocalMeasurements[audio, "SpectralCentroid", List]]
feSCrest[audio_] := Normal[AudioLocalMeasurements[audio, "SpectralCrest", List]]
feSFlat[audio_] := Normal[AudioLocalMeasurements[audio, "SpectralFlatness", List]]
feSKurt[audio_] := Normal[AudioLocalMeasurements[audio, "SpectralKurtosis", List]]
feSRoll[audio_] := Normal[AudioLocalMeasurements[audio, "SpectralRollOff", List]]
feSSkew[audio_] := Normal[AudioLocalMeasurements[audio, "SpectralSkewness", List]]
feSSlope[audio_] := Normal[AudioLocalMeasurements[audio, "SpectralSlope", List]]
feSSpread[audio_] := Normal[AudioLocalMeasurements[audio, "SpectralSpread", List]]
feNovelty[audio_] := Normal[AudioLocalMeasurements[audio, "Novelty", List]]
<br/>
##Second Layer
Using data generated from the first layer, the valence and arousal predictors of the second layer provide 46 predictions for the audio input, based on its different features.
(*RMSAmplitude*)
(*Feature extractor*) feRMSA[audio_] := Normal[AudioLocalMeasurements[audio, "RMSAmplitude", List]]
dataRMSA = Table[First[takeLast[feRMSA[First[Take[musicFiles, {n}]]]]], {n, Length[musicFiles]}];
(*Generating predictor*) pArousalRMSA = Predict[dataRMSA -> arousalValueC]
![Sample predictor function][3]
<br/>
##Third Layer
The two parts of the third layer, main category classifier and sub-category classifier, each utilize the tensors generated in the second layer to make a prediction within their realm of emotion. The output consists of two parts, a main category emotion and a sub-category emotion.
(*Main*) emotionClassify1 = Classify[classifyMaterial -> emotionList1, PerformanceGoal -> "Quality"]
(*Sub*) emotionClassify2 = Classify[classifyMaterial -> emotionList2, PerformanceGoal -> "Quality"]
![enter image description here][4]
<br/>
##Output
If the program receives an input that is longer than 45 second, it will automatically clip the audio file into 45 second segments and return the result for each. If the last segment is less than 45 seconds, the program would still work fine on it, though with reduced accuracy. The display for each clip includes a main-category and a sub-category descriptor, with each of their associated probability also printed.
###Sample testing: Debussy's Clair de Lune
![enter image description here][5]
<br/>
----------
#Conclusion
The program gives very reasonable result for most music in the classical style. However, the program have three shortcomings that I plan to fix in later versions of the this program. Firstly, the program may give contradictory result (ex. happy and depressed) if the sentiment dramatically changes in the middle of a 45 second segment, perhaps reflecting the music's changing emotional composition. The current 45 second clipping window is rather long and thus prone to capture contradicting emotions. In the next version of this program, the window will probably be shortened to 30 or 20 seconds to reduce prediction uncertainty. Secondly, the program's processing speed has a lot of room of improvement. It currently takes about one and half minutes to compute an one minute audio file. In future versions I will remove relative ineffective feature extractors to speed things up. Lastly, the data used in creating this application is solely from myself, and therefore it is prone to my human biases. I plan to expand the data set with more people's input and more genres of music.
I have attached the application to this post so that everyone can try out the program.
#Acknowledgement
I sincerely thank my mentor, Professor Rob Morris, for providing invaluable guidance to help me carry out the project. I also want to thank Rick Hennigan for giving me crucial support with my code.
[1]: http://community.wolfram.com//c/portal/getImageAttachment?filename=8714Emotion2DSpace.PNG&userId=1371765
[2]: http://community.wolfram.com//c/portal/getImageAttachment?filename=2406DataStructure.PNG&userId=1371765
[3]: http://community.wolfram.com//c/portal/getImageAttachment?filename=8990capture1.PNG&userId=1371765
[4]: http://community.wolfram.com//c/portal/getImageAttachment?filename=5765capture2.PNG&userId=1371765
[5]: http://community.wolfram.com//c/portal/getImageAttachment?filename=9894capture3.PNG&userId=1371765William Yicheng Zhu2018-07-14T02:40:20Z[WSC18] Predicting the Halting Problem with Machine Learning
http://community.wolfram.com/groups/-/m/t/1384403
# A Machine Learning Analysis of the Halting Problem over the SKI Combinator Calculus
![A rasterised SK combinator with length 50, evaluated to 5 steps][16]
## Abstract
Much of machine learning is driven by the question: can we learn what we cannot compute? The learnability of the halting problem, the canonical undecidable problem, to an arbitrarily high accuracy for Turing machines was proven by Lathrop. The SKI combinator calculus can be seen as a reduced form of the untyped lambda calculus, which is Turing-complete; hence, the SKI combinator calculus forms a universal model of computation. In this vein, the growth and halting times of SKI combinator expressions is analysed and the feasibility of a machine learning approach to predicting whether a given SKI combinator expression is likely to halt is investigated.
## 1. SK Combinators
What we will refer to as 'SK Combinators' are expressions in the SKI combinator calculus, a simple Turing-complete language introduced by Schönfinkel (1924) and Curry (1930). In the same way that NAND gates can be used to construct any expression in Boolean logic, SK combinators were posed as a way to construct any expression in predicate logic, and being a reduced form of the untyped lambda calculus, any functional programming language can be implemented by a machine that implements SK combinators. While implementations of this language exist, these serve little functional purpose - instead, this language, a simple idealisation of transformations on symbolic expressions, provides a useful tool for studying complex computational systems.
### 1.1 Rules and Expressions
Formally, SK combinator expressions are binary trees whose leaves are labelled either '*S*', '*K*' or '*I*': each tree *(x y)* represents a function *x* applied to an argument *y*. When the expression is evaluated (i.e. when the function is applied to the argument), the tree is transformed into another tree, the 'value'. The basic 'rules' for evaluating combinator expressions are given below:
*k[x][y] := x*
The K combinator or 'constant function': when applied to *x*, returns the function *k[x]*, which when applied to some *y* will return *x*.
*s[x][y][z] := x[z][y[z]]*
The S combinator or 'fusion function': when applied to *x, y, z*, returns *x* applied to *z*, which is in turn applied to the result of *y* applied to *z*.
*i[x] := x*
The I combinator or 'identity function': when applied to *x*, returns *x*.
Note that the I combinator *I[x]* is equivalent to the function *S[K][a][x]*, as the latter will evaluate to the former in two steps:
*S[K][a][x]*
*= K[x][a[x]]*
*= x*
Thus the I combinator is redundant as it is simply 'syntactic sugar' - for the purposes of this exploration it will be ignored.
These rules can be expressed in the Wolfram Language as follows:
SKRules={k[x_][y_]:> x,s[x_][y_][z_]:> x[z][y[z]]}
### 1.2 Evaluation
The result of applying these rules to a given expression is given by the following functions:
SKNext[expr_]:=expr/.SKRules;
Returns the next 'step' of evaluation of the expression *expr* - evaluating all functions in *expr* according to the rules above without evaluating any 'new'/transformed functions.
SKEvaluate[expr_,n_]:=NestList[#1/.SKRules&,expr,n];
Returns the next *n* steps of evaluation of the expression *expr*
SKEvaluateUntilHalt[expr_,n_] := FixedPointList[SKNext,expr,n+1];
Returns the steps of evaluation of *expr* until either it reaches a fixed point or it has been evaluated for n steps, whichever comes first.
Note that, due to the Church-Rosser theorem (Church and Rosser, 2018), the order in which the rules are applied does not affect the final result, as long as the combinator evaluates to a fixed point / 'halts'. For combinators with no fixed point, which do not halt, the behaviour demonstrated as they evaluate could change based on the order of application of the rules - this is not explored here and is a topic for potential future investigation.
### 1.3 Examples
The functions above can be used to evaluate a number of interesting SK combinator expressions:
Column[SKEvaluateUntilHalt[s[k][a][x],10][[1;;-2]]]
[//]: # (No rules defined for Output)
The *I* combinator
Column[SKEvaluateUntilHalt[s[k[s[i]]][k][a][b],10][[1;;-2]]]
[//]: # (No rules defined for Output)
The reversal expression - *s[k][s[i]][k][a][b]* takes two terms, *a* and *b*, and returns *b[a]*.
## 2. Growth and Halting
### 2.1 Halting and Related Works
We will define a combinator expression to have halted if it has reached a fixed point - i.e. if no combinators in the expression can be evaluated, or if evaluating any of the combinators in the expression returns the original expression. As SK combinators are Turing-complete and so computationally universal, it is evident that the halting problem - determining whether or not a given SK combinator expression will halt - is undecidable for SK combinators. There are, however, patterns and trends in the growth of SK combinators, and it is arguably possible to speak of the probability of a given SK combinator expression halting.
Some investigations (Lathrop 1996) and (Calude and M. Dumitrescu 2018) have been made into probabilistically determining the halting time of Turing machines, with [2] proving that it is possible to compute some value K where for some arbitrary predetermined confidence *(1-\[Delta])* and accuracy *(1-\[Epsilon]),* a program that does
A. Input a Turing machine M and program I.
B. Simulate M on I for K steps.
C. If M has halted then print 1, else print 0.
D. Halt.
has a probability greater than *(1-δ)* of having an accuracy (when predicting whether or not a program will halt) greater than *(1-ε).* The key result of this is that, in some cases 'we can learn what we cannot compute' - 'learning' referring to Valiant's formal analysis as 'the phenomenon of knowledge acquisition in the absence of specific programming' (Valiant 1984).
### 2.2 Definitions and Functions
The size of a combinator expression can either be measured by its length (total number of characters including brackets) or by its leaf size (number of 's' and 'k' characters). We use the former in most cases, and the latter when randomly generating combinator expressions.
The number of possible combinator expressions with leaf size *n* is given by
SKPossibleExpressions[n_]:=(2^n)*Binomial[2*(n-2),n-1]/n
(Wolfram, 2002), which grows exponentially.
#### 2.2.1 Visualisation
We define a function to visualise the growth of a combinator, *SKRasterize*:
SKArray[expr_,n_]:=Characters/@ToString/@SKEvaluate[expr,n];
SKArray[expr_]:=SKArray[expr,10];
Generates a list of the steps in the growth of a combinator, where each expression is itself a list of characters ('s', 'k', '[', ']')
SKGrid[exp_,n_]:=ArrayPlot[SKArray[exp,n],{ColorRules->{"s"->RGBColor[1,0,0],"k"->RGBColor[0,1,0],"["->RGBColor[0,0,1],"]"->RGBColor[0,0,0]},PixelConstrained->True,Frame->False,ImageSize->1000}];
SKGrid[exp_]:=SKGrid[exp,10];
Generates an ArrayPlot of a list given by SKArray, representing the growth of a combinator in a similar manner to that of cellular automata up to step n. The y axis represents time - each row is the next expression in the evaluation of an SK combinator. Red squares indicate 'S', green squares indicate 'K', blue squares indicate '[' and black squares indicate ']'.
SKRasterize[func_,n_]:=Image[SKGrid[func,n][[1]]];
SKRasterize[func_]:=SKRasterize[func,10];
Generates a rasterized version of the ArrayPlot.
A visualisation of a given combinator can easily be produced, as follows:
SKRasterize[s[s[s]][s][s][s][k],15]
[//]: # (No rules defined for Output)
![The longest running halting expression with leaf size 7, halting in 12 steps (Wolfram, 2002)][1]
The longest running halting expression with leaf size 7, halting in 12 steps (Wolfram, 2002)
#### 2.2.2 Halting graphs
We can create a table of the length (string length) of successive combinator expressions as they evaluate as follows:
SKLengths[exp_,n_]:=StringLength/@ToString/@SKEvaluate[exp,n];
Returns a list of the lengths of successive expressions until step *n*
These can be plotted as a graph (x axis number of steps, y axis length of expression):
SKPlot[expr_,limit_]:=ListLinePlot[SKLengths[expr,limit],AxesOrigin->{1,0},AxesLabel->{"Number of steps","Length of expression"}];
Thus, a graph of the above combinator can be produced:
SKPlot[s[s[s]][s][s][s][k],15]
[//]: # (No rules defined for Output)
![A graph of the above combinator][2]
It is evident from the graph that this combinator halts at 12 steps.
#### 2.2.3 Random SK combinators
To empirically study SK combinators, we need a function to randomly generate them. Two methods to do this were found:
RecursiveRandomSKExpr[0,current_]:=current;
RecursiveRandomSKExpr[depth_,current_]:=
RecursiveRandomSKExpr[depth-1,
RandomChoice[{
RandomChoice[{s,k}][current],
current[RecursiveRandomSKExpr[depth-1,RandomChoice[{s,k}]]]
}]
];
RecursiveRandomSKExpr[depth_Integer]:=RecursiveRandomSKExpr[depth,RandomChoice[{s,k}]];
A recursive method, repeatedly appending either a combinator to the 'head' of the expression or a randomly generated combinator expression to the 'tail' of the expression. (Hennigan)
replaceWithList[expr_,pattern_,replaceWith_]:=ReplacePart[expr,Thread[Position[expr,pattern]->replaceWith]];
treeToFunctions[tree_]:=ReplaceRepeated[tree,{x_,y_}:>x[y]];
randomTree[leafCount_]:=Nest[ReplacePart[#,RandomChoice[Position[#,x]]->{x,x}]&,{x,x},leafCount-2];
RandomSKExpr[leafCount_]:=treeToFunctions[replaceWithList[randomTree[leafCount],x,RandomChoice[{s,k},leafCount]]];
Random combinator generation based on generation of binary trees - each combinator can be expressed as a binary tree with leaves 'S' or 'K'. (Parfitt, 2017)
While the first method gives a large spread of combinators with a variety of lengths, and is potentially more efficient, for the purposes of this exploration the second is more useful, as it limits the combinators generated to a smaller, more controllable sample space (for a given leaf size).
### 2.3 Halting Graphs
All combinators of leaf sizes up to size 6 evolve to fixed points (NKS):
exprs = Table[RandomSKExpr[6],10];
ImageCollage[Table[ListLinePlot[SKLengths[exprs[[n]],40]],{n,10}],Background->White]
[//]: # (No rules defined for Output)
![10 randomly generated combinators of size 6, with their lengths plotted until n=40.][3]
10 randomly generated combinators of size 6, with their lengths plotted until n=40.
As the leaf size increases, combinators take longer to halt, and some show exponential growth:
exprs = Table[RandomSKExpr[10],10];
ImageCollage[Table[ListLinePlot[SKLengths[exprs[[n]],40]],{n,10}],Background->White]
[//]: # (No rules defined for Output)
![10 randomly generated combinators of size 10, with their lengths plotted until n=20.][4]
10 randomly generated combinators of size 10, with their lengths plotted until n=20.
exprs = Table[RandomSKExpr[30],10];
ImageCollage[Table[ListLinePlot[SKLengths[exprs[[n]],40]],{n,10}],Background->White]
[//]: # (No rules defined for Output)
![10 randomly generated combinators of size 30, with their lengths plotted until n=40.][5]
10 randomly generated combinators of size 30, with their lengths plotted until n=40.
CloudEvaluate[exprs = Table[RandomSKExpr[50],10];
ImageCollage[Table[ListLinePlot[SKLengths[exprs[[n]],40]],{n,10}],Background->White]]
[//]: # (No rules defined for Output)
![10 randomly generated combinators of size 50, with their lengths plotted until n=40.][6]
10 randomly generated combinators of size 50, with their lengths plotted until n=40.
After evaluating a number of these combinators, it appears that they tend to either halt or grow exponentially - some sources (Parfitt, 2017) reference linear growth combinators, however none of these have been encountered as yet.
### 2.4 Halting Times
With a random sample of combinators, we can plot a cumulative frequency graph of the number of combinators that have halted at a given number of steps:
SKHaltLength[expr_,n_]:=Module[{x},
x=Length[SKEvaluateUntilHalt[expr,n+1]];
If[x>n,False,x]
]
Returns the number of steps it takes the combinator *expr* to halt; if *expr* does not halt within n steps, returns *False*.
GenerateHaltByTable[depth_,iterations_,number_]:=Module[{exprs,lengths},
exprs = Monitor[Table[RandomSKExpr[depth],{n,number}],n];
lengths = Monitor[Table[SKHaltLength[exprs[[n]],iterations],{n,number}],n];
Return[lengths]
]
Generates a table of the halt lengths of *number* random combinator expressions (*False* if they do not halt within *iterations* steps) with leaf size *depth*.
GenerateHaltData[depth_,iterations_,number_]:=Module[{haltbytable,vals},
haltbytable = GenerateHaltByTable[depth,iterations,number];
vals = BinCounts[Sort[haltbytable],{1,iterations+1,1}];
Table[Total[vals[[1;;n]]],{n,1,Length[vals]}]
]
Generates a table of the number of *number* random combinator expressions (*False* if they do not halt within *iterations* steps) with leaf size *depth* that have halted after a given number of steps
GenerateHaltGraph[depth_,iterations_,number_]:=Module[{cumulative,f},
cumulative=GenerateHaltData[depth,iterations,number];
f=Interpolation[cumulative];
{ListLinePlot[cumulative,PlotRange->{Automatic,{0,number}},GridLines->{{},{number}},Epilog-> {Red,Dashed,Line[{{0,cumulative[[-1]]},{number,cumulative[[-1]]}}]},AxesOrigin->{1,0},AxesLabel->{"Number of steps","Number of combinators halted"}],cumulative[[-1]]}
]
Plots a graph of the above data.
#### 2.4.1 Halting Graphs
We analyse halt graphs of random samples of 1000 combinators (to leaf size 30):
CloudEvaluate[GenerateHaltGraph[10,30,1000]]
[//]: # (No rules defined for Output)
![Leaf size 10: almost all combinators in the sample (997) have halted (99.7%).][7]
Leaf size 10: almost all combinators in the sample (997) have halted (99.7%).
CloudEvaluate[GenerateHaltGraph[20,30,1000]]
[//]: # (No rules defined for Output)
![Leaf size 20: 979 combinators in the sample have halted (97.9%).][8]
Leaf size 20: 979 combinators in the sample have halted (97.9%).
CloudEvaluate[GenerateHaltGraph[30,30,1000]]
[//]: # (No rules defined for Output)
![Leaf size 30: 962 combinators in the sample have halted (96.2%).][9]
Leaf size 30: 962 combinators in the sample have halted (96.2%).
CloudEvaluate[GenerateHaltGraph[40,30,1000]]
[//]: # (No rules defined for Output)
![Leaf size 40: 944 combinators in the sample have halted (94.4%).][10]
Leaf size 40: 944 combinators in the sample have halted (94.4%).
CloudEvaluate[GenerateHaltGraph[50,30,1000]]
[//]: # (No rules defined for Output)
![Leaf size 50: 889 combinators in the sample have halted (88.9%).][11]
Leaf size 50: 889 combinators in the sample have halted (88.9%).
Evidently, the rate of halting of combinators in the sample decreases as number of steps increases - the gradient of the graph is decreasing. As the graph levels out at around 30 steps, we will assume that the number of halting combinators will not increase significantly beyond this point.
As the leaf size increases, fewer combinators in the sample have halted by 30 steps - however, the graph still levels out, suggesting most of the combinators which have not halted by this point will never halt.
#### 2.4.2 Halting Times and Leaf Size
We can plot a graph of the number of halted combinators against leaf size:
CloudEvaluate[ListLinePlot[Table[{n,GenerateHaltGraph[n,30,1000][[2]]},{n,5,50,1}]]]
![A graph to show the number of combinators which halt within 30 steps in each of 45 random samples of 1000 combinators, with leaf size varying from 5 to 50.][12]
A graph to show the number of combinators which halt within 30 steps in each of 45 random samples of 1000 combinators, with leaf size varying from 5 to 50.
This graph shows that, despite random variation, the number of halted combinators decreases as the leaf size increases: curve fitting suggests that this follows a negative quadratic function.
FitData[data_,func_]:=Module[{fitd},fitd={Fit[data[[1,2,3,4,1]],func,x]};{fitd,Show[ListPlot[data[[1,2,3,4,1]],PlotStyle->Red],Plot[fitd,{x,5,50}]]}]
A curve-fitting function: plots the curve of best fit for *data* with some combination of functions *func*.
FitData[%,{1,x,x^2}]
[//]: # (No rules defined for Output)
![Curve-fitting on the data with a quadratic function][13]
{1012.07 - 1.18915 x - 0.0209805 x^2}
Curve-fitting on the data with a quadratic function yields a reasonably accurate curve of best fit.
## 3. Machine Learning Analysis of SK Combinators
The graphs above suggest that the majority of halting SK combinators with leaf size <=50 will halt before ~30 steps. Thus we can state that, for a randomly chosen combinator, it is likely that if it does not halt before 40 steps, it will never halt. Unfortunately a lack of time prohibited a formal analysis of this, in the vein of Lathrop's work - this is an area for future research.
We attempt to use modern machine learning methods to predict the likelihood of a given SK combinator expression halting before 40 steps:
### 3.1 Dataset Generation
We implement a function *GenerateTable* to produce tables of random SK expressions:
SKHaltLength[expr_,n_]:=Module[{x},
x=Length[SKEvaluateUntilHalt[expr,n+1]];
If[x>n,False,x]
]
Returns the number of steps *expr* takes to halt if the given expression *expr* halts within the limit given (*limit*), otherwise returns *False*
GenerateTable[depth_,iterations_,number_]:=Module[{exprs,lengths},
exprs = Monitor[Table[RandomSKExpr[depth],{n,number}],n];
lengths = Monitor[Table[exprs[[n]]-> SKHaltLength[exprs[[n]],iterations],{n,number}],n];
lengths = DeleteDuplicates[lengths];
Return[lengths]
]
Returns a list of *number* expressions with leaf size *depth* whose elements are associations with key *expression* and value *number of steps taken to halt* if the expression halts within *iterations* steps, otherwise *False*.
*GenerateTable* simply returns tables random SK expressions - as seen earlier, these tend to be heavily skewed datasets as around 90% of random expressions generated will halt. Thus we must process this dataset to create a balanced training dataset: this is done with the function *CreateTrainingData*:
CreateTrainingData[var_]:=Module[{NoHalt,Halt,HaltTrain,Train},
NoHalt = Select[var,#[[2]]==False&];
Halt = Select[var,#[[2]]==True&];
HaltTrain = RandomSample[Halt,Length[NoHalt]];
Train = Join[HaltTrain,NoHalt];
Return[Train]
];
Counts the number of non-halting combinators in *var* (assumption is this is less than number of halting combinators), selects a random sample of halting combinators of this length and concatenates the lists.
ConvertSKTableToString[sktable_]:=Table[ToString[sktable[[n,1]]]-> sktable[[n,2]],{n,1,Length[sktable]}];
Converts SK expressions in a table generated with *GenerateTable* to strings
We also implement a function to create rasterised training data (where instead of an individual SK combinator associated with either True or False, an image of the first 5 steps of evaluation of the combinator is associated with either True or False):
CreateRasterizedTrainingData[var_]:=Module[{NoHalt,Halt,HaltTrain,HaltTrainRaster,NoHaltTrainRaster,RasterTrain},
NoHalt = Select[var,#[[2]]==False&];
Halt = Select[var,#[[2]]==True&];
HaltTrain = RandomSample[Halt,Length[NoHalt]];
HaltTrainRaster=Monitor[Table[SKRasterize[HaltTrain[[x,1]],5]-> HaltTrain[[x,2]],{x,1,Length[HaltTrain]}],x];
NoHaltTrainRaster=Monitor[Table[SKRasterize[NoHalt[[x,1]],5]-> NoHalt[[x,2]],{x,1,Length[NoHalt]}],x];
RasterTrain = Join[HaltTrainRaster,NoHaltTrainRaster];
Return[RasterTrain]
];
Counts the number of non-halting combinators in *var* (assumption is this is less than number of halting combinators), selects a random sample of halting combinators of this length, evaluates and generates images of both halting and non-halting combinators and processes them into training data (image->True/False).
### 3.2 Markov Classification
#### 3.2.1 Training
As a first attempt, we generate 2000 random SK expressions with leaf size 5, 2000 expressions with leaf size 10 ... 2000 expressions with leaf size 50, evaluated up to 40 steps:
lengths = Flatten[Table[GenerateTable[n,40,2000],{n,5,50,5}]]
We convert all non-False halt lengths to 'True':
lengths = lengths/.(a_->b_)/;!(b===False):> (a->True);
We process the data and train a classifier using the Markov method:
TrainingData = CreateTrainingData[lengths];
TrainingData2 = ConvertSKTableToString[TrainingData];
HaltClassifier1 = Classify[TrainingData2,Method->"Markov"]
[//]: # (No rules defined for Output)
#### 3.2.2 Testing
We must now generate test data, using the same parameters for generating random combinators:
testlengths = Flatten[Table[GenerateTable[n,40,2000],{n,5,50,5}]]
testlengths = testlengths/.(a_->b_)/;!(b===False):> (a->True);
TestData = CreateTrainingData[testlengths];
TestData2 = ConvertSKTableToString[TestData];
The classifier can now be assessed for accuracy using this data:
TestClassifier1 = ClassifierMeasurements[HaltClassifier1,TestData2]
[//]: # (No rules defined for Output)
#### 3.2.3 Evaluation
A machine learning solution to this problem is only useful if the accuracy is greater than 0.5 (i.e. more accurate than a random coin flip). We test the accuracy of the classifier:
TestClassifier1["Accuracy"]
0.755158
This, while not outstanding, is passable for a first attempt. We find the training accuracy:
ClassifierInformation[HaltClassifier1]
![Classifier Information][14]
The training accuracy (71.3%) is slightly lower than the testing accuracy (75.5%) - this is surprising, and is probably due to a 'lucky' testing dataset chosen.
We calculate some statistics from a confusion matrix plot:
TestClassifier1["ConfusionMatrixPlot"]
![Confusion Matrix Plot][15]
Accuracy: 0.76
Misclassification rate: 0.24
Precision (halt): 0.722 (when 'halt' is predicted, how often is it correct?)
True Positive Rate: 0.83 (when the combinator halts, how often is it classified as halting?)
False Positive Rate: 0.32 (when the combinator doesn't halt, how often is it classified as halting?)
Precision (non-halt): 0.799 (when 'non halt' is predicted, how often is it correct?)
True Negative Rate: 0.68 (when the combinator doesn't halt, how often is it classified as not halting?)
False Negative Rate: 0.17 (when the combinator halts, how often is it classified as not halting?)
A confusion matrix plot shows that the true positive rate is larger than the true negative rate - this would suggest that it is easier for the model to tell when an expression halts than when an expression does not halt. This could be due to the model detecting features suggesting very short run time in the initial string - for instance, a combinator k[k][<expression>] would evaluate immediately to k and halt - however, these 'obvious' features are very rare.
### 3.3 Random Forest Classification on Rasterised Expression Images
Analysing strings alone, without any information about how they are actually structured or how they might evaluate, could well be a flawed method - one might argue that, in order to predict halting, one would need more information about how the program runs. Hence, another possible method is to generate a dataset of visualisations of the first 5 steps of a combinator's evaluation as follows:
SKRasterize[RandomSKExpr[50],5]
![A rasterised SK combinator with length 50, evaluated to 5 steps][16]
and feed these into a machine learning model. Although it might seem that this method is pointless - we are already evaluating the combinators to 5 steps, and we are training a model on a database of combinators evaluated to 40 steps to predict if a combinator will halt in <=40 steps, the point of the exercise is less to create a useful resource than to investigate the feasibility of applying machine learning to this type of problem. If more computational power was available, a dataset of combinators evaluated to 100 steps (when even more combinators will have halted) could be created: in such a case a machine learning model to predict whether or not a combinator will halt in <=100 steps would be a practical approach as the time taken to evaluate a combinator to 100 steps is exponentially longer than that taken to evaluate a combinator to 5 steps.
3.3.1 Training
We generate a dataset of 2000 random SK expressions with leaf size 5, 2000 expressions with leaf size 10 ... 2000 expressions with leaf size 50, evaluated up to 40 steps:
rasterizedlengths = Flatten[Table[GenerateTable[n,40,2000],{n,5,50,5}]];
In order to train a model on rasterised images, we must evaluate all SK expressions in the dataset to 5 steps and generate rasterised images of these:
RasterizedTrainingData = CreateRasterizedTrainingData[rasterizedlengths];
We then train a classifier on this data:
RasterizeClassifier=Classify[RasterizedTrainingData,Method->"RandomForest"]
#### 3.3.2 Testing
We must now generate test data, using the same parameters for generating random training data:
testrasterizedlengths = Flatten[Table[GenerateTable[n,40,2000],{n,5,50,5}]];
testrasterizedlengths = testrasterizedlengths/.(a_->b_)/;!(b===False):> (a->True);
TestRasterizedData = CreateRasterizedTrainingData[testrasterizedlengths];
The classifier can now be assessed for accuracy using this data:
TestRasterizeClassifier=ClassifierMeasurements[RasterizeClassifier,TestRasterizedData]
[//]: # (No rules defined for Output)
#### 3.3.3 Evaluation
A machine learning solution to this problem is only useful if the accuracy is greater than 0.5 (i.e. more accurate than a random coin flip). We test the accuracy of the classifier:
TestRasterizeClassifier["Accuracy"]
0.876891
This is significantly better than the Markov approach (75.5%). We find the training accuracy:
ClassifierInformation[RasterizeClassifier]
![enter image description here][17]
Again, the training accuracy (85.5%) is slightly lower than the testing accuracy (87.7%).
We calculate some statistics from a confusion matrix plot:
TestRasterizeClassifier["ConfusionMatrixPlot"]
![Confusion Matrix Plot][18]
Accuracy: 0.88
Misclassification rate: 0.12
Precision (halt): 0.911 (when 'halt' is predicted, how often is it correct?)
True Positive Rate: 0.83 (when the combinator halts, how often is it classified as halting?)
False Positive Rate: 0.08 (when the combinator doesn't halt, how often is it classified as halting?)
Precision (non-halt): 0.848 (when 'non halt' is predicted, how often is it correct?)
True Negative Rate: 0.92 (when the combinator doesn't halt, how often is it classified as not halting?)
False Negative Rate: 0.17 (when the combinator halts, how often is it classified as not halting?)
A confusion matrix plot shows that the false negative rate is larger than the false positive rate - this would suggest that it is easier for the model to tell when an expression halts than when an expression does not halt. The precision for halting is much higher than the precision for non-halting, indicating that if the model suggests a program will halt, this is much more likely to be correct than if it suggested that the program would not halt. An (oversimplified) way to look at this intuitively is to examine some graphs of lengths of random combinators:
![Random combinator length graphs][19]
Looking at combinators that halt (combinators for which the graph flattens out), some combinators 'definitely halt' - their length decreases until the graph flattens out:
!['definitely halts'][20]
'definitely halts' (1)
Some combinators have length that increases exponentially :
![exponentially increasing combinator length graph][21]
'possibly non-halting' (2)
And some combinators appear to have increasing length but suddenly decrease:
![increasing then decreasing combinator length graph][22]
'possibly non-halting' (3)
We do not know which features of the rasterised graphic the machine learning model extracts to make its prediction, but if, say, it was classifying based purely on length of the graphic, it would identify combinators like (1) as ' definitely halting', but would not necessarily be able to distinguish between combinators like (2) and combinators like (3), which both appear to be non - halting initially.
On a similar note, some functional programming languages (e.g. Agda - [7]) have the ability to classify a function as 'definitely halting' or 'possibly non-halting', just like our classifier, whose dataset is trained on functions that either 'definitely halt' (halt in <= 40 steps) or are 'possibly non-halting' (do not halt in <= 40 steps - might halt later).
### 3.4 Table of Comparison
![A table comparing statistics for Markov and Random Forest models][23]
### 4. Conclusions and Further Work
#### 4.1 Conclusions
The results of this exploration were somewhat surprising, in that a machine learning approach to determining whether or not a program will terminate appears to some extent viable - out of all the methods attempted, the random forest classifier applied to a rasterised image of the first five steps of the evaluation of a combinator achieved the highest accuracy of 0.88 on a test dataset of 1454 random SK combinator expressions. Note, though, that what is actually being determined here is whether or not a combinator will halt before some n steps (here, n=40) - we are classifying between combinators that 'definitely halt' and combinators which are 'possibly non-halting'.
### 4.2 Microsite
As an extension to this project, a Wolfram microsite was created and is accessible at [https://www.wolframcloud.com/objects/euan.l.y.ong/SKCombinators](https://www.wolframcloud.com/objects/euan.l.y.ong/SKCombinators) - within this microsite, a user can view a rasterised image of a combinator, a graph of the length of the combinator as it is evaluated, a statistical analysis of halting time relative to other combinators with the same leaf size and a machine learning prediction of whether or not the combinator will halt within 40 steps.
![Microsite Screenshot][24]
A screenshot of the microsite evaluating a random SK combinator expression
### 4.3 Implications, Limitations and Further Work
Although the halting problem is undecidable, the field of termination analysis - attempting to determine whether or not a given program will eventually terminate - has a variety of applications, for instance in program verification. Machine learning approaches to this problem would not only help explore this field in new ways but could also be implemented in, for instance, software debuggers.
The principal limitations of this method are that we are only predicting whether or not a combinator will halt in a finite number *k* of steps - while this could be a sensible idea if k is large, at present this system is very impractical due to small datasets and a small value of *k* used to train the classifier (*k *= 40). Another issue with the machine learning technique used is that the visualisations have different dimensions (longer combinators will generate longer images), and when the images are preprocessed and resized before being fed into the random forest model, downsampling/upsampling can lead to loss of data decreasing the accuracy of the model.
From a machine learning perspective, attempts at analysis of the rasterised images with a neural network could well prove fruitful, as would an implementation of a vector representation of syntax trees to allow the structure of SK combinators (nesting combinators) to be accurately extracted by a machine learning model.
Future theoretical research could include a deeper exploration of Lathrop's probabilistic method of determining *k*, an investigation of the 'halting' features the machine learning model is looking for within the rasterised images, a more general analysis of SK combinators (proofs of halting / non-halting for certain expressions, for instance) to uncover deeper patterns, or even an extension of the analysis carried out in the microsite to lambda calculus expressions (which can be transformed to an 'equivalent' SK combinator expression).
## Acknowledgements
We thank the mentors at the 2018 Wolfram High School Summer Camp - Andrea Griffin, Chip Hurst, Rick Hennigan, Michael Kaminsky, Robert Morris, Katie Orenstein, Christian Pasquel, Dariia Porechna and Douglas Smith - for their help and support writing this paper.
## References
A. Church and J. B. Rosser: Some properties of conversion. Transactions of the American Mathematical Society, 39 (3): 472\[Dash]482, 2018
R. H. Lathrop: On the learnability of the uncomputable. ICML 1996: 302-309.
C. S. Calude and M. Dumitrescu: A probabilistic anytime algorithm for the halting problem. Computability 7(2-3): 259-271, 2018.
L. G. Valiant: A theory of the learnable. Communications of the Association for Computing Machinery 27, (11): 1134-1142, 1984.
S. Wolfram: A New Kind of Science. 1121-1122, 2002.
E. Parfitt: Ways that combinators evaluate from the Wolfram Community\[LongDash]A Wolfram Web Resource. (2017)
http://community.wolfram.com/groups/-/m/t/965400
S-C Mu: Agda.Termination.Termination http://www.iis.sinica.edu.tw/~scm/Agda/Agda-Termination-Termination.html
Attached is a Wolfram Notebook (.nb) version of this computational essay.
[1]: http://community.wolfram.com//c/portal/getImageAttachment?filename=1.png&userId=1371970
[2]: http://community.wolfram.com//c/portal/getImageAttachment?filename=2.png&userId=1371970
[3]: http://community.wolfram.com//c/portal/getImageAttachment?filename=3.png&userId=1371970
[4]: http://community.wolfram.com//c/portal/getImageAttachment?filename=4.png&userId=1371970
[5]: http://community.wolfram.com//c/portal/getImageAttachment?filename=5.png&userId=1371970
[6]: http://community.wolfram.com//c/portal/getImageAttachment?filename=6.png&userId=1371970
[7]: http://community.wolfram.com//c/portal/getImageAttachment?filename=7.png&userId=1371970
[8]: http://community.wolfram.com//c/portal/getImageAttachment?filename=8.png&userId=1371970
[9]: http://community.wolfram.com//c/portal/getImageAttachment?filename=9.png&userId=1371970
[10]: http://community.wolfram.com//c/portal/getImageAttachment?filename=10.png&userId=1371970
[11]: http://community.wolfram.com//c/portal/getImageAttachment?filename=11.png&userId=1371970
[12]: http://community.wolfram.com//c/portal/getImageAttachment?filename=12.png&userId=1371970
[13]: http://community.wolfram.com//c/portal/getImageAttachment?filename=13.png&userId=1371970
[14]: http://community.wolfram.com//c/portal/getImageAttachment?filename=Classify1.png&userId=1371970
[15]: http://community.wolfram.com//c/portal/getImageAttachment?filename=15.png&userId=1371970
[16]: http://community.wolfram.com//c/portal/getImageAttachment?filename=16.png&userId=1371970
[17]: http://community.wolfram.com//c/portal/getImageAttachment?filename=17.png&userId=1371970
[18]: http://community.wolfram.com//c/portal/getImageAttachment?filename=18.png&userId=1371970
[19]: http://community.wolfram.com//c/portal/getImageAttachment?filename=19.png&userId=1371970
[20]: http://community.wolfram.com//c/portal/getImageAttachment?filename=20.png&userId=1371970
[21]: http://community.wolfram.com//c/portal/getImageAttachment?filename=21.png&userId=1371970
[22]: http://community.wolfram.com//c/portal/getImageAttachment?filename=22.png&userId=1371970
[23]: http://community.wolfram.com//c/portal/getImageAttachment?filename=23.png&userId=1371970
[24]: http://community.wolfram.com//c/portal/getImageAttachment?filename=24.png&userId=1371970Euan Ong2018-07-14T23:56:25Z[WSC18] Streaming Live Phone Sensor Data to the Wolfram Language
http://community.wolfram.com/groups/-/m/t/1386358
# Streaming Live Phone Sensor Data to the Wolfram Language
(This forms Part 1 of a 2-part community post: "Using Machine Learning Models for Accelerometer-based Gesture Recognition" - part 2 is available at http://community.wolfram.com/groups/-/m/t/1386392)
![Live demo of sensor streaming][1]
Not only are smartphones wonderful ways to stay connected to the digital world, they also contain an astonishing array of sensors making them ideal for scientific and computational experimentation.
The Wolfram Language (WL) has extraordinary data processing and scientific computing abilities - the only sensors, however, from which they can read data are specialised and either somewhat expensive or require a significant amount of setup. On a high level, the WL has baked-in support for a variety of devices - specifically the Raspberry Pi, Vernier Go!Link compatible sensors, Arduino microcontrollers, webcams and devices using the RS-232 or RS-422 serial protocol (http://reference.wolfram.com/language/guide/UsingConnectedDevices.html); unfortunately, there is no easy way to access sensor data from Android or iOS mobile devices.
In this post, I will attempt to combine the two, demonstrating
1. A UDP socket-based method for transmission of (general) sensor data from an Android phone to the Wolfram Language (based on this excellent community post: http://community.wolfram.com/groups/-/m/t/344278 which does the same for iPhones)
2. A web-based, platform-agnostic method for transmission of IMU / inertial motion unit data (i.e. accelerometer and gyroscope data) from a phone to the Wolfram Language.
# Socket Transmission
On the Google Play Store, there exist a number of Android apps which can transmit live sensor data to a computer over UDP sockets - for instance, "Sensorstream IMU+GPS" ([https://play.google.com/store/apps/details?id=de.lorenz_fenster.sensorstreamgps][1]). Unfortunately, the WL does not support receipt and transmission of data over UDP sockets - while there exists a *Socket* library, as of 2018 this is only capable of dealing with TCP. Thus, to use UDP sockets in the WL, we must implement our own library using JLink to access Java socket packages from the WL. (Credit is due to http://community.wolfram.com/groups/-/m/t/344278 - the code here was slightly outdated so had to be modified.)
## Instructions
To send accelerometer (or other sensor) data from your phone to Wolfram over UDP sockets:
1. Install the "Sensorstream IMU+GPS" app
2. Ensure the sensors you want to stream to Wolfram are ticked on the 'Toggle Sensors' page. (If you want to stream other sensors besides 'Accelerometer', 'Gyroscope' and 'Magnetic Field', ensure the 'Include User-Checked Sensor Data in Stream' box is ticked. Beware, though - the more sensors are ticked, the more latency the sensor stream will have.)
3. On the "Preferences" tab:
a. Change the target IP address in the app to the IP address of your computer (ensure your computer and phone are connected to the same local network)
b. Set the target port to 5555
c. Set the sensor update frequency to 'Fastest'
d. Select the 'UDP stream' radio box
e. Tick 'Run in background'
4. Switch stream ON **before** executing code. (nb. ensure your phone does not fall asleep during streaming - perhaps use the 'Caffeinate' app (https://play.google.com/store/apps/details?id=xyz.omnicron.caffeinate&hl=en_US) to ensure this.)
5. Execute the following WL code (in part from http://community.wolfram.com/groups/-/m/t/344278):
Initialise JLink
QuitJava[];
Needs["JLink`"];
InstallJava[];
Initialise a socket connection - ensure *5555* is the target port set
udpSocket=JavaNew["java.net.DatagramSocket",5555];
Function that reads *size* bytes of a function.
readSocket[sock_,size_]:=JavaBlock@Block[{datagramPacket=JavaNew["java.net.DatagramPacket",Table[0,size],size]},sock@receive[datagramPacket];
datagramPacket@getData[]]
Function that reads from the socket, processes data and 'sows' it to be collected later
listen[]:=record=DeleteCases[readSocket[udpSocket,1200],0]//FromCharacterCode//Sow;
Initialises the results list and repeatedly appends accelerometer data to it every 0.01 seconds - if the list is over 700 elements long, the 150 oldest elements (at start of list) are removed.
results={};RunScheduledTask[AppendTo[results,Quiet[Reap[listen[]]]];If[Length[results]>700,Drop[results,150]],0.01];
Initialises the stream list to be refreshed every 0.01 seconds with the most recent 500 elements of results. Each element of results is a string of transmitted socket data (e.g. "225585.00455, 3, -1.591, 8.624, 5.106, 4, -0.193, -0.690, -0.072") - this is split into a list of strings {"225585.00455", "3", "-1.591"...} and each string is converted to a numerical expression.
stream:=Refresh[ToExpression[StringSplit[#[[1]],","]]& /@ Select[results[[-500;;]],Head[#]==List&],UpdateInterval-> 0.01]
*Stream* now contains the 500 most recent accelerometer readings, stored in an array. The values of Stream will be updated whenever the variable is used within a *Dynamic*. (Note that, with the default sensors enabled - the first three boxes ticked on the *Toggle Sensors* tab - the x, y and z coordinates of the accelerometer can be accessed at elements 3, 4 and 5 in each list in the array. (e.g. to access the most recent accelerometer reading, run stream[[-1,3;;5]])
The accelerometer data can then be visualised using a *ListLinePlot*:
While[Length[results]<500,Pause[2]];Dynamic[Refresh[ListLinePlot[{stream[[All,3]],stream[[All,4]],stream[[All,5]]},PlotRange->All],UpdateInterval->0.1]]
![A list line plot of accelerometer data][2]
The 'pulses' (i.e. shaking the phone) were carried out every second; from this it is evident that the frequency of data transmission is 50 Hz (i.e. data is sent every 0.02 seconds).
To get the most recent accelerometer data, run
Dynamic[stream[[-1,3;;5]]]
To end socket transmission, turn off the stream on the app, run
RemoveScheduledTask[ScheduledTasks[]];
udpSocket@close[];
QuitJava[];
and ensure the process 'JLink' is quit in Task Manager / Activity Monitor etc - if it is not closed properly, you will be unable to create another socket from that port.
# Channel Transmission
An alternative way to send data from a phone to the Wolfram Cloud is by using the Channel framework. Introduced in version 11 of the Wolfram Language. the Channel framework allows asynchronous communication between Wolfram sessions as well as external systems, with communication being brokered in the Wolfram Cloud. A key point to note about the Channel framework is that it is based on a publish-subscribe model, allowing messages to be sent and received through a 'channel' rather than pairing specific senders and receivers.
##Instructions
To transmit accelerometer data, run the following code: (for other sensors, see the bottom of the page)
ChannelDeploySensorPage[func_]:=Module[{listener,listenerurl,SensorHTML,c,url,u},
CloudConnect[];
listener=ChannelListen["Sensors",func[#Message]&,Permissions->"Public"];
listenerurl = listener["URL"];
SensorHTML="<!DOCTYPE html><html lang=en><meta charset=UTF-8><title>Sensors</title><script src=https://cdn.jsdelivr.net/npm/gyronorm@2.0.6/dist/gyronorm.complete.min.js></script><script>function makeXHR(n,t,o){var e=Date.now(),r=(Math.random(),new XMLHttpRequest);r.withCredentials=!0;var i=\""<>listenerurl<>"?operation=send&time=\"+e.toString()+\"&x=\"+n.toString()+\"&y=\"+t.toString()+\"&z=\"+o.toString();r.open(\"GET\",i,!0),r.send()}function init(){var n={frequency:100,gravityNormalized:!0,orientationBase:GyroNorm.WORLD,decimalCount:2,logger:null,screenAdjusted:!1},t=new GyroNorm;t.init(n).then(function(){t.start(function(n){makeXHR(n.dm.x,n.dm.y,n.dm.z)})})}window.onload=init</script>";
c = CloudExport[SensorHTML,"HTML",Permissions->"Public"];
u=URLShorten[c[[1]]];
Return[{u,BarcodeImage[u,"QR"],listener}]
]
Then run
c = ChannelDeploySensorPage[Func]
![Output - a QR code][3]
(where the argument *func* is some function to be called whenever the channel receives a new point of data from the phone - the argument given to *func* is an association such as the one below:)
<|x=3, y=4, z=1|>
Now, simply scan the QR code generated with your phone, and sensor data will be streamed from your phone to the computer.
The data transmitted can be viewed as a time series as follows:
c[[3]]["TimeSeries"]
[//]: # (No rules defined for Output)
The accelerometer data can also be plotted with the following Dynamic: (red --> x, green --> y, blue --> z):
Dynamic[ListLinePlot[ToExpression/@Reverse[Take[Reverse[#["Values"]],UpTo[100]]]&/@c[[3]]["TimeSeries"][[2;;4]],PlotRange->{All,{-50,50}},PlotStyle->{Red, Green, Blue}]]
When you're done, delete the channel by running
RemoveChannelListener[c[[3]]]
##Explanation
Setting up a channel is as easy as connecting to the Wolfram Cloud
CloudConnect[];
and typing
current="";
Func[x_]:=current=x;
listener=ChannelListen["NameOfChannel",Func[#Message]&, Permissions->"Public"]
![A channel listener][4]
Here, *Func* is a function that will be called each time the channel receives a message (the message is supplied as an argument to the function) - it simply sets the variable 'current' to the data last sent to the channel (in the form of key-value pairs - e.g. <|x=3, y=4, z=1|>. To make the channel accessible to other users, ensure the channel has Permissions set to Public.
To delete the channel (useful when debugging), call
RemoveChannelListener[listener];
One particularly useful feature of the Channel is that it has built-in support for receiving and parsing HTTP requests - simply send a GET request to the channel URL (given by *listener["URL"]*) and the WL will automatically parse the parameters and make the data available to the user:
For instance, if we send an HTTP GET request to *https://channelbroker.wolframcloud.com/users/<your Wolfram Cloud email address>/NameOfChannel* and append the parameters "*operation=send*" (indicates data is being sent to the channel) and "*test=5*":
BaseURL=listener["URL"]
Params = "?operation=send&test=5";
URLRead[HTTPRequest[BaseURL<>Params,<|Method->"Get"|>]]
[//]: # (No rules defined for Output)
The variable 'current' has now been updated and contains the key-value pair 'test->5' which we just sent to the channel.
current
<|"test" -> "5"|>
[//]: # (No rules defined for Output)
current[["test"]]
5
[//]: # (No rules defined for Output)
An alternative way of viewing the data from the channel is to call
listener["TimeSeries"]
[//]: # (No rules defined for Output)
This allows the data sent to the channel to be stored as a time series, which can be useful in applications such as collecting time-based sensor data.
### Transmission of Sensor Data over Channels
As demonstrated earlier, a nice feature of Channels is that data can be sent to the Wolfram Language over HTTP - instead of fiddling with JLink and sockets (which tend to be laggy and break easily), one can simply create a web page that streams sensor data to a channel.
For Android devices (running Google Chrome), there exist a range of built-in sensor APIs giving a web page access to raw accelerometer, gyroscope, light sensor and magnetometer data, and processed linear acceleration (i.e. total acceleration experienced by a device disregarding that produced by gravity), absolute orientation and relative orientation sensors. Documentation for these sensors exists online at https://developers.google.com/web/updates/2017/09/sensors-for-the-web.
Unfortunately, for iOS devices there does not exist an easy way to access sensors from the web - although one can use DeviceMotion events (https://developers.google.com/web/fundamentals/native-hardware/device-orientation/), the data these give can vary significantly from browser to browser (e.g. different browsers might use different coordinate systems), so training a machine learning model on gesture data produced by this method would require either retraining a model for each browser or significant processing of data based on browser.
However, there is another solution - namely, the gyronorm.js API (https://github.com/dorukeker/gyronorm.js), which claims to return 'consistent [gyroscope and accelerometer] values across different devices'. Using this, we construct a simple web page to transmit accelerometer data to a Wolfram Language channel called 'Sensors': (While the following code focuses on extracting accelerometer data, it is a trivial task to change the sensor being polled to read, for instance, gyroscope data in the Wolfram Language instead.)
<!DOCTYPE html>
<html lang=en>
<meta charset=UTF-8>
<title>Sensors</title>
<script src=https://cdn.jsdelivr.net/npm/gyronorm@2.0.6/dist/gyronorm.complete.min.js></script>
<script>
function makeXHR(x,y,z){
var t=Date.now();
r=new XMLHttpRequest;
r.withCredentials=true;
var i="https://channelbroker.wolframcloud.com/users/euan.l.y.ong@gmail.com/Sensors?operation=send&time="+t.toString()+"&x="+x.toString()+"&y="+y.toString()+"&z="+z.toString();
r.open("GET",i,!0);
r.send()
}
function init(){
//Explanations are from the GyroNorm GitHub page. (https://github.com/dorukeker/gyronorm.js/)
var n={
frequency:100, //send values every 100 milliseconds
gravityNormalized:!0, // Whether or not to normalise gravity-related values
orientationBase:GyroNorm.WORLD, // ( Can be Gyronorm.GAME or GyroNorm.WORLD. gn.GAME returns orientation values with respect to the head direction of the device. gn.WORLD returns the orientation values with respect to the actual north direction of the world. )
decimalCount:2, // How many digits after the decimal point to return for each value
logger:null,
screenAdjusted:!1
};
t=new GyroNorm;
t.init(n).then(function(){
t.start(function(data){
makeXHR(data.dm.x,data.dm.y,data.dm.z)
//Other possible values to substitute for data.dm.x, data.dm.y, data.dm.z are:
// data.do.alpha ( deviceorientation event alpha value )
// data.do.beta ( deviceorientation event beta value )
// data.do.gamma ( deviceorientation event gamma value )
// data.do.absolute ( deviceorientation event absolute value )
// data.dm.x ( devicemotion event acceleration x value )
// data.dm.y ( devicemotion event acceleration y value )
// data.dm.z ( devicemotion event acceleration z value )
// data.dm.gx ( devicemotion event accelerationIncludingGravity x value )
// data.dm.gy ( devicemotion event accelerationIncludingGravity y value )
// data.dm.gz ( devicemotion event accelerationIncludingGravity z value )
// data.dm.alpha ( devicemotion event rotationRate alpha value )
// data.dm.beta ( devicemotion event rotationRate beta value )
// data.dm.gamma ( devicemotion event rotationRate gamma value )
})
})
}
window.onload=init;
</script>
</html>
This webpage, when opened on an Android or iOS phone, will stream data to the 'Sensors' channel, sending a new HTTP request every 100 milliseconds. (Decreasing the 'frequency' leads to more frequent results, but can cause atrocious levels of lag.)
### Producing the ChannelDeploySensorPage function
Although this webpage allows accelerometer data to be transmitted from a phone to a computer, for it to be used it must be deployed on a server. To change, for instance, the channel name, one would need to edit the file on the server itself, which can quickly become a tiresome process. Thus, we developed a function which autogenerates the required HTML code and stores it in the Wolfram Cloud as a CloudObject where it can easily be accessed. The function also outputs a QR code, to allow mobile users to quickly navigate to the web page. (The argument *func* is simply the function to be called whenever the channel receives a new point of data from the phone.)
## Alternative Sensors
ChannelDeploySensorPage functions for accessing sensors other than the accelerometer can be found below: (For more information about sensors and readings, check out https://developers.google.com/web/fundamentals/native-hardware/device-orientation/)
### Device Orientation (alpha, beta, gamma, absolute):
ChannelDeploySensorPageDeviceOrientation[func_]:=Module[{listener,listenerurl,SensorHTML,c,url,u},
CloudConnect[];
listener=ChannelListen["Sensors",func[#Message]&,Permissions->"Public"];
listenerurl = listener["URL"];
SensorHTML="<!DOCTYPE html><html lang=en><meta charset=UTF-8><title>Sensors</title><script src=https://cdn.jsdelivr.net/npm/gyronorm@2.0.6/dist/gyronorm.complete.min.js></script><script>function makeXHR(n,t,o,abs){var e=Date.now(),r=(Math.random(),new XMLHttpRequest);r.withCredentials=!0;var i=\""<>listenerurl<>"?operation=send&time=\"+e.toString()+\"&alpha=\"+n.toString()+\"&beta=\"+t.toString()+\"&gamma=\"+o.toString()+\"&absolute=\"+abs.toString();r.open(\"GET\",i,!0),r.send()}function init(){var n={frequency:100,gravityNormalized:!0,orientationBase:GyroNorm.WORLD,decimalCount:2,logger:null,screenAdjusted:!1},t=new GyroNorm;t.init(n).then(function(){t.start(function(n){makeXHR(n.do.alpha,n.do.beta,n.do.gamma,n.do.absolute)})})}window.onload=init</script>";
c = CloudExport[SensorHTML,"HTML",Permissions->"Public"];
u=URLShorten[c[[1]]];
Return[{u,BarcodeImage[u,"QR"],listener}]
]
### Device Motion - Acceleration Including Gravity (x, y, z):
ChannelDeploySensorPageAccelerationGravity[func_]:=Module[{listener,listenerurl,SensorHTML,c,url,u},
CloudConnect[];
listener=ChannelListen["Sensors",func[#Message]&,Permissions->"Public"];
listenerurl = listener["URL"];
SensorHTML="<!DOCTYPE html><html lang=en><meta charset=UTF-8><title>Sensors</title><script src=https://cdn.jsdelivr.net/npm/gyronorm@2.0.6/dist/gyronorm.complete.min.js></script><script>function makeXHR(n,t,o){var e=Date.now(),r=(Math.random(),new XMLHttpRequest);r.withCredentials=!0;var i=\""<>listenerurl<>"?operation=send&time=\"+e.toString()+\"&x=\"+n.toString()+\"&y=\"+t.toString()+\"&z=\"+o.toString();r.open(\"GET\",i,!0),r.send()}function init(){var n={frequency:100,gravityNormalized:!0,orientationBase:GyroNorm.WORLD,decimalCount:2,logger:null,screenAdjusted:!1},t=new GyroNorm;t.init(n).then(function(){t.start(function(n){makeXHR(n.dm.gx,n.dm.gy,n.dm.gz)})})}window.onload=init</script>";
c = CloudExport[SensorHTML,"HTML",Permissions->"Public"];
u=URLShorten[c[[1]]];
Return[{u,BarcodeImage[u,"QR"],listener}]
]
### Device Motion - Rotation Rate (alpha, beta, gamma):
ChannelDeploySensorPageRotationRate[func_]:=Module[{listener,listenerurl,SensorHTML,c,url,u},
CloudConnect[];
listener=ChannelListen["Sensors",func[#Message]&,Permissions->"Public"];
listenerurl = listener["URL"];
SensorHTML="<!DOCTYPE html><html lang=en><meta charset=UTF-8><title>Sensors</title><script src=https://cdn.jsdelivr.net/npm/gyronorm@2.0.6/dist/gyronorm.complete.min.js></script><script>function makeXHR(n,t,o){var e=Date.now(),r=(Math.random(),new XMLHttpRequest);r.withCredentials=!0;var i=\""<>listenerurl<>"?operation=send&time=\"+e.toString()+\"&x=\"+n.toString()+\"&y=\"+t.toString()+\"&z=\"+o.toString();r.open(\"GET\",i,!0),r.send()}function init(){var n={frequency:100,gravityNormalized:!0,orientationBase:GyroNorm.WORLD,decimalCount:2,logger:null,screenAdjusted:!1},t=new GyroNorm;t.init(n).then(function(){t.start(function(n){makeXHR(n.dm.alpha,n.dm.beta,n.dm.gamma)})})}window.onload=init</script>";
c = CloudExport[SensorHTML,"HTML",Permissions->"Public"];
u=URLShorten[c[[1]]];
Return[{u,BarcodeImage[u,"QR"],listener}]
]
# Applications
Real world applications of this sensor data abound - aside from the gesture recognition system described in post 2 (http://community.wolfram.com/groups/-/m/t/1386392), you could use this sort of data to make a pocket seismometer, a fall detector, electronic dice, investigate centrifugal motion, investigate friction... More examples available here: http://www.gcdataconcepts.com/examples.html. If you make a sensor-based project in Wolfram, or think of a new / innovative / interesting way to use this data, or if the code above is buggy / incomplete, please do share it in the comments below!
A Wolfram Notebook version of this post is attached.
-- Euan Ong
# References
"Using Connected Devices": http://reference.wolfram.com/language/guide/UsingConnectedDevices.html
"Using your smart phone as the ultimate sensor array for Mathematica": http://community.wolfram.com/groups/-/m/t/344278
"Capturing Data from an Android Phone using Wolfram Data Drop": http://community.wolfram.com/groups/-/m/t/461190
"Sensorstream IMU+GPS": https://play.google.com/store/apps/details?id=de.lorenz_fenster.sensorstreamgps
"Sensors For The Web!": https://developers.google.com/web/updates/2017/09/sensors-for-the-web
"gyronorm.js": https://github.com/dorukeker/gyronorm.js
[1]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ezgif-3-207848dda6.gif&userId=1371970
[2]: http://community.wolfram.com//c/portal/getImageAttachment?filename=60691.png&userId=1371970
[3]: http://community.wolfram.com//c/portal/getImageAttachment?filename=42762.png&userId=1371970
[4]: http://community.wolfram.com//c/portal/getImageAttachment?filename=37343.png&userId=1371970Euan Ong2018-07-17T20:01:00Z[WSC18] Implementation of Common Axiom Systems and Proof Generation
http://community.wolfram.com/groups/-/m/t/1382544
# Exploration of Fundamental Mathematics, via Implementation of Common Axiom Systems and Proof Generation
Utilizing the function FindEquationalProof introduced in version 11.3 of Mathematica, I implemented and proved common algebraic and logical conjectures as well as hypothetical, but plausible combinations of axioms and conjectures, as introduced in Chapter 12, Section 9 of Stephen Wolfram's book, *A New Kind of Science*. **The Full Computational Essay is attached at the bottom.**
This post is divided into the following 4 sections:
1. **Implementation of Modern Axiom Systems**
2. **Proof Generation**
3. **Alternative Axiom Systems**
4. **Implications**
## Implementation of Modern Axiom Systems
### Axioms, Conjectures, Theorems and Proofs
Most common math curriculums prioritize the introduction of theorems, but not many take the time to properly prove each of them. A mathematical proof is a logical argument for the validity of a particular statement. A statement without proof is called a conjecture. A statement that has been proved is a theorem.
Such mathematical system of proofs build upon each other. A proof might require another theorem, which might require another theorem for its proof. When we continue this trace of proofs and theorems, we reach a point in which a statement is so obviously true that it needs no proof. This is the informally defined axiom.
An example of an axiom may be `a + b == b + a`, or `Not[True] == False`. Despite the simplicity of these statements, without a consistent set of enough axioms, it is impossible to build up a viable system of mathematics.
A more complete definition, as well as the history of the investigation of proofs, can be found in the following Wikipedia article:
[Wikipedia - Proof Theory](https://en.wikipedia.org/wiki/Proof_theory)
### The FindEquationalProof Function
This newly-implemented function, though rather lacking in practicality, is conceptually an interesting product, as it introduces a method of computationally generating human-readable (but possibly lengthy) proofs. It is also interesting in that a system of axioms can be specified, rather than implied, which allows for the possibility of limiting the range of axioms a proof can use, or providing a completely new system of axioms, separate from that of our mathematics.
The function, in essence, generates such proofs by simple substitution—it replaces a part of an expression by the rules of the axiom, or lemmas (also generated via the same method). The following diagram excerpted from Wolfram's book visualizes the algorithm:
![nks-extract][1]
[1]
Axioms listed in the bottom left are used in the proofs of theorems. The validity of a particular statement is therefore proved by investigating if a certain set of substitutions to one expression can be transformed into another.
This process is equivalent to the proofs of conjectures of mathematics. Every proof is merely a series of substitution of parts of statements with axioms (or lemmas derived from axioms). Given enough computational power, and more importantly, a required set of consistent axioms, a theorem can be proved.
### Providing definitions and axioms
The only statements the FindEquationalProof function can understand, according to the documentation, are:
- lhs==rhs — equational logic
- ForAll[vars,lhs==rhs] — universal quantifiers over equational logic identities
Implementing even the simplest axioms were, therefore, a notoriously time-consuming task.
```
definitions = {
ForAll[{a, b}, ex[a, b] == not[ForAll[{a}, not[b]]]],
ForAll[{a, b}, im[a, b] == or[not[a], b]],
ForAll[{a}, ueq[a, a] == not[eq[a, a]]],
ForAll[{a, b}, eqv[a, b] == and[im[a, b], im[b, a]]],
ForAll[{a, b}, nand[a, b] == not[and[a, b]]],
ForAll[{a, b}, xor[a, b] == and[or[a, not[b]], or[not[a], b]]],
ForAll[{a, b}, or[or[a, b], not[b]] == "T"],
not["T"] == "F",
ForAll[{a}, eq[a, a] == "T"],
ForAll[{a, b}, im[a, b] == or[not[a], b]]
};
```
```
booleanLogic = {
ForAll[{a, b}, and[a, b] == and[b, a]],
ForAll[{a, b}, or[a, b] == or[b, a]],
ForAll[{a, b}, and[a, or[b, not[b]]] == a],
ForAll[{a, b}, or[a, and[b, not[b]]] == a],
ForAll[{a, b, c}, and[a, or[b, c]] == or[and[a, b], and[a, c]]],
ForAll[{a, b}, or[a, and[b, c]] == and[or[a, b], or[a, c]]]
};
```
Simple Boolean algebra axioms are specified, providing the system to compute `And`, `Or`, and `Not` operations. Definitions not provided, like `Exists`, `Implies`, !=, or <-> must be defined using only these logical operators. Notable implementations include:
```
ForAll[{a,b},ex[a,b]==b]] (*Definition of Existance*)
ForAll[{a, b}, im[a, b] == or[not[a],b] (*Definition of Implication*)
ForAll[{a, b}, or[ or[a,b], not[b] ] == "T"] (*Definition of Truth*)
ForAll[{a, b}, eq[a, b] == "T"] (*Functional Definition of Equality*)
```
More complicated operations such as NAND or XOR are defined as combinations of these operations.
Using these definitions and axioms, we can prove simple logical theorems, such as De Morgan's law, or the Modus Ponens. The output Proof Graphs show the Axioms in Green, and Lemmas in Red and Orange, revealing the order and steps of the proof:
```
modusP = ForAll[{p, q}, im[and[im[p, q], p], q] == "T"];
deMorgan = ForAll[{a, b}, not[and[a, b]] == or[not[a], not[b]]];
axioms = Join[definitions, booleanLogic];
FindEquationalProof[modusP, axioms]["ProofGraph"]
FindEquationalProof[deMorgan, axioms]["ProofGraph"]
```
![modusP][2]
![demorgan][3]
Note the fact that even such simple proofs can take over 200 steps to prove from the systems of axioms. More complicated expressions, such as the Wolfram Axiom, can be shown to follow from these axioms:
```
wolframLogic =
ForAll[{a, b, c},
nand[nand[nand[a, b], c], nand[a, nand[nand[a, c], a]]] == c];
FindEquationalProof[wolframLogic, axioms]["ProofGraph"]
```
![wolframLogic][4]
Another notable fact is that despite the length of the Wolfram Axiom compared to De Morgan's law, it takes a similar number of steps to reach the proof. A possible justification may be related to the Principle of Computational Equivalence.
### Implementing Arithmetic
The first attempted implementation of arithmetic was using a modified version of Peano's axioms. His system is simple to implement, but is slightly harder to express. 0 is defined, and all natural numbers are defined using a successor function s[x], which is equivalent to adding 1 to x, e.g. 1==s[0],2==s[s[0]], and so on. Addition is defined as a recursive function of s, and multiplication is defined as a recursive function of addition. Note that the distributive and commutative properties of addition and multiplication is not defined originally in Peano's axioms.
```
arithmetic = {
(*Addition*)
ForAll[{x, y}, add[x, s[y]] == s[add[x, y]]],
ForAll[{x}, add[0, x] == x],
ForAll[{x, y}, add[x, y] == add[y, x]],
ForAll[{x, y, z}, add[x, add[y, z]] == add[add[x, y], z]],
(*Multiplication*)
ForAll[{x, y}, times[x, s[y]] == add[times[x, y], x]],
ForAll[{x}, times[0, x] == 0],
ForAll[{x}, times[x, s[0]] == x],
ForAll[{x, y}, times[x, y] == times[y, x]],
ForAll[{x, y}, times[x, times[y, z]] == times[times[x, y], z]],
ForAll[{x, y, z},
add[times[x, z], times[y, z]] == times[add[x, y], z]]
};
```
With these in hand, many simple algebraic properties can be proved:
1+2x+x*x == (1+x)(1+x)
```
distribution =
ForAll[{x},
add[s[0], add[times[s[s[0]], x], times[x, x]]] ==
times[add[s[0], x], add[s[0], x]]];
FindEquationalProof[distribution, arithmetic]["ProofGraph"]
```
![distribution][5]
We can further extend this system by defining powers.
```
powers = {
ForAll[{x}, pow[x, s[0]] == x],
ForAll[{x}, pow[x, 0] == 1],(*Note: 0^0 is undef*)
ForAll[{x, y}, pow[x, s[y]] == times[pow[x, y], x]],
ForAll[{x, y, z}, pow[pow[x, y], z] == pow[x, times[y, z]]
]};
arithmetic = Join[arithmetic, powers];
```
However, if we introduce negative numbers, the system generates an impossible proof:
```
negatives = {
ForAll[{x}, add[x, neg[x]] == 0],
ForAll[{x}, y, sub[x, y] == add[x, neg[y]]]
};
arithmetic = Join[arithmetic, negatives];
zeroEqualsOne = FindEquationalProof[0 == s[0], arithmetic]
zeroEqualsOne["ProofNotebook"];
(*Remove semicolon and evaluate to view proof notebook*)
```
Inspection of the Proof Notebook reveals that the problem is due to the undefined power of 0^0. As x^0 is defined as 1, but 0^x is defined as 0, the inevitable conclusion is that 0 is equal to 1.
This problem showcases the importance of defining a sturdy set of axioms. Although this problem can be eliminated by the use of the "\[Implies]" operator defined in Boolean logic, it would be more wise to follow the steps of previous mathematicians, rather than to continue building a new set of axioms, as unpredictable inconsistencies and contradictions may occur.
### Real Algebra and Tarski Axioms
To implement real algebra, a more robust set of axioms need to be defined. Tarski's axioms of real arithmetic are what many consider to be the basis of current algebra:
![tarski][6]
[1]
Before we do so, however, axioms from predicate logic, such as implication or equivalence, are also required.
```
predicateLogic = {
im[ForAll[{a}, im[b, c]], im[ForAll[{a}, b], ForAll[{b}, c]]] ==
"T",
im[fq[a, b], im[a, ForAll[{b}, a]]] == "T",
im[fq[b, a], ex[a, eq[a, b]]] == "T"
};
tarskiAxioms = {
ForAll[{x, y, z}, add[x, add[y, z]] == add[add[x, y], z]],
ForAll[{a}, add[a, 0] == a],
ForAll[{a}, add[a, neg[a]] == 0],
ForAll[{a, b}, add[a, b] == add[b, a]],
ForAll[{x, y, z}, times[x, times[y, z]] == times[times[x, y], z]],
ForAll[{x, y, z},
add[times[x, z], times[y, z]] == times[add[x, y], z]],
ForAll[{x, y}, times[x, y] == times[y, x]],
ForAll[{a}, times[a, 1] == a],
ForAll[{a}, im[ueq[a, 0], eq[times[a, rec[a]], 1]] == "T"],
ForAll[{a, b, c}, im[and[gt[a, b], gr[b, c]], gr[a, c]] == "T"],
ForAll[{a, b}, im[gr[a, b], ueq[a, b]] == "T"],
ForAll[{a, b}, or[gr[a, b], a] == or[b, gr[b, a]]],
ForAll[{a, b, c}, im[gr[a, b], gr[add[a, c], add[b, c]]] == "T"],
ForAll[{a, b, c},
im[and[gr[a, b], gr[c, 0]], gr[times[a, c], times[b, c]]] ==
"T"],
gr[1, 0] == "T"
};
axioms = Join[definitions, booleanLogic, predicateLogic,
tarskiAxioms];
```
With these, we can compute with the set of all real numbers. Proving again the distribution law, and investigating the Proof Notebook, reveals the use of reciprocals, negative numbers, and even fractions in the proof.
```
distribution =
ForAll[{a, b, x, y},
times[add[x, y], add[x, b]] ==
add[add[times[x, x], times[b, x]], add[times[x, y], times[b, y]]]];
FindEquationalProof[distribution, axioms]["ProofNotebook"];
(*remove semicolon and evaluate to inspect ProofNotebook*)
```
It is surreal to see that the whole of algebra can be defined in these few axioms. Stephen Wolfram, in his book, remarks on this fact as well, stating after the two-page list of mathematical axioms that "It is from these axiom systems [...] that most of the millions of theorems in the literature of mathematics have ultimately been derived."
## Proof Generation
### Elimination of Axioms
A natural extension of such an investigation would be to computationally determine how many axioms could be eliminated from a particular proof. A grid would be ideal to view such data, therefore a wrapper function was written in order to automate this process.
generateProof accepts a list of theorems and axioms, and returns a 2-D array of its proofObjects. generateGrid accepts a list of theorems and axioms, as well as a few options about its display, and outputs a grid, revealing the number of steps of each proof.
```
generateProof[thms_?ListQ, axms_?ListQ, perTimeConstraint_?IntegerQ] :=
Module[{},
Table[Table[
FindEquationalProof[ForAll[{a, b, c}, thm], axm,
TimeConstraint -> perTimeConstraint], {thm, thms}], {axm, axms}]]
```
```
generateGrid[thms_?ListQ, axms_?ListQ, AxiomLength_?IntegerQ,
TimeConstraint_?IntegerQ, scheme_?StringQ, SimpleLabels_?BooleanQ] :=
Module[{proofs, steps, labels, viewThems, viewSteps, colorRules},
proofs = generateProof[thms, axms, TimeConstraint];
steps =
Table[StringCount[ToString@#["ProofFunction"], ";"] & /@
proofs[[n]], {n, Length[proofs]}];
labels = (Rest@Subsets@Range[AxiomLength]);
viewThems = {"Theorems"}~Join~thms;
viewSteps =
Table[{If[SimpleLabels, labels[[n]], axms[[n]]]}~Join~
steps[[n]], {n, Length[proofs]}];
{Rotate[#, \[Pi]/2, Baseline, {1, 1}] & /@ viewThems}~Join~steps;
colorRules = {None, None}~
Join~{Flatten@
Table[Table[{y, x} ->
ColorData[scheme,
Rescale[steps[[y - 1]][[x - 1]], {0,
Max@Flatten@steps}]], {x, 2, Length[steps[[1]]] + 1}], {y,
2, Length[steps] + 1}]};
Grid[{Rotate[#, \[Pi]/2] & /@ viewThems}~Join~viewSteps,
ItemSize -> {{Automatic, Table[1, Length[steps] - 1]}, {Automatic,
Table[1, Length[axms] - 1]}}, Spacings -> Automatic,
Frame -> All, Alignment -> Left, Background -> colorRules]]
```
For example, a set of randomly-generated axioms proving a set of equally randomly-generated theorems might look like the following grid—the top row are the theorems, and the leftmost column are the axioms. Each square is numbered and colored according to the length of its proof:
![exampleGrid][7]
In the our case, the intent is to investigate how many axioms from Boolean logic we can remove to prove particular theorems. Therefore, we can provide the generateGrid function a list of subsets of the Boolean logic. Each subset still requires definitions for things such as implication or equals, so it is prepended to each list.
```
subsets = Rest@Subsets@booleanLogic;
Short[subsets, 5]
```
`Out:`
$$\left\{\left\{\forall _{\{a,b\}}\text{and}(a,b)=\text{and}(b,a)\right\},\left\{\forall _{\{a,b\}}\text{or}(a,b)=\text{or}(b,a)\right\},\langle\langle 60\rangle\rangle ,\left\{\forall _{\{a,b\}}\text{and}(a,b)=\text{and}(b,a),\forall _{\{a,b\}}\text{or}(a,b)=\text{or}(b,a),\forall _{\{a,b\}}\text{and}(a,\text{or}(b,\text{not}(b)))=a,\forall _{\{a,b\}}\text{or}(a,\text{and}(b,\text{not}(b)))=a,\forall _{\{a,b,c\}}\text{and}(a,\text{or}(b,c))=\text{or}(\text{and}(a,b),\text{and}(a,c)),\forall _{\{a,b,c\}}\text{or}(a,\text{and}(b,c))=\text{and}(\text{or}(a,b),\text{or}(a,c))\right\}\right\}$$
```
subsetAxioms = Join[definitions, #] & /@ (Rest@Subsets@booleanLogic);
Short[subsetAxioms, 6]
```
$$\left\{\left\{\forall _{\{a,b\}}\text{ex}(a,b)=\text{not}(\text{not}(b)),\forall _{\{a,b\}}\text{im}(a,b)=\text{or}(\text{not}(a),b),\forall _a\text{ueq}(a,a)=\text{not}(\text{eq}(a,a)),\forall _{\{a,b\}}\text{eqv}(a,b)=\text{and}(\text{im}(a,b),\text{im}(b,a)),\forall _{\{a,b\}}\text{nand}(a,b)=\text{not}(\text{and}(a,b)),\forall _{\{a,b\}}\text{xor}(a,b)=\text{and}(\text{or}(a,\text{not}(b)),\text{or}(\text{not}(a),b)),\forall _{\{a,b\}}\text{or}(\text{or}(a,b),\text{not}(b))=\text{T},\text{not}(\text{T})=\text{F},\forall _a\text{eq}(a,a)=\text{T},\forall _{\{a,b\}}\text{im}(a,b)=\text{or}(\text{not}(a),b),\forall _{\{a,b\}}\text{and}(a,b)=\text{and}(b,a)\right\},\langle\langle 61\rangle\rangle ,\left\{\forall _{\{a,b\}}\text{ex}(a,b)=\text{not}(\text{not}(b)),\langle\langle 14\rangle\rangle ,\forall _{\{a,b,c\}}\text{or}(a,\text{and}(b,c))=\text{and}(\text{or}(a,b),\text{or}(a,c))\right\}\right\}$$
After such formatting, the list subsetAxioms contains every combination of axioms, each with the list of definitions. As displaying such long expressions on the grid is infeasible, we can number each axiom of Boolean logic, and only show its index on the grid. In the following example we investigate the provability and the number of steps required for proving De Morgan's theorem, using subsets of Boolean logic. The output grid is edited for easy viewing.
```
booleanLogic = {
(*1*) ForAll[{a, b}, and[a, b] == and[b, a]],
(*2*) ForAll[{a, b}, or[a, b] == or[b, a]],
(*3*) ForAll[{a, b}, and[a, or[b, not[b]]] == a],
(*4*) ForAll[{a, b}, or[a, and[b, not[b]]] == a],
(*5*) ForAll[{a, b, c}, and[a, or[b, c]] == or[and[a, b], and[a, c]]],
(*6*) ForAll[{a, b}, or[a, and[b, c]] == and[or[a, b], or[a, c]]]
};
generateGrid[{deMorgan}, subsetAxioms, 6, 10, "Rainbow", True]
```
![gridA][8]
### Generation of Random Boolean Expressions and Proofs
In order to see what proofs are required for particular theorems, we need to be able generate these theorems repetitively. The first function, replacer, translates the Wolfram Language's native Heads (e.g. Or, And, Nand, True), with ones defined within the axiom set (e.g. or, and, nand, "T"). The second function, randomBoolExp, generates a single logical statement in the axiom set's language.
```
replacer[expr_, reverse_?BooleanQ] :=
Module[{dictionary, reverseDictionary},
dictionary = {"and" -> "And", "or" -> "Or", "not" -> "Not",
"ex" -> "Exists", "im" -> "Implies", "ueq" -> "Unequal",
"eq" -> "Equal", "eqv" -> "Equivalent", "nand" -> "Nand",
"T" -> "True", "F" -> "False", "x" -> "X", "add" -> "Plus",
"times" -> "Times"};
Needs["GeneralUtilities`"];
reverseDictionary =
Association[dictionary] // AssociationInvert // Normal;
ToExpression@
StringReplace[ToString@expr,
If[reverse, reverseDictionary, dictionary]]]
```
```
randomBoolExp[numb_?IntegerQ, form_?StringQ] :=
Module[{tripleDictionary, replBoolExp, shortBoolExp},
tripleDictionary = {or[x_, y_, z_] -> or[x, or[y, z]],
and[x_, y_, z_] -> and[x, and[y, z]],
nand[x_, y_, z_] -> not[and[x, and[y, z]]],
xor[x_, y_, z_] -> and[and[a, b, c], not[and[a, b, c]]]};
replBoolExp =
replacer[
FullForm@(randBoolExp =
BooleanFunction[RandomInteger[1, 2^numb],
ToExpression /@ Alphabet[][[;; numb]]]), True] //.
tripleDictionary;
shortBoolExp =
replacer[FullForm@BooleanMinimize[randBoolExp, form], True] //.
tripleDictionary;
replBoolExp == shortBoolExp]
```
```
randomBoolExp[3, "NAND"](*Repeatedly evaluate for different output*)
```
`Out:`
```
or[and[a,and[not[b],c]],and[not[a],and[not[b],not[c]]]]==nand[not[and[a,and[not[b],c]]],not[and[not[a],and[not[b],not[c]]]]]
```
The function generates the expression via a random integer generator which is used as a truth table for the built-in BooleanFunction function. Each variable in the expression is an alphabet, and the number of variables are adjusted with the variable numb, which defaults to {a,b,c}. The expression is then simplified using a specific form, e.g. "NAND", "ANF", etc., and equated with the original equation. Finally, the expression is translated into the axiom's language set.
```
expressions = {ForAll[{a, b, c},
or[and[not[a], not[b]], or[and[not[a], c], and[not[b], not[c]]]] ==
or[and[not[a], c], and[not[b], not[c]]]],
ForAll[{a, b, c},
or[and[a, not[b]], or[and[not[a], b], and[b, not[c]]]] ==
or[and[a, not[b]], or[and[a, not[c]], and[not[a], b]]]],
ForAll[{a, b, c},
or[and[a, b], or[and[not[a], not[b]], and[b, c]]] ==
or[and[a, b], or[and[not[a], not[b]], and[not[a], c]]]],
ForAll[{a, b, c},
or[and[not[a], not[b]], or[and[b, c], and[not[b], not[c]]]] ==
or[and[not[a], c], or[and[b, c], and[not[b], not[c]]]]]
};
expressions = {ForAll[{a, b, c},
or[and[not[a], not[b]], or[and[not[a], c], and[not[b], not[c]]]] ==
or[and[not[a], c], and[not[b], not[c]]]],
ForAll[{a, b, c},
or[and[a, not[b]], or[and[not[a], b], and[b, not[c]]]] ==
or[and[a, not[b]], or[and[a, not[c]], and[not[a], b]]]],
ForAll[{a, b, c},
or[and[a, b], or[and[not[a], not[b]], and[b, c]]] ==
or[and[a, b], or[and[not[a], not[b]], and[not[a], c]]]],
ForAll[{a, b, c},
or[and[not[a], not[b]], or[and[b, c], and[not[b], not[c]]]] ==
or[and[not[a], c], or[and[b, c], and[not[b], not[c]]]]]
};
defAxioms = Join[definitions, #] & /@ (Rest@Subsets@booleanLogic);
grid = generateGrid[expressions, defAxioms, 6, 20, "Rainbow", True]
```
![gridB][9]
The empty columns for the 7th theorem is believed to be an issue of computing; limited computed power necessitated the time constraint of 20 seconds per proof.
An interesting fact to denote is that most of the generated expressions were provable without the full set of axioms. One would normally expect only a complete axiom system to be able to prove or falsify a statement. Despite the simplicity of these theorems compared to the complex mathematical problems researched in the real world, we can see that it is very plausible to generate proofs without a complete axiom system. It is actually the case that there can be no complete set of axioms, as an arbitrary number of axioms with arbitrary operators can be generated (as explored in the following section). It would also be possible to draw a relation with Gödel's Incompleteness theorem, which suggests that no consistent, complete set of axioms exist that can prove all possible truths describable within the system. A civilization which only considers the {1,2,3,5,6} without the 4th axiom, might be able to express the 3rd theorem, but would not be able to prove it without expanding its axiom system.
## Alternative Axiom Systems
### A new operator
Our common sense tells us that our system of axioms, i.e., logic, is the only possible system of axioms that exists. However, this is not necessarily the case. An expression can be created with an arbitrary operator, and those expressions can be tried as axioms or theorems, despite not being representable with our system.
Let us imagine a new operator, "$\cdot$", the CenterDot. We can then introduce variables, a and b, to build up expressions like the following:
$$((b\cdot (a\cdot a))\cdot a)\cdot b=a$$
This is impossible to make sense of within our system of logic, but still is a valid expression that describes an equation. Whether it is understandable or not is an unrequited issue; a more interesting question would be if these systems are viable, which we will explore by using them to prove other theorems.
The following list defines some of these arbitrary equalities that constitute as valid expressions:
$$\text{imaginaryAxioms}=\{\\a=a,b=a,a\cdot a=a,a\cdot b=a,\\b\cdot a=a,b\cdot b=a,a\cdot b=b\cdot a,a\cdot (a\cdot a)=a,\\(a\cdot a)\cdot a=a,a\cdot (a\cdot b)=a,a\cdot (b\cdot b)=a,b\cdot (a\cdot a)=a,\\b\cdot (a\cdot b)=a,b\cdot (b\cdot a)=a,(a\cdot a)\cdot b=a,(a\cdot b)\cdot a=a,\\(a\cdot b)\cdot b=a,(b\cdot a)\cdot a=a,(b\cdot a)\cdot b=a,(b\cdot b)\cdot a=a,\\b\cdot (b\cdot b)=a,(b\cdot b)\cdot b=a,(a\cdot a)\cdot (a\cdot a)=a,a\cdot ((a\cdot a)\cdot b)=a,\\(a\cdot a)\cdot (a\cdot b)=a,(a\cdot b)\cdot (b\cdot c)=a,(a\cdot b)\cdot c=a\cdot (b\cdot c),((b\cdot b)\cdot a)\cdot (a\cdot b)=a,\\((b\cdot (a\cdot a))\cdot a)\cdot b=a,b\cdot (c\cdot (a\cdot (b\cdot c)))=a,(a\cdot b)\cdot (a\cdot (b\cdot c))=a,(((b\cdot a)\cdot c)\cdot a)\cdot (a\cdot c)=a,(((b\cdot c)\cdot d)\cdot a)\cdot (a\cdot d)=a,\\\{(a\cdot b)\cdot a=a,a\cdot a=b\cdot b\},\{(a\cdot a)\cdot (a\cdot a)=a,a\cdot b=b\cdot a\},\\(b\cdot (b\cdot (a\cdot a)))\cdot (a\cdot (b\cdot c))=a\};$$
Similarly, as we did in the previous section with Boolean algebra, we can generate theorems to evaluate what these axioms can prove. The task is done with the following function randThms, and its related functions:
[2]
```
canonicalize[list_] :=
DeleteDuplicates[
DeleteDuplicates[Sort /@ list, #1 === (#2 /. {1 -> 0, 0 -> 1}) &]]
canonicalize[list_, vars_] :=
DeleteDuplicates[
DeleteDuplicates[Sort /@ list,
Function[{x, y},
MatchQ[x,
Alternatives @@ ((y /. (Rule @@@
Partition[Append[#, First[#]], 2, 1])) & /@
Permutations[vars])]]]]
revariable[expr_] := (expr /. {0 -> a, 1 -> b, 2 -> c})
```
```
randThms[length_?IntegerQ] :=
Module[{thms, axms, newthms, newaxms, ns},
thms = Cases[
Apply[Equal, #] & /@
Flatten[Groupings[#, CenterDot -> 2] & /@
Table[Rest[IntegerDigits[ns, 2]], {ns, length}]], _Equal];
newthms =
Select[revariable[canonicalize[thms]],
TautologyQ[Equivalent @@ (# /. CenterDot -> Nand)] &]
]
```
For example,
```
randThms[50] // Column
```
`Out:`
$$\begin{array}{l}
a\cdot b=b\cdot a \\
a=(a\cdot a)\cdot (a\cdot a) \\
a=(a\cdot a)\cdot (a\cdot b) \\
a=(a\cdot a)\cdot (b\cdot a) \\
a=(a\cdot b)\cdot (a\cdot a) \\
a=(b\cdot a)\cdot (a\cdot a) \\
\end{array}$$
### Arbitrary Proofs
Using the above mentioned proofGrid function, we can visualize which axioms yield which theorems. As in the previous run, the axioms and theorems are numbered for visual clarity. Again, despite the fact that it is always impossible to determine if a certain theorem holds, the simplicity of the ones used allow for an acceptable level of confidence.
$$\text{imaginaryTheorems}=\{a\cdot b=b\cdot a,a=(a\cdot a)\cdot (a\cdot a),a=(a\cdot a)\cdot (a\cdot b),a=(a\cdot a)\cdot (b\cdot a),a=(a\cdot b)\cdot (a\cdot a),a=(b\cdot a)\cdot (a\cdot a),a=((a\cdot a)\cdot a)\cdot (a\cdot a),a=(a\cdot a)\cdot ((a\cdot a)\cdot a),a=(a\cdot (a\cdot a))\cdot (a\cdot a),a=(a\cdot a)\cdot (a\cdot (a\cdot a)),a\cdot a=((a\cdot a)\cdot a)\cdot a,a\cdot a=a\cdot ((a\cdot a)\cdot a),a\cdot a=(a\cdot (a\cdot a))\cdot a,a\cdot a=a\cdot (a\cdot (a\cdot a)),a\cdot (a\cdot a)=(a\cdot a)\cdot a,a=(a\cdot a)\cdot (a\cdot (a\cdot b)),a\cdot a=a\cdot ((a\cdot a)\cdot b),a=(a\cdot a)\cdot ((a\cdot b)\cdot a),a=(a\cdot a)\cdot (a\cdot (b\cdot a)),a\cdot a=((a\cdot a)\cdot b)\cdot a,a=(a\cdot a)\cdot (a\cdot (b\cdot b)),a\cdot a=a\cdot ((a\cdot b)\cdot b),a=(a\cdot a)\cdot ((b\cdot a)\cdot a),a=(a\cdot (a\cdot b))\cdot (a\cdot a),a\cdot a=a\cdot (b\cdot (a\cdot a)),a=(a\cdot (a\cdot b))\cdot (a\cdot b),a\cdot a=a\cdot ((b\cdot a)\cdot b),a\cdot b=a\cdot (a\cdot (a\cdot b)),a\cdot a=a\cdot (b\cdot (a\cdot b)),a=(a\cdot a)\cdot ((b\cdot b)\cdot a),a=(a\cdot (a\cdot b))\cdot (b\cdot a),a\cdot a=((a\cdot b)\cdot b)\cdot a,a\cdot (a\cdot (a\cdot b))=b\cdot a,a\cdot a=a\cdot (b\cdot (b\cdot a)),b=((a\cdot a)\cdot a)\cdot (b\cdot b),a=(a\cdot a)\cdot ((b\cdot b)\cdot b),b=(a\cdot (a\cdot a))\cdot (b\cdot b),a=(a\cdot a)\cdot (b\cdot (b\cdot b)),b\cdot b=((a\cdot a)\cdot a)\cdot b,a\cdot a=a\cdot ((b\cdot b)\cdot b),b\cdot b=(a\cdot (a\cdot a))\cdot b,a\cdot a=a\cdot (b\cdot (b\cdot b)),(a\cdot a)\cdot a=(b\cdot b)\cdot b,b\cdot (b\cdot b)=(a\cdot a)\cdot a,a\cdot (a\cdot a)=b\cdot (b\cdot b),a=((a\cdot b)\cdot a)\cdot (a\cdot a),a=(a\cdot (b\cdot a))\cdot (a\cdot a),a\cdot a=(b\cdot (a\cdot a))\cdot a,a=((a\cdot b)\cdot a)\cdot (a\cdot b),a=(a\cdot (b\cdot a))\cdot (a\cdot b),a=(a\cdot b)\cdot (a\cdot (a\cdot b)),a\cdot b=a\cdot ((a\cdot b)\cdot a),a\cdot b=(a\cdot (a\cdot b))\cdot a,a\cdot b=a\cdot (a\cdot (b\cdot a)),a=((a\cdot b)\cdot a)\cdot (b\cdot a),a=(a\cdot b)\cdot ((a\cdot b)\cdot a),a=(a\cdot (b\cdot a))\cdot (b\cdot a),a=(a\cdot b)\cdot (a\cdot (b\cdot a)),a\cdot a=((b\cdot a)\cdot b)\cdot a,a\cdot ((a\cdot b)\cdot a)=b\cdot a,b\cdot a=(a\cdot (a\cdot b))\cdot a,a\cdot a=(b\cdot (a\cdot b))\cdot a,a\cdot (a\cdot (b\cdot a))=b\cdot a,a\cdot (a\cdot b)=(a\cdot b)\cdot a,a\cdot (a\cdot b)=a\cdot (b\cdot a),b=((a\cdot a)\cdot b)\cdot (a\cdot b),a=(a\cdot b)\cdot (a\cdot (b\cdot b)),(a\cdot a)\cdot b=(a\cdot b)\cdot b,a\cdot (a\cdot b)=a\cdot (b\cdot b),a=(a\cdot b)\cdot ((b\cdot a)\cdot a),a=(a\cdot (b\cdot b))\cdot (a\cdot a),a\cdot a=(b\cdot (b\cdot a))\cdot a,b\cdot (a\cdot a)=(a\cdot a)\cdot b,a\cdot (a\cdot b)=(b\cdot a)\cdot a,b=((a\cdot a)\cdot b)\cdot (b\cdot a),a=(a\cdot (b\cdot b))\cdot (a\cdot b),a\cdot b=((a\cdot a)\cdot b)\cdot b,a\cdot b=a\cdot (a\cdot (b\cdot b)),(a\cdot a)\cdot b=(b\cdot a)\cdot b,b\cdot (a\cdot b)=(a\cdot a)\cdot b,a=(a\cdot b)\cdot ((b\cdot b)\cdot a),a=(a\cdot (b\cdot b))\cdot (b\cdot a),b\cdot a=((a\cdot a)\cdot b)\cdot b,a\cdot (a\cdot (b\cdot b))=b\cdot a,b\cdot (b\cdot a)=(a\cdot a)\cdot b,b=((a\cdot a)\cdot b)\cdot (b\cdot b),a=((b\cdot a)\cdot a)\cdot (a\cdot a),a=((b\cdot a)\cdot a)\cdot (a\cdot b),a=(b\cdot a)\cdot (a\cdot (a\cdot b)),a\cdot b=((a\cdot b)\cdot a)\cdot a,a\cdot b=a\cdot ((b\cdot a)\cdot a),a\cdot b=(a\cdot (b\cdot a))\cdot a,a=((b\cdot a)\cdot a)\cdot (b\cdot a),a=(b\cdot a)\cdot ((a\cdot b)\cdot a),a=(b\cdot a)\cdot (a\cdot (b\cdot a)),b\cdot a=((a\cdot b)\cdot a)\cdot a,a\cdot ((b\cdot a)\cdot a)=b\cdot a,b\cdot a=(a\cdot (b\cdot a))\cdot a,a\cdot (b\cdot a)=(a\cdot b)\cdot a,b=(a\cdot b)\cdot ((a\cdot a)\cdot b),a=(b\cdot a)\cdot (a\cdot (b\cdot b)),a\cdot (b\cdot b)=(a\cdot b)\cdot a,a\cdot (b\cdot a)=a\cdot (b\cdot b),a=(b\cdot a)\cdot ((b\cdot a)\cdot a),(a\cdot b)\cdot a=(b\cdot a)\cdot a,a\cdot (b\cdot a)=(b\cdot a)\cdot a,a\cdot b=a\cdot ((b\cdot b)\cdot a),a\cdot b=(a\cdot (b\cdot b))\cdot a,(a\cdot b)\cdot a=(b\cdot b)\cdot a,a\cdot b=((a\cdot b)\cdot b)\cdot b,a\cdot b=((b\cdot a)\cdot a)\cdot a,a\cdot b=b\cdot ((a\cdot a)\cdot b),a\cdot b=(b\cdot (a\cdot a))\cdot b,a\cdot ((b\cdot b)\cdot a)=b\cdot a,a\cdot b=b\cdot ((a\cdot b)\cdot b),a\cdot b=b\cdot (b\cdot (a\cdot a)),b\cdot (a\cdot a)=(a\cdot b)\cdot b,a\cdot b=b\cdot ((b\cdot a)\cdot b),a\cdot b=b\cdot (b\cdot (a\cdot b)),(a\cdot b)\cdot b=(b\cdot a)\cdot b,a\cdot b=b\cdot (b\cdot (b\cdot a))\};$$
```
generateGrid[imaginaryTheorems, imaginaryAxioms, 6, 20, "Rainbow", \
True]
```
![gridD][10]
Note that axiom number 27, $(a\cdot b)\cdot c=a\cdot (b\cdot c)$, is a part of our axiom system (group theory). The significant factor here is that this axiom does not particularly stand out; the implication being that there is nothing special about our choice of the axiom system. Theoretically, if another axiom was chosen, and such a system was universal, it would not particularly be interesting or different. Rather, it would be plausible. The concept of computational universality implies these alternative systems are translatable to our system, but may possibly hold keys to solving problems that can be expressed, but cannot be solved with our current set of axioms.
## Implications
This project was neither a mathematically sound, nor a practical exploration. Rather, the results simply point to some simple, but fundamental implications about how our mathematics is conducted. Axioms are the foundations of math, and to an arguable extent, of our universe. Therefore, the exploration of the significance, or rather the insignificance of our choice of theorems, is an important step in augmenting our view of mathematics—not just a manipulation based on a certain system of logic, but the system of implication with a specific set of simple rules, that may differ from what we would expect.
### Footnote
[1] Wolfram, Stephen. A New Kind of Science. 2018.
[2] Partially developed with code written by Jonathan Gorard, Wolfram Research
[1]: http://community.wolfram.com//c/portal/getImageAttachment?filename=1-2-1.nks_multiway_diagram.png&userId=1371718
[2]: http://community.wolfram.com//c/portal/getImageAttachment?filename=1-3-1.modusP.png&userId=1371718
[3]: http://community.wolfram.com//c/portal/getImageAttachment?filename=1-3-2.deMorgan.png&userId=1371718
[4]: http://community.wolfram.com//c/portal/getImageAttachment?filename=1-3-3.wolframLogic.png&userId=1371718
[5]: http://community.wolfram.com//c/portal/getImageAttachment?filename=1-4-4.distribution.png&userId=1371718
[6]: http://community.wolfram.com//c/portal/getImageAttachment?filename=1-5-1.tarskiAxioms.png&userId=1371718
[7]: http://community.wolfram.com//c/portal/getImageAttachment?filename=2-1-1.exampleGrid.png&userId=1371718
[8]: http://community.wolfram.com//c/portal/getImageAttachment?filename=grid4.png&userId=1371718
[9]: http://community.wolfram.com//c/portal/getImageAttachment?filename=grid1.png&userId=1371718
[10]: http://community.wolfram.com//c/portal/getImageAttachment?filename=grid3.png&userId=1371718Pyokyeong Son2018-07-13T15:06:33Z[WSC18] Phone Gesture Recognition with Machine Learning
http://community.wolfram.com/groups/-/m/t/1386392
# Accelerometer-based Gesture Recognition with Machine Learning
![A GIF of gesture recognition in progress][1]
## Introduction
This is Part 2 of a 2-part community post - Part 1 (Streaming Live Phone Sensor Data to the Wolfram Language) is available here: http://community.wolfram.com/groups/-/m/t/1386358
As technology advances, we are constantly seeking new, more intuitive methods of interfacing with our devices and the digital world; one such method is gesture recognition. Although touchless human interface devices (kinetic user interfaces) exist and are in development, the cost and configuration required for these sometimes makes them impractical, particularly for mobile applications. A simpler method would be to use devices the user already has on their person - such as a phone or a smartwatch - to detect basic gestures, taking advantage of the wide array of sensors included in such devices. In an attempt to assess the feasibility of such a method, methods of asynchronous communication between a mobile device and the Wolfram Language are investigated, and a gesture recognition system based around an accelerometer sensor is implemented, using a machine learning model to classify a few simple gestures from mobile accelerometer data.
## Investigating Methods of Asynchronous Communication
To implement an accelerometer-based gesture recognition system, we must devise a suitable means for a mobile device to transmit accelerometer data to a computer running the Wolfram Language (WL). On a high level, the WL has baked-in support for a variety of devices - specifically the Raspberry Pi, Vernier Go!Link compatible sensors, Arduino microcontrollers, webcams and devices using the RS-232 or RS-422 serial protocol (http://reference.wolfram.com/language/guide/UsingConnectedDevices.html); unfortunately, there is no easy way to access sensor data from Android or iOS mobile devices.
On a low level, the WL natively supports TCP and ZMQ socket functionality, as well as receipt and transmission of HTTP requests and Pub-Sub channel communication. We investigate the feasibility of both methods for transmission of accelerometer data in Part 1 of this community post (http://community.wolfram.com/groups/-/m/t/1386358).
## Gesture Classification using Neural Networks
Now that we are able to stream accelerometer data to the WL, we may proceed to implement gesture recognition / classification. Due to limited time at camp, we used the UDP socket method to do this - in the future, we hope to move the system over to the (more user-friendly) channel interface.
We first configure the sensor stream, allowing live accelerometer data to be sent to the Wolfram Language:
### Configure the Sensor Stream
1. Install the "Sensorstream IMU+GPS" app ([https://play.google.com/store/apps/details?id=de.lorenz_fenster.sensorstreamgps][2])
2. Ensure the sensors you want to stream to Wolfram are ticked on the 'Toggle Sensors' page. (If you want to stream other sensors besides 'Accelerometer', 'Gyroscope' and 'Magnetic Field', ensure the 'Include User-Checked Sensor Data in Stream' box is ticked. Beware, though - the more sensors are ticked, the more latency the sensor stream will have.)
3. On the "Preferences" tab:
a. Change the target IP address in the app to the IP address of your computer (ensure your computer and phone are connected to the same local network)
b. Set the target port to 5555
c. Set the sensor update frequency to 'Fastest'
d. Select the 'UDP stream' radio box
e. Tick 'Run in background'
4. Switch stream ON **before** executing code. (nb. ensure your phone does not fall asleep during streaming - perhaps use the 'Caffeinate' app ([https://play.google.com/store/apps/details?id=xyz.omnicron.caffeinate&hl=en_US][3]) to ensure this.)
5. Execute the following WL code:
(in part from http://community.wolfram.com/groups/-/m/t/344278)
QuitJava[];
Needs["JLink`"];
InstallJava[];
udpSocket=JavaNew["java.net.DatagramSocket",5555];
readSocket[sock_,size_]:=JavaBlock@Block[{datagramPacket=JavaNew["java.net.DatagramPacket",Table[0,size],size]},sock@receive[datagramPacket];
datagramPacket@getData[];
listen[]:=record=DeleteCases[readSocket[udpSocket,1200],0]//FromCharacterCode//Sow;
results={};
RunScheduledTask[AppendTo[results,Quiet[Reap[listen[]]]];If[Length[results]>700,Drop[results,150]],0.01];
stream:=Refresh[ToExpression[StringSplit[#[[1]],","]]& /@ Select[results[[-500;;]],Head[#]==List&],UpdateInterval-> 0.01]
### Detecting Gestures
On a technical level, the problem of gesture classification is as follows: given a continuous stream of accelerometer data (or similar),
1. distinguish periods during which the user is performing a given gesture from other activities / noise and
2. identify / classify a particular gesture based on accelerometer data during that period. This essentially boils down to classification of a time series dataset, in which we can observe a series of emissions (accelerometer data) but not the states generating the emissions (gestures).
A relatively straightforward solution to (1) is to approximate the gradients of the moving averages of the x, y and z values of the data and take the Euclidean norm of these - whenever these increase above a threshold, a gesture has been made.
movingAvg[start_,end_,index_]:=Total[stream[[start;;end,index]]]/(end-start+1);
^ Takes the average of the x, y or z values of data (specified by *index* - x-->3, y-->4, z-->5) from the index *start* to the index *end*.
normAvg[start_,middle_,end_]:=((movingAvg[middle,end,3]-movingAvg[start,middle,3])/(middle-start))^2+((movingAvg[middle,end,4]-movingAvg[start,middle,4])/(middle-start))^2+((movingAvg[middle,end,5]-movingAvg[start,middle,5])/(middle-start))^2;
^ (Assuming difference from start to middle is equal to difference from middle to end:) Approximates the gradient at index *middle* using the average from *start* to *middle* and from *middle* to *end* for x, y and z values, and then takes the sum of the squares of these values. Note that we do not need to take the square root of the final answer (to find the Euclidean norm), as doing so and comparing it to some threshold *x* would be equivalent to not doing so and comparing it to the threshold *x^2* (square root is a computationally expensive operation).
Thus
Dynamic[normAvg[-155,-150,-146]]
will yield the square of the Euclidean norm of the gradients of the x, y and z values of the data (approximated by calculating the averages from the 155th most recent to 150th most recent and 150th most recent to 146th most recent values). As accelerometer data is sent to the Wolfram Language, this value will update.
### Data Collection
To train the network, we must collect gesture data. To do this, we have a variety of options - we can either represent the gesture as a tensor of 3 dimensional vectors (x,y,z accelerometer data points) and perform time series classification on these sequences of vectors using hidden Markov models or recurrent neural networks, or we can represent the gesture as a rasterised image of a graph much like the one below:
![Rasterised image of a gesture][4]
and perform image classification on the image of the graph.
Since the latter has had some degree of success (e.g. in http://community.wolfram.com/groups/-/m/t/1142260), we attempt a similar method:
PrepareDataIMU[dat_]:=Rasterize@ListLinePlot[{dat[[All,1]],dat[[All,2]],dat[[All,3]]},PlotRange->All,Axes->None,AxesLabel->None,PlotStyle->{Red, Green, Blue}];
^ Plots the data points in *dat* with no axes or axis labels, and with x coordinates in red, y coordinates in green, z coordinates in blue (this makes processing easier as Wolfram operates in RGB colours).
threshold = 0.8;
trainlist={};
appendToSample[n_,step_,x_]:=AppendTo [trainlist,PrepareDataIMU[Part[x,n;;step]]];
Dynamic[If[normAvg[-155,-150,-146]>threshold,appendToSample[-210,-70,stream],False],UpdateInterval->0.1]
^ Every 0.1 seconds, checks whether or not the normed average of the gradient of accelerometer data at the 150th most recent data point (using the *normAvg* function) is greater than the threshold - if it is, it will create a rasterised image of a graph of accelerometer data from the 210th most recent data point to the 70th most recent data point and append it to *trainlist* - a list of graphs of gestures. Patience is recommended here - there can be up to ~5 seconds' lag before a gesture appears. Ensure gestures are made reasonably vigorously.
As a first test, we attempted to generate 30 samples of each of the digits 1 to 5 drawn in the air with a phone - the images of these graphs were stored in *trainlist*. Then, we classified them as 1, 2, 3, 4 or 5, converting *trainlist* into an association with key <image of graph> and value <number>).
We split the data into training data (25 samples from each category) and test data (all remaining data):
TrainingTest[data_,number_]:=Module[{maxindex,sets,trainingdata,testdata},
maxindex=Max[Values[data]];
sets = Table[Select[data,#[[2]]==x&],{x,1,maxindex}];
sets=Map[RandomSample,sets];
trainingdata =Flatten[Map[#[[1;;number]]&,sets]];
testdata =Flatten[Map[#[[number+1;;-1]]&,sets]];
Return[{trainingdata,testdata}]
]
^ Randomly selects *number* training elements and *Length[data]-number* test elements for each value in the list *data*
gestTrainingTest=TrainingTest[gesture1to5w30,25];
gestTraining=gestTrainingTest[[1]];
gestTest=gestTrainingTest[[2]];
*gestTraining* and *gestTest* now contain key-value pairs like those below:
![Key-value pairs of accelerometer graphs and labels][5]
##### Machine Learning
To train a model on these images, we first attempt a basic *Classify*:
TrainWithClassify = Classify[gestTraining]
ClassifierInformation[TrainWithClassify]
![A poor result in ClassifierInformation][6]
Evidently, this gives a very poor training accuracy of 24.4% - given that there are 5 classes, this is only marginally better than random chance.
As the input data consists of images, we try transfer learning on an image identification neural network (specifically the VGG-16 network):
net = NetModel["VGG-16 Trained on ImageNet Competition Data"]
We remove the last few layers from the network (which classify images into the classes the network was trained on), leaving the earlier layers which perform more general image feature extraction:
featureFunction = Take[net,{1,"fc6"}]
[//]: # (No rules defined for Output)
We train a classifier using this neural net as a feature extractor:
NetGestClassifier = Classify[gestTraining,FeatureExtractor->featureFunction]
We now test the classifier using the data in *gestTest:*
NetGestTest = ClassifierMeasurements[NetGestClassifier,gestTest]
We check the training accuracy:
ClassifierInformation[NetGestClassifier]
![A better classifier information result][7]
NetGestTest["Accuracy"]
1.
NetGestTest["ConfusionMatrixPlot"]
![A good confusion matrix plot][8]
[//]: # (No rules defined for Output)
This method appears to be promising, as a training accuracy of 93.1% and a test accuracy of 100% was achieved.
## Implementation of Machine Learning Model
To use the model with live data, we use the same method of identifying gestures as before in 'Detecting Gestures' (detecting 'spikes' in the data using moving averages), but when a gesture is identified, instead of being appended to a list it is sent through the classifier:
results = {""};
ClassGestIMU[n_,step_,x_]:=Module[{aa,xa,ya},
aa = Part[x,n;;step];
xa=PrepareDataIMU[aa];
ya=gestClassifier[xa];
AppendTo[results,{Length@aa,xa,ya}]
];
Dynamic[If[normAvg[-155,-150,-146]>threshold,ClassGestIMU[-210,-70,stream],False],UpdateInterval->0.1]
Real time results (albeit with significant lag) can be seen by running
Dynamic@column[[-1]]
## Conclusions and Further Work
On the whole, this project was successful, with gesture detection and classification using rasterised graph images proving a viable method. However, the system as-is is impractical and unreliable, with a significant lag, training bias (trained to recognise the digits 1 to 5 the way they are drawn by one person only) and small sample size: these are problems that can be solved, given more time.
Further extensions to this project include:
- Serious code optimisation to reduce / eliminate lag
- An improved training interface to allow users to create their own gestures
- Integration of the gesture classification system with the Channel interface (as described earlier) and deployment of this to the cloud.
- Investigation of the feasibility of using an RNN or an LSTM for gesture classification - using a time series of raw gesture data rather than relying on rasterised images (which, although accurate, can be quite laggy). Alternatively, hidden Markov models could be used in an attempt to recover the states (gestures) that generate the observed data (accelerometer readings).
- Adding an API to trigger actions based on gestures and deployment of gesture recognition technology as a native app on smartphones / smartwatches.
- Improvement of gesture detection. At the moment, the code takes a predefined 1-2 second 'window' of data after a spike is detected - an improvement would be to detect when the gesture has ended and 'crop' the data values to include only the gesture made.
- Exploration of other applications of gesture recognition (e.g. walking style, safety, sign language recognition). Beyond limited UI navigation, a similar concept to what is currently implemented could be used with, say, a phone kept in the pocket, to analyse walking styles / gaits and, for instance, to predict or detect an elderly user falling and notify emergency services. Alternatively, with suitable methods for detecting finger motions, like flex sensors, such a system could be trained to recognise / transcribe sign language.
- Just for fun - training the model on Harry Potter-esque spell gestures, to use your phone as a wand...
A notebook version of this post is attached, along with a full version of the computational essay.
## Acknowledgements
We thank the mentors at the 2018 Wolfram High School Summer Camp - Andrea Griffin, Chip Hurst, Rick Hennigan, Michael Kaminsky, Robert Morris, Katie Orenstein, Christian Pasquel, Dariia Porechna and Douglas Smith - for their help and support during this project.
[1]: http://community.wolfram.com//c/portal/getImageAttachment?filename=SensorDemo2.gif&userId=1371970
[2]: https://play.google.com/store/apps/details?id=de.lorenz_fenster.sensorstreamgps
[3]: https://play.google.com/store/apps/details?id=xyz.omnicron.caffeinate&hl=en_US
[4]: http://community.wolfram.com//c/portal/getImageAttachment?filename=109724.png&userId=1371970
[5]: http://community.wolfram.com//c/portal/getImageAttachment?filename=12475.png&userId=1371970
[6]: http://community.wolfram.com//c/portal/getImageAttachment?filename=19276.png&userId=1371970
[7]: http://community.wolfram.com//c/portal/getImageAttachment?filename=77697.png&userId=1371970
[8]: http://community.wolfram.com//c/portal/getImageAttachment?filename=96898.png&userId=1371970Euan Ong2018-07-17T21:16:07Z[WSC18] Analyzing and visualizing chord sequences in music
http://community.wolfram.com/groups/-/m/t/1383630
During this year's Wolfram Summer Camp, being mentored by Christian Pasquel, I developed a tool that identifies chord sequences in music (from MIDI files) and generates a corresponding graph. The graph represents all [unique] chords as vertices, and connects every pair of chronologically subsequent chords with a directed edge. Here is an example of a graph I generated:
![Graph genehrated from Bach's prelude no.1 of the Well Tempered Klavier (Book I)][1]
Below is a detailed account on the development and current state of the project, plus some background on the corresponding musical theory notions.
#Introduction
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;**GOAL** | The aim of this project is to develop a utility that identifies chords (e.g. C Major, A minor, G7, etc.) from MIDI files, in chronological order, and then generates a graph for visualizing that chord sequence. In the graph, each vertex would represent a unique chord, and each pair chronologically adjacent chords would be connected by a directed edge (i.e. an arrow). So, for example, if at some point in the music that is being analyzed there is a transition from a major G chord to a major C chord, there would be an arrow that goes from the G Major chord to the C Major chord. Therefore, the graph would describe a [Markov chain][2] for the chords. The purpose of the graph is to visualize frequent chord sequences and progressions within a certain piece of music.
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;**MOTIVATION** | While brainstorming for project ideas, I don't know why, I had a desire to do something with graphs. Then I asked myself, "What are graphs good at modelling?". I mentally browsed through my areas of interest, searching for any that matched that requirement. One of my main interests is music; I am somewhat of a musician myself. And, in fact, [musical] harmony *is* a good subject to be modelled by graphs. Harmony, one of the fundamental pillars of music (and perhaps the most important), not only involves the chords themselves, but, more significantly, the *transitions* between those, which is what gives character to music. And directed graphs, and, specifically, Markov models, are a perfect match for transitions between states.
----------
#Some background
*Skip this if you aren't interested in the musical theory part or if you already have a background in music theory!*
##What is a chord?
A chord is basically a group of notes played together (contemporarily). Chords are the quanta of musical "feeling"; the typical—but somewhat naïve—example is the sensation of major chords sounding "happy" and minor chords sounding "sad" or melancholic (more on types of chords later).
Types of chords are defined by the [intervals][3] (distance in pitch) between the notes. The *root* of a chord is the "most important" or fundamental note of the cord, in the sense that it is the "base" from which the aforementioned intervals are measured. In other words, the archetype of chord defines the "feel" and the general harmonic properties of the chord, while the root defines the pitch of the chord. So a "C Major" chord is a chord with archetype "major triad" (more on that later) built on the note C; i.e., its root is C.
The *sequence* of chords in a piece constitutes its **harmony**, and it can convey much more complex musical messages or feelings than a single chord, just as in language: a single word does have meaning, but a sentence can have a much more complex meaning than any single word.
##Patterns in chord sequences
The main difference that between language and music is that language, in general, has a much stricter structure (i.e. the order of words, a.k.a. syntax) than music: the latter is an art, and there are no predetermined rules to follow. But humans \[have a tendency to\] like patterns, and music wouldn't be so universally beloved if it didn't contain any patterns. This also explains the unpopularity of [atonal music][4] (example [here][5]). But even atonal music has patterns: it may do its best to avoid harmonic patterns, but it still contains some level of rythmic, textural or other kinds of patterns.
This is why using graphs to visualize chord sequences is interesting: it is a semidirect way of identifying the harmonic patterns that distinguish different genres, styles, forms, pieces or even fragments of music. In my project, I have mainly focused on the "western" conception of tonal music, an particularly in its "classical" version (what I mean by "classical" is, in lack of a better definition, a classification that encompasses all music where the composer is, culturally, the most important artist). That doesn't mean this tool isn't apt for other types of music; it just means it will analyze it from this specific standpoint.
In tonal music, the harmonic patterns are all related to a certain notion of "center of gravity": the [*tonic*][6], which is, in some way the music's harmonic "home". Classical (as in pre-XX-century) tonal music usually ends (and often starts) with the tonic chord. In fact, we can further extend the analogy with gravity by saying that music consists in a game of tension, in which the closer you are to the center of gravity (the tonic), the greater the "pull". In an oversimplified manner, the musical equivalent of the [Schwarzschild radius][7] is the [dominant chord][8]: it tends towards the tonic. Well, not really, because you *can* turn back from it—and in fact a lot of interesting harmonical sequences consist in doing just that.
##Some types of chords
In "classical" music (see definition above), there are mainly these kinds of chords (based on the amount of unique notes they contain): triad chords (i.e. three-note chords), seventh chords (i.e. four-note chords; we'll see why they're called *seventh* in a bit), and ninth chords (five-note chords). There is another main distinction: major and minor chords (i.e. the cliché "happy" vs "sad" distinction).
###Triad chords
Probably the most simple and frequent chord is the triad chord (either major or minor). Here is a picture of a major and a minor triad C chord (left to right):
![Major and minor triad C chords (ltr)][9]
###Seventh chords
[Seventh chords][10] are called so because they contain a seventh [interval][11]. Their main significance is in dominant chords, where they usually appear in the major-triad-minor-seventh (a.k.a ["dominant"][12]) form. Another important seventh chord form is the fully diminished seventh chord (these will be relevant for the code later), which also tends to resolve ("resolve" is music jargon for "transition to a chord with less tension") to tonic.
![Seventh chords][13]
###Ninth chords
Although not extremely frequent, they do appear in classical music. The most "popular" is the dominant ninth chord (an extension of the dominant 7th). An alternative for this chord is the minor ninth dominant chord (built from the same dominant 7th chord, but with a minor ninth instead).
<br>
----------
#Algorithms and Code
In this section I'm going to walk through my code in order of execution. Four main parts can be distinguished in my project: importing and preprocessing, splitting the note sequence into "chunks" to be analyzed as chords, identifying the chord in each of those chunks, and visualizing the whole sequence as a graph.
##First phase: importing and preprocessing the MIDI file
The first operation that needs to be done is importing the MIDI file and preprocessing it. This includes selecting which elements to import from the file, converting them to a given simplified form, and performing any sorting, deletion of superfluous elements, or other modification that needs to be done.
For this purpose I defined the function `importMIDI`:
importMIDI[filename_String] := MapAt[Flatten[#, 1] &, MapAt[flattenAndSortSoundNotes,
Import[(dir <> filename <> ".mid"), {{"SoundNotes", "Metadata"}}],
1], 2]
Here `dir` stands for the directory where I saved all my MIDIs (to avoid having to type in the whole directory every time). Notice that we're importing the music as SoundNotes *and* the file's metadata—we will need it for determining the boundaries of measures (see below). The function `flattenAndSortNotes` does what it sound like: it converts the list of `SoundNote`s that `Import` returned into a flattened list of notes (i.e. a single track), sorted by their starting time. It also gets rid of anything that isn't necessary for chord identification (i.e. rhythmic sounds or effects). Consult the attached notebook for the explicit definition.
Here is the format the sequence of notes is returned in (i.e. `importMIDI[...][[1]]`):
{{"C4", {0., 1.4625}}, {"E4", {0.18125, 1.4625}}, {"G4", {0.36875, 0.525}}, <<562>>, {"G2", {105., 107.963}}, {"G4", {105., 107.963}}}
Each sub-list represents a note. Its first element is the actual pitch; the second is a list that represents the timespan (i.e. start and end time in seconds).
<br>
##Second phase: splitting the note sequence into chunks
The challenge in this part of the project is knowing how to determine which notes form a single chord; i.e., where to put the boundary between one chord and the next.
The solution I came up with is not optimal, but, until now, nothing better has occurred to me (suggestions are welcome!). It involves determining where each measure start/end lies in time from the metadata and splitting each of those into a certain amount of sub-parts; then the notes are grouped by the specific sub-part of the specific measure they pertain to. The rationale behind this is that chords in classical music tend to be well-contained within measures or rational fractions of these.
This procedure is contained in the function `chordSequenceAnalyzeUsingMeasures`. I'm going to go over it quickly:
chordSequenceUsingMeasures[midiData_List /; Length@midiData == 2,
measureSplit_: 2, analyzer_String: "Heuristic"] :=
Block[{noteSequence, metadata, chunkKeyframes, chunkedSequence,
result},
(*Separate notes from metadata*)
noteSequence = midiData[[1]];
metadata = midiData[[2]];
Until here it's pretty self evident.
(*Get measure keyframes*)
chunkKeyframes =
divideByN[
measureKeyframesFromMetadata[
metadata, (Last@noteSequence)[[2, 2]]], measureSplit];
Here the function `measureKeyframesFromMetadata` is called. It fetches all of the `TimeSignature` and `SetTempo` tags in the metadata and identifies the position of each measure from them. `divideByN` subdivides each measure by `measureSplit` (an optional argument with default value `2`).
(*Chunk sequence*)
chunkedSequence = {};
Module[{i = 1},
Do[
With[{k0 = chunkKeyframes[[j]], k1 = chunkKeyframes[[j + 1]]},
Module[{chunk = {}},
While[
i <= Length@noteSequence && (
k0 <= noteSequence[[i, 2, 1]] < k1 ||
k0 < noteSequence[[i, 2, 2]] <= k1 ),
AppendTo[chunk, noteSequence[[i]]] i++;];
AppendTo[chunkedSequence, chunk]
]
],
{j, Length@chunkKeyframes - 1}]];
chunkedSequence =
DeleteCases[chunkedSequence, l_List /; Length@l == 0];
Once the measures' timespan has been determined, a list of "chunks" (lists of notes grouped by measure part) is generated.
(*Call analyzer*)
Switch[analyzer,
"Deterministic", result = chordChunkAnalyze /@ chunkedSequence,
"Heuristic",
result = heuristicChordAnalyze /@ justPitch /@ chunkedSequence
];
result = resolveDiminished7th[Split[result][[All, 1]]]
]
Finally, each chunk is sent to the chord analyzer function `heuristicChordAnalyze`, which I'll talk about in the next section, along with the currently mysterious `resolveDiminished7th`.
Since this algorithm for "chunking" a note sequence doesn't work for everything, I also developed an alternative, more naïve approach:
chordSequenceNaïve[midiData_List /; Length@midiData == 2,
analyzer_String: "Heuristic", n1_Integer: 6, n2_Integer: 1] :=
Module[{noteSequence, chunkedSequence, result},
(*Separate notes from metadata*)
noteSequence = midiData[[1]];
(*Chunk sequence*)
chunkedSequence = Partition[noteSequence, n1, n2];
(*Call analyzer*)
result = heuristicChordAnalyze /@ justPitch@chunkedSequence;
result = resolveDiminished7th[Split[result][[All, 1]]]
]
<br>
##Phase 3: identifying the chord from a group of notes
This has been the main conceptual challenge in the whole project. After some unsucsessful ideas, with some suggestions from Rob Morris (one of the mentors), whom I thank, I ended up developing the following algorithm. It iterates through each note and assigns it a score that represents the likeliness of that note being the root of the chord based on the presence of certain indicators (i.e. notes the presence of which define a chord, to some degree), each of which with a different weight: having a fifth, having a third, a minor seventh... Then the note with the highest chord is assumed to be the root of the chord.
In code:
heuristicChordAnalyze[notes_List] :=
Block[{chordNotes, scores, root},
(*Calls to helper functions*)
chordNotes = octaveReduce /@ convertToSemitones /@ notes // DeleteDuplicates;
(*Scoring*)
scores = Table[Total@
Pick[
(*Score points*)
{24, 16, 16, 8, 2, 3, 1, 1,
10, 15, 15, 18},
(*Conditions*)
SubsetQ[chordNotes, #] & /@octaveReduce /@
{{nt + 7}, {nt + 4}, {nt + 3}, {nt + 10}, {nt + 11}, {nt + 2}, {nt + 5}, {nt + 9},
{nt + 4, nt + 10}, {nt + 3, nt + 6, nt + 10}, {nt + 3, nt + 6, nt + 9}, {nt + 1, nt + 4, nt + 10}}
]
(*Substract outliers*)
- 18*Length@Complement[chordNotes, octaveReduce /@ {nt, 7 + nt, 4 + nt, 3 + nt, 10 + nt, 11 + nt,
2 + nt, 5 + nt, 9 + nt, 6 + nt}],
{nt, chordNotes}];
(*Return*)
root = Part[chordNotes, Position[scores, Max @@ scores][[1, 1]]];
{root, Which[
SubsetQ[chordNotes, octaveReduce /@ {root + 10 , root + 2, root + 5, root + 9}], "13",
SubsetQ[chordNotes, octaveReduce /@ {root + 10, root + 2, root + 5}], "11",
SubsetQ[chordNotes, octaveReduce /@ {root + 4, root + 10, root + 2}], "Dom9",
SubsetQ[chordNotes, octaveReduce /@ {root + 4, root + 10, root + 1}], "Dom9m",
SubsetQ[chordNotes, octaveReduce /@ {root + 11, root + 7, root + 3}], "m7M",
SubsetQ[chordNotes, {octaveReduce[root + 11], octaveReduce[root + 4]}], "7M",
SubsetQ[chordNotes, {octaveReduce[root + 10], octaveReduce[root + 4]}], "Dom7",
SubsetQ[chordNotes, {octaveReduce[root + 10], octaveReduce[root + 7]}], "Dom7",
SubsetQ[chordNotes, {octaveReduce[root + 10], octaveReduce[root + 6]}], "d7",
SubsetQ[ chordNotes, {octaveReduce[root + 9], octaveReduce[root + 6]}], "d7d",
SubsetQ[chordNotes, {octaveReduce[root + 10], octaveReduce[root + 3]}], "m7",
MemberQ[chordNotes, octaveReduce[root + 4]], "M",
MemberQ[chordNotes, octaveReduce[root + 3]], "m",
MemberQ[chordNotes, octaveReduce[root + 7]], "5",
True, "undef"]}
]
###A note on notation
In this project I use the following abbreviations for chord notation (they're not in the standard format). "X" represents the root of the chord.
- *X-**5*** = undefined triad chord (just the root and the fifth)
- *X-**M*** = Major
- *X-**m*** = minor
- *X-**m7*** = minor triad with minor (a.k.a dominant) seventh
- *X-**d7d*** = fully diminished 7thchord
- *X-**d7*** = half diminished 7thchord
- *X-**Dom7*** = Dominant 7th chord
- *X-**7M*** = Major triad with Major 7th
- *X-**m7M*** = minor triad with Major 7th
- *X-**Dom9*** = Dominant 9th chord
- *X-**Dom9m*** = Dominant 7th chord with a minor 9th
- *X-**11*** = 11th chord
- *X-**13*** = 13th chord
###Dealing with diminished 7th chords
Now, on to `resolveDiminished7th`. What is this function on about?
Well, recall the fully diminished seventh chords I mentioned in the Background section. Here's the problem: they're completely symmetrical! What I mean by that is that the intervals between subsequent notes are identical, even if you [invert][14] the chord. In other words, the distance in semitones between notes is constant (it's 3) and is a factor of 12 (distance of 12 semitones = octave). So, given one of these chords, there is no way to determine which note is the root just by analyzing the chord itself. In the context of our algorithm, every note would have the same score!
At this point I thought: "How do humans deal with this?". And I concluded that the only way to resolve this issue is to have some contextual vision (looking at the next chord, particularly), which is how humans do it. So what `resolveDiminished7th` does is it brushes through the chord sequence stored in `result`, looking for fully diminished chords (marked with the string "d7d"), and re-assigns each of those a root by looking at the next chord:
resolveDiminished7th[chordSequence_List] :=
Module[{result},
result = Partition[chordSequence, 2, 1] /. {{nt_, "d7d"}, c2_List} :> Which[
MemberQ[octaveReduce /@ {nt, nt + 3, nt + 6, nt + 9}, octaveReduce[c2[[1]] - 1]], {{c2[[1]] - 1, "d7d"}, c2},
MemberQ[octaveReduce /@ {nt, nt + 3, nt + 6, nt + 9}, octaveReduce[c2[[1]] + 4]], {{c2[[1]] + 4, "d7d"}, c2},
MemberQ[octaveReduce /@ {nt, nt + 3, nt + 6, nt + 9}, octaveReduce[c2[[1]] + 6]], {{c2[[1]] + 6, "d7d"}, c2},
True, {{nt, "d7d"}, c2}];
result = Append[result[[All, 1]], Last[result][[2]]]
]
##Phase 4: Visualization
Basically, my visualization function (`visualizeChordSequence`) is fundamentally a highly customized call of the `Graph` function; so I'll just paste the code below and then explain what some parameters do:
visualizeChords[chordSequence_List, layoutSpec_String: "Unspecified", version_String: "Full", mVSize_: "Auto", simplicitySpec_Integer: 0, normalizationSpec_String: "Softmax"] :=
Module[{purgedChordSequence, chordList, transitionRules, weights, graphicalWeights, nOfCases, edgeStyle, vertexLabels, vertexSize, vertexStyle, vertexShapeFunction, clip},
(*Preprocess*)
Switch[version,
"Full",
purgedChordSequence =
StringJoin[toNoteName[#1], "-", #2] & @@@ chordSequence,
"Basic",
purgedChordSequence =
Split[toNoteName /@ chordSequence[[All, 1]]][[All, 1]]];
(*Amount of each chord*)
chordList = DeleteDuplicates[purgedChordSequence];
nOfCases = Table[{c, Count[purgedChordSequence, c]}, {c, chordList}];
(*Transition rules between chords*)
Switch[version,
"Full",
transitionRules =
Gather[Rule @@@ Partition[purgedChordSequence, 2, 1]],
"Basic",
transitionRules =(*DeleteCases[*)
Gather[Rule @@@ Partition[purgedChordSequence, 2, 1]](*, t_/;
Length@t\[LessEqual]2]*) ];
(*Get processed weight for each transition*)
weights = Length /@ transitionRules;
If[normalizationSpec == "Softmax", graphicalWeights = SoftmaxLayer[][weights]];;
graphicalWeights =
If[Min@graphicalWeights != Max@graphicalWeights,
Rescale[graphicalWeights,
MinMax@graphicalWeights, {0.003, 0.04}],
graphicalWeights /. _?NumericQ :> 0.03 ];
(*Final transition list*)
transitionRules = transitionRules[[All, 1]];
(*Graph display specs*)
clip = RankedMax[weights, 4];
edgeStyle =
Table[(transitionRules[[i]]) ->
Directive[Thickness[graphicalWeights[[i]]],
Arrowheads[2.5 graphicalWeights[[i]] + 0.015],
Opacity[Which[
weights[[i]] <= Clip[simplicitySpec - 2, {0, clip - 2}], 0,
weights[[i]] <= Clip[simplicitySpec, {0, clip}], 0.2,
True, 0.6]],
RandomColor[Hue[_, 0.75, 0.7]],
Sequence @@ If[weights[[i]] <= Clip[simplicitySpec - 1, {0, clip - 1}], {
Dotted}, {}] ], {i, Length@transitionRules}];
vertexLabels =
Thread[nOfCases[[All,
1]] -> (Placed[#,
Center] & /@ (Style[#[[1]], Bold,
Rescale[#[[2]], MinMax[nOfCases[[All, 2]]],
Switch[mVSize, "Auto", {6, 20}, _List,
10*mVSize[[1]]/0.3*{1, mVSize[[2]]/mVSize[[1]]}]]] & /@
nOfCases))];
vertexSize =
Thread[nOfCases[[All, 1]] ->
Rescale[nOfCases[[All, 2]], MinMax[nOfCases[[All, 2]]],
Switch[mVSize,
"Auto", (Floor[Length@chordList/10] + 1)*{0.1, 0.3}, _List,
mVSize]]];
vertexStyle =
Thread[nOfCases[[All, 1]] ->
Directive[Hue[0.53, 0.27, 1, 0.6], EdgeForm[Blue]]];
vertexShapeFunction =
Switch[version, "Full", Ellipsoid[#1, {3.5, 1} #3] &, "Basic",
Ellipsoid[#1, {2, 1} #3] &];
Graph[transitionRules,
GraphLayout ->
Switch[layoutSpec, "Unspecified", Automatic, _, layoutSpec],
EdgeStyle -> edgeStyle,
EdgeWeight -> weights,
VertexLabels -> vertexLabels,
VertexSize -> vertexSize,
VertexStyle -> vertexStyle,
VertexShapeFunction -> vertexShapeFunction,
PerformanceGoal -> "Quality"]
]
There are five main things to focus on in the above definition: the graph layout (passed as the argument `layoutSpec`), the edge thickness (defined in `edgeStyle`), the vertex size (defined in `vertexSize`), the version (passed as argument `version`) and the simplicity specification (`simplicitySpec`).
The graph layout is a `Graph` option that can be specified in the argument `layoutSpec`. If `"Unspecified"` is passed, an automatic layout will be used. I find that the best layouts tend to be, in order of preference, "BalloonEmbedding" and "RadialEmbedding"; nevertheless, neither are a perfect fit for every piece. In the future I would like to to implement custom (i.e. pre-defined) positioning, so that I can design it in a way that best fits this project.
The edge thickness is a function of the amount of times a certain transition between two chords has occurred in the chord sequence. There is an option (namely the `normalizationSpec` argument) to enable or disable using a Softmax function for assigning thicknesses to edges. This is due to the fact that for simple/short chord sequences, Softmax is actually counterproductive because it suppresses secondary but still top-ranked transitions; i.e., it assigns a very high thickness to the most frequent transition and a low thickness to all other transitions (even those that come in second or third in frequency ranking). But for large or complex sequences it is actually useful, because it "gets rid of" a lot of the \[relatively\] insignificant instances, thus making the output actually understandable (and not just a [jumbled mess of thick lines][15]).
The vertex size is proportional to the number of occurrences of each particular chord (that is, without taking into account the transitions). It can also be specified manually by passing `vSize` as a list `{a,b}` such that `a` is the minimum size an `b` is the maximum.
The `version` can be either `"Full"` or `"Basic"`; the default is `"Full"`. The `"Basic"` version consists of a simplified chord set in which only the root note of the chord is taken into account, and not the archetype. For example, all C chords (M, Dom7, m...) would be represented by a single `"C"` vertex.
Finally, the simplicity specification (`simplicitySpec`) is a number that can be thought of, in some way, as a "noise" threshold: as it gets larger, fewer edges "stand out"—that is, more of the lower-significance ones are rendered with reduced opacity or are shown as dotted lines. This is useful for large or complex sequences.
<br>
----------
#Some examples
Here I will show some specific examples generated with this tool. I tried to use different styles of music for comparison.
- **Bach**'s [prelude no.1][16] from the Well Tempered Clavier:
![Visualization of Bach's prelude no.1 ][17]
- **Debussy**'s [*Passepied*][18] from the *Suite Bergamasque*:
![Visualization of Debussy's *Passepied*][19]
- A "template" blues progression:
![Blues template][20]
- **Beethoven**'s second movement from the *Pathétique* sonata (no.8):
![Beethoven][21]
- Any "reggaeton" song (e.g. Despacito):
![Reggaeton][22]
#Microsite
Check out the form page (a.k.a. microsite) of this project [here][23]:
https://www.wolframcloud.com/objects/lammenspaolo/Chord%20sequence%20visualization
[![enter image description here][24]][23]
Briefly, here is what each option does (see the section **Algorithms and code** for a more detailed explanation):
- **Chunkifier functon**: choose between splitting notes by measures
- **Measure split factor**: choose into how many pieces you want to divide measures (each piece will be analyzed as a separate chord)
- **Graph layout**: choose the layout option for the `Graph` call
- **Normalization function**: choose whether to apply a Softmax function to the weights of edges (to make results clearer in case of complex sequences).
- **Version**: choose "Full" for complete chord info (e.g. "C-M", "D-Dom7", "C-7M"...) or "Basic" for just the root of the chord (e.g. "C", "D"...)
- **Vertex size**: specify vertex size as a list `{a,b}` where `a` is the minimum and `b` is the maximum size
- **Simplicity parameter**: visual simplification of the graph (a value of 0 means no simplification is applied)
<br>
#Conclusions
I have developed a functional tool to visualize chord sequences as graphs. It is far from perfect, though. In the future, I would like improving the positioning of vertices, being able to eliminate insignificant transitions from the graph altogether, and making other visual adjustments. Furthermore, I plan to refine and optimize the chord analyzer, as right now it is just an experimental version that isn't too accurate. A better "chunkifier" function could be developed too.
Finally, I'd like to thank my mentor Christian Pasquel and all of the other WSC staff for this amazing opportunity. I'd also like to thank my music theory teacher, Raimon Romaní, for making me, over the years, sufficiently less terrible at musical analysis to be able to undertake this project.
[1]: http://community.wolfram.com//c/portal/getImageAttachment?filename=Prelude.png&userId=1372342
[2]: https://en.wikipedia.org/wiki/Markov_chain "Wikipedia: Markov chain"
[3]: https://en.wikipedia.org/wiki/Interval_(music) "Wikipedia: Interval"
[4]: https://en.wikipedia.org/wiki/Atonality "Wikipedia: Atonality"
[5]: https://youtu.be/L85XTLr5eBE "Schönberg's 4th string quartet on YouTube"
[6]: https://en.wikipedia.org/wiki/Tonic_%28music%29 "Wikipedia: Tonic"
[7]: http://astronomy.swin.edu.au/cosmos/S/Schwarzschild+Radius "Basic info on Schwartzschild radius"
[8]: https://en.wikipedia.org/wiki/Dominant_(music) "Dominant chord"
[9]: http://community.wolfram.com//c/portal/getImageAttachment?filename=2548Macro_analysis_chords_on_C.jpg&userId=1372342
[10]: https://en.wikipedia.org/wiki/Seventh_chord "Wikipedia: Seventh chord"
[11]: https://en.wikipedia.org/wiki/Interval_(music) "Wikipedia: Interval"
[12]: https://en.wikipedia.org/wiki/Dominant_seventh_chord "Wikipedia: Dominant seventh"
[13]: http://community.wolfram.com//c/portal/getImageAttachment?filename=images.png&userId=1372342
[14]: https://en.wikipedia.org/wiki/Inversion_(music)#Chords "Wikipedia: Inversion#Chords"
[15]: http://community.wolfram.com//c/portal/getImageAttachment?filename=Passepied.png&userId=1372342 "Jumbled mess!"
[16]: https://www.youtube.com/watch?v=aengbLEFnM8
[17]: http://community.wolfram.com//c/portal/getImageAttachment?filename=Prelude.png&userId=1372342
[18]: https://www.youtube.com/watch?v=hDWbVP-5DSA "Passepied"
[19]: http://community.wolfram.com//c/portal/getImageAttachment?filename=deb_pass2.png&userId=1372342
[20]: http://community.wolfram.com//c/portal/getImageAttachment?filename=Blues.png&userId=1372342
[21]: http://community.wolfram.com//c/portal/getImageAttachment?filename=pathetique.png&userId=1372342
[22]: http://community.wolfram.com//c/portal/getImageAttachment?filename=Reggaeton.png&userId=1372342
[23]: https://www.wolframcloud.com/objects/lammenspaolo/Chord%20sequence%20visualization "Microsite"
[24]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-19at1.53.02PM.png&userId=11733Paolo Lammens2018-07-14T05:10:03Z[WSS18] Punctuation Restoration With Recurrent Neural Networks
http://community.wolfram.com/groups/-/m/t/1379001
# Punctuation Restoration With Recurrent Neural Networks
Mengyi Shan, Harvey Mudd College, mshan@hmc.edu
![flow][1]
All codes posted on GitHub: [https://github.com/Shanmy/Summer2018Starter/tree/master/Project][2].
Raw results in the attached notebook.
----------
## Introduction
In natural language processing problems such as automatic speech recognition (ASR), the generated text is normally unpunctuated, which is hard for further recognition or analysis. Thus punctuation restoration is a small but crucial problem that deserves our attention. This project aims to build an automatic "punctuation adding" tool for plain English text with no punctuation.
Since the input text could be considered as a sequence in which context is important for every single word's properties, recurrent characteristics of neural networks are considered to be a good method. Traditional approaches to this problem include usage of various recurrent neural networks (RNN), especially long short-term memory layers (LSTM). This project examines several models built from different layers and introduces bidirectional operators which can significantly improve the result compared with old methods.
## Methods
![Basic steps of the method][3]
There're four basic steps in the whole process. First, we get the corpus of articles (with punctuations). Then, we keep the periods and commas in the corpus but change the question marks, exclamation marks, and colons to periods and commas, while removing all other punctuations. With this pure text, we tag each word as one of {NONE, COMMA, PERIOD} by judging if it is followed by a punctuation or not. And this set of tagging rules are sent to a neural network model for training. Finally, we test the result on another piece of articles, which is the test set.
### Data
Basically, we have two pieces of data. The first one is the Wikipedia text of 4000 nouns (deleting missing), and the second is 50 novels from Wolfram data repository.
(*Get wikipedia text of 4000 nouns*)
nounlist = Select[WordList[], WordData[#, "PartsOfSpeech"][[1]] == "Noun" &];
rawData = StringJoin @@ DeleteCases[Flatten[WikipediaData[#] & /@ Take[nounlist, {1, 4000}], 2], _Missing]
(*Get text of 50 novels*)
books = StringJoin @@ Get /@ ResourceSearch["novels", 50];
### Pre-processing
The first goal of the preprocessing step is to purify the text. That is, since we only consider commas and periods, we should either delete or replace other characters and punctuations. Also, for convenience, all numbers are replaced with 1 first. All other characters are removed from the text.
(*Show sets of characters replaced with comma, period, whitespace, one and null respectively*)
toComma = Characters[":;"];
toPeriod = Characters["!?"];
toWhiteSpace = {"-", "\n"};
toOne = {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9"};
toNull[x_String] := Complement[Union[Characters[x]], ToUpperCase@Alphabet[], Alphabet[], toOne, toComma, toPeriod, toWhiteSpace, {".", ",", " "}];
Then complete the replacement and modify it to pure form. And we include a validation test to examine its purity.
(*Replacement and modification. End with lowercase text with only periods, alphabets and commas.*)
toPureText[x_String] :=
StringReplace[#, ".," .. -> ". "] &@
StringReplace[#, ". " .. -> ". "] &@
StringReplace[#, {" ," -> ",", " ." -> "."}] &@
StringReplace[#, {"1" .. -> "one", " " .. -> " "}] &@
StringReplace[#, {"1. 1" -> "", "1, 1" -> "1"}] &@
StringReplace[#, {" " .. -> " "}] &@
StringReplace[#, {"," -> ", ", "." -> ". "}] &@
ToLowerCase@
StringReplace[{toComma -> ",", toPeriod -> ".", toNull[x] -> "",
toWhiteSpace -> " ", toOne -> "1"}][x];
(*validation test*)
VerificationTest[Length@StringSplit@x == Length@TextWords@x]
Then we define a function fPuncTag that can generate the corresponding tagging given a piece of text with punctuation.
(*Define the tagging function, and maps it to original text. Original text is partitioned into pieces of 200 words*)
fPuncTag := Switch[StringTake[#, -1], ".", "a", ",", "b", _, "c"] &;
fWordTag[x_String] := Map[fPuncTag, Partition[StringSplit[x], 200], {2}];
And we can thus remove the punctuation, and build a set of rules between the unpunctuated text and the generated tagging.
fWordText[x_String] := StringReplace[#, {"," -> "", "." -> ""}] & /@ StringRiffle /@ Partition[StringSplit[x], 200];
fWordTrain[x_String] := Normal@AssociationThread[fWordText[x], fWordTag[x]];
totalData = fWordTrain@toPureText@rawText
With the total data, we want to divide it into three groups: the training set, the validation set, and the test set.
(* First we know that the length is 63252, then we divide it by 15:3:1*)
order = RandomSample[Range[63252]];
trainingSet = totalData[[Take[order, 50000]]];
validationSet = totalData[[Take[order, {50001, 60000}]]];
testSet = totalData[[Take[order, {60001, -1}]]];
### Train
During neural network training, I used 8 different combinations of layers, out of which 4 are worth considering. They are listed as followed. LSTM layer, gate recurrent layer, and basic recurrent layer are three types of recurrent layers, each representing a net that takes a sequence of vectors and outputs a sequence of the same length. LSTM is commonly used in natural language processing problems, so we start with it as a penetrating point.
(*Pure LSTM*)
net1 = NetChain[{
embeddingLayer,
LongShortTermMemoryLayer[100],
LongShortTermMemoryLayer[60],
LongShortTermMemoryLayer[30],
LongShortTermMemoryLayer[10],
NetMapOperator[LinearLayer[3]],
SoftmaxLayer["Input" -> {"Varying", 3}]},
"Output" -> NetDecoder[{"Class", {"a", "b", "c"}}]
];
(*Gate Recurrent*)
net2 = NetChain[{
embeddingLayer,
LongShortTermMemoryLayer[100],
GatedRecurrentLayer[60],
LongShortTermMemoryLayer[30],
GatedRecurrentLayer[10],
NetMapOperator[LinearLayer[3]],
SoftmaxLayer["Input" -> {"Varying", 3}]},
"Output" -> NetDecoder[{"Class", {"a", "b", "c"}}]
];
(Basic Recurrent)
net3 = NetChain[{
embeddingLayer,
LongShortTermMemoryLayer[100],
BasicRecurrentLayer[60],
LongShortTermMemoryLayer[30],
BasicRecurrentLayer[10],
NetMapOperator[LinearLayer[3]],
SoftmaxLayer["Input" -> {"Varying", 3}]},
"Output" -> NetDecoder[{"Class", {"a", "b", "c"}}]
];
(*Bidirectional*)
net4 = NetChain[{
embeddingLayer,
LongShortTermMemoryLayer[100],
NetBidirectionalOperator[{LongShortTermMemoryLayer[40],
GatedRecurrentLayer[40]}],
NetBidirectionalOperator[{LongShortTermMemoryLayer[20],
GatedRecurrentLayer[20]}],
LongShortTermMemoryLayer[10],
NetMapOperator[LinearLayer[3]],
SoftmaxLayer["Input" -> {"Varying", 3}]},
"Output" -> NetDecoder[{"Class", {"a", "b", "c"}}]
];
The embedding layer is used to change words into vectors that represent their semantic characteristics.
(*The embedding layer here*)
embeddingLayer = NetModel["GloVe 100-Dimensional Word Vectors Trained on Wikipedia and Gigaword 5 Data"]
With all those neural network models set up, we can train each neural network. To save time, I first trained all models with a small data set of only 3 million words to compare their behaviors.
(*train the neural network while saving the training object*)
NetTrain[net, trainingSet, All, ValidationSet -> validationSet]
### Test
Since this classification problem is a problem of a skewed dataset, that is, most of the words should have the tag "None", it doesn't make sense to use "accuracy" to measure the models' behavior. Even if it simply do nothing and always return "None", it will have a high accuracy that is the percentage of "None" in the whole tagging set. Instead, to evaluate the behavior of the models, we introduce the concept of precision, recall, and f1-score.
(*Precision and recall*)
precision = truePrediction/allTrue
recall = truePrediction/allPrediction
F1 = HarmonicMean[{precision, recall}]
![PR][4]
For a given test set, first, we want to remove its punctuations and run the trained model on it.
(*romve punctuation and run the model*)
noPuncTest = Keys /@ testSet
result = net["TrainedNet"] /@ noPuncTest;
Then we changed the tags to 1,2 and 0. And we calculate the elementwise product of realTag and resultTag. If an element is 4, it means that both the realTag and resultTag is 2, which counts as a successful prediction of a comma. An element of 2 represents a successful prediction of a period.
![Tag][5]
(*Change the tags to numerical values and count 1s and 4s*)
realTag = Replace[Flatten[Values /@ Take[testSet, 3252]], {"a" -> 1, "b" -> 2, "c" -> 0}, {1}];
resultTag = Replace[Flatten[result], {"a" -> 1, "b" -> 2, "c" -> 0}, {1}];
totalTag = realTag*resultTag;
Now we can use totalTag, resultTag, and realTag to calculate precision, recall, and f1-score.
(*Precision*)
PrecPeriod = N@Count[totalTag, 1]/Count[resultTag, 1]
PrecComma = N@Count[totalTag, 4]/Count[resultTag, 2]
(*Recall*)
RecPeriod = N@Count[totalTag, 1]/Count[realTag, 1]
RecComma = N@Count[totalTag, 4]/Count[realTag, 2]
(*F1*)
F1Period = (2*RecPeriod*PrecPeriod)/(RecPeriod + PrecPeriod)
F1Comma = (2*RecComma*PrecComma)/(RecComma + PrecComma)
## Result
Ten neural networks are trained based on a small dataset with different layers. Only using Long Short-Term Memory layers gives an f1 score of 13% and 11% for periods and commas. Introducing dropout parameters, pooling layers, elementwise layers, basic recurrent layers and gate recurrent layers all produce an f1 score between 10% and 30%, showing no significant improvement. Introduction of the bidirectional operator (combining two recurrent layers) improves the scores to 53% and 47%, and to 72% and 60% respectively when training on a larger dataset of 10M words.
Here are the results for the three different neural networks trained with a 3M small dataset, and bidirectional neural network (which has the best performance in the small dataset) trained with a larger dataset of 10M words. The first figure is of the period and the second is of the comma.
![Period][6]
![Comma][7]
We can easily observe the advantage of the bidirectional operator in terms of both periods and commas, precision and recall. Instead of the sequence to sequence learning, "tagging" is a significantly more efficient and accurate way to restore punctuation in plain text. Since every words' tags ("None", "Comma", "Period") is influenced by its context, it makes sense that recurrent neural networks and bidirectional operators show great potential in this research.
Generally, the recall score is significantly lower than the precision score, suggesting that the model generates too many punctuations than it should. This could be due to the dataset of Wikipedia which is not clean enough. In the Wikipedia text, sometimes there're equations, translations, or other strange characters that we simply delete. This changed the ratio of punctuations to words and produces some segments of text that is "full of" punctuations since all words are not recognized and simply deleted. One example of those "not clean segment" is shown below.
![wiki][8]
Also, the overall performance on commas is slightly worse than on periods. This also makes sense from a linguistics point of view. There seems to be a concrete linguistics set of rules for the period, but the usage of comma greatly depends on personal writing style. For example, you could say either *"I like apples but I don't like bananas."*, or *"I like apples, but I don't like bananas."* In this way, it's really hard to build a model for comma prediction with such high accuracy. But fortunately, sometimes adding commas or not doesn't really influence the overall meaning of the sentence. So it's okay to be tolerant to a slightly worse performance on commas.
## Future Works
70% f1-score is still not enough for the application. Planned future work focuses on improving accuracy to a level suitable for usage in industry. The most urgent and important future work is using a larger data size. We can observe great improvement when changing from 3M to 10M dataset, but it's still far less than enough.
![plot][9]
If we take a closer look at the evolution plots during training, we can see that the error rate and loss of training set are continuously decreasing, while the error rate and loss of the validation set soon reaches a stable state and doesn't change too much. The gap between those two curves suggests the possibility of overfitting, and it should greatly help if we introduce better and more data.
Also, punctuation restoration should not be limited to periods and comma. A more rigorous study of the question mark, exclamation mark, colon, and quotation mark is expected. However, we should note that the choice of most punctuations is not restricted to one possibility. In cases like distinguishing a period with an exclamation mark, we cannot expect a high f1-score. But it's still an interesting topic, may be useful for topics like sentimental analysis.
## Acknowledgement
I would like to thank the summer school for providing the environment and background skills for me to finish this project. Especially, I want to thank my mentor for helping me with neural network problems and debugging.
## Data and Reference
- [Wolfram Data Repository][10]
- [Wikipedia][11]
- Tilk O, et al. "Lstm for Punctuation Restoration in Speech Transcripts." Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech, 2015-January, 2015, pp. 683\687.
[1]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-10at7.10.22PM.png&userId=1362824
[2]: https://github.com/Shanmy/Summer2018Starter/tree/master/Project
[3]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-11at10.22.30AM.png&userId=1362824
[4]: http://community.wolfram.com//c/portal/getImageAttachment?filename=PR1.png&userId=1362824
[5]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-11at11.45.44AM.png&userId=1362824
[6]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-10at8.38.40PM.png&userId=1362824
[7]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-10at8.38.49PM.png&userId=1362824
[8]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-11at12.13.54PM.png&userId=1362824
[9]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-10at8.59.14PM.png&userId=1362824
[10]: https://datarepository.wolframcloud.com/category/Text-Literature/
[11]: https://www.wikipedia.orgMengyi Shan2018-07-11T17:20:45Z[WSS18] Analysis of Axon Expression Intensity from Images
http://community.wolfram.com/groups/-/m/t/1386698
Axon, also known as the nerve fiber, is a long, slender projection of a nerve cell, or neuron, that conducts electrical impulses known as action potentials, away from the nerve cell body. Although the connectivity of axons are crucial in signal transfer between neurons, its structural assembly and trend across the cortex is yet not widely investigated. By utilizing image processing techniques, we look at the structural trend and are able to quantify axon expressions, providing valuable data for further investigation of neuronal activity.
Background: What are Axons?
---------------------------
The neocortex, also called the neopallium and isocortex, is the part of the mammalian brain involved in higher-order brain functions such as sensory perception, cognition, generation of motor commands, spatial reasoning and language. The neocortex is the largest part of the cerebral cortex - outer layer of the cerebrum - in human brain.
![enter image description here][1]
The neocortex is made up of six layers, labelled from the outermost inwards, I to VI. Since different layers specialize in different activities, analysis of changing trend in axon density across the layers is crucial to understanding brain activity. Plasticity over longer distances means that a larger number of neural circuits can be achieved and implies a larger memory capacity per synapse (Fawcett and Geller, 1998, Chen et al., 2002, Papadopoulos, 2002).
Axons span many millimeters of cortical territory, and individual axons target diverse areas (Zhang and Deschenes, 1998). Thus, understanding the repertoire of axonal structural changes is fundamental to evaluating mechanisms of functional rewiring in the brain.
Images acquired from distinct axon arbors in adult barrel cortex of GFP transgenic mice were used in this project. Two-photon microscopy techniques were used along with SBEM(Serial Block-face scanning Electron Microscopy) techniques. Images were obtained after a series of surface scanning throughout the entire sample, then stacked for segmentation and quantification. Due to the size of the data, only the first section of layer 1 was analyzed in this project.
## Import Images ##
In order to carry out an image analysis with electron-microscope images, image data (real-value pixel sizes corresponding to each pixels) is needed.
First we want to assign pixel sizes corresponding to real image sizes:
xpixelsize = 512Quantity[1,"Micrometers"];
ypixelsize = 512 Quantity[1,"Micrometers"];
zstepsize = 293Quantity[1,"Micrometers"];
Then we import the TIF dataset from directory.
pic=Import@URLDownload["https://github.com/JihyeonJe/JJ-WSS18/raw/master/axon.tif"];
## Image Processing ##
After images are imported, 3D mesh is created for visualization of general structural trend throughout the entire stack. Maximum intensity projection is also generated from the stacks to aid the understanding of overall axon distribution.
To carry out density and volume calculations, images were binarized with given thresholds.
To get a general idea of the structure, we create a 3D mesh with the dataset:
image3D=Image3D[image,ColorFunction->"GrayLevelOpacity",BoxRatios->{1,1,1/3}];
resize = ImageResize[image3D, 170]
![enter image description here][2]
Then we binarize all the images with a set threshold:
binarized = Map[MorphologicalBinarize[#, {0.10, 0.4}]&,ImageAdjust/@image];
Now let's create maximum intensity projection from the previously created binarized images:
MIP = Image3DProjection[Image3D[binarized]]
![enter image description here][3]
Display mesh and temporal interpolation side by side for convenient analysis:
{Labeled[image3D,Text@"3D mesh"],Labeled[MIP,Text@"Maximum Intensity Projection"]}
![enter image description here][4]
## Calculate volume and density ##
With processed images, we are now able to calculate volume and density of the axons expressed in the images. This step is crucial in analyzing the volumetric intensity of axon expression in the sample.
Volume of total axons expressed were calculated by counting all non-zero elements in the binarized images and multiplying them by pixel sizes.
expressedvol = Count[Flatten[ImageData /@ binarized],1]*xpixelsize*ypixelsize*zstepsize
Then we get the value of 12353445560320 um^3.
Now calculate the volume of the entire image stack by counting image dimensions and multiplying it with pixel sizes:
totalvol = First[ImageDimensions[First[image]]]*xpixelsize*Last[ImageDimensions[First[image]]]*ypixelsize*Length[image]*zstepsize
This results in another volumetric quantity of 1208088401018880 um^3.
Using these two values, calculate the average axon volume:
denstiy = N[expressedvol/totalvol]Quantity[1,"Micrometers"]
Then we can get the output of average axon volume, 0.0102256 um^3.
By casting a sliding window across the entire image stack from top to bottom, the general trend of axon density was observed. Since connectivity between axons is a crucial in understanding brain activity, sliding window allows an analysis of the amount of shared data between axons across the brain.
slidingwindow[x_] := Count[Flatten[ImageData[x]],1]
SetAttributes[slidingwindow,Listable];
window = Total /@ MovingMap[slidingwindow, binarized, 2];
Now plot the graph:
Show[ListPlot[window/(First[ImageDimensions[First[image]]]* Last[ImageDimensions[First[image]]]*3), Joined -> True, PlotStyle->Thick, PlotLabel-> "Axon Density Across Z",LabelStyle->Directive[Bold], AxesLabel->"Axon Density (\!\(\*TemplateBox[{InterpretationBox[\"\[InvisibleSpace]\", 1],RowBox[{SuperscriptBox[\"\\\"\[Micro]m\\\"\", \"3\"]}],\"micrometers cubed\",SuperscriptBox[\"\\\"Micrometers\\\"\", \"3\"]},\n\"Quantity\"]\))" ]]
![enter image description here][5]
From the plotted graph we can see the general trend of axon density across the brain. For example, a local maxima(peak) at the region corresponding to the line of Gennari would indicate that the specific area is responsible for active neuronal signal transfer. When analyzed across the entire brain, such data can provide a novel understanding of axon expression and structures.
## References ##
Image data acquired from Diadem Challenge - Neocortical Layer Axon 1 : http://www.diademchallenge.org/neocortical_layer_ 1_axons _readme.html
De Paola V1 et al. (2006) Cell type-specific structural plasticity of axonal branches and boutons in the adult neocortex. Cold Spring Harbor Symp. Quant. Neruon. 49, 861-875. DOI: 10.1016/j.neuron.2006.02.017
Author Information:
Jihyeon Je (Western Reserve Academy, jej19@wra.net)
[1]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-18at4.41.29PM.png&userId=1352003
[2]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-18at4.44.56PM.png&userId=1352003
[3]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-18at4.47.25PM.png&userId=1352003
[4]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-18at4.48.15PM.png&userId=1352003
[5]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-18at4.52.59PM.png&userId=1352003Jihyeon Je2018-07-18T07:55:40ZFast curve building with linear solving method
http://community.wolfram.com/groups/-/m/t/1386853
Introduction
------------
We discuss the mechanics of yield curve bootstrapping and demonstrate how the use of linear solver speeds up the process. Once the market data for curve construction is available, the bootstrapping process is quite quick and the entire yield curve is obtained in a single step. This optimises the bootstrapping procedures and leads to an elegant and transparent solution. The auxiliary output - the so-called yield curve derivatives such as zero and forward rates can be generated on the fly.
![enter image description here][1]
The yield curve methodology
---------------------------
Yield curves are one of the most fundamental concepts in finance. Curves are basic building blocks for many financial instruments and they are key determinants of value creation due to the discounting effect. Yield curves - when built with derivative instruments - are essential for forward rates calculation. As such, they are critical component of the entire market for interest rate derivatives.
Yield curves are created through bootstrapping - a technique that converts market observable rates into 'zero-coupon' instruments. To be consistent and correct, the yield curve has to satisfy the following condition:
Sum[ c[[i]] g[[i]] DF[[i], {i,1, n-1}] + (1+c[[n]] g[[n]]) DF[[n]] ==1
where **c** is the fixed market rate, **g** is the year-fraction and **DF** stands for discount factor that we are trying to bootstrap
When we choose the 'fixed leg' method, the above expression guarantees that all instruments on the yield curve will price to par. This ensures that discount factors are correctly calculated.
The curve equation expression above ensures the consistency for a single curve pillar - i.e. the maturity point on the curve When $m$ such instruments are lined up together, we obtain a matrix of curve equations that we can solve with linear solver:
Curve bootstrapping - example
-----------------------------
We demonstrate the curve building principles with the following simple example:
- Randomize market data for the curve construction
- Visualise the data as the curve input
dim=30;
t=Table[0.5,{dim}];
act=Accumulate[t];
r=Table[1,{dim}];
c=0.0055+RandomReal[{0.005,0.009}] Sqrt[act];
td=TemporalData[c,{act}];
ListLinePlot[td,PlotStyle->{Blue, Thickness[0.008]},GridLines->{Automatic,Automatic},PlotLabel->Style["Yield curve market quotes",Purple,15]]
![enter image description here][2]
Using the market data above, we construct matrix of cash flows and use **LinearSolve** method to obtain the required discount factors.
z=Table[If[i<j,c[[i]]*t[[i]],If[i==j,r[[i]]+c[[i]]*t[[i]],0]],{j,1,dim},{i,1,dim}];
df=LinearSolve[z,r];
td2=TemporalData[df,{act}];
ListLinePlot[td2,PlotLabel->Style["Discount factors",Blue,15],GridLines->{Automatic, Automatic},PlotStyle->{Red,Thickness[0.009]}]
![enter image description here][3]
The obtained discount factors are unique, monotonically decreasing and in right order. Having discount factors computed allows us to obtain all kind of measures that depend on discount factors:
1. **Zero-coupon rates**
These are the unique rates that pay out single cash flow at maturity and are essentially 'inverted' DF
$zero rate = Log[1/DF]/T$
zc=Log[1/df]/act;
td3=TemporalData[zc,{act}];
ListLinePlot[td3,PlotLabel->Style["Zero rates",Blue,15],GridLines->{Automatic, Automatic},PlotStyle->{Magenta,Thickness[0.009]}]
![enter image description here][4]
2. **Forward rates**
Forward rates are 'special' as they are rates observed today but starting in the future. They are market expectations of future short-term rates and
therefore important indicators of future level of interest rates in the economy.
We calculate a series of 3-monthly forward rates from the discount factors as follows:
forward_rate[t1,t2] =( DF[t1]/DF[t2]-1)/g[t1,t2]
where $t1$ and $t2$ are times in the future, $DF$ are discount factors at time point $t1$ and $t2$ and $g[t1,t2]$ is the year fraction between
time point $t1$ and $t2$, with $t2 >t1$
In order to compute forward rates, we need to introduce a simple function that computes the business day. That can be easily done using the built-in functional components:
OpenDay[start_,incr_,itype_]:=NestWhile[DatePlus[#,1]&,DatePlus[start,{incr,itype} ],Not[BusinessDayQ[#]]&]
and we can test it
OpenDay[Today,3,"Day"]
and get the correct day - Monday, 23rd July 18
To compute forward rates, we first generate forward dates, calculate the time interval, interpolate DF from the DF object and then use the forward rate formula to get the rate:
intdf=Interpolation[td2,Method->"Spline"];
fwdates=Table[OpenDay[Today,i,"Month"],{i,3,60,3}];
dd=DateDifference[%,"Year",DayCountConvention->"Actual360"];
Differences[dd];
kd=Mean[%][[1]];
fwrates=Table[(intdf[i]/intdf[i+kd]-1)/kd,{i,kd,60 kd, kd}]//Quiet;
td4=TemporalData[fwrates,{Range[kd,60 kd, kd]}];
ListLinePlot[td4,PlotTheme->"Web",PlotLabel->Style["Smooth Forward rates",Blue,15]]
![enter image description here][5]
We have obtained smooth forward rates from the generated discount factors - this is an evidence that the discount factors were properly calculated and the choice of interpolation was appropriate.
When the yield curve is upward sloping, then the mathematics of forward rates will cause forward rates being higher than the zero rates. We can see this is the case:
ListLinePlot[{td3,td4},PlotTheme->"Web",PlotLabel->Style["Zero and Forward rates",Blue,15], PlotLegends->{"Zero", "Forward"}]
![enter image description here][6]
Real-case example
-----------------
Having explained the yield curve bootstrapping methodology, we can now move to the real-case scenario where we will the actual market data to construct the Australian Dollar (AUD) curve. We use deposit rates in the short-end and the swap rates in the mid-to-long end of the curve with maturity up to 30 years. The swap will pay out semi-annually.
pmfreq=2;
pildates={OpenDay[Today,1, "Month"],OpenDay[Today,2, "Month"], Table[OpenDay[Today, 3 i, "Month"],{i,4}],OpenDay[Today,18, "Month"], Table[OpenDay[Today, i, "Year"],{i,2,10 }], OpenDay[Today,12, "Year"],OpenDay[Today,15, "Year"],OpenDay[Today,20, "Year"],OpenDay[Today,25, "Year"],OpenDay[Today,30, "Year"]}//Flatten;
pilyrs=DateDifference[pildates ,"Year",DayCountConvention->"Actual365"][[All,1]] ;
crvQ={0.0195,0.0197,0.0201,0.02034,0.02045,0.02056,0.0206,0.0208,0.0212,0.0224,0.0245,0.0259,0.0267,0.0274,0.0279,0.0284,0.0292,0.0305,0.03097,0.0312,0.0323};
tdQ=TemporalData[crvQ,{pilyrs}];
intQ=Interpolation[tdQ ,Method->"Spline"];
crvD=Length[crvQ];
cfdates={Today,OpenDay[Today,1, "Month"],OpenDay[Today,2, "Month"], Table[OpenDay[Today, 3 i, "Month"],{i,4}],Table[OpenDay[Today, (12/pmfreq ) i, "Month"],{i,3,30 pmfreq }]}//Flatten;
cfyrs=Drop[DateDifference[cfdates ,"Year",DayCountConvention->"Actual365"][[All,1]],1];
cfyD=Length[cfyrs ];
intrates=intQ [cfyrs ];
ListLinePlot[intrates,PlotLabel->Style["AUD yield curve term structure",Blue,{15,Bold}],PlotStyle->{Magenta, Thick}]
![enter image description here][7]
We first split the market data into (i) deposits and (ii) swap segments:
depoyrs={Take[ cfyrs ,3],Differences[Take[cfyrs,{3,6}]]}//Flatten;
fraAv=Take[depoyrs,-4]//Mean;
swapData=Inner[{#1,#2}&,Drop[cfyrs,6] ,Drop[intrates ,6],List];
swapyrs={0,cfyrs[[4]],cfyrs[[6]] ,Take[cfyrs,{7,cfyD}]}//Flatten;
swyfrac=Differences[swapyrs ];
swyAv=Mean[swyfrac];
swapD=Length[swapData];
and built the matrix of linear equations that we solve for the discount factors:
zTab1=Table[If[i==j,1+intrates[[j]] If[i<=2,depoyrs [[i]],fraAv ],0],{j,6},{i,cfyD}];
zTab2=Table[If[MemberQ[swapyrs,cfyrs[[i]]],If[cfyrs[[i]]<swapData[[j,1]],swapData[[j,2]] swyAv ,If[cfyrs[[i]]==swapData[[j,1]],1+swapData[[j,2]] swyAv ,0]],0],{j,swapD},{i,1,cfyD}];
zTab0=Join[zTab1,zTab2];
b1Vec=ConstantArray[1,cfyD];
dfact=LinearSolve[zTab0 ,b1Vec ];
tdDf=TemporalData[dfact,{cfyrs}]//Quiet;
ListLinePlot[tdDf ,PlotLabel->Style["Generated Discount factors",Blue,15],PlotStyle->{Thickness[0.01],Purple},PlotRange->{Full,{1,0.3}},InterpolationOrder->2]
![enter image description here][8]
This is the actual AUD yield curve built from derivative instruments that can be used to value various AUD instruments.
Curve smoothing
---------------
Since we use the actual market data, the generated discount factors are not as smooth as in our first example. This is particularly visible in the short-end of the curve. The existence of kinks is due to several factors - different segments of the the curve are being traded by different desks and equilibrium market rates are obtained through different price discovery. If curve kinks and peaks are undesirable, there exist several techniques to smooth them sway and we look at *linear filters* that can help to accomplish this task. The built-in Mathematica **GaussianFilter** is particularly useful as it provides required level of control to manipulate 'noisy' data.
For example, applying GaussianFilter with radius $r$ = 3 and standard deviation $sigma$ =2 provide nice and smooth output the the 'kinky' segment of the curve:
iDF=Interpolation[tdDf,Method->"Spline"];
tdDF2=TemporalData[Map[iDF,Range[0.25,30,0.25]] ,{Range[0.25,30,0.25]}];
orD=iDF/@Range[0.25,30,0.25];
flD=GaussianFilter[orD,{3,2}];
ListLinePlot[{Take[orD ,10],Take[flD,10]},PlotLabel->Style["Curve correction with Gaussian filter",Blue,15],PlotLegends->{"Original", "Filtered"},PlotStyle->{Thickness[0.008],Thickness[0.008]},InterpolationOrder->3]
![enter image description here][9]
The filtered output produces decent curve with smoother short-end that, if needed, can be used to construct all types of derivatives with appropriate rates. It has to be noted that the filter can be applied both to (i) discount factors and/or (ii) forward rates - depending on the required objective. Generally, the higher the radius, the smoother the output.
Conclusion
----------
The usage of linear solver and its application in curve bootstrapping leads to fast and accurate curve construction with unique discount factors. The function can be easily applied to any environment where the term structure of market data is duly defined. We have also demonstrated the use of filtering technique for curve smoothing with built-in Mathematica filters that provide rexcellent tool for output manipulation if the curve kinks become an issue.
[1]: http://community.wolfram.com//c/portal/getImageAttachment?filename=G10.png&userId=387433
[2]: http://community.wolfram.com//c/portal/getImageAttachment?filename=G1.png&userId=387433
[3]: http://community.wolfram.com//c/portal/getImageAttachment?filename=G2.png&userId=387433
[4]: http://community.wolfram.com//c/portal/getImageAttachment?filename=G3.png&userId=387433
[5]: http://community.wolfram.com//c/portal/getImageAttachment?filename=G4.png&userId=387433
[6]: http://community.wolfram.com//c/portal/getImageAttachment?filename=G5.png&userId=387433
[7]: http://community.wolfram.com//c/portal/getImageAttachment?filename=G6.png&userId=387433
[8]: http://community.wolfram.com//c/portal/getImageAttachment?filename=G7.png&userId=387433
[9]: http://community.wolfram.com//c/portal/getImageAttachment?filename=G9.png&userId=387433Igor Hlivka2018-07-18T14:58:28ZLive code templates
http://community.wolfram.com/groups/-/m/t/1273720
## Background
I enjoy coding in the FrontEnd (except it crashes and lookup across files does not exist), but I often miss 'hands on keyboard', customizable code templates.
E.g. I often forget to wrap an option name with quotes "_" or I'm starting a new function and would like to avoid retyping `Attributes/Options` `Catch/Check` etc.
I don't like palettes for something that I need to do quickly and frequently.
So I created a little package/stylesheet, should work on Win/MacOs with MMA 10.4+
https://github.com/kubaPod/DevTools
In case you are interested and/or have any ideas about this / similar features, let me know here or create an Issue in GitHub.
Topic cross posted on Mathematica.stackexchange: https://mathematica.stackexchange.com/q/164653/5478
## Setup
(*additional package I use to install github assets' paclets,
you can download .paclet manually if you want
*)
Import["https://raw.githubusercontent.com/kubapod/mpm/master/install.m"]
Needs["MPM`"]
(*installing the package*)
MPMInstall["kubapod", "devtools"]
(*changing default .m stylesheet to a dev's stylesheet*)
CurrentValue[$FrontEnd, "DefaultPackageStyleDefinitions"] =
FrontEnd`FileName[{"DevTools", "DevPackage.nb"}]
(*test*)
FrontEndTokenExecute["NewPackage"]
## How to:
- <kbd>Ctrl</kbd>+<kbd>1</kbd> to open a menu
- navigate with arrows and hit enter/return or hit a shortkey like <kbd>n</kbd>
/ <kbd>{</kbd> / <kbd>[</kbd>
## Customization
Once you setup a new stylesheet the package should have an additional toolbar with 'Edit code templates' button on the top right. Click on it and a user's templates file should open.
It is just a .m file with a header that should explain everything. It will be improved in future.
## Example
[![enter image description here][1]][1]
There is also a dark one based on a build-in ReversedColors.nb stylesheet:
CurrentValue[$FrontEnd, "DefaultPackageStyleDefinitions"
] = FrontEnd`FileName[{"DevTools", "DevPackageDark.nb"}]
[![enter image description here][2]][2]
[1]: https://i.stack.imgur.com/v81cV.gif
[2]: https://i.stack.imgur.com/g96TY.gifKuba Podkalicki2018-01-28T18:00:23Z