Community RSS Feed
http://community.wolfram.com
RSS Feed for Wolfram Community showing any discussions in tag Machine Learning sorted by activeDoes inter-pixel distance change after downsampling?
http://community.wolfram.com/groups/-/m/t/1390548
Hi guys!
I have a question. So I am trying to implement SIFT.
My question is regarding inter-pixel distance. I think after downsampling the number of pixels change but the inter-pixel distance remains the same. But I am reading the paper 'Anatomy of SIFT' which states that after downsampling(subsampling) by 2 the inter-pixel distance doubles.
Am I missing something?
Thanks!Subhankar Ghosh2018-07-23T06:16:39Z[WSC18] Classifying & Converting the Major Currencies By Using Neural Nets
http://community.wolfram.com/groups/-/m/t/1382598
![enter image description here][1]
#Introduction
Throughout the years, I've traveled around the world experiencing new places, eating great food, and learning about other cultures. In order to do some of these things, I needed to use the national currency of the country I was in. Whether it was using Euros in France to buy some macarons, or using the Pound in England to buy tea, I noticed that each currency is unique and its design is a part of the country's identity. My project allows a user to input as many images of different coins, and select a currency to convert them all to. The Microsite identifies what country the coin is from, the name of the currency, the value in native currency, and the converted value.
#Gathering Data
Since the Wolfram database didn't have a large dataset of foreign coin images, we had to gather some data manually. We started out by using WebImageSearch and keywords to pull images off of Bing. Since coin designs change every couple years we used en.ucoin.com as a reference as to what the current design was. Then, we edited the set of images that the WebImageSearch gave me. We then exported the relevant images to a folder named after the coin's country of origin, and named the file after its name and value. The files were saved as .wxf. We did this process for all 42 coins and additionally pulled more data from Google Images.
![enter image description here][2]
#Different Approaches to Feeding the Neural Network
Creating the right data set and matching it with a network took 4 attempts
##Attempt 1
For the first try, we created training data by joining all of the data saved in the folders (at the time this was just Euros and Canadian Dollars) and randomized it. We threaded each folder and gave the images labels of the country. We tried to use the Classify function but found that it was not complex enough.
fullEUData =
Thread[Flatten[
Map[ImageResize[#, {50, 50}] &, Import[#]] & /@
FileNames["*", "coin_data/euro"]] -> "EU"];
fullCanadianData =
Thread[Flatten[
Map[ImageResize[#, {50, 50}] &, Import[#]] & /@
FileNames["*", "coin_data/canadian"]] -> "Canadian"];
trainingData = RandomSample@Join[fullEUData, fullCanadianData];
cf = Classify[trainingData]
![enter image description here][3]
##Attempt 2
Since I didn't have the skills to build my own neural network, my mentor suggested we use the ImageIdentify net. We were able to modify the net to fit the goal of the project and had great results with the net. However, during this attempt we found that the net was depending on the transparent or white backgrounds of the pure images.
![enter image description here][4]
##Attempt 3
After analyzing the results of the 2nd attempt, we decided to process and format the images. We removed the backgrounds from each image, randomized the brightness, contrast, and angles, and layered the coins onto a background of blended color and noise.
preprocess[image_ (*Removes background*)
] :=
ImagePad[RemoveBackground@image, 5, Padding -> None]
background // Clear;
background := (*Creates a random background*)
Blend[{ConstantImage[RandomColor[], {224, 224}],
RandomImage[1, {224, 224}, ColorSpace -> "RGB"]}, RandomReal[]];
randomLighting[
image_] := (*randomizes brightness and contrast of image*)
ImageAdjust[image, RandomReal[{-.25, .25}, 2]];
overlay[coin_Image, background_Image] := (*Layers everything together*)
ImageCompose[
ImageCrop[background, {224, 224}]
,
ImagePerspectiveTransformation[
randomLighting@ImageResize[coin, RandomReal[{60, 260}]],
IdentityMatrix[2] + RandomReal[{-.75, .75}, {2, 2}],
DataRange -> Full,
Background -> White
]
,
{RandomReal[{60, 190}], RandomReal[{70, 180}]}
]
For this attempt we also created data using a generator rather than having a pre made dataset. This helped the net train much quicker and was great at identifying 3 currencies. However, when we increased the number of currencies to 7, it struggled.
![enter image description here][5]
##Attempt 4 / Final Attempt
In the last 3 attempts we had assigned the images from each country the same label, ex: "USA" or "UK". We realized that the net could get confused due to thinking up to 7 different coins were all the same thing. So, we changed the labels from country, to country & value. Ex: "USA_0.25". Then, we joined all of the countries data and that was the fullOriginalData set. This was the final attempt and worked the best because each image of a specific value and country had its own label, therefore avoiding confusion for the neural net.
fullOriginalData =
RandomSample[
Join[fullCanadianData, fullEUData, fullUKData, fullChinaData,
fullJapanData, fullSwissData, fullUSData, fullUSData]];
We used the same process to format the images with backgrounds, but instead of using a generator to create data, we used a pre-made set. In order to create test data we split the fullData 80%, 20%.
createRandomData // ClearAll;
createRandomData[coin_ -> label_, background_] :=
Thread[overlay[preprocess@coin, background] -> label];
data1 = Table[createRandomData[RandomChoice[fullOriginalData], background],
Length[fullOriginalData]];
data2 = Table[createRandomData[RandomChoice[fullOriginalData], background],
Length[fullOriginalData]];
(*Random group of data*)
fullData = RandomSample@Flatten@Join[data1, data2];
Later on, we increased the data set from 900 to 5,000.
trained =
NetTrain[new, trainingData, ValidationSet -> testdata,
BatchSize -> 20, MaxTrainingRounds -> 30]
##Accuracy
The results of the neural net were extremely accurate. With a 99.58% accuracy rate on training data, and a 99.52% accuracy rate on test data, the program is able to make an accurate guess every time.
![enter image description here][6]
##Assigning the Output with a List of Characteristics
The net outputs the label which gives the country and value, ex: "USA_0.25". In order to have the microsite's output look better, we created a list of lists that included written out characteristics.
assignments =
Association[ {"CAN_1" -> {"Canada", "CanadianDollars", 1.00},
"CAN_2" -> {"Canada", "CanadianDollars", 2.00},
"CAN_0.50" -> {"Canada", "CanadianDollars", 0.50},
"CAN_0.25" -> {"Canada", "CanadianDollars", 0.25},
"CAN_0.10" -> {"Canada", "CanadianDollars", 0.10},
"CAN_0.05" -> {"Canada", "CanadianDollars", 0.05},
"CAN_0.01" -> {"Canada", "CanadianDollars", 0.01},
"EU_1" -> {"European Union" , "Euros", 1.00},
"EU_0.50" -> {"European Union", "Euros", 0.50},
"EU_0.20" -> {"European Union", "Euros", 0.20},
"EU_2" -> {"European Union", "Euros", 2.00},
"EU_0.10" -> {"European Union", "Euros", 0.10},
"EU_0.05" -> {"European Union", "Euros", 0.05},
"EU_0.02" -> {"European Union", "Euros", 0.02},
"EU_0.01" -> {"European Union", "Euros", 0.01},
"UK_1" -> {"United Kingdom", "BritishPounds", 1.00},
"UK_2" -> {"United Kingdom", "BritishPounds", 2.00},
"UK_0.50" -> {"United Kingdom", "BritishPounds", 0.50},
"UK_0.20" -> {"United Kingdom", "BritishPounds", 0.20},
"UK_0.10" -> {"United Kingdom", "BritishPounds", 0.10},
"UK_0.05" -> {"United Kingdom", "BritishPounds", 0.05},
"UK_0.02" -> {"United Kingdom", "BritishPounds", 0.02},
"UK_0.01" -> {"United Kingdom", "BritishPounds", 0.01},
"CHN_1" -> {"China", "ChineseYuan", 1.00},
"CHN_0.50" -> {"China", "ChineseYuan", 0.50},
"CHN_0.10" -> {"China", "ChineseYuan", 0.10},
"JPN_1" -> {"Japan", "Yen", 1}, "JPN_5" -> {"Japan", "Yen", 5},
"JPN_10" -> {"Japan", "Yen", 10},
"JPN_50" -> {"Japan", "Yen", 50},
"JPN_100" -> {"Japan", "Yen", 100},
"JPN_500" -> {"Japan", "Yen", 500},
"SUI_5" -> {"Switzerland", "SwissFrancs", 5.00},
"SUI_2" -> {"Switzerland", "SwissFrancs", 2.00},
"SUI_1" -> {"Switzerland", "SwissFrancs", 1.00},
"SUI_0.50" -> {"Switzerland", "SwissFrancs", 0.50},
"SUI_0.20" -> {"Switzerland", "SwissFrancs", 0.20},
"SUI_0.05" -> {"Switzerland", "SwissFrancs", 0.05},
"USA_0.25" -> {"United States", "USDollars", 0.25},
"USA_0.10" -> {"United States", "USDollars", 0.10},
"USA_0.05" -> {"United States", "USDollars", 0.05},
"USA_0.01" -> {"United States", "USDollars", 0.01}
}];
The second element in the sublist is the name of the currency saved in Mathematica. This string is an input for the function CurrencyConvert. Not only does the Microsite list this value as the name of the currency, but it also fetches it to use in the conversion feature.
##Deploying the Microsite
Using CloudDeploy and FormFunction, we made a microsite that allows the user to input as many images as they'd like from the 7 currencies into the drop box. The site also allows the user to select one of the 7 currencies in the dropdown menu to covert all of the coins to. The output gives the image, country, name of currency, value in native currency, and value in converted currency all in table form. At the bottom of the page the output also neatly displays the total in original currencies, and the total in the converted currency.
![enter image description here][7]
![enter image description here][9]
##Future Work
In the future, this project can be expanded by:
Adding all of the currencies in the world
, Since the design changes often, date the currencies by years
, Use this to create a mobile app that allows the user to easily take a photo of the coins they have.
##Acknowledgements
This project could not have been completed without the help and insight from my mentor Rick Hennigan
[Click Here to view the Microsite][10]
##Computational Essay Is Down Below
[1]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-13at1.11.38PM.png&userId=1371841
[2]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-13at1.22.37PM.png&userId=1371841
[3]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-13at1.35.35PM.png&userId=1371841
[4]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-13at1.38.33PM.png&userId=1371841
[5]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-13at1.45.54PM.png&userId=1371841
[6]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-13at1.59.43PM.png&userId=1371841
[7]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-12at7.01.17PM.png&userId=1371841
[8]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-12at7.01.35PM.png&userId=1371841
[9]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-13at2.09.09PM.png&userId=1371841
[10]: https://www.wolframcloud.com/objects/kennethleeny/CoinIdentifierMorgan Lee2018-07-13T18:12:01Z[WSC18] Music Sentiment Analysis through Machine Learning
http://community.wolfram.com/groups/-/m/t/1383518
![A Representation of the emotion categorization system][1]
----------
#Abstract
This project aims to develop a machine learning application to identify the sentiments in a music clip. The data set I used consists of one hundred 45-second clips from the Database for Emotional Analysis of Music and an additional 103 gathered by myself. I manually labeled all 203 clips and used them as training data for my program. This program works best with classical-style music, which is the main component of my data set, but also works with other genres to an reasonable extent.
#Introduction
One of the most important functions of music is to affect emotion, but the experience of emotion is ambiguous and subjective to individual. The same music may induce a diverse range of feelings in people as a result of different context, personality, or culture. Some underlying features and elements of music, however, usually lead to the same effect on the human brain. For example, louder music often leads to more intense emotional responses from people. This consistency provides a foundation to train a supervised machine learning program based on music feature extraction.
#Background
This project is based on James Russell's circumplex model, in which a two-dimensional emotion space is constructed from the x-axis of valence level and y-axis of arousal level, as shown above in the picture. Specifically, valence is a measurement of an emotion's pleasantness, whereas arousal is a measurement of an emotion's intensity. Russell's model provides a metric on which different sentiments can be compared and contrasted, creating four main categories of emotion: Happy (high valence, high arousal), Stressed (low valence, high arousal), Sad (low valence, low arousal), and Calm (high valence, low arousal). Within these main categories there are various sub-categories, labeled on the graph above. Notably, "passionate" is a sub-category that does not belong to any main category due to its ambiguous valence value.
----------
#Program Structure
The program contains a three-layer structure. The first layer is responsible for extracting musical features, the second for generating a list of numerical predictions based on different features, and the third for predicting and displaying the most probable emotion descriptors based on the second layer's output.
![enter image description here][2]
##First Layer
The first layer consists of 23 feature extractors that generate numerical sequence based on different features:
(*A list of feature extractors*)
feMin[audio_] := Normal[AudioLocalMeasurements[audio, "Min", List]]
feMax[audio_] := Normal[AudioLocalMeasurements[audio, "Max", List]]
feMean[audio_] := Normal[AudioLocalMeasurements[audio, "Mean", List]]
feMedian[audio_] := Normal[AudioLocalMeasurements[audio, "Median", List]]
fePower[audio_] := Normal[AudioLocalMeasurements[audio, "Power", List]]
feRMSA[audio_] := Normal[AudioLocalMeasurements[audio, "RMSAmplitude", List]]
feLoud[audio_] := Normal[AudioLocalMeasurements[audio, "Loudness", List]]
feCrest[audio_] := Normal[AudioLocalMeasurements[audio, "CrestFactor", List]]
feEntropy[audio_] := Normal[AudioLocalMeasurements[audio, "Entropy", List]]
fePeak[audio_] := Normal[AudioLocalMeasurements[audio, "PeakToAveragePowerRatio", List]]
feTCent[audio_] := Normal[AudioLocalMeasurements[audio, "TemporalCentroid", List]]
feZeroR[audio_] := Normal[AudioLocalMeasurements[audio, "ZeroCrossingRate", List]]
feForm[audio_] := Normal[AudioLocalMeasurements[audio, "Formants", List]]
feHighFC[audio_] := Normal[AudioLocalMeasurements[audio, "HighFrequencyContent", List]]
feMFCC[audio_] := Normal[AudioLocalMeasurements[audio, "MFCC", List]]
feSCent[audio_] := Normal[AudioLocalMeasurements[audio, "SpectralCentroid", List]]
feSCrest[audio_] := Normal[AudioLocalMeasurements[audio, "SpectralCrest", List]]
feSFlat[audio_] := Normal[AudioLocalMeasurements[audio, "SpectralFlatness", List]]
feSKurt[audio_] := Normal[AudioLocalMeasurements[audio, "SpectralKurtosis", List]]
feSRoll[audio_] := Normal[AudioLocalMeasurements[audio, "SpectralRollOff", List]]
feSSkew[audio_] := Normal[AudioLocalMeasurements[audio, "SpectralSkewness", List]]
feSSlope[audio_] := Normal[AudioLocalMeasurements[audio, "SpectralSlope", List]]
feSSpread[audio_] := Normal[AudioLocalMeasurements[audio, "SpectralSpread", List]]
feNovelty[audio_] := Normal[AudioLocalMeasurements[audio, "Novelty", List]]
<br/>
##Second Layer
Using data generated from the first layer, the valence and arousal predictors of the second layer provide 46 predictions for the audio input, based on its different features.
(*RMSAmplitude*)
(*Feature extractor*) feRMSA[audio_] := Normal[AudioLocalMeasurements[audio, "RMSAmplitude", List]]
dataRMSA = Table[First[takeLast[feRMSA[First[Take[musicFiles, {n}]]]]], {n, Length[musicFiles]}];
(*Generating predictor*) pArousalRMSA = Predict[dataRMSA -> arousalValueC]
![Sample predictor function][3]
<br/>
##Third Layer
The two parts of the third layer, main category classifier and sub-category classifier, each utilize the tensors generated in the second layer to make a prediction within their realm of emotion. The output consists of two parts, a main category emotion and a sub-category emotion.
(*Main*) emotionClassify1 = Classify[classifyMaterial -> emotionList1, PerformanceGoal -> "Quality"]
(*Sub*) emotionClassify2 = Classify[classifyMaterial -> emotionList2, PerformanceGoal -> "Quality"]
![enter image description here][4]
<br/>
##Output
If the program receives an input that is longer than 45 second, it will automatically clip the audio file into 45 second segments and return the result for each. If the last segment is less than 45 seconds, the program would still work fine on it, though with reduced accuracy. The display for each clip includes a main-category and a sub-category descriptor, with each of their associated probability also printed.
###Sample testing: Debussy's Clair de Lune
![enter image description here][5]
<br/>
----------
#Conclusion
The program gives very reasonable result for most music in the classical style. However, the program have three shortcomings that I plan to fix in later versions of the this program. Firstly, the program may give contradictory result (ex. happy and depressed) if the sentiment dramatically changes in the middle of a 45 second segment, perhaps reflecting the music's changing emotional composition. The current 45 second clipping window is rather long and thus prone to capture contradicting emotions. In the next version of this program, the window will probably be shortened to 30 or 20 seconds to reduce prediction uncertainty. Secondly, the program's processing speed has a lot of room of improvement. It currently takes about one and half minutes to compute an one minute audio file. In future versions I will remove relative ineffective feature extractors to speed things up. Lastly, the data used in creating this application is solely from myself, and therefore it is prone to my human biases. I plan to expand the data set with more people's input and more genres of music.
I have attached the application to this post so that everyone can try out the program.
#Acknowledgement
I sincerely thank my mentor, Professor Rob Morris, for providing invaluable guidance to help me carry out the project. I also want to thank Rick Hennigan for giving me crucial support with my code.
[1]: http://community.wolfram.com//c/portal/getImageAttachment?filename=8714Emotion2DSpace.PNG&userId=1371765
[2]: http://community.wolfram.com//c/portal/getImageAttachment?filename=2406DataStructure.PNG&userId=1371765
[3]: http://community.wolfram.com//c/portal/getImageAttachment?filename=8990capture1.PNG&userId=1371765
[4]: http://community.wolfram.com//c/portal/getImageAttachment?filename=5765capture2.PNG&userId=1371765
[5]: http://community.wolfram.com//c/portal/getImageAttachment?filename=9894capture3.PNG&userId=1371765William Yicheng Zhu2018-07-14T02:40:20Z[WSC18] Predicting the Halting Problem with Machine Learning
http://community.wolfram.com/groups/-/m/t/1384403
# A Machine Learning Analysis of the Halting Problem over the SKI Combinator Calculus
![A rasterised SK combinator with length 50, evaluated to 5 steps][16]
## Abstract
Much of machine learning is driven by the question: can we learn what we cannot compute? The learnability of the halting problem, the canonical undecidable problem, to an arbitrarily high accuracy for Turing machines was proven by Lathrop. The SKI combinator calculus can be seen as a reduced form of the untyped lambda calculus, which is Turing-complete; hence, the SKI combinator calculus forms a universal model of computation. In this vein, the growth and halting times of SKI combinator expressions is analysed and the feasibility of a machine learning approach to predicting whether a given SKI combinator expression is likely to halt is investigated.
## 1. SK Combinators
What we will refer to as 'SK Combinators' are expressions in the SKI combinator calculus, a simple Turing-complete language introduced by Schönfinkel (1924) and Curry (1930). In the same way that NAND gates can be used to construct any expression in Boolean logic, SK combinators were posed as a way to construct any expression in predicate logic, and being a reduced form of the untyped lambda calculus, any functional programming language can be implemented by a machine that implements SK combinators. While implementations of this language exist, these serve little functional purpose - instead, this language, a simple idealisation of transformations on symbolic expressions, provides a useful tool for studying complex computational systems.
### 1.1 Rules and Expressions
Formally, SK combinator expressions are binary trees whose leaves are labelled either '*S*', '*K*' or '*I*': each tree *(x y)* represents a function *x* applied to an argument *y*. When the expression is evaluated (i.e. when the function is applied to the argument), the tree is transformed into another tree, the 'value'. The basic 'rules' for evaluating combinator expressions are given below:
*k[x][y] := x*
The K combinator or 'constant function': when applied to *x*, returns the function *k[x]*, which when applied to some *y* will return *x*.
*s[x][y][z] := x[z][y[z]]*
The S combinator or 'fusion function': when applied to *x, y, z*, returns *x* applied to *z*, which is in turn applied to the result of *y* applied to *z*.
*i[x] := x*
The I combinator or 'identity function': when applied to *x*, returns *x*.
Note that the I combinator *I[x]* is equivalent to the function *S[K][a][x]*, as the latter will evaluate to the former in two steps:
*S[K][a][x]*
*= K[x][a[x]]*
*= x*
Thus the I combinator is redundant as it is simply 'syntactic sugar' - for the purposes of this exploration it will be ignored.
These rules can be expressed in the Wolfram Language as follows:
SKRules={k[x_][y_]:> x,s[x_][y_][z_]:> x[z][y[z]]}
### 1.2 Evaluation
The result of applying these rules to a given expression is given by the following functions:
SKNext[expr_]:=expr/.SKRules;
Returns the next 'step' of evaluation of the expression *expr* - evaluating all functions in *expr* according to the rules above without evaluating any 'new'/transformed functions.
SKEvaluate[expr_,n_]:=NestList[#1/.SKRules&,expr,n];
Returns the next *n* steps of evaluation of the expression *expr*
SKEvaluateUntilHalt[expr_,n_] := FixedPointList[SKNext,expr,n+1];
Returns the steps of evaluation of *expr* until either it reaches a fixed point or it has been evaluated for n steps, whichever comes first.
Note that, due to the Church-Rosser theorem (Church and Rosser, 2018), the order in which the rules are applied does not affect the final result, as long as the combinator evaluates to a fixed point / 'halts'. For combinators with no fixed point, which do not halt, the behaviour demonstrated as they evaluate could change based on the order of application of the rules - this is not explored here and is a topic for potential future investigation.
### 1.3 Examples
The functions above can be used to evaluate a number of interesting SK combinator expressions:
Column[SKEvaluateUntilHalt[s[k][a][x],10][[1;;-2]]]
[//]: # (No rules defined for Output)
The *I* combinator
Column[SKEvaluateUntilHalt[s[k[s[i]]][k][a][b],10][[1;;-2]]]
[//]: # (No rules defined for Output)
The reversal expression - *s[k][s[i]][k][a][b]* takes two terms, *a* and *b*, and returns *b[a]*.
## 2. Growth and Halting
### 2.1 Halting and Related Works
We will define a combinator expression to have halted if it has reached a fixed point - i.e. if no combinators in the expression can be evaluated, or if evaluating any of the combinators in the expression returns the original expression. As SK combinators are Turing-complete and so computationally universal, it is evident that the halting problem - determining whether or not a given SK combinator expression will halt - is undecidable for SK combinators. There are, however, patterns and trends in the growth of SK combinators, and it is arguably possible to speak of the probability of a given SK combinator expression halting.
Some investigations (Lathrop 1996) and (Calude and M. Dumitrescu 2018) have been made into probabilistically determining the halting time of Turing machines, with [2] proving that it is possible to compute some value K where for some arbitrary predetermined confidence *(1-\[Delta])* and accuracy *(1-\[Epsilon]),* a program that does
A. Input a Turing machine M and program I.
B. Simulate M on I for K steps.
C. If M has halted then print 1, else print 0.
D. Halt.
has a probability greater than *(1-δ)* of having an accuracy (when predicting whether or not a program will halt) greater than *(1-ε).* The key result of this is that, in some cases 'we can learn what we cannot compute' - 'learning' referring to Valiant's formal analysis as 'the phenomenon of knowledge acquisition in the absence of specific programming' (Valiant 1984).
### 2.2 Definitions and Functions
The size of a combinator expression can either be measured by its length (total number of characters including brackets) or by its leaf size (number of 's' and 'k' characters). We use the former in most cases, and the latter when randomly generating combinator expressions.
The number of possible combinator expressions with leaf size *n* is given by
SKPossibleExpressions[n_]:=(2^n)*Binomial[2*(n-2),n-1]/n
(Wolfram, 2002), which grows exponentially.
#### 2.2.1 Visualisation
We define a function to visualise the growth of a combinator, *SKRasterize*:
SKArray[expr_,n_]:=Characters/@ToString/@SKEvaluate[expr,n];
SKArray[expr_]:=SKArray[expr,10];
Generates a list of the steps in the growth of a combinator, where each expression is itself a list of characters ('s', 'k', '[', ']')
SKGrid[exp_,n_]:=ArrayPlot[SKArray[exp,n],{ColorRules->{"s"->RGBColor[1,0,0],"k"->RGBColor[0,1,0],"["->RGBColor[0,0,1],"]"->RGBColor[0,0,0]},PixelConstrained->True,Frame->False,ImageSize->1000}];
SKGrid[exp_]:=SKGrid[exp,10];
Generates an ArrayPlot of a list given by SKArray, representing the growth of a combinator in a similar manner to that of cellular automata up to step n. The y axis represents time - each row is the next expression in the evaluation of an SK combinator. Red squares indicate 'S', green squares indicate 'K', blue squares indicate '[' and black squares indicate ']'.
SKRasterize[func_,n_]:=Image[SKGrid[func,n][[1]]];
SKRasterize[func_]:=SKRasterize[func,10];
Generates a rasterized version of the ArrayPlot.
A visualisation of a given combinator can easily be produced, as follows:
SKRasterize[s[s[s]][s][s][s][k],15]
[//]: # (No rules defined for Output)
![The longest running halting expression with leaf size 7, halting in 12 steps (Wolfram, 2002)][1]
The longest running halting expression with leaf size 7, halting in 12 steps (Wolfram, 2002)
#### 2.2.2 Halting graphs
We can create a table of the length (string length) of successive combinator expressions as they evaluate as follows:
SKLengths[exp_,n_]:=StringLength/@ToString/@SKEvaluate[exp,n];
Returns a list of the lengths of successive expressions until step *n*
These can be plotted as a graph (x axis number of steps, y axis length of expression):
SKPlot[expr_,limit_]:=ListLinePlot[SKLengths[expr,limit],AxesOrigin->{1,0},AxesLabel->{"Number of steps","Length of expression"}];
Thus, a graph of the above combinator can be produced:
SKPlot[s[s[s]][s][s][s][k],15]
[//]: # (No rules defined for Output)
![A graph of the above combinator][2]
It is evident from the graph that this combinator halts at 12 steps.
#### 2.2.3 Random SK combinators
To empirically study SK combinators, we need a function to randomly generate them. Two methods to do this were found:
RecursiveRandomSKExpr[0,current_]:=current;
RecursiveRandomSKExpr[depth_,current_]:=
RecursiveRandomSKExpr[depth-1,
RandomChoice[{
RandomChoice[{s,k}][current],
current[RecursiveRandomSKExpr[depth-1,RandomChoice[{s,k}]]]
}]
];
RecursiveRandomSKExpr[depth_Integer]:=RecursiveRandomSKExpr[depth,RandomChoice[{s,k}]];
A recursive method, repeatedly appending either a combinator to the 'head' of the expression or a randomly generated combinator expression to the 'tail' of the expression. (Hennigan)
replaceWithList[expr_,pattern_,replaceWith_]:=ReplacePart[expr,Thread[Position[expr,pattern]->replaceWith]];
treeToFunctions[tree_]:=ReplaceRepeated[tree,{x_,y_}:>x[y]];
randomTree[leafCount_]:=Nest[ReplacePart[#,RandomChoice[Position[#,x]]->{x,x}]&,{x,x},leafCount-2];
RandomSKExpr[leafCount_]:=treeToFunctions[replaceWithList[randomTree[leafCount],x,RandomChoice[{s,k},leafCount]]];
Random combinator generation based on generation of binary trees - each combinator can be expressed as a binary tree with leaves 'S' or 'K'. (Parfitt, 2017)
While the first method gives a large spread of combinators with a variety of lengths, and is potentially more efficient, for the purposes of this exploration the second is more useful, as it limits the combinators generated to a smaller, more controllable sample space (for a given leaf size).
### 2.3 Halting Graphs
All combinators of leaf sizes up to size 6 evolve to fixed points (NKS):
exprs = Table[RandomSKExpr[6],10];
ImageCollage[Table[ListLinePlot[SKLengths[exprs[[n]],40]],{n,10}],Background->White]
[//]: # (No rules defined for Output)
![10 randomly generated combinators of size 6, with their lengths plotted until n=40.][3]
10 randomly generated combinators of size 6, with their lengths plotted until n=40.
As the leaf size increases, combinators take longer to halt, and some show exponential growth:
exprs = Table[RandomSKExpr[10],10];
ImageCollage[Table[ListLinePlot[SKLengths[exprs[[n]],40]],{n,10}],Background->White]
[//]: # (No rules defined for Output)
![10 randomly generated combinators of size 10, with their lengths plotted until n=20.][4]
10 randomly generated combinators of size 10, with their lengths plotted until n=20.
exprs = Table[RandomSKExpr[30],10];
ImageCollage[Table[ListLinePlot[SKLengths[exprs[[n]],40]],{n,10}],Background->White]
[//]: # (No rules defined for Output)
![10 randomly generated combinators of size 30, with their lengths plotted until n=40.][5]
10 randomly generated combinators of size 30, with their lengths plotted until n=40.
CloudEvaluate[exprs = Table[RandomSKExpr[50],10];
ImageCollage[Table[ListLinePlot[SKLengths[exprs[[n]],40]],{n,10}],Background->White]]
[//]: # (No rules defined for Output)
![10 randomly generated combinators of size 50, with their lengths plotted until n=40.][6]
10 randomly generated combinators of size 50, with their lengths plotted until n=40.
After evaluating a number of these combinators, it appears that they tend to either halt or grow exponentially - some sources (Parfitt, 2017) reference linear growth combinators, however none of these have been encountered as yet.
### 2.4 Halting Times
With a random sample of combinators, we can plot a cumulative frequency graph of the number of combinators that have halted at a given number of steps:
SKHaltLength[expr_,n_]:=Module[{x},
x=Length[SKEvaluateUntilHalt[expr,n+1]];
If[x>n,False,x]
]
Returns the number of steps it takes the combinator *expr* to halt; if *expr* does not halt within n steps, returns *False*.
GenerateHaltByTable[depth_,iterations_,number_]:=Module[{exprs,lengths},
exprs = Monitor[Table[RandomSKExpr[depth],{n,number}],n];
lengths = Monitor[Table[SKHaltLength[exprs[[n]],iterations],{n,number}],n];
Return[lengths]
]
Generates a table of the halt lengths of *number* random combinator expressions (*False* if they do not halt within *iterations* steps) with leaf size *depth*.
GenerateHaltData[depth_,iterations_,number_]:=Module[{haltbytable,vals},
haltbytable = GenerateHaltByTable[depth,iterations,number];
vals = BinCounts[Sort[haltbytable],{1,iterations+1,1}];
Table[Total[vals[[1;;n]]],{n,1,Length[vals]}]
]
Generates a table of the number of *number* random combinator expressions (*False* if they do not halt within *iterations* steps) with leaf size *depth* that have halted after a given number of steps
GenerateHaltGraph[depth_,iterations_,number_]:=Module[{cumulative,f},
cumulative=GenerateHaltData[depth,iterations,number];
f=Interpolation[cumulative];
{ListLinePlot[cumulative,PlotRange->{Automatic,{0,number}},GridLines->{{},{number}},Epilog-> {Red,Dashed,Line[{{0,cumulative[[-1]]},{number,cumulative[[-1]]}}]},AxesOrigin->{1,0},AxesLabel->{"Number of steps","Number of combinators halted"}],cumulative[[-1]]}
]
Plots a graph of the above data.
#### 2.4.1 Halting Graphs
We analyse halt graphs of random samples of 1000 combinators (to leaf size 30):
CloudEvaluate[GenerateHaltGraph[10,30,1000]]
[//]: # (No rules defined for Output)
![Leaf size 10: almost all combinators in the sample (997) have halted (99.7%).][7]
Leaf size 10: almost all combinators in the sample (997) have halted (99.7%).
CloudEvaluate[GenerateHaltGraph[20,30,1000]]
[//]: # (No rules defined for Output)
![Leaf size 20: 979 combinators in the sample have halted (97.9%).][8]
Leaf size 20: 979 combinators in the sample have halted (97.9%).
CloudEvaluate[GenerateHaltGraph[30,30,1000]]
[//]: # (No rules defined for Output)
![Leaf size 30: 962 combinators in the sample have halted (96.2%).][9]
Leaf size 30: 962 combinators in the sample have halted (96.2%).
CloudEvaluate[GenerateHaltGraph[40,30,1000]]
[//]: # (No rules defined for Output)
![Leaf size 40: 944 combinators in the sample have halted (94.4%).][10]
Leaf size 40: 944 combinators in the sample have halted (94.4%).
CloudEvaluate[GenerateHaltGraph[50,30,1000]]
[//]: # (No rules defined for Output)
![Leaf size 50: 889 combinators in the sample have halted (88.9%).][11]
Leaf size 50: 889 combinators in the sample have halted (88.9%).
Evidently, the rate of halting of combinators in the sample decreases as number of steps increases - the gradient of the graph is decreasing. As the graph levels out at around 30 steps, we will assume that the number of halting combinators will not increase significantly beyond this point.
As the leaf size increases, fewer combinators in the sample have halted by 30 steps - however, the graph still levels out, suggesting most of the combinators which have not halted by this point will never halt.
#### 2.4.2 Halting Times and Leaf Size
We can plot a graph of the number of halted combinators against leaf size:
CloudEvaluate[ListLinePlot[Table[{n,GenerateHaltGraph[n,30,1000][[2]]},{n,5,50,1}]]]
![A graph to show the number of combinators which halt within 30 steps in each of 45 random samples of 1000 combinators, with leaf size varying from 5 to 50.][12]
A graph to show the number of combinators which halt within 30 steps in each of 45 random samples of 1000 combinators, with leaf size varying from 5 to 50.
This graph shows that, despite random variation, the number of halted combinators decreases as the leaf size increases: curve fitting suggests that this follows a negative quadratic function.
FitData[data_,func_]:=Module[{fitd},fitd={Fit[data[[1,2,3,4,1]],func,x]};{fitd,Show[ListPlot[data[[1,2,3,4,1]],PlotStyle->Red],Plot[fitd,{x,5,50}]]}]
A curve-fitting function: plots the curve of best fit for *data* with some combination of functions *func*.
FitData[%,{1,x,x^2}]
[//]: # (No rules defined for Output)
![Curve-fitting on the data with a quadratic function][13]
{1012.07 - 1.18915 x - 0.0209805 x^2}
Curve-fitting on the data with a quadratic function yields a reasonably accurate curve of best fit.
## 3. Machine Learning Analysis of SK Combinators
The graphs above suggest that the majority of halting SK combinators with leaf size <=50 will halt before ~30 steps. Thus we can state that, for a randomly chosen combinator, it is likely that if it does not halt before 40 steps, it will never halt. Unfortunately a lack of time prohibited a formal analysis of this, in the vein of Lathrop's work - this is an area for future research.
We attempt to use modern machine learning methods to predict the likelihood of a given SK combinator expression halting before 40 steps:
### 3.1 Dataset Generation
We implement a function *GenerateTable* to produce tables of random SK expressions:
SKHaltLength[expr_,n_]:=Module[{x},
x=Length[SKEvaluateUntilHalt[expr,n+1]];
If[x>n,False,x]
]
Returns the number of steps *expr* takes to halt if the given expression *expr* halts within the limit given (*limit*), otherwise returns *False*
GenerateTable[depth_,iterations_,number_]:=Module[{exprs,lengths},
exprs = Monitor[Table[RandomSKExpr[depth],{n,number}],n];
lengths = Monitor[Table[exprs[[n]]-> SKHaltLength[exprs[[n]],iterations],{n,number}],n];
lengths = DeleteDuplicates[lengths];
Return[lengths]
]
Returns a list of *number* expressions with leaf size *depth* whose elements are associations with key *expression* and value *number of steps taken to halt* if the expression halts within *iterations* steps, otherwise *False*.
*GenerateTable* simply returns tables random SK expressions - as seen earlier, these tend to be heavily skewed datasets as around 90% of random expressions generated will halt. Thus we must process this dataset to create a balanced training dataset: this is done with the function *CreateTrainingData*:
CreateTrainingData[var_]:=Module[{NoHalt,Halt,HaltTrain,Train},
NoHalt = Select[var,#[[2]]==False&];
Halt = Select[var,#[[2]]==True&];
HaltTrain = RandomSample[Halt,Length[NoHalt]];
Train = Join[HaltTrain,NoHalt];
Return[Train]
];
Counts the number of non-halting combinators in *var* (assumption is this is less than number of halting combinators), selects a random sample of halting combinators of this length and concatenates the lists.
ConvertSKTableToString[sktable_]:=Table[ToString[sktable[[n,1]]]-> sktable[[n,2]],{n,1,Length[sktable]}];
Converts SK expressions in a table generated with *GenerateTable* to strings
We also implement a function to create rasterised training data (where instead of an individual SK combinator associated with either True or False, an image of the first 5 steps of evaluation of the combinator is associated with either True or False):
CreateRasterizedTrainingData[var_]:=Module[{NoHalt,Halt,HaltTrain,HaltTrainRaster,NoHaltTrainRaster,RasterTrain},
NoHalt = Select[var,#[[2]]==False&];
Halt = Select[var,#[[2]]==True&];
HaltTrain = RandomSample[Halt,Length[NoHalt]];
HaltTrainRaster=Monitor[Table[SKRasterize[HaltTrain[[x,1]],5]-> HaltTrain[[x,2]],{x,1,Length[HaltTrain]}],x];
NoHaltTrainRaster=Monitor[Table[SKRasterize[NoHalt[[x,1]],5]-> NoHalt[[x,2]],{x,1,Length[NoHalt]}],x];
RasterTrain = Join[HaltTrainRaster,NoHaltTrainRaster];
Return[RasterTrain]
];
Counts the number of non-halting combinators in *var* (assumption is this is less than number of halting combinators), selects a random sample of halting combinators of this length, evaluates and generates images of both halting and non-halting combinators and processes them into training data (image->True/False).
### 3.2 Markov Classification
#### 3.2.1 Training
As a first attempt, we generate 2000 random SK expressions with leaf size 5, 2000 expressions with leaf size 10 ... 2000 expressions with leaf size 50, evaluated up to 40 steps:
lengths = Flatten[Table[GenerateTable[n,40,2000],{n,5,50,5}]]
We convert all non-False halt lengths to 'True':
lengths = lengths/.(a_->b_)/;!(b===False):> (a->True);
We process the data and train a classifier using the Markov method:
TrainingData = CreateTrainingData[lengths];
TrainingData2 = ConvertSKTableToString[TrainingData];
HaltClassifier1 = Classify[TrainingData2,Method->"Markov"]
[//]: # (No rules defined for Output)
#### 3.2.2 Testing
We must now generate test data, using the same parameters for generating random combinators:
testlengths = Flatten[Table[GenerateTable[n,40,2000],{n,5,50,5}]]
testlengths = testlengths/.(a_->b_)/;!(b===False):> (a->True);
TestData = CreateTrainingData[testlengths];
TestData2 = ConvertSKTableToString[TestData];
The classifier can now be assessed for accuracy using this data:
TestClassifier1 = ClassifierMeasurements[HaltClassifier1,TestData2]
[//]: # (No rules defined for Output)
#### 3.2.3 Evaluation
A machine learning solution to this problem is only useful if the accuracy is greater than 0.5 (i.e. more accurate than a random coin flip). We test the accuracy of the classifier:
TestClassifier1["Accuracy"]
0.755158
This, while not outstanding, is passable for a first attempt. We find the training accuracy:
ClassifierInformation[HaltClassifier1]
![Classifier Information][14]
The training accuracy (71.3%) is slightly lower than the testing accuracy (75.5%) - this is surprising, and is probably due to a 'lucky' testing dataset chosen.
We calculate some statistics from a confusion matrix plot:
TestClassifier1["ConfusionMatrixPlot"]
![Confusion Matrix Plot][15]
Accuracy: 0.76
Misclassification rate: 0.24
Precision (halt): 0.722 (when 'halt' is predicted, how often is it correct?)
True Positive Rate: 0.83 (when the combinator halts, how often is it classified as halting?)
False Positive Rate: 0.32 (when the combinator doesn't halt, how often is it classified as halting?)
Precision (non-halt): 0.799 (when 'non halt' is predicted, how often is it correct?)
True Negative Rate: 0.68 (when the combinator doesn't halt, how often is it classified as not halting?)
False Negative Rate: 0.17 (when the combinator halts, how often is it classified as not halting?)
A confusion matrix plot shows that the true positive rate is larger than the true negative rate - this would suggest that it is easier for the model to tell when an expression halts than when an expression does not halt. This could be due to the model detecting features suggesting very short run time in the initial string - for instance, a combinator k[k][<expression>] would evaluate immediately to k and halt - however, these 'obvious' features are very rare.
### 3.3 Random Forest Classification on Rasterised Expression Images
Analysing strings alone, without any information about how they are actually structured or how they might evaluate, could well be a flawed method - one might argue that, in order to predict halting, one would need more information about how the program runs. Hence, another possible method is to generate a dataset of visualisations of the first 5 steps of a combinator's evaluation as follows:
SKRasterize[RandomSKExpr[50],5]
![A rasterised SK combinator with length 50, evaluated to 5 steps][16]
and feed these into a machine learning model. Although it might seem that this method is pointless - we are already evaluating the combinators to 5 steps, and we are training a model on a database of combinators evaluated to 40 steps to predict if a combinator will halt in <=40 steps, the point of the exercise is less to create a useful resource than to investigate the feasibility of applying machine learning to this type of problem. If more computational power was available, a dataset of combinators evaluated to 100 steps (when even more combinators will have halted) could be created: in such a case a machine learning model to predict whether or not a combinator will halt in <=100 steps would be a practical approach as the time taken to evaluate a combinator to 100 steps is exponentially longer than that taken to evaluate a combinator to 5 steps.
3.3.1 Training
We generate a dataset of 2000 random SK expressions with leaf size 5, 2000 expressions with leaf size 10 ... 2000 expressions with leaf size 50, evaluated up to 40 steps:
rasterizedlengths = Flatten[Table[GenerateTable[n,40,2000],{n,5,50,5}]];
In order to train a model on rasterised images, we must evaluate all SK expressions in the dataset to 5 steps and generate rasterised images of these:
RasterizedTrainingData = CreateRasterizedTrainingData[rasterizedlengths];
We then train a classifier on this data:
RasterizeClassifier=Classify[RasterizedTrainingData,Method->"RandomForest"]
#### 3.3.2 Testing
We must now generate test data, using the same parameters for generating random training data:
testrasterizedlengths = Flatten[Table[GenerateTable[n,40,2000],{n,5,50,5}]];
testrasterizedlengths = testrasterizedlengths/.(a_->b_)/;!(b===False):> (a->True);
TestRasterizedData = CreateRasterizedTrainingData[testrasterizedlengths];
The classifier can now be assessed for accuracy using this data:
TestRasterizeClassifier=ClassifierMeasurements[RasterizeClassifier,TestRasterizedData]
[//]: # (No rules defined for Output)
#### 3.3.3 Evaluation
A machine learning solution to this problem is only useful if the accuracy is greater than 0.5 (i.e. more accurate than a random coin flip). We test the accuracy of the classifier:
TestRasterizeClassifier["Accuracy"]
0.876891
This is significantly better than the Markov approach (75.5%). We find the training accuracy:
ClassifierInformation[RasterizeClassifier]
![enter image description here][17]
Again, the training accuracy (85.5%) is slightly lower than the testing accuracy (87.7%).
We calculate some statistics from a confusion matrix plot:
TestRasterizeClassifier["ConfusionMatrixPlot"]
![Confusion Matrix Plot][18]
Accuracy: 0.88
Misclassification rate: 0.12
Precision (halt): 0.911 (when 'halt' is predicted, how often is it correct?)
True Positive Rate: 0.83 (when the combinator halts, how often is it classified as halting?)
False Positive Rate: 0.08 (when the combinator doesn't halt, how often is it classified as halting?)
Precision (non-halt): 0.848 (when 'non halt' is predicted, how often is it correct?)
True Negative Rate: 0.92 (when the combinator doesn't halt, how often is it classified as not halting?)
False Negative Rate: 0.17 (when the combinator halts, how often is it classified as not halting?)
A confusion matrix plot shows that the false negative rate is larger than the false positive rate - this would suggest that it is easier for the model to tell when an expression halts than when an expression does not halt. The precision for halting is much higher than the precision for non-halting, indicating that if the model suggests a program will halt, this is much more likely to be correct than if it suggested that the program would not halt. An (oversimplified) way to look at this intuitively is to examine some graphs of lengths of random combinators:
![Random combinator length graphs][19]
Looking at combinators that halt (combinators for which the graph flattens out), some combinators 'definitely halt' - their length decreases until the graph flattens out:
!['definitely halts'][20]
'definitely halts' (1)
Some combinators have length that increases exponentially :
![exponentially increasing combinator length graph][21]
'possibly non-halting' (2)
And some combinators appear to have increasing length but suddenly decrease:
![increasing then decreasing combinator length graph][22]
'possibly non-halting' (3)
We do not know which features of the rasterised graphic the machine learning model extracts to make its prediction, but if, say, it was classifying based purely on length of the graphic, it would identify combinators like (1) as ' definitely halting', but would not necessarily be able to distinguish between combinators like (2) and combinators like (3), which both appear to be non - halting initially.
On a similar note, some functional programming languages (e.g. Agda - [7]) have the ability to classify a function as 'definitely halting' or 'possibly non-halting', just like our classifier, whose dataset is trained on functions that either 'definitely halt' (halt in <= 40 steps) or are 'possibly non-halting' (do not halt in <= 40 steps - might halt later).
### 3.4 Table of Comparison
![A table comparing statistics for Markov and Random Forest models][23]
### 4. Conclusions and Further Work
#### 4.1 Conclusions
The results of this exploration were somewhat surprising, in that a machine learning approach to determining whether or not a program will terminate appears to some extent viable - out of all the methods attempted, the random forest classifier applied to a rasterised image of the first five steps of the evaluation of a combinator achieved the highest accuracy of 0.88 on a test dataset of 1454 random SK combinator expressions. Note, though, that what is actually being determined here is whether or not a combinator will halt before some n steps (here, n=40) - we are classifying between combinators that 'definitely halt' and combinators which are 'possibly non-halting'.
### 4.2 Microsite
As an extension to this project, a Wolfram microsite was created and is accessible at [https://www.wolframcloud.com/objects/euan.l.y.ong/SKCombinators](https://www.wolframcloud.com/objects/euan.l.y.ong/SKCombinators) - within this microsite, a user can view a rasterised image of a combinator, a graph of the length of the combinator as it is evaluated, a statistical analysis of halting time relative to other combinators with the same leaf size and a machine learning prediction of whether or not the combinator will halt within 40 steps.
![Microsite Screenshot][24]
A screenshot of the microsite evaluating a random SK combinator expression
### 4.3 Implications, Limitations and Further Work
Although the halting problem is undecidable, the field of termination analysis - attempting to determine whether or not a given program will eventually terminate - has a variety of applications, for instance in program verification. Machine learning approaches to this problem would not only help explore this field in new ways but could also be implemented in, for instance, software debuggers.
The principal limitations of this method are that we are only predicting whether or not a combinator will halt in a finite number *k* of steps - while this could be a sensible idea if k is large, at present this system is very impractical due to small datasets and a small value of *k* used to train the classifier (*k *= 40). Another issue with the machine learning technique used is that the visualisations have different dimensions (longer combinators will generate longer images), and when the images are preprocessed and resized before being fed into the random forest model, downsampling/upsampling can lead to loss of data decreasing the accuracy of the model.
From a machine learning perspective, attempts at analysis of the rasterised images with a neural network could well prove fruitful, as would an implementation of a vector representation of syntax trees to allow the structure of SK combinators (nesting combinators) to be accurately extracted by a machine learning model.
Future theoretical research could include a deeper exploration of Lathrop's probabilistic method of determining *k*, an investigation of the 'halting' features the machine learning model is looking for within the rasterised images, a more general analysis of SK combinators (proofs of halting / non-halting for certain expressions, for instance) to uncover deeper patterns, or even an extension of the analysis carried out in the microsite to lambda calculus expressions (which can be transformed to an 'equivalent' SK combinator expression).
## Acknowledgements
We thank the mentors at the 2018 Wolfram High School Summer Camp - Andrea Griffin, Chip Hurst, Rick Hennigan, Michael Kaminsky, Robert Morris, Katie Orenstein, Christian Pasquel, Dariia Porechna and Douglas Smith - for their help and support writing this paper.
## References
A. Church and J. B. Rosser: Some properties of conversion. Transactions of the American Mathematical Society, 39 (3): 472\[Dash]482, 2018
R. H. Lathrop: On the learnability of the uncomputable. ICML 1996: 302-309.
C. S. Calude and M. Dumitrescu: A probabilistic anytime algorithm for the halting problem. Computability 7(2-3): 259-271, 2018.
L. G. Valiant: A theory of the learnable. Communications of the Association for Computing Machinery 27, (11): 1134-1142, 1984.
S. Wolfram: A New Kind of Science. 1121-1122, 2002.
E. Parfitt: Ways that combinators evaluate from the Wolfram Community\[LongDash]A Wolfram Web Resource. (2017)
http://community.wolfram.com/groups/-/m/t/965400
S-C Mu: Agda.Termination.Termination http://www.iis.sinica.edu.tw/~scm/Agda/Agda-Termination-Termination.html
Attached is a Wolfram Notebook (.nb) version of this computational essay.
[1]: http://community.wolfram.com//c/portal/getImageAttachment?filename=1.png&userId=1371970
[2]: http://community.wolfram.com//c/portal/getImageAttachment?filename=2.png&userId=1371970
[3]: http://community.wolfram.com//c/portal/getImageAttachment?filename=3.png&userId=1371970
[4]: http://community.wolfram.com//c/portal/getImageAttachment?filename=4.png&userId=1371970
[5]: http://community.wolfram.com//c/portal/getImageAttachment?filename=5.png&userId=1371970
[6]: http://community.wolfram.com//c/portal/getImageAttachment?filename=6.png&userId=1371970
[7]: http://community.wolfram.com//c/portal/getImageAttachment?filename=7.png&userId=1371970
[8]: http://community.wolfram.com//c/portal/getImageAttachment?filename=8.png&userId=1371970
[9]: http://community.wolfram.com//c/portal/getImageAttachment?filename=9.png&userId=1371970
[10]: http://community.wolfram.com//c/portal/getImageAttachment?filename=10.png&userId=1371970
[11]: http://community.wolfram.com//c/portal/getImageAttachment?filename=11.png&userId=1371970
[12]: http://community.wolfram.com//c/portal/getImageAttachment?filename=12.png&userId=1371970
[13]: http://community.wolfram.com//c/portal/getImageAttachment?filename=13.png&userId=1371970
[14]: http://community.wolfram.com//c/portal/getImageAttachment?filename=Classify1.png&userId=1371970
[15]: http://community.wolfram.com//c/portal/getImageAttachment?filename=15.png&userId=1371970
[16]: http://community.wolfram.com//c/portal/getImageAttachment?filename=16.png&userId=1371970
[17]: http://community.wolfram.com//c/portal/getImageAttachment?filename=17.png&userId=1371970
[18]: http://community.wolfram.com//c/portal/getImageAttachment?filename=18.png&userId=1371970
[19]: http://community.wolfram.com//c/portal/getImageAttachment?filename=19.png&userId=1371970
[20]: http://community.wolfram.com//c/portal/getImageAttachment?filename=20.png&userId=1371970
[21]: http://community.wolfram.com//c/portal/getImageAttachment?filename=21.png&userId=1371970
[22]: http://community.wolfram.com//c/portal/getImageAttachment?filename=22.png&userId=1371970
[23]: http://community.wolfram.com//c/portal/getImageAttachment?filename=23.png&userId=1371970
[24]: http://community.wolfram.com//c/portal/getImageAttachment?filename=24.png&userId=1371970Euan Ong2018-07-14T23:56:25Z[WSC18] Streaming Live Phone Sensor Data to the Wolfram Language
http://community.wolfram.com/groups/-/m/t/1386358
# Streaming Live Phone Sensor Data to the Wolfram Language
(This forms Part 1 of a 2-part community post: "Using Machine Learning Models for Accelerometer-based Gesture Recognition" - part 2 is available at http://community.wolfram.com/groups/-/m/t/1386392)
![Live demo of sensor streaming][1]
Not only are smartphones wonderful ways to stay connected to the digital world, they also contain an astonishing array of sensors making them ideal for scientific and computational experimentation.
The Wolfram Language (WL) has extraordinary data processing and scientific computing abilities - the only sensors, however, from which they can read data are specialised and either somewhat expensive or require a significant amount of setup. On a high level, the WL has baked-in support for a variety of devices - specifically the Raspberry Pi, Vernier Go!Link compatible sensors, Arduino microcontrollers, webcams and devices using the RS-232 or RS-422 serial protocol (http://reference.wolfram.com/language/guide/UsingConnectedDevices.html); unfortunately, there is no easy way to access sensor data from Android or iOS mobile devices.
In this post, I will attempt to combine the two, demonstrating
1. A UDP socket-based method for transmission of (general) sensor data from an Android phone to the Wolfram Language (based on this excellent community post: http://community.wolfram.com/groups/-/m/t/344278 which does the same for iPhones)
2. A web-based, platform-agnostic method for transmission of IMU / inertial motion unit data (i.e. accelerometer and gyroscope data) from a phone to the Wolfram Language.
# Socket Transmission
On the Google Play Store, there exist a number of Android apps which can transmit live sensor data to a computer over UDP sockets - for instance, "Sensorstream IMU+GPS" ([https://play.google.com/store/apps/details?id=de.lorenz_fenster.sensorstreamgps][1]). Unfortunately, the WL does not support receipt and transmission of data over UDP sockets - while there exists a *Socket* library, as of 2018 this is only capable of dealing with TCP. Thus, to use UDP sockets in the WL, we must implement our own library using JLink to access Java socket packages from the WL. (Credit is due to http://community.wolfram.com/groups/-/m/t/344278 - the code here was slightly outdated so had to be modified.)
## Instructions
To send accelerometer (or other sensor) data from your phone to Wolfram over UDP sockets:
1. Install the "Sensorstream IMU+GPS" app
2. Ensure the sensors you want to stream to Wolfram are ticked on the 'Toggle Sensors' page. (If you want to stream other sensors besides 'Accelerometer', 'Gyroscope' and 'Magnetic Field', ensure the 'Include User-Checked Sensor Data in Stream' box is ticked. Beware, though - the more sensors are ticked, the more latency the sensor stream will have.)
3. On the "Preferences" tab:
a. Change the target IP address in the app to the IP address of your computer (ensure your computer and phone are connected to the same local network)
b. Set the target port to 5555
c. Set the sensor update frequency to 'Fastest'
d. Select the 'UDP stream' radio box
e. Tick 'Run in background'
4. Switch stream ON **before** executing code. (nb. ensure your phone does not fall asleep during streaming - perhaps use the 'Caffeinate' app (https://play.google.com/store/apps/details?id=xyz.omnicron.caffeinate&hl=en_US) to ensure this.)
5. Execute the following WL code (in part from http://community.wolfram.com/groups/-/m/t/344278):
Initialise JLink
QuitJava[];
Needs["JLink`"];
InstallJava[];
Initialise a socket connection - ensure *5555* is the target port set
udpSocket=JavaNew["java.net.DatagramSocket",5555];
Function that reads *size* bytes of a function.
readSocket[sock_,size_]:=JavaBlock@Block[{datagramPacket=JavaNew["java.net.DatagramPacket",Table[0,size],size]},sock@receive[datagramPacket];
datagramPacket@getData[]]
Function that reads from the socket, processes data and 'sows' it to be collected later
listen[]:=record=DeleteCases[readSocket[udpSocket,1200],0]//FromCharacterCode//Sow;
Initialises the results list and repeatedly appends accelerometer data to it every 0.01 seconds - if the list is over 700 elements long, the 150 oldest elements (at start of list) are removed.
results={};RunScheduledTask[AppendTo[results,Quiet[Reap[listen[]]]];If[Length[results]>700,Drop[results,150]],0.01];
Initialises the stream list to be refreshed every 0.01 seconds with the most recent 500 elements of results. Each element of results is a string of transmitted socket data (e.g. "225585.00455, 3, -1.591, 8.624, 5.106, 4, -0.193, -0.690, -0.072") - this is split into a list of strings {"225585.00455", "3", "-1.591"...} and each string is converted to a numerical expression.
stream:=Refresh[ToExpression[StringSplit[#[[1]],","]]& /@ Select[results[[-500;;]],Head[#]==List&],UpdateInterval-> 0.01]
*Stream* now contains the 500 most recent accelerometer readings, stored in an array. The values of Stream will be updated whenever the variable is used within a *Dynamic*. (Note that, with the default sensors enabled - the first three boxes ticked on the *Toggle Sensors* tab - the x, y and z coordinates of the accelerometer can be accessed at elements 3, 4 and 5 in each list in the array. (e.g. to access the most recent accelerometer reading, run stream[[-1,3;;5]])
The accelerometer data can then be visualised using a *ListLinePlot*:
While[Length[results]<500,Pause[2]];Dynamic[Refresh[ListLinePlot[{stream[[All,3]],stream[[All,4]],stream[[All,5]]},PlotRange->All],UpdateInterval->0.1]]
![A list line plot of accelerometer data][2]
The 'pulses' (i.e. shaking the phone) were carried out every second; from this it is evident that the frequency of data transmission is 50 Hz (i.e. data is sent every 0.02 seconds).
To get the most recent accelerometer data, run
Dynamic[stream[[-1,3;;5]]]
To end socket transmission, turn off the stream on the app, run
RemoveScheduledTask[ScheduledTasks[]];
udpSocket@close[];
QuitJava[];
and ensure the process 'JLink' is quit in Task Manager / Activity Monitor etc - if it is not closed properly, you will be unable to create another socket from that port.
# Channel Transmission
An alternative way to send data from a phone to the Wolfram Cloud is by using the Channel framework. Introduced in version 11 of the Wolfram Language. the Channel framework allows asynchronous communication between Wolfram sessions as well as external systems, with communication being brokered in the Wolfram Cloud. A key point to note about the Channel framework is that it is based on a publish-subscribe model, allowing messages to be sent and received through a 'channel' rather than pairing specific senders and receivers.
##Instructions
To transmit accelerometer data, run the following code: (for other sensors, see the bottom of the page)
ChannelDeploySensorPage[func_]:=Module[{listener,listenerurl,SensorHTML,c,url,u},
CloudConnect[];
listener=ChannelListen["Sensors",func[#Message]&,Permissions->"Public"];
listenerurl = listener["URL"];
SensorHTML="<!DOCTYPE html><html lang=en><meta charset=UTF-8><title>Sensors</title><script src=https://cdn.jsdelivr.net/npm/gyronorm@2.0.6/dist/gyronorm.complete.min.js></script><script>function makeXHR(n,t,o){var e=Date.now(),r=(Math.random(),new XMLHttpRequest);r.withCredentials=!0;var i=\""<>listenerurl<>"?operation=send&time=\"+e.toString()+\"&x=\"+n.toString()+\"&y=\"+t.toString()+\"&z=\"+o.toString();r.open(\"GET\",i,!0),r.send()}function init(){var n={frequency:100,gravityNormalized:!0,orientationBase:GyroNorm.WORLD,decimalCount:2,logger:null,screenAdjusted:!1},t=new GyroNorm;t.init(n).then(function(){t.start(function(n){makeXHR(n.dm.x,n.dm.y,n.dm.z)})})}window.onload=init</script>";
c = CloudExport[SensorHTML,"HTML",Permissions->"Public"];
u=URLShorten[c[[1]]];
Return[{u,BarcodeImage[u,"QR"],listener}]
]
Then run
c = ChannelDeploySensorPage[Func]
![Output - a QR code][3]
(where the argument *func* is some function to be called whenever the channel receives a new point of data from the phone - the argument given to *func* is an association such as the one below:)
<|x=3, y=4, z=1|>
Now, simply scan the QR code generated with your phone, and sensor data will be streamed from your phone to the computer.
The data transmitted can be viewed as a time series as follows:
c[[3]]["TimeSeries"]
[//]: # (No rules defined for Output)
The accelerometer data can also be plotted with the following Dynamic: (red --> x, green --> y, blue --> z):
Dynamic[ListLinePlot[ToExpression/@Reverse[Take[Reverse[#["Values"]],UpTo[100]]]&/@c[[3]]["TimeSeries"][[2;;4]],PlotRange->{All,{-50,50}},PlotStyle->{Red, Green, Blue}]]
When you're done, delete the channel by running
RemoveChannelListener[c[[3]]]
##Explanation
Setting up a channel is as easy as connecting to the Wolfram Cloud
CloudConnect[];
and typing
current="";
Func[x_]:=current=x;
listener=ChannelListen["NameOfChannel",Func[#Message]&, Permissions->"Public"]
![A channel listener][4]
Here, *Func* is a function that will be called each time the channel receives a message (the message is supplied as an argument to the function) - it simply sets the variable 'current' to the data last sent to the channel (in the form of key-value pairs - e.g. <|x=3, y=4, z=1|>. To make the channel accessible to other users, ensure the channel has Permissions set to Public.
To delete the channel (useful when debugging), call
RemoveChannelListener[listener];
One particularly useful feature of the Channel is that it has built-in support for receiving and parsing HTTP requests - simply send a GET request to the channel URL (given by *listener["URL"]*) and the WL will automatically parse the parameters and make the data available to the user:
For instance, if we send an HTTP GET request to *https://channelbroker.wolframcloud.com/users/<your Wolfram Cloud email address>/NameOfChannel* and append the parameters "*operation=send*" (indicates data is being sent to the channel) and "*test=5*":
BaseURL=listener["URL"]
Params = "?operation=send&test=5";
URLRead[HTTPRequest[BaseURL<>Params,<|Method->"Get"|>]]
[//]: # (No rules defined for Output)
The variable 'current' has now been updated and contains the key-value pair 'test->5' which we just sent to the channel.
current
<|"test" -> "5"|>
[//]: # (No rules defined for Output)
current[["test"]]
5
[//]: # (No rules defined for Output)
An alternative way of viewing the data from the channel is to call
listener["TimeSeries"]
[//]: # (No rules defined for Output)
This allows the data sent to the channel to be stored as a time series, which can be useful in applications such as collecting time-based sensor data.
### Transmission of Sensor Data over Channels
As demonstrated earlier, a nice feature of Channels is that data can be sent to the Wolfram Language over HTTP - instead of fiddling with JLink and sockets (which tend to be laggy and break easily), one can simply create a web page that streams sensor data to a channel.
For Android devices (running Google Chrome), there exist a range of built-in sensor APIs giving a web page access to raw accelerometer, gyroscope, light sensor and magnetometer data, and processed linear acceleration (i.e. total acceleration experienced by a device disregarding that produced by gravity), absolute orientation and relative orientation sensors. Documentation for these sensors exists online at https://developers.google.com/web/updates/2017/09/sensors-for-the-web.
Unfortunately, for iOS devices there does not exist an easy way to access sensors from the web - although one can use DeviceMotion events (https://developers.google.com/web/fundamentals/native-hardware/device-orientation/), the data these give can vary significantly from browser to browser (e.g. different browsers might use different coordinate systems), so training a machine learning model on gesture data produced by this method would require either retraining a model for each browser or significant processing of data based on browser.
However, there is another solution - namely, the gyronorm.js API (https://github.com/dorukeker/gyronorm.js), which claims to return 'consistent [gyroscope and accelerometer] values across different devices'. Using this, we construct a simple web page to transmit accelerometer data to a Wolfram Language channel called 'Sensors': (While the following code focuses on extracting accelerometer data, it is a trivial task to change the sensor being polled to read, for instance, gyroscope data in the Wolfram Language instead.)
<!DOCTYPE html>
<html lang=en>
<meta charset=UTF-8>
<title>Sensors</title>
<script src=https://cdn.jsdelivr.net/npm/gyronorm@2.0.6/dist/gyronorm.complete.min.js></script>
<script>
function makeXHR(x,y,z){
var t=Date.now();
r=new XMLHttpRequest;
r.withCredentials=true;
var i="https://channelbroker.wolframcloud.com/users/euan.l.y.ong@gmail.com/Sensors?operation=send&time="+t.toString()+"&x="+x.toString()+"&y="+y.toString()+"&z="+z.toString();
r.open("GET",i,!0);
r.send()
}
function init(){
//Explanations are from the GyroNorm GitHub page. (https://github.com/dorukeker/gyronorm.js/)
var n={
frequency:100, //send values every 100 milliseconds
gravityNormalized:!0, // Whether or not to normalise gravity-related values
orientationBase:GyroNorm.WORLD, // ( Can be Gyronorm.GAME or GyroNorm.WORLD. gn.GAME returns orientation values with respect to the head direction of the device. gn.WORLD returns the orientation values with respect to the actual north direction of the world. )
decimalCount:2, // How many digits after the decimal point to return for each value
logger:null,
screenAdjusted:!1
};
t=new GyroNorm;
t.init(n).then(function(){
t.start(function(data){
makeXHR(data.dm.x,data.dm.y,data.dm.z)
//Other possible values to substitute for data.dm.x, data.dm.y, data.dm.z are:
// data.do.alpha ( deviceorientation event alpha value )
// data.do.beta ( deviceorientation event beta value )
// data.do.gamma ( deviceorientation event gamma value )
// data.do.absolute ( deviceorientation event absolute value )
// data.dm.x ( devicemotion event acceleration x value )
// data.dm.y ( devicemotion event acceleration y value )
// data.dm.z ( devicemotion event acceleration z value )
// data.dm.gx ( devicemotion event accelerationIncludingGravity x value )
// data.dm.gy ( devicemotion event accelerationIncludingGravity y value )
// data.dm.gz ( devicemotion event accelerationIncludingGravity z value )
// data.dm.alpha ( devicemotion event rotationRate alpha value )
// data.dm.beta ( devicemotion event rotationRate beta value )
// data.dm.gamma ( devicemotion event rotationRate gamma value )
})
})
}
window.onload=init;
</script>
</html>
This webpage, when opened on an Android or iOS phone, will stream data to the 'Sensors' channel, sending a new HTTP request every 100 milliseconds. (Decreasing the 'frequency' leads to more frequent results, but can cause atrocious levels of lag.)
### Producing the ChannelDeploySensorPage function
Although this webpage allows accelerometer data to be transmitted from a phone to a computer, for it to be used it must be deployed on a server. To change, for instance, the channel name, one would need to edit the file on the server itself, which can quickly become a tiresome process. Thus, we developed a function which autogenerates the required HTML code and stores it in the Wolfram Cloud as a CloudObject where it can easily be accessed. The function also outputs a QR code, to allow mobile users to quickly navigate to the web page. (The argument *func* is simply the function to be called whenever the channel receives a new point of data from the phone.)
## Alternative Sensors
ChannelDeploySensorPage functions for accessing sensors other than the accelerometer can be found below: (For more information about sensors and readings, check out https://developers.google.com/web/fundamentals/native-hardware/device-orientation/)
### Device Orientation (alpha, beta, gamma, absolute):
ChannelDeploySensorPageDeviceOrientation[func_]:=Module[{listener,listenerurl,SensorHTML,c,url,u},
CloudConnect[];
listener=ChannelListen["Sensors",func[#Message]&,Permissions->"Public"];
listenerurl = listener["URL"];
SensorHTML="<!DOCTYPE html><html lang=en><meta charset=UTF-8><title>Sensors</title><script src=https://cdn.jsdelivr.net/npm/gyronorm@2.0.6/dist/gyronorm.complete.min.js></script><script>function makeXHR(n,t,o,abs){var e=Date.now(),r=(Math.random(),new XMLHttpRequest);r.withCredentials=!0;var i=\""<>listenerurl<>"?operation=send&time=\"+e.toString()+\"&alpha=\"+n.toString()+\"&beta=\"+t.toString()+\"&gamma=\"+o.toString()+\"&absolute=\"+abs.toString();r.open(\"GET\",i,!0),r.send()}function init(){var n={frequency:100,gravityNormalized:!0,orientationBase:GyroNorm.WORLD,decimalCount:2,logger:null,screenAdjusted:!1},t=new GyroNorm;t.init(n).then(function(){t.start(function(n){makeXHR(n.do.alpha,n.do.beta,n.do.gamma,n.do.absolute)})})}window.onload=init</script>";
c = CloudExport[SensorHTML,"HTML",Permissions->"Public"];
u=URLShorten[c[[1]]];
Return[{u,BarcodeImage[u,"QR"],listener}]
]
### Device Motion - Acceleration Including Gravity (x, y, z):
ChannelDeploySensorPageAccelerationGravity[func_]:=Module[{listener,listenerurl,SensorHTML,c,url,u},
CloudConnect[];
listener=ChannelListen["Sensors",func[#Message]&,Permissions->"Public"];
listenerurl = listener["URL"];
SensorHTML="<!DOCTYPE html><html lang=en><meta charset=UTF-8><title>Sensors</title><script src=https://cdn.jsdelivr.net/npm/gyronorm@2.0.6/dist/gyronorm.complete.min.js></script><script>function makeXHR(n,t,o){var e=Date.now(),r=(Math.random(),new XMLHttpRequest);r.withCredentials=!0;var i=\""<>listenerurl<>"?operation=send&time=\"+e.toString()+\"&x=\"+n.toString()+\"&y=\"+t.toString()+\"&z=\"+o.toString();r.open(\"GET\",i,!0),r.send()}function init(){var n={frequency:100,gravityNormalized:!0,orientationBase:GyroNorm.WORLD,decimalCount:2,logger:null,screenAdjusted:!1},t=new GyroNorm;t.init(n).then(function(){t.start(function(n){makeXHR(n.dm.gx,n.dm.gy,n.dm.gz)})})}window.onload=init</script>";
c = CloudExport[SensorHTML,"HTML",Permissions->"Public"];
u=URLShorten[c[[1]]];
Return[{u,BarcodeImage[u,"QR"],listener}]
]
### Device Motion - Rotation Rate (alpha, beta, gamma):
ChannelDeploySensorPageRotationRate[func_]:=Module[{listener,listenerurl,SensorHTML,c,url,u},
CloudConnect[];
listener=ChannelListen["Sensors",func[#Message]&,Permissions->"Public"];
listenerurl = listener["URL"];
SensorHTML="<!DOCTYPE html><html lang=en><meta charset=UTF-8><title>Sensors</title><script src=https://cdn.jsdelivr.net/npm/gyronorm@2.0.6/dist/gyronorm.complete.min.js></script><script>function makeXHR(n,t,o){var e=Date.now(),r=(Math.random(),new XMLHttpRequest);r.withCredentials=!0;var i=\""<>listenerurl<>"?operation=send&time=\"+e.toString()+\"&x=\"+n.toString()+\"&y=\"+t.toString()+\"&z=\"+o.toString();r.open(\"GET\",i,!0),r.send()}function init(){var n={frequency:100,gravityNormalized:!0,orientationBase:GyroNorm.WORLD,decimalCount:2,logger:null,screenAdjusted:!1},t=new GyroNorm;t.init(n).then(function(){t.start(function(n){makeXHR(n.dm.alpha,n.dm.beta,n.dm.gamma)})})}window.onload=init</script>";
c = CloudExport[SensorHTML,"HTML",Permissions->"Public"];
u=URLShorten[c[[1]]];
Return[{u,BarcodeImage[u,"QR"],listener}]
]
# Applications
Real world applications of this sensor data abound - aside from the gesture recognition system described in post 2 (http://community.wolfram.com/groups/-/m/t/1386392), you could use this sort of data to make a pocket seismometer, a fall detector, electronic dice, investigate centrifugal motion, investigate friction... More examples available here: http://www.gcdataconcepts.com/examples.html. If you make a sensor-based project in Wolfram, or think of a new / innovative / interesting way to use this data, or if the code above is buggy / incomplete, please do share it in the comments below!
A Wolfram Notebook version of this post is attached.
-- Euan Ong
# References
"Using Connected Devices": http://reference.wolfram.com/language/guide/UsingConnectedDevices.html
"Using your smart phone as the ultimate sensor array for Mathematica": http://community.wolfram.com/groups/-/m/t/344278
"Capturing Data from an Android Phone using Wolfram Data Drop": http://community.wolfram.com/groups/-/m/t/461190
"Sensorstream IMU+GPS": https://play.google.com/store/apps/details?id=de.lorenz_fenster.sensorstreamgps
"Sensors For The Web!": https://developers.google.com/web/updates/2017/09/sensors-for-the-web
"gyronorm.js": https://github.com/dorukeker/gyronorm.js
[1]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ezgif-3-207848dda6.gif&userId=1371970
[2]: http://community.wolfram.com//c/portal/getImageAttachment?filename=60691.png&userId=1371970
[3]: http://community.wolfram.com//c/portal/getImageAttachment?filename=42762.png&userId=1371970
[4]: http://community.wolfram.com//c/portal/getImageAttachment?filename=37343.png&userId=1371970Euan Ong2018-07-17T20:01:00Z[WSC18] Phone Gesture Recognition with Machine Learning
http://community.wolfram.com/groups/-/m/t/1386392
# Accelerometer-based Gesture Recognition with Machine Learning
![A GIF of gesture recognition in progress][1]
## Introduction
This is Part 2 of a 2-part community post - Part 1 (Streaming Live Phone Sensor Data to the Wolfram Language) is available here: http://community.wolfram.com/groups/-/m/t/1386358
As technology advances, we are constantly seeking new, more intuitive methods of interfacing with our devices and the digital world; one such method is gesture recognition. Although touchless human interface devices (kinetic user interfaces) exist and are in development, the cost and configuration required for these sometimes makes them impractical, particularly for mobile applications. A simpler method would be to use devices the user already has on their person - such as a phone or a smartwatch - to detect basic gestures, taking advantage of the wide array of sensors included in such devices. In an attempt to assess the feasibility of such a method, methods of asynchronous communication between a mobile device and the Wolfram Language are investigated, and a gesture recognition system based around an accelerometer sensor is implemented, using a machine learning model to classify a few simple gestures from mobile accelerometer data.
## Investigating Methods of Asynchronous Communication
To implement an accelerometer-based gesture recognition system, we must devise a suitable means for a mobile device to transmit accelerometer data to a computer running the Wolfram Language (WL). On a high level, the WL has baked-in support for a variety of devices - specifically the Raspberry Pi, Vernier Go!Link compatible sensors, Arduino microcontrollers, webcams and devices using the RS-232 or RS-422 serial protocol (http://reference.wolfram.com/language/guide/UsingConnectedDevices.html); unfortunately, there is no easy way to access sensor data from Android or iOS mobile devices.
On a low level, the WL natively supports TCP and ZMQ socket functionality, as well as receipt and transmission of HTTP requests and Pub-Sub channel communication. We investigate the feasibility of both methods for transmission of accelerometer data in Part 1 of this community post (http://community.wolfram.com/groups/-/m/t/1386358).
## Gesture Classification using Neural Networks
Now that we are able to stream accelerometer data to the WL, we may proceed to implement gesture recognition / classification. Due to limited time at camp, we used the UDP socket method to do this - in the future, we hope to move the system over to the (more user-friendly) channel interface.
We first configure the sensor stream, allowing live accelerometer data to be sent to the Wolfram Language:
### Configure the Sensor Stream
1. Install the "Sensorstream IMU+GPS" app ([https://play.google.com/store/apps/details?id=de.lorenz_fenster.sensorstreamgps][2])
2. Ensure the sensors you want to stream to Wolfram are ticked on the 'Toggle Sensors' page. (If you want to stream other sensors besides 'Accelerometer', 'Gyroscope' and 'Magnetic Field', ensure the 'Include User-Checked Sensor Data in Stream' box is ticked. Beware, though - the more sensors are ticked, the more latency the sensor stream will have.)
3. On the "Preferences" tab:
a. Change the target IP address in the app to the IP address of your computer (ensure your computer and phone are connected to the same local network)
b. Set the target port to 5555
c. Set the sensor update frequency to 'Fastest'
d. Select the 'UDP stream' radio box
e. Tick 'Run in background'
4. Switch stream ON **before** executing code. (nb. ensure your phone does not fall asleep during streaming - perhaps use the 'Caffeinate' app ([https://play.google.com/store/apps/details?id=xyz.omnicron.caffeinate&hl=en_US][3]) to ensure this.)
5. Execute the following WL code:
(in part from http://community.wolfram.com/groups/-/m/t/344278)
QuitJava[];
Needs["JLink`"];
InstallJava[];
udpSocket=JavaNew["java.net.DatagramSocket",5555];
readSocket[sock_,size_]:=JavaBlock@Block[{datagramPacket=JavaNew["java.net.DatagramPacket",Table[0,size],size]},sock@receive[datagramPacket];
datagramPacket@getData[];
listen[]:=record=DeleteCases[readSocket[udpSocket,1200],0]//FromCharacterCode//Sow;
results={};
RunScheduledTask[AppendTo[results,Quiet[Reap[listen[]]]];If[Length[results]>700,Drop[results,150]],0.01];
stream:=Refresh[ToExpression[StringSplit[#[[1]],","]]& /@ Select[results[[-500;;]],Head[#]==List&],UpdateInterval-> 0.01]
### Detecting Gestures
On a technical level, the problem of gesture classification is as follows: given a continuous stream of accelerometer data (or similar),
1. distinguish periods during which the user is performing a given gesture from other activities / noise and
2. identify / classify a particular gesture based on accelerometer data during that period. This essentially boils down to classification of a time series dataset, in which we can observe a series of emissions (accelerometer data) but not the states generating the emissions (gestures).
A relatively straightforward solution to (1) is to approximate the gradients of the moving averages of the x, y and z values of the data and take the Euclidean norm of these - whenever these increase above a threshold, a gesture has been made.
movingAvg[start_,end_,index_]:=Total[stream[[start;;end,index]]]/(end-start+1);
^ Takes the average of the x, y or z values of data (specified by *index* - x-->3, y-->4, z-->5) from the index *start* to the index *end*.
normAvg[start_,middle_,end_]:=((movingAvg[middle,end,3]-movingAvg[start,middle,3])/(middle-start))^2+((movingAvg[middle,end,4]-movingAvg[start,middle,4])/(middle-start))^2+((movingAvg[middle,end,5]-movingAvg[start,middle,5])/(middle-start))^2;
^ (Assuming difference from start to middle is equal to difference from middle to end:) Approximates the gradient at index *middle* using the average from *start* to *middle* and from *middle* to *end* for x, y and z values, and then takes the sum of the squares of these values. Note that we do not need to take the square root of the final answer (to find the Euclidean norm), as doing so and comparing it to some threshold *x* would be equivalent to not doing so and comparing it to the threshold *x^2* (square root is a computationally expensive operation).
Thus
Dynamic[normAvg[-155,-150,-146]]
will yield the square of the Euclidean norm of the gradients of the x, y and z values of the data (approximated by calculating the averages from the 155th most recent to 150th most recent and 150th most recent to 146th most recent values). As accelerometer data is sent to the Wolfram Language, this value will update.
### Data Collection
To train the network, we must collect gesture data. To do this, we have a variety of options - we can either represent the gesture as a tensor of 3 dimensional vectors (x,y,z accelerometer data points) and perform time series classification on these sequences of vectors using hidden Markov models or recurrent neural networks, or we can represent the gesture as a rasterised image of a graph much like the one below:
![Rasterised image of a gesture][4]
and perform image classification on the image of the graph.
Since the latter has had some degree of success (e.g. in http://community.wolfram.com/groups/-/m/t/1142260), we attempt a similar method:
PrepareDataIMU[dat_]:=Rasterize@ListLinePlot[{dat[[All,1]],dat[[All,2]],dat[[All,3]]},PlotRange->All,Axes->None,AxesLabel->None,PlotStyle->{Red, Green, Blue}];
^ Plots the data points in *dat* with no axes or axis labels, and with x coordinates in red, y coordinates in green, z coordinates in blue (this makes processing easier as Wolfram operates in RGB colours).
threshold = 0.8;
trainlist={};
appendToSample[n_,step_,x_]:=AppendTo [trainlist,PrepareDataIMU[Part[x,n;;step]]];
Dynamic[If[normAvg[-155,-150,-146]>threshold,appendToSample[-210,-70,stream],False],UpdateInterval->0.1]
^ Every 0.1 seconds, checks whether or not the normed average of the gradient of accelerometer data at the 150th most recent data point (using the *normAvg* function) is greater than the threshold - if it is, it will create a rasterised image of a graph of accelerometer data from the 210th most recent data point to the 70th most recent data point and append it to *trainlist* - a list of graphs of gestures. Patience is recommended here - there can be up to ~5 seconds' lag before a gesture appears. Ensure gestures are made reasonably vigorously.
As a first test, we attempted to generate 30 samples of each of the digits 1 to 5 drawn in the air with a phone - the images of these graphs were stored in *trainlist*. Then, we classified them as 1, 2, 3, 4 or 5, converting *trainlist* into an association with key <image of graph> and value <number>).
We split the data into training data (25 samples from each category) and test data (all remaining data):
TrainingTest[data_,number_]:=Module[{maxindex,sets,trainingdata,testdata},
maxindex=Max[Values[data]];
sets = Table[Select[data,#[[2]]==x&],{x,1,maxindex}];
sets=Map[RandomSample,sets];
trainingdata =Flatten[Map[#[[1;;number]]&,sets]];
testdata =Flatten[Map[#[[number+1;;-1]]&,sets]];
Return[{trainingdata,testdata}]
]
^ Randomly selects *number* training elements and *Length[data]-number* test elements for each value in the list *data*
gestTrainingTest=TrainingTest[gesture1to5w30,25];
gestTraining=gestTrainingTest[[1]];
gestTest=gestTrainingTest[[2]];
*gestTraining* and *gestTest* now contain key-value pairs like those below:
![Key-value pairs of accelerometer graphs and labels][5]
##### Machine Learning
To train a model on these images, we first attempt a basic *Classify*:
TrainWithClassify = Classify[gestTraining]
ClassifierInformation[TrainWithClassify]
![A poor result in ClassifierInformation][6]
Evidently, this gives a very poor training accuracy of 24.4% - given that there are 5 classes, this is only marginally better than random chance.
As the input data consists of images, we try transfer learning on an image identification neural network (specifically the VGG-16 network):
net = NetModel["VGG-16 Trained on ImageNet Competition Data"]
We remove the last few layers from the network (which classify images into the classes the network was trained on), leaving the earlier layers which perform more general image feature extraction:
featureFunction = Take[net,{1,"fc6"}]
[//]: # (No rules defined for Output)
We train a classifier using this neural net as a feature extractor:
NetGestClassifier = Classify[gestTraining,FeatureExtractor->featureFunction]
We now test the classifier using the data in *gestTest:*
NetGestTest = ClassifierMeasurements[NetGestClassifier,gestTest]
We check the training accuracy:
ClassifierInformation[NetGestClassifier]
![A better classifier information result][7]
NetGestTest["Accuracy"]
1.
NetGestTest["ConfusionMatrixPlot"]
![A good confusion matrix plot][8]
[//]: # (No rules defined for Output)
This method appears to be promising, as a training accuracy of 93.1% and a test accuracy of 100% was achieved.
## Implementation of Machine Learning Model
To use the model with live data, we use the same method of identifying gestures as before in 'Detecting Gestures' (detecting 'spikes' in the data using moving averages), but when a gesture is identified, instead of being appended to a list it is sent through the classifier:
results = {""};
ClassGestIMU[n_,step_,x_]:=Module[{aa,xa,ya},
aa = Part[x,n;;step];
xa=PrepareDataIMU[aa];
ya=gestClassifier[xa];
AppendTo[results,{Length@aa,xa,ya}]
];
Dynamic[If[normAvg[-155,-150,-146]>threshold,ClassGestIMU[-210,-70,stream],False],UpdateInterval->0.1]
Real time results (albeit with significant lag) can be seen by running
Dynamic@column[[-1]]
## Conclusions and Further Work
On the whole, this project was successful, with gesture detection and classification using rasterised graph images proving a viable method. However, the system as-is is impractical and unreliable, with a significant lag, training bias (trained to recognise the digits 1 to 5 the way they are drawn by one person only) and small sample size: these are problems that can be solved, given more time.
Further extensions to this project include:
- Serious code optimisation to reduce / eliminate lag
- An improved training interface to allow users to create their own gestures
- Integration of the gesture classification system with the Channel interface (as described earlier) and deployment of this to the cloud.
- Investigation of the feasibility of using an RNN or an LSTM for gesture classification - using a time series of raw gesture data rather than relying on rasterised images (which, although accurate, can be quite laggy). Alternatively, hidden Markov models could be used in an attempt to recover the states (gestures) that generate the observed data (accelerometer readings).
- Adding an API to trigger actions based on gestures and deployment of gesture recognition technology as a native app on smartphones / smartwatches.
- Improvement of gesture detection. At the moment, the code takes a predefined 1-2 second 'window' of data after a spike is detected - an improvement would be to detect when the gesture has ended and 'crop' the data values to include only the gesture made.
- Exploration of other applications of gesture recognition (e.g. walking style, safety, sign language recognition). Beyond limited UI navigation, a similar concept to what is currently implemented could be used with, say, a phone kept in the pocket, to analyse walking styles / gaits and, for instance, to predict or detect an elderly user falling and notify emergency services. Alternatively, with suitable methods for detecting finger motions, like flex sensors, such a system could be trained to recognise / transcribe sign language.
- Just for fun - training the model on Harry Potter-esque spell gestures, to use your phone as a wand...
A notebook version of this post is attached, along with a full version of the computational essay.
## Acknowledgements
We thank the mentors at the 2018 Wolfram High School Summer Camp - Andrea Griffin, Chip Hurst, Rick Hennigan, Michael Kaminsky, Robert Morris, Katie Orenstein, Christian Pasquel, Dariia Porechna and Douglas Smith - for their help and support during this project.
[1]: http://community.wolfram.com//c/portal/getImageAttachment?filename=SensorDemo2.gif&userId=1371970
[2]: https://play.google.com/store/apps/details?id=de.lorenz_fenster.sensorstreamgps
[3]: https://play.google.com/store/apps/details?id=xyz.omnicron.caffeinate&hl=en_US
[4]: http://community.wolfram.com//c/portal/getImageAttachment?filename=109724.png&userId=1371970
[5]: http://community.wolfram.com//c/portal/getImageAttachment?filename=12475.png&userId=1371970
[6]: http://community.wolfram.com//c/portal/getImageAttachment?filename=19276.png&userId=1371970
[7]: http://community.wolfram.com//c/portal/getImageAttachment?filename=77697.png&userId=1371970
[8]: http://community.wolfram.com//c/portal/getImageAttachment?filename=96898.png&userId=1371970Euan Ong2018-07-17T21:16:07Z[WSS18] Punctuation Restoration With Recurrent Neural Networks
http://community.wolfram.com/groups/-/m/t/1379001
# Punctuation Restoration With Recurrent Neural Networks
Mengyi Shan, Harvey Mudd College, mshan@hmc.edu
![flow][1]
All codes posted on GitHub: [https://github.com/Shanmy/Summer2018Starter/tree/master/Project][2].
Raw results in the attached notebook.
----------
## Introduction
In natural language processing problems such as automatic speech recognition (ASR), the generated text is normally unpunctuated, which is hard for further recognition or analysis. Thus punctuation restoration is a small but crucial problem that deserves our attention. This project aims to build an automatic "punctuation adding" tool for plain English text with no punctuation.
Since the input text could be considered as a sequence in which context is important for every single word's properties, recurrent characteristics of neural networks are considered to be a good method. Traditional approaches to this problem include usage of various recurrent neural networks (RNN), especially long short-term memory layers (LSTM). This project examines several models built from different layers and introduces bidirectional operators which can significantly improve the result compared with old methods.
## Methods
![Basic steps of the method][3]
There're four basic steps in the whole process. First, we get the corpus of articles (with punctuations). Then, we keep the periods and commas in the corpus but change the question marks, exclamation marks, and colons to periods and commas, while removing all other punctuations. With this pure text, we tag each word as one of {NONE, COMMA, PERIOD} by judging if it is followed by a punctuation or not. And this set of tagging rules are sent to a neural network model for training. Finally, we test the result on another piece of articles, which is the test set.
### Data
Basically, we have two pieces of data. The first one is the Wikipedia text of 4000 nouns (deleting missing), and the second is 50 novels from Wolfram data repository.
(*Get wikipedia text of 4000 nouns*)
nounlist = Select[WordList[], WordData[#, "PartsOfSpeech"][[1]] == "Noun" &];
rawData = StringJoin @@ DeleteCases[Flatten[WikipediaData[#] & /@ Take[nounlist, {1, 4000}], 2], _Missing]
(*Get text of 50 novels*)
books = StringJoin @@ Get /@ ResourceSearch["novels", 50];
### Pre-processing
The first goal of the preprocessing step is to purify the text. That is, since we only consider commas and periods, we should either delete or replace other characters and punctuations. Also, for convenience, all numbers are replaced with 1 first. All other characters are removed from the text.
(*Show sets of characters replaced with comma, period, whitespace, one and null respectively*)
toComma = Characters[":;"];
toPeriod = Characters["!?"];
toWhiteSpace = {"-", "\n"};
toOne = {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9"};
toNull[x_String] := Complement[Union[Characters[x]], ToUpperCase@Alphabet[], Alphabet[], toOne, toComma, toPeriod, toWhiteSpace, {".", ",", " "}];
Then complete the replacement and modify it to pure form. And we include a validation test to examine its purity.
(*Replacement and modification. End with lowercase text with only periods, alphabets and commas.*)
toPureText[x_String] :=
StringReplace[#, ".," .. -> ". "] &@
StringReplace[#, ". " .. -> ". "] &@
StringReplace[#, {" ," -> ",", " ." -> "."}] &@
StringReplace[#, {"1" .. -> "one", " " .. -> " "}] &@
StringReplace[#, {"1. 1" -> "", "1, 1" -> "1"}] &@
StringReplace[#, {" " .. -> " "}] &@
StringReplace[#, {"," -> ", ", "." -> ". "}] &@
ToLowerCase@
StringReplace[{toComma -> ",", toPeriod -> ".", toNull[x] -> "",
toWhiteSpace -> " ", toOne -> "1"}][x];
(*validation test*)
VerificationTest[Length@StringSplit@x == Length@TextWords@x]
Then we define a function fPuncTag that can generate the corresponding tagging given a piece of text with punctuation.
(*Define the tagging function, and maps it to original text. Original text is partitioned into pieces of 200 words*)
fPuncTag := Switch[StringTake[#, -1], ".", "a", ",", "b", _, "c"] &;
fWordTag[x_String] := Map[fPuncTag, Partition[StringSplit[x], 200], {2}];
And we can thus remove the punctuation, and build a set of rules between the unpunctuated text and the generated tagging.
fWordText[x_String] := StringReplace[#, {"," -> "", "." -> ""}] & /@ StringRiffle /@ Partition[StringSplit[x], 200];
fWordTrain[x_String] := Normal@AssociationThread[fWordText[x], fWordTag[x]];
totalData = fWordTrain@toPureText@rawText
With the total data, we want to divide it into three groups: the training set, the validation set, and the test set.
(* First we know that the length is 63252, then we divide it by 15:3:1*)
order = RandomSample[Range[63252]];
trainingSet = totalData[[Take[order, 50000]]];
validationSet = totalData[[Take[order, {50001, 60000}]]];
testSet = totalData[[Take[order, {60001, -1}]]];
### Train
During neural network training, I used 8 different combinations of layers, out of which 4 are worth considering. They are listed as followed. LSTM layer, gate recurrent layer, and basic recurrent layer are three types of recurrent layers, each representing a net that takes a sequence of vectors and outputs a sequence of the same length. LSTM is commonly used in natural language processing problems, so we start with it as a penetrating point.
(*Pure LSTM*)
net1 = NetChain[{
embeddingLayer,
LongShortTermMemoryLayer[100],
LongShortTermMemoryLayer[60],
LongShortTermMemoryLayer[30],
LongShortTermMemoryLayer[10],
NetMapOperator[LinearLayer[3]],
SoftmaxLayer["Input" -> {"Varying", 3}]},
"Output" -> NetDecoder[{"Class", {"a", "b", "c"}}]
];
(*Gate Recurrent*)
net2 = NetChain[{
embeddingLayer,
LongShortTermMemoryLayer[100],
GatedRecurrentLayer[60],
LongShortTermMemoryLayer[30],
GatedRecurrentLayer[10],
NetMapOperator[LinearLayer[3]],
SoftmaxLayer["Input" -> {"Varying", 3}]},
"Output" -> NetDecoder[{"Class", {"a", "b", "c"}}]
];
(Basic Recurrent)
net3 = NetChain[{
embeddingLayer,
LongShortTermMemoryLayer[100],
BasicRecurrentLayer[60],
LongShortTermMemoryLayer[30],
BasicRecurrentLayer[10],
NetMapOperator[LinearLayer[3]],
SoftmaxLayer["Input" -> {"Varying", 3}]},
"Output" -> NetDecoder[{"Class", {"a", "b", "c"}}]
];
(*Bidirectional*)
net4 = NetChain[{
embeddingLayer,
LongShortTermMemoryLayer[100],
NetBidirectionalOperator[{LongShortTermMemoryLayer[40],
GatedRecurrentLayer[40]}],
NetBidirectionalOperator[{LongShortTermMemoryLayer[20],
GatedRecurrentLayer[20]}],
LongShortTermMemoryLayer[10],
NetMapOperator[LinearLayer[3]],
SoftmaxLayer["Input" -> {"Varying", 3}]},
"Output" -> NetDecoder[{"Class", {"a", "b", "c"}}]
];
The embedding layer is used to change words into vectors that represent their semantic characteristics.
(*The embedding layer here*)
embeddingLayer = NetModel["GloVe 100-Dimensional Word Vectors Trained on Wikipedia and Gigaword 5 Data"]
With all those neural network models set up, we can train each neural network. To save time, I first trained all models with a small data set of only 3 million words to compare their behaviors.
(*train the neural network while saving the training object*)
NetTrain[net, trainingSet, All, ValidationSet -> validationSet]
### Test
Since this classification problem is a problem of a skewed dataset, that is, most of the words should have the tag "None", it doesn't make sense to use "accuracy" to measure the models' behavior. Even if it simply do nothing and always return "None", it will have a high accuracy that is the percentage of "None" in the whole tagging set. Instead, to evaluate the behavior of the models, we introduce the concept of precision, recall, and f1-score.
(*Precision and recall*)
precision = truePrediction/allTrue
recall = truePrediction/allPrediction
F1 = HarmonicMean[{precision, recall}]
![PR][4]
For a given test set, first, we want to remove its punctuations and run the trained model on it.
(*romve punctuation and run the model*)
noPuncTest = Keys /@ testSet
result = net["TrainedNet"] /@ noPuncTest;
Then we changed the tags to 1,2 and 0. And we calculate the elementwise product of realTag and resultTag. If an element is 4, it means that both the realTag and resultTag is 2, which counts as a successful prediction of a comma. An element of 2 represents a successful prediction of a period.
![Tag][5]
(*Change the tags to numerical values and count 1s and 4s*)
realTag = Replace[Flatten[Values /@ Take[testSet, 3252]], {"a" -> 1, "b" -> 2, "c" -> 0}, {1}];
resultTag = Replace[Flatten[result], {"a" -> 1, "b" -> 2, "c" -> 0}, {1}];
totalTag = realTag*resultTag;
Now we can use totalTag, resultTag, and realTag to calculate precision, recall, and f1-score.
(*Precision*)
PrecPeriod = N@Count[totalTag, 1]/Count[resultTag, 1]
PrecComma = N@Count[totalTag, 4]/Count[resultTag, 2]
(*Recall*)
RecPeriod = N@Count[totalTag, 1]/Count[realTag, 1]
RecComma = N@Count[totalTag, 4]/Count[realTag, 2]
(*F1*)
F1Period = (2*RecPeriod*PrecPeriod)/(RecPeriod + PrecPeriod)
F1Comma = (2*RecComma*PrecComma)/(RecComma + PrecComma)
## Result
Ten neural networks are trained based on a small dataset with different layers. Only using Long Short-Term Memory layers gives an f1 score of 13% and 11% for periods and commas. Introducing dropout parameters, pooling layers, elementwise layers, basic recurrent layers and gate recurrent layers all produce an f1 score between 10% and 30%, showing no significant improvement. Introduction of the bidirectional operator (combining two recurrent layers) improves the scores to 53% and 47%, and to 72% and 60% respectively when training on a larger dataset of 10M words.
Here are the results for the three different neural networks trained with a 3M small dataset, and bidirectional neural network (which has the best performance in the small dataset) trained with a larger dataset of 10M words. The first figure is of the period and the second is of the comma.
![Period][6]
![Comma][7]
We can easily observe the advantage of the bidirectional operator in terms of both periods and commas, precision and recall. Instead of the sequence to sequence learning, "tagging" is a significantly more efficient and accurate way to restore punctuation in plain text. Since every words' tags ("None", "Comma", "Period") is influenced by its context, it makes sense that recurrent neural networks and bidirectional operators show great potential in this research.
Generally, the recall score is significantly lower than the precision score, suggesting that the model generates too many punctuations than it should. This could be due to the dataset of Wikipedia which is not clean enough. In the Wikipedia text, sometimes there're equations, translations, or other strange characters that we simply delete. This changed the ratio of punctuations to words and produces some segments of text that is "full of" punctuations since all words are not recognized and simply deleted. One example of those "not clean segment" is shown below.
![wiki][8]
Also, the overall performance on commas is slightly worse than on periods. This also makes sense from a linguistics point of view. There seems to be a concrete linguistics set of rules for the period, but the usage of comma greatly depends on personal writing style. For example, you could say either *"I like apples but I don't like bananas."*, or *"I like apples, but I don't like bananas."* In this way, it's really hard to build a model for comma prediction with such high accuracy. But fortunately, sometimes adding commas or not doesn't really influence the overall meaning of the sentence. So it's okay to be tolerant to a slightly worse performance on commas.
## Future Works
70% f1-score is still not enough for the application. Planned future work focuses on improving accuracy to a level suitable for usage in industry. The most urgent and important future work is using a larger data size. We can observe great improvement when changing from 3M to 10M dataset, but it's still far less than enough.
![plot][9]
If we take a closer look at the evolution plots during training, we can see that the error rate and loss of training set are continuously decreasing, while the error rate and loss of the validation set soon reaches a stable state and doesn't change too much. The gap between those two curves suggests the possibility of overfitting, and it should greatly help if we introduce better and more data.
Also, punctuation restoration should not be limited to periods and comma. A more rigorous study of the question mark, exclamation mark, colon, and quotation mark is expected. However, we should note that the choice of most punctuations is not restricted to one possibility. In cases like distinguishing a period with an exclamation mark, we cannot expect a high f1-score. But it's still an interesting topic, may be useful for topics like sentimental analysis.
## Acknowledgement
I would like to thank the summer school for providing the environment and background skills for me to finish this project. Especially, I want to thank my mentor for helping me with neural network problems and debugging.
## Data and Reference
- [Wolfram Data Repository][10]
- [Wikipedia][11]
- Tilk O, et al. "Lstm for Punctuation Restoration in Speech Transcripts." Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech, 2015-January, 2015, pp. 683\687.
[1]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-10at7.10.22PM.png&userId=1362824
[2]: https://github.com/Shanmy/Summer2018Starter/tree/master/Project
[3]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-11at10.22.30AM.png&userId=1362824
[4]: http://community.wolfram.com//c/portal/getImageAttachment?filename=PR1.png&userId=1362824
[5]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-11at11.45.44AM.png&userId=1362824
[6]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-10at8.38.40PM.png&userId=1362824
[7]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-10at8.38.49PM.png&userId=1362824
[8]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-11at12.13.54PM.png&userId=1362824
[9]: http://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot2018-07-10at8.59.14PM.png&userId=1362824
[10]: https://datarepository.wolframcloud.com/category/Text-Literature/
[11]: https://www.wikipedia.orgMengyi Shan2018-07-11T17:20:45ZCloud Deploy broken with neural net (new?)
http://community.wolfram.com/groups/-/m/t/1382601
Over the last few weeks I routinely deployed neural nets to the cloud for computation. Today, even the simplest example does not work anymore:
(*initialise random neural net that talkes 200x200 image as input*)
scrnet = NetInitialize@
NetChain[{ConvolutionLayer[1, 1], PartLayer[1]},
"Input" -> {1, 200, 200}]
(*Deploy to cloud*)
cnet = CloudExport[scrnet, "MX", Permissions -> "Public"];
(*test on example image *WORKS**)
img = Import["ExampleData/ocelot.jpg"];
With[{net = CloudImport@cnet}, Image@net@{ImageData@img}]
(*Deploy as cloud form page *)
CloudDeploy[
FormPage[{"image" -> "Image"},
With[{net = CloudImport@cnet}, Image@net@{ImageData@#image}] &],
Permissions -> "Public"]
The last piece of code generates a cloud object. If I try to upload the ocelot image I get Image[Failed] as the sole output. If I do not use a net the form works fine:
CloudDeploy[FormPage[{"image" -> "Image"}, Image@ImageData@#image &],
Permissions -> "Public"]
I contacted wolfram support already and they told me to post here. Am I doing something wrong? I have ~1000 cloud credits left on a free account so that should not be the issue. Frankly, I am having a lot of issues with the documentation for cloud computing in general and am starting to wonder if it is worth the hassle.
Best, MaxMaximilian Jakobs2018-07-13T14:19:54Z