Group Abstract

Message Boards

WOLFRAM COMMUNITY

13.1K Views

2 Replies

9 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Data Science External Programs and Systems Wolfram Language Modeling Machine Learning Neural Networks

Wolfram Language vs. TensorFlow vs. Numpy - Coding on Simple Neural Network

Lanston Hau Man Chu

Lanston Hau Man Chu, University of WisconsinMadison

Posted 7 years ago

Source Code: Github Repositories Coding simple cases on complicated frameworks often offers important insights on the prototyping abilities of our tools. In this post, I will try to code a simple neural network problem on three different programming languages/libraries, namely Wolfram Language, TensorFlow (Python)¹ and Numpy (Python)². Let's take a simple hypothetical problem in the life insurance industry as an example. An actuary would study the historical claim patterns of insurance policies and would do valuation work by making predictions of future claims. In this toy example, lets look at a pool of insured persons of the same age (e.g. age 35) of a medical insurance product. We have four existing insured persons, A, B, C, D, of different gender, smoker status, country class and claim history in their first policy year. We would like to know whether an insured person, E, would make claims in his/her first policy year. Below is a summary table: The problem We would construct a simple neural network with a linear layer and a sigmoid layer, and then train the network by using full batch gradient descent method (with mean-square-error $E = \frac{1}{2} \sum_{k} (y_k - t_k)^{2}$ as the loss function and a learning rate of $\lambda = 0.5$). We would go through $10,000$ epochs for the training. After the network is trained, we would make prediction for Insured person E based on the trained network. Image 1: A simple neural network of two layers: a linear layer and a sigmoid layer. We would focus on the coding style/ prototyping abilities of the programming languages/ libraries, instead of the theoretical background. For the explanation of the theoretical background for the forward/backward propagation of linear layer and sigmoid layer, please refer to this article³. Let's see how the codes of the three programming languages/libraries look like: Summary As we can see, Numpy has the shortest run-time. It is still within expectation as Numpy is a lower-level to-the-metal language/library, while TensorFlow and Wolfram Language are) much more to-the-human. Although run-time differences seem huge, we should not forget that this toy example is simple (i.e. with one linear layer and one sigmoid layer only), which means many high-level objects of TensorFlow and Wolfram Language designed for neural network have not been fully utilized. Numpy is also using fewer lines of codes than TensorFlow. But again, we need to note the simplicity of layering in this case, which involves basic forward/backward propagations, allowing Numpy to be less wordy than TensorFlow. If the structure of the neural network becomes more complicated, the difficulty and complicity of coding in Numpy would increase much more significantly than TensorFlow. It is worth to point out that Wolfram Language uses only seven lines, which indicates that it has robust prototyping abilities in neural network. In fact, Wolfram Language is designed to perform abstract computation, such that many concepts in the neural network can be categorized as a single object, which is also reflected in the above comparison table. In this particular example, in fact, we can even further reduce the lines of code to four⁴, each refers to the concept of 1. importing training data, 2. constructing network, 3. training network and 4. making predictions. Below are the codes being used:- Wolfram Language trainingSetInput = {{0, 1, 2}, {0, 0, 2}, {1, 1, 1}, {1, 0, 1}}; trainingSetOutput = {1, 0, 1, 0}; n = Length@trainingSetInput; asso = Thread[trainingSetInput -> trainingSetOutput]; net = NetChain[{LinearLayer[], ElementwiseLayer["Sigmoid"]}]; trained = NetTrain[net, asso, MaxTrainingRounds -> 10000, LossFunction -> MeanSquaredLossLayer[], Method -> {"SGD", "LearningRate" -> 0.5}, BatchSize -> n]; trained[{0, 1, 0}] TensorFlow (Python) import tensorflow as tf import numpy as np # training data training_set_inputs =np.array([[0,1,2],[0,0,2],[1,1,1],[1,0,1]]) training_set_outputs =np.array([[1],[0],[1],[0]]) # containers and operations x = tf.placeholder(tf.float32, [None, 3]) W = tf.Variable(tf.zeros([3, 1])) B = tf.Variable(tf.zeros([1])) yHat = tf.nn.sigmoid(tf.matmul(x, W) + B) yLb = tf.placeholder(tf.float32, [None, 1]) learning_rate = 0.5 mean_square_loss = tf.reduce_mean(tf.square(yLb - yHat)) train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(mean_square_loss) # use session to execute graphs sess = tf.Session() init=tf.global_variables_initializer() sess.run(init) # start training for i in range(10000): sess.run(train_step, feed_dict={x: training_set_inputs, yLb: training_set_outputs}) # do prediction x0=np.float32(np.array([[0.,1.,0.]])) y0=tf.nn.sigmoid(tf.matmul(x0,W) + B) print('%.15f' % sess.run(y0)) Numpy (Python) import numpy as np from numpy import exp, array, random, dot training_set_inputs = array([[0, 1, 2], [0, 0, 2], [1, 1, 1], [1, 0, 1]]) training_set_outputs = array([[1, 0, 1, 0]]).T random.seed(1) #Initialization W = random.random((3, 1)) B = random.random((1, 1)) for iteration in range(10000): # Sigmoid function yHat = 1 / (1 + exp(-(dot(training_set_inputs, W)+B))) # gradient of mean square loss: grad0 = (yHat-training_set_outputs) # gradient of Sigmoid: grad = grad0 * yHat * (1 - yHat); # full batch gradient descent grad=(yHat-training_set_outputs) * yHat * (1 - yHat) # gradient of linear layer d_W=dot(training_set_inputs.T, grad) # just sum up grad to form d_B d_B=np.sum(grad,axis=0) LearnRate=0.5 # gradient descent method W -= LearnRated_W B -= LearnRated_B print(1 / (1 + exp(-(dot(array([0, 1, 0]), W)+B)))) Footnotes TensorFlow means coding in Python using TensorFlow library as the main tool for constructing neural network (but without Keras), while using Numpy for certain basic calculations. Numpy means coding in Python using the Numpy library to construct neural network without involving TensorFlow. There are some discrepancies between the network used in the reference article and that in this post. For example, the affine layer in the referenced neural network does not have a bias term for the sake of simplicity, while we do. The referenced neural network codes the learning process in the format of adding the minus gradient instead of lessing the gradient. But the basic ideas are the same, and in fact, this article is inspired by that article. By putting all numerical data in `asso` without defining `trainingSetInput` and `trainingSetOutput`, and putting `n` in `NetTrain`.

POSTED BY: Lanston Hau Man Chu

2 Replies

Sort By:

Lanston Hau Man Chu

Lanston Hau Man Chu, University of WisconsinMadison

Posted 7 years ago

Thanks for your comment. 1 - Yes, you are right about the timing, that's why I've pointed out that "we should not forget that this toy example is simple" in my post. In fact, the main focus of my post is the prototyping abilities instead of timing, and when I wrote the post I was thinking whether I should exclude the runtime in my post, or should I include it but pointing out that it is not really a good indicator for this case. Since timing is always what people eager to see, and people may even test it themselves especially in this simple case with just a few lines of codes at the appendix. Finally, I decided to include the runtime and point out the above consideration so that all of us are aware. 2 and 3 - The purpose of this post is to see the prototyping abilities of different approaches, from high to low level. This is why I picked these three languages/libraries including Numpy. I hope the codes in the appendix would help people to have some quick insight on how the tools abstract different concepts and can make good reference when comparing to the summary table. Perhaps in the future, we can have a comparison with another set of approaches when we want to look at another angle.

POSTED BY: Lanston Hau Man Chu

Updating Name

Posted 7 years ago

I just want to clarify a few things about this post for The timing results are completely misleading: for a real-world dataset, TensorFlow and the WL neural net framework should be faster than your NumPy code, and TensorFlow and the WL neural net frameworks should have very similar speeds for both GPU and CPU. Only in the special case of a super small dataset should you see these timing differences. There are many reasons: `NetTrain` starts plotting loss curves, both `NetTrain` and TensorFlow run a compilation step that tries to find any runtime optimizations possible by rewriting the computation graph of the net. All of this is completely unnecessary for tiny examples, but can produce major speedups for large nets, which is the case that people care about optimizing. Lines of code versus TensorFlow: TensorFlow is designed to be a super low-level framework that gives maximal flexibility (it can do things that the WL neural net framework can't, due to this flexibility), whilst the WL neural net framework is designed to be as high-level as possible to be as simple to use as possible (it itself uses a low-level framework as a backend, MXNet). A much more interesting comparison is against Keras (a high-level framework built on top of TensorFlow), which you mention. The comparison between NumPy and WL is strange: the analogue of using NumPy is writing a neural net from scratch using WL `PackedArrays`. Why not compare against that instead?

POSTED BY: Updating Name

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback

Wolfram Language vs. TensorFlow vs. Numpy - Coding on Simple Neural Network

The problem

Summary

Wolfram Language

TensorFlow (Python)

Numpy (Python)

Footnotes