Message Boards Message Boards

Improving accuracy of neural network for determining qubit rotation angle

Posted 1 month ago

The physics example problem (to illustrate the use of a basic neural network using Mathematica) I am looking at is a qubit rotated about the y-axis, where the rotation angle is discretized as $\theta_j \in (0, \pi)$. The setup involves the y-rotated qubit measured in the z-basis (hence spin-up and spin-down projector measurements). This scenario involves first analytically determining the measurement outcome probabilities in as a function of the rotation angle $\theta$, then generating measurement outcomes for training, for specific fixed rotation angles $\theta_j$. Then generating another set of test measurement data for some fixed rotation angle $\theta$, we use the neural network to infer the most probable rotation angle. My training data involves generating m = 1000 total measurements for each discrete rotation angle $\theta_j \in [0, \pi]$, then saving the measurement outcomes as tuples of spin-up and spin-down outcomes for each discrete angle. These outcomes are associated with each of the discrete $\theta_j$ values which are one-hot vectors (hence training data of the form {1000,0} -> {1,0,0,0,0...} if for the first rotation angle we get all spin-up outcomes).

The idea is that after training, setting some true rotation angle $\theta$, and generating a new set of test measurement outcomes, the trained neural network should be able to output a probability distribution that shows the most likely discrete rotation angle is the true angle. The code below works but I am having difficulty improving the accuracy without simply increasing the layers and MaxTrainingRounds (this seems to have it's limits in improving accuracy). Can anyone advise on how to improve the accuracy of the code in determining the correct discrete rotation angle (I would like to maintain the general framework of the code)? I am very new to using Mathematica for machine learning applications hence the query. Thanks for any assistance, this is the code in question:

POSTED BY: Byron Alexander
8 Replies
Posted 1 month ago

Please see the attachment. I provided 3 examples.

You are predicting the numerical values, e.g., {{9997, 3} -> 1 but the SoftmaxLayer[] is to classify categorical values. In other words, your output data is numerical (e.g., 1) but your network uses the error function for categorical output. The SoftmaxLayer automatically invoke the CrossEntropyLossLayer as a default which is to compute the measurements for the purpose of classification (e.g., accuracy, confusion matrix), rather than prediction (e.g., r-square or mean square)

"For nets that contain CrossEntropyLossLayer, the following built-in measurements are available"

What I learned are:

  • pay attention to the type of the output, whether for prediction (numerical values) or classification (categorical value). This will determine the type of loss layer (MeanSquaredLossLayer, CrossEntropyLossLayer)
  • pay attention to the type of the output value (real value, vector, or a list}, this determines the last layer and the second last layer (SoftmaxLayer[] or, LinearLayer[], SoftmaxLayer[]) as well as the number of neurons (e.g., LinearLayer[1] or LinearLayer[50]).
  • Use the "TrainingNet" option to check the type of loss layer (not "TrainedNet")


    finalNet3 = trainedNet3["TrainedNet"]

  • Use the "All" option in the NetTrain[net3, trainingData3, All] to check all properties

I noticed you applied PINN and it would be nice if you can share it (I am also learning PINN).

POSTED BY: Sangdon Lee

@SangdonLee Hi thanks for the continued discussion. I see what you are saying about noting the output, to see if it is of a prediction or classification type. The physics problem that I am looking at is a rotated qubit where the rotation angles (between 0, pi]) are discretized and the z-measurements of spin-up and spin-down are used to generate training data e.g. {{9988, 12} -> 1}, implies that for the first discretized rotation angle, when measurements are simulated we find 9988 spin-up outcomes and 12 spin-down outcomes. This is the example that is outlined on page 2 of [Nolan's paper (I think you will find it interesting). I'm using this as a study example.

With regard to your previous attached code and comments, I think it is clear that this physics problem is a classification problem. The only difference that I note between my code and you example 3 classification code is that the output in training has inverted commas " ", that is {{9988, 12} -> "1"} instead of {{9988, 12} -> 1}. The thing is that when I add these inverted commas and I send the Training Data through to test the network, I get a list of labels as outcomes. Which is correct for classification. But I am interested in obtaining the associated probabilities (from the SoftMax function) associated with each discrete rotation bin. With my original code (without the " " around the output) I obtain these probabilities.

As examples I attach the codes "Original code without inverted commas" and "Modified code with inverted commas". You can see the different output as lists of probabilities for the original code and lists of labels for the modified code. Do you maybe know how to produce probability lists, as in the original code, from the modified code?

POSTED BY: Byron Alexander
Posted 27 days ago

Per your interest on predicting probabilities, you can compute the probabilities for classification, by using the "net3" and the "trainedNet3" which use the SoftmaxLayer and the output as a string (e.g., "1", not 1.0.

{finalNet3[{9987, 13}, "Probabilities"] 
 finalNet3[{5062, 4938}, "Probabilities"] 
 finalNet3[{0, 10000}, "Probabilities"] }

Note that the following three samples are classified correctly. For example, using the 1st, 25th, and 50th samples, the 3 samples are correctly classified with the highest probabilities.

  • {9987,13} -> "1": {9987,13} show the highest probability to be classified as "1"
  • {5062,4938}->"25": {5062,4938} show the highest probability to be classified as "25"
  • {0,10000}-> "50": {0,10000} show the highest probability to be classified as "50"

I am not sure what you mean by: "the output in training has inverted commas " ", that is {{9988, 12} -> "1"} instead of {{9988, 12} -> 1}." I think you mean string not a number.

You can compute the probabilities for each class by using the SoftmaxLayer but the SoftmaxLayer requires the output to be a string, not a number, which I called classification.

By the way, you can check and plot the sensitivities of each input by changing the 1st input while holding the 2nd input constant and vice versa.

Table[{x1, finalNet3[{x1, 5000}]}, {x1, 4000, 6000, 100}]

Hope this helps.

POSTED BY: Sangdon Lee

@SangdonLee Thanks, all the discussions are very helpful. The physics problem that I am considering, for learning purposes at this stage, is the problem of a rotating qubit. I consider that a qubit rotates about it's y-axis by an angle of \theta in [0, pi]. For training we consider the z measurements of spin-up and spin-down for discretized angles of [0, pi]. Hence the training result {9989, 11} -> "1" indicates that I obtain 9989 spin up results and 11 spin down when measuring the qubit during the first discretized rotation angle. Since I discretize the range [0, pi] in 50 intervals, I obtain 50 such training data results. Have a look at my final code, let me know what you think of my attempt to model this example (not sure how familiar you are with quantum mechanics?). You can advise if anything is unclear in the code or example.

POSTED BY: Byron Alexander
Posted 1 month ago

The syntax looks correct although the training and testing data sets are usually 70% and 30% split, that is, ValidationSet->Scaled[.3]. You can split the input data into a training dataset and a validation dataset also, e.g., ValidationSet->myValidationSet. By doing this way, you can apply the NetMeasurements function to compute various measurements for your validation set.

I noticed that you are changing the one-hot coding as a number, not a "string" and think about whether the "string" would make more sense or not. If string is used, then your problem becomes classification and thus your net has to be modified, especially the last layer.

  • Number? e.g., {1,0,0,0,0.......} to 1, {0,1,0,0,0.......} to 2,
  • String? e.g., {1,0,0,0,0.......} to "1", {0,1,0,0,0.......} to "2",

Adding more hidden layers does not necessarily increase prediction accuracy as demonstrated by the Stephen Wolframs' blog.

POSTED BY: Sangdon Lee
Posted 1 month ago

@SangdonLee Many thanks for your response, just one query, when I try to evaluate NetMeasurements using the following code:

validationData = trainingData2b;
accuracy = NetMeasurements[trainedNet, validationData, "Accuracy"]
precision = NetMeasurements[trainedNet, validationData, "Precision"]

I obtain the following strange results: 
<|1 -> 1., 2 -> 1., 3 -> 1., 4 -> 1., 5 -> 1., 6 -> 1., 7 -> 1., 
 8 -> 1., 9 -> 1., 10 -> 1., 11 -> 1., 12 -> 1., 13 -> 1., 14 -> 1., 
 15 -> 1., 16 -> 1., 17 -> 1., 18 -> 1., 19 -> 1., 20 -> 1., 21 -> 1.,
  22 -> 1., 23 -> 1., 24 -> 1., 25 -> 1., 26 -> 1., 27 -> 1., 
 28 -> 1., 29 -> 1., 30 -> 1., 31 -> 1., 32 -> 1., 33 -> 1., 34 -> 1.,
  35 -> 1., 36 -> 1., 37 -> 1., 38 -> 1., 39 -> 1., 40 -> 1., 
 41 -> 1., 42 -> 1., 43 -> 1., 44 -> 1., 45 -> 1., 46 -> 1., 47 -> 1.,
  48 -> 1., 49 -> 1., 50 -> 1.|>

Firstly, I don't think the accuracy could be 1. Secondly, I would have expected some real number between 0 and 1 for the precision (instead I get this strange output). Do you have any idea what is going on here?

POSTED BY: Updating Name
Posted 1 month ago

The followings are mere suggestions.

  • The inputs are a 100 x 2 matrix (i.e., only 2 input parameters), but the outputs are a 100 x 100 matrix (a vector of 100 values due to one-hot vectors) which is huge compared to the 2 predictors. It might be helpful to change the one-hot coding as a number or string. e.g., {1,0,0,0,0.......} to 1, {0,1,0,0,0.......} to 2, etc.

    trainingData2 = Thread[trainingData[[All, 1]] -> Range[100]]
    net = NetChain[{LinearLayer[50], ElementwiseLayer["ReLU"], 
        LinearLayer[50], ElementwiseLayer["ReLU"], LinearLayer[1]}];  
  • The "net" has only very few hidden layers and it might be helpful to increase the number of hidden layers. Adding more layers does not necessarily increase the accuracy thus apply other layers such as batch normalization layer (adding which layer may not be straightforward)

  • The last 4 samples of the "trainingData" have the same input values but the output values are different, which does not make sense.
POSTED BY: Sangdon Lee

@SangdonLee Thanks for your response. Having implemented your suggestions I do note an improvement. One query, do you maybe know how to incorporate the ValidationSet[] built-in function into the type of neural network to increase the accuracy and prevent overfitting? I left highlighted in purple my attempt at including the ValidationSet[]. It does run but I don't think it is set in an optimal way. I attach the revised Notebook.

POSTED BY: Byron Alexander
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract