Group Abstract Group Abstract

Message Boards Message Boards

0
|
3.1K Views
|
4 Replies
|
2 Total Likes
View groups...
Share
Share this post:

How can I reproduce a historical neural network result?

Posted 2 years ago
POSTED BY: Veit Elser
4 Replies
Posted 2 years ago

I did not expect to get the same learned weights or values on the hidden nodes, because these things depend on how the network is seeded (using RandomSeeding -> Automatic I get different results). What intrigued me about Table 5 of RHW was the fact that the codes seemed to be ternary, with values 0, 0.5 and 1 (0.5 is a point of symmetry for the sigmoid function). After my historical reenactment of the experiment I believe RHW may have rounded the values in Table 5 to the nearest half. I also repeated the experiment with n=4 and again there was a broad distribution of intermediate values (not just 0.5).

POSTED BY: Veit Elser

You should not expect to get the same learned weights (and resulting hidden unit patterns) as they achieve, as it is not unique (you can always do some matched linear transformation of the learned weights) RHW refer to this indirectly in the paper:

It is of some interest that the system that the system employed its ability to use intermediate values in solving this problem. It could, of course, have found a solution in which the hidden units took on only the values of zero and one. Often it does just that, but in this instance, and many others, there are solutions that use the intermediate values, and the learning system finds them even though it has a bias toward extreme values.

POSTED BY: Joshua Schrier
Posted 2 years ago

Thanks Joshua! I was hasty when reading the guide and didn't see that sigmoids in the final layer changed the default loss. I should have guessed this when the progress monitor started plotting "error" in addition to the loss.

In the meantime I had also found that increasing MaxTrainingRounds reduced the loss substantially, with the result that the autoencoder accuracy became quite good. On the other hand, the codes generated by the encoder stage of the well-trained autoencoder still deviated from binary codes, exactly as reported by RHW. I repeated those experiments with the properly implemented mean-squared loss (following RHW). The resulting codes were non-binary to about the same extent.

POSTED BY: Veit Elser

A few salient points:

You state that you wish to use "mean-squared loss". However, your code does not do this. When an Elementwise[LogisticSigmoid] is the last layer, NetTrain will default to using the CrossEntropyLoss (see ref/LossFunction) If you wish to use a different loss function that this default, set the LossFunction -> MeanSquaredLossLayer[] option in NetTrain.

You may need to increase the MaxTrainingRounds option in NetTrain. By default (if nothing else is specified) it will run a maximum of 10^4 batches, but this may not be enough. Try increasing this to converge the results. For example, using the default loss (CrossEntropyLoss), I found that the loss was still converging beyond this default; increasing MaxTrainingRounds -> 10^5 batches resulted in a well-converged results.

example loss convergence plot

POSTED BY: Joshua Schrier
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard