Message Boards

WOLFRAM COMMUNITY

4161 Views

2 Replies

2 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Avoid training speed to slow with each new net trained?

Eric Mockensturm

Eric Mockensturm, Stable Focus Consulting

Posted 5 years ago

I have a problem in which I want to optimize the placement of sensors in addition to using a NN to predict what those sensors are "saying". (Aside: It's kind of an interesting issue if you try to optimize both the NN and the sensor placement at the same time in a single NetGraph. The training data--at least what is fed into the NN--changes at each iteration. I've developed a few strategies for doing this but none work very well as there is a local minimum for each sensor combination. The best strategy seems to be decoupling the two optimization problems, finding some good sensor layouts with a GA-like strategy, and then doing brute-force hill climbing from those layouts. If anyone is interested in this or would like to discuss better strategies, let me know.) To find good sensor layouts I run NetTrain many times. I have notice that the input rate drops significantly with the number of training runs. At first I thought this might be because the GPU (both a Quadro M4000 and a GTX 1050Ti) might be getting hot a reducing their clock speeds. Some simple testing showed this was not the case (and the GPU loads are typically less than 20%). If I stop the training for awhile (to, say, let the GPUs cool) and then start back up, the input rate does not increase back to when the training iterations started. If, however, I restart the kernel and start training immediately, the input rate climbs back to what it was initially. Thus, it seems something is going on in the kernel or the link between it and MXNet. I have included a graph of the mean input rate for each network that is trained. At various points I stopped the training but the rate continues to drop. When I restart the kernel the rate resets and starts the cycle again (orange dots). Thoughts? Eric Mockensturm Edit: Sorry, I didn't mention that this occurs with all NNs I've tried, even extremely simple ones. As an example, put a loop around this: NetTrain[NetChain[{16, Ramp, 12, Ramp, 4, SoftmaxLayer[]}, "Output" -> NetDecoder[{"Class", Range[4]}]], trainingData, "ResultsObject", ValidationSet -> testingData, MaxTrainingRounds -> 12000, BatchSize -> 20000, TargetDevice -> {"GPU", 1}, TrainingProgressReporting -> "Panel"] and generate 20000 training points with any input size (I've tried downto 2x1) in each loop iteration. The data in the graph is simply the MeanInputsPerSecond for each iteration.

POSTED BY: Eric Mockensturm

2 Replies

Sort By:

Eric Mockensturm

Eric Mockensturm, Stable Focus Consulting

Posted 5 years ago

FYI Wolfram has reproduced this and confirmed that it is a bug. Not sure if it's in Mathematica or MXNet but they hope to have it fixed "by the next release".

POSTED BY: Eric Mockensturm

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 5 years ago

I think even people with the right expertise will be hard pressed to say much in the absence of a concrete example that elicits theehavior. That said, I doubt any of it is related to clock speeds or processors overheating.

POSTED BY: Daniel Lichtblau

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Group Abstract

Feedback