Group Abstract Group Abstract

Message Boards Message Boards

NetDecoder[ ] failed to decode NetEncoder["AudioSTFT"] output?

Posted 3 years ago
POSTED BY: John M.
6 Replies
Posted 3 years ago
POSTED BY: John M.

Sorry, you should try TransposeLayer[{2, 3, 1}] in the discriminator instead of TransposeLayer[{3, 2, 1}] (which is the same as TransposeLayer[1 <-> 3]).

And TransposeLayer[{3, 1, 2}] in the generator.

BTW, when I try to use your EXAMPLE2.nb, I don't understand how it can fit the dimensions. I have this error:

NetInitialize[discriminator][Audio[File["ExampleData/car.mp3"]]]
During evaluation of In[17]:= NetChain::invindata3: Data supplied to port "Input" could not be encoded; "Function" encoder did not produce an output that was a 256*256*2 array of real numbers.
Out[17]= $Failed

because indeed the NetEncoder is not producing arrays of size {256,256,2} (the first dimension varies depending on the length of the signal):

Dimensions[enc[Audio[File["ExampleData/car.mp3"]]]]
Out[18]= {2693,256,2}

Do you use audio signals (FileNames["*.wav", NotebookDirectory[]]) that have all a particular length?

Also, do you get why your "c" seems to be 2 while it's 1 in the paper?

Happy to see GANs with audio in the Wolfram Language :)

Quick guess: Can you try TransposeLayer[{3, 1, 2}] instead of TransposeLayer[{1 <-> 3}] and TransposeLayer[{3 <-> 1}]

Posted 3 years ago

For sure!, I wish there was more examples of how to use NetGANOperator[] online, & I was excited when it was implemented.

I tried changing the TranspsoseLayers[] from {3 <-> 1} to {3, 1, 2} & it gave this error:

NetChain::valfail: Validation failed for ConvolutionLayer: kernel size 4*4 cannot exceed input size 1*128 plus padding size 2*2.

Then, I changed them to from {3 <-> 1} to {3, 2, 1} & I could evaluate the nets, but I still got bad from the generator results after training. I even tried adjusting my parameters:

kern = {4, 4};
chan = 128;
α = 0.2;

& restructuring the generator & discriminator more closely following the example :

discriminator =
 NetChain[
  {
   TransposeLayer[{3, 2, 1}, "Input" -> {256, 256, 2}],
   ConvolutionLayer[chan, kern, "Stride" -> 2, PaddingSize -> 1],
   ParametricRampLayer[{}, "Slope" -> \[Alpha]],
   ConvolutionLayer[chan*2, kern, "Stride" -> 2, PaddingSize -> 1],
   ParametricRampLayer[{}, "Slope" -> \[Alpha]],
   ConvolutionLayer[chan*4, kern, "Stride" -> 2, PaddingSize -> 1],
   ParametricRampLayer[{}, "Slope" -> \[Alpha]],
   ConvolutionLayer[chan*8, kern, "Stride" -> 2, PaddingSize -> 1],
   ParametricRampLayer[{}, "Slope" -> \[Alpha]],
   ConvolutionLayer[chan*16, kern, "Stride" -> 2, PaddingSize -> 1],
   ParametricRampLayer[{}, "Slope" -> \[Alpha]],
   ConvolutionLayer[chan*32, kern, "Stride" -> 2, PaddingSize -> 1],
   ParametricRampLayer[{}, "Slope" -> \[Alpha]],
   ReshapeLayer[{4*4*128*32, 1}],
   LinearLayer[{}]
   }, "Input" -> enc

  ]

.

generator =
NetChain[
{

LinearLayer[{4096*4*4 }],
ReshapeLayer[{4096, 4, 4}],
ElementwiseLayer["ReLU"],
DeconvolutionLayer[chan*32, kern, "Stride" -> 2, PaddingSize -> 1],
ElementwiseLayer["ReLU"],
DeconvolutionLayer[chan*16, kern, "Stride" -> 2, PaddingSize -> 1],
ElementwiseLayer["ReLU"],
DeconvolutionLayer[chan*8, kern, "Stride" -> 2, PaddingSize -> 1],
ElementwiseLayer["ReLU"],
DeconvolutionLayer[chan*4, kern, "Stride" -> 2, PaddingSize -> 1],
ElementwiseLayer["ReLU"],
DeconvolutionLayer[chan*2, kern, "Stride" -> 2, PaddingSize -> 1],
ElementwiseLayer["ReLU"],
DeconvolutionLayer[2, kern, "Stride" -> 2, PaddingSize -> 1],
ElementwiseLayer[Tanh],
TransposeLayer[{3, 2, 1}]
},
"Input" -> 100,
"Output" -> dec
]

After training, though, the generator only generated noise. I'm certain it has something to do with the dimensions {256,256,2} getting somehow switched around in the net, but I don't know where/how. In the MATLAB example, the TransposeLayer[] equivalents come at the opposite ends of the generator & discriminator (i.e., BEFORE the DeconvolutionLayer[]s in the generator & AFTER the ConvolutionLayer[]s in the discriminator). I tried doing building the nets that way, but I get errors & can't evaluate the cells with my NetChain[]s until I do it in reverse. The dimensions in the NetChain] box are reverse of the way the dimensions are outlined in [the paper too, e.g.,

enter image description here _ _ _enter image description here

I'm sure it's just a simple transposition issue, any tips would be greatly appreciated, I'd love to get this going in Mathematica but there are obviously some details here I'm missing.

I've included an updated EXAMPLE notebook. Thanks.

Attachments:
POSTED BY: John M.
Posted 3 years ago
Attachments:
POSTED BY: John M.
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard