Message Boards Message Boards

Cellular Automaton Neural Network Classification

In this post I'll share a Neural Network (NN) arquiteture capable of classifying a input image of a Cellular Automaton (CA). This input image can be of any dimension (width and height).

First we'll create a function called RandomRule capable of generating a random CA.

RandomRule[n_Integer, W_Integer, H_Integer] := Image[
    1 - CellularAutomaton[n, RandomInteger[1, W], H-1], 
    ImageSize -> {UpTo@256, UpTo@256}
]

For example, RandomRule[90, 50, 30] gives:

90

The NN arquiteture will consiste of 16 Convolutional Layers of with a kernel of 2x3 (height 2 and width 3). This kernel size is choosen because it fits the CA rules, where the three neighboors of a line determine the center cell of the line bellow. A Pooling Layer will colapse all spatial dimension of the result into a single tensor of length 16x1x1, which is a requirment to make the NN size-independendant, which will then be Flatten out and 256 Linear Layers will be used for classification using SoftMax.

net[W_Integer, H_Integer] := NetInitialize@NetChain[{
    ConvolutionLayer[16, {2,3}],
    Ramp,
    PoolingLayer[{H,W} - {1,2}],
    FlattenLayer[],
    LinearLayer[256],
    SoftmaxLayer[]
},
    "Input" -> NetEncoder[{"Image", {W, H}, "ColorSpace" -> "Grayscale"}],
    "Output" -> NetDecoder[{"Class", Range[0, 255]}]
]

The data to train the NN will be random CA image that map the rule it generate.

data[W_Integer, H_Integer, n_Integer] := Table[
    RandomRule[i, W, H] -> i
, {i, RandomChoice[Range[0,255], n]}]

Let's initialize a trained model 60x20, which is big enough but still fast:

trained = net[60, 20];

enter image description here

We will train the net:

trained = NetTrain[trained,
    data[60, 20, 256*128],
    MaxTrainingRounds -> 1000,
    BatchSize -> 256*4,
    TargetDevice -> "GPU"
]

After the error drops less than 0.5% we can stop the training and it will be saved automatically. You can re-train the model with new input by simply re-evaluating the cell above. In my machine (GTX 1050) it takes about 5min to train.

Let's classify how well does this NN does on new data (remember that the function data creates new data each time it is run).

cm = ClassifierMeasurements[trained, data[60, 20, 256*128]];
Quantity[100*cm["Accuracy"], "Percent"]
Output: 99.7498%

Which gives a very good mark for such a simple NN. I wont plot the ConfustionMatrix because it is very confusion, as a 256x256 matrix...

Now for the fun part. How can we use this NN to classify any input image of any size? We will recreate a NN as the previous one, but add the weigths and biases from the trained net (I tried NetReplacePart with no succes...).

netTrained[W_, H_] := NetChain[{
    ConvolutionLayer[16, {2, 3},
       "Weights" -> NetExtract[trained, {1, "Weights"}], 
       "Biases" -> NetExtract[trained, {1, "Biases"}]
    ], Ramp,
    PoolingLayer[{H,W} - {1,2}],
    FlattenLayer[],
    NetExtract[trained, 5],
    SoftmaxLayer[]
},
    "Input" -> NetEncoder[{"Image", {W,H}, "ColorSpace" -> "Grayscale"}],
    "Output" -> NetDecoder[{"Class", Range[0, 255]}]
]

With the code bellow we can test different input sizes, where all the rules are evaluated for 50 random input images. The Union is used to avoid duplicated points.

Block[{W=70,H=20, netTemp, pts},
    netTemp = netTrained[W,H];
    pts = Flatten[Table[
       {i, netTemp[RandomRule[i, W,H]]}
    , {i,0,255}, {50}], 1] // Union;
    ListPlot[pts,
       FrameLabel -> {"Correct Rule", "Predicted Rule"}, 
       PlotLabel -> StringTemplate["`1` x `2`"][W,H]]
]

enter image description here enter image description here enter image description here enter image description here enter image description here

Where we can see that images with larger widths are more important, since it gives more variation, since lots of rules die down quickly with height.

The NN arquiteture could be easily scaled down for less convolutional layers and the output could be compressed from 256 one-hot, to 8-bits, to use less degrees of freedom (the current one uses 4464 elements).

POSTED BY: Thales Fernandes
5 Replies

Thales, do you think your approach might give some results for the partitions example with low modulus at Successive Differences of Sequences:

http://demonstrations.wolfram.com/SuccessiveDifferencesOfSequences

Those certainly look like CAs. - George

POSTED BY: George Beck

One of the main reasons why this approach works well with CA (the collapsing of all spatial information) is because there is an deep underline structure with CA, each cell is locally dependent on the cells above. So, in a sense, the convolution is being force to learn the this underline structure. My ideia of using a Pooling layer for the colapse of the spatial information is to get a operation similar to DeleteDuplicates.

The patterns shown in link provided seems to fall this locally category, at least for low modulus as you suggest. Perhaps a bigger convolution of cascades of thereof followed by all-spatial pooling can lead it to learn the underline structure of the rule used to generate it. It might be worth investigating this approach in others systems.

POSTED BY: Thales Fernandes

enter image description here - Congratulations! This post is now a Staff Pick as distinguished by a badge on your profile! Thank you, keep it coming!

POSTED BY: Moderation Team

Thales, what a neat project, thanks for sharing! Some observations:

  • There is a little typo in function definition, I think. net[W_Integer, H_W_Integer] should have simply H_Integer perhaps.

  • Did you try option ValidationSet -> Scaled[0.1] for NetTrain? I wonder if that improves anything.

I don't have GPU, too long to check. BTW what kind of GPU did you use and how long did it take to train?

POSTED BY: Vitaliy Kaurov

Thanks Vitaliy for the typo correction.

Did you try option ValidationSet -> Scaled[0.1] for NetTrain? I wonder if that improves anything.

Since each time the code is run the data is generated, there is no need for a Validation Set, but it could be set just to verify the convergence.

I don't have GPU, too long to check. BTW what kind of GPU did you use and how long did it take to train?

I used a GTX 1050 and it took me around 5 min to train. The fast time is due to the compact size of the NN. Which could be further compacted for efficiency too.

POSTED BY: Thales Fernandes
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract