# How does a neural network that only knows beauty interpret the world?

Posted 3 years ago
8481 Views
|
10 Replies
|
35 Total Likes
|

I recently came across a video that intended to show how neural networks interpret images of (not so beautiful) things if they have only been trained on beautiful things. It is quite a nice question, I think. Here is a website describing the technique, and here is a video that illustrates the idea. In this post I will show you, how to generate similar effects easily with the Wolfram Language:

and in video format:

On the right you see the "interpretation" of a neural network that has been shown lots of photos of flowers, when it actually looks at a rubbish dump with a couple of birds sitting on the rubbish.

## Devising a plan

We will need a training dataset and should hope to find a network in the Wolfram Neural Net Repository that more or less does what we want to do. If you have watched some of the excellent training videos on neural nets offered by Wolfram you will have noticed that the general suggestion is not to develop your own neural networks from scratch, but rather use what is already there and perhaps combine it, or adapt it so that you can achieve what you want. This is also very well described in this recent blog-post by experts on the topic. I am usually happy if I can use the work of others and do not have to re-invent the wheel.

If you read the posts describing how to build a network that have only seen beautiful things, you will find that they used a variation of the pix2pix network and an implementation in tensorflow (a "conditional adversarial network"). If you go through the extensive list of networks that are offered on the Wolfram Neural Net Repository you will see that there are Pix2pix resources, e.g.

ResourceObject["Pix2pix Photo-To-Street-Map Translation"]


or

net=ResourceObject["Pix2pix Street-Map-To-Photo Translation"]


I will use the latter resource object, but that does not actually matter. Next, we will need to build a training set.

## Scraping data for the training set

The next thing we need is a solid training set. My first attempt was to use ServiceConnect with the google search to obtain lots of images of flowers.

googleCS = ServiceConnect["GoogleCustomSearch"]
"Search", {"Query" -> "Flowers", MaxItems -> 1000,
"SearchType" -> "Image"}];


It turns out that the max of results returned is only 100, which is not enough for our purpose. I tried to fix this by using

imgs2 = ServiceExecute["GoogleCustomSearch",
"Search", {"Query" -> "Flowers", MaxItems -> 1000,
"StartIndex" -> 101, "SearchType" -> "Image"}];


but that did not work. So WebImageSearch is the way to go. It does cost ServiceCredits, but the costs are relatively limited. Let's download information on 1000 images of flowers:

imgswebsearch = WebImageSearch["Flowers", MaxItems -> 1000];


A WebImageSearch of up to 10 results costs 3 ServiceCredits. So this should be 300 credits. 500 credits can be bought for $3, and 5000 for$25 (+VAT). This would mean the generation of our training set comes in at maximally \$1.8, which is manageable - particularly if we consider the price of the eGPU that we will use later on.... Just in case we export the result, because we paid for it and might have to recover it later if we suffer a kernel crash or something.

Alright. Now we have a dataset that looks more or less like this:

Great. That contains the "ImageHyperlink" which we will now use to download all the images:

rawimgs = Import /@ ("ImageHyperlink" /. Normal[imgswebsearch]);
Export["~/Desktop/rawimgs.mx", rawimgs]


Again, we export the result (better safe than sorry!). Let's make the images conform:

imagesconform = ConformImages[Select[rawimgs, ImageQ]];


By using Select[...,ImageQ] we make sure that we use only images; and not error messages of the cases where it didn't work.

## Generating a training set

In the original posts they suggest that they used edges, i.e. EdgeDetect to generate partial information of the images, and then linked that to the full image like so:

rules = ImageAdjust[EdgeDetect[ImageAdjust[#]]] -> # & /@ imagesconform;


It turns out that my results with that were less than impressive so I went for a more time consuming approach that gave better results. I used

Monitor[rulesnew = Table[Colorize[ClusteringComponents[rules[[i, 2]], 7]] -> rules[[i, 2]], {i, 1, Length[rules]}];, I]


i.e. ClusteringComponents to generate a trainingset. Partial information on the images now looked like this:

rather than

when we use EdgeDetect. Our training data set now links the image with partial (CluseteringComponents) information via a rule to the original image. Basically, we give partial information of the world and train the network to see flowers. Just in case we export the data set like so:

Export["~/Desktop/rulesnew.mx", rulesnew]


## Training the network

If you want to train on the EdgeDetect version you can use:

retrainednet = NetTrain[net, rules, TargetDevice -> "GPU", TrainingProgressReporting -> "Panel", TimeGoal -> Quantity[120, "Minutes"]]


otherwise you can use

retrainednet2 = NetTrain[net, rulesnew, TargetDevice -> "GPU", TrainingProgressReporting -> "Panel", TimeGoal -> Quantity[120, "Minutes"]]


Note that I use a GPU and considerable training time (2h). On a CPU this would take quite a while. Here are typical results of the EdgeDetect network:

retrainednet[EdgeDetect[CurrentImage[], 0.7]]


and the ClusteringComponents one:

We should not forget to export the network:

Export["~/Desktop/teachnwnicethings2.wlnet", retrainednet2]


## More examples

Let's look at the ClusteringComponents network a bit closer. We apply

GraphicsRow[{ImageResize[#, {256, 256}], beautifulnet[Colorize[ClusteringComponents[#, 7]]]}] &


to different images to obtain:

Application to videos

Suppose that I have the frames of a recorded movie stored in the variable movie1. Then load our network into the function beautifulnet

beautifulnet = Import["/Users/thiel/Desktop/teachnwnicethings2.wlnet"]


Then the following will generate frames for an animation:

animation1 = GraphicsRow[{ImageResize[#, {256, 256}], beautifulnet[Colorize[ClusteringComponents[#, 7]]]}] & /@ movie1;


We can animate this like so:

ListAnimate[animation1]


## Conclusion

These are only very preliminary results. But we see the workflow from scraping data, via generating a training set, choosing a network and training it. I think that having more images and more training, perhaps a small change of the net might give us much better results. The video is quite bad, because we should use a better object than "four cables in a hand". It is also a bit debatable whether it is ok to say that this is how a network that only has seen beautiful things interprets the world, but I couldn't resit the hype. Sorry for that!

This is certainly in the realm of "recreational use of the Wolfram Language", but the network does appear to make the world more colourful and provides a very special interpretation of the world. I hope that people in this forum who are better than me at this (@Sebastian Bodenstein , @Matteo Salvarezza , @Meghan Rieu-Werden , @Vitaliy Kaurov ?) can improve on the results.

Cheers,

Marco

10 Replies
Sort By:
Posted 3 years ago
 Dear Marco,very nice, thanks for sharing! I am always amazed by your ideas and creativity! And as one can see this approach has practical implications, e.g. undisturbing traffic lights, soldiers in perfect camouflage, atomic flower power explosions, etc...Best regards -- Henrik
Posted 3 years ago
 Dear Henrik,thank you very much for you kind words. I have, however, not done anything than copying what was described in the original article using the creativity that is in the Wolfram Language. This post was more for me to explore machine learning. I do only apply some few applications of ML and am trying to explore more. I did find the original article quite interesting though. This was also my very first attempt at this. I will try to do the same with other sets of images etc. Thank you for your comments,Marco
Posted 3 years ago
 - Congratulations! This post is now a Staff Pick as distinguished by a badge on your profile! Thank you, keep it coming!
Posted 3 years ago
 fascinating, on a non coding note. I used to paint cityscapes. And I found, merely the act of copying something down, in the imperfect realism of which I was capable still made a beautiful image. Windows that were square, but that I could not make entirely so with my brush. Shadows that in my photo were nondiscernable blobs simply became splashes of color. A neural net that could identify such ambiguities, and make choices on how to represent them would produce some interesting results. Imperfections in my style, in the limits of my skill, forced me to make choices in how I represented certain things.If I were to suggest a revision to your net, it would possibly include some Gaussian filtering, though I'm not sure how a Neural net would fit in.
Posted 3 years ago
 Dear Jeremy,yes, I see what you mean with the Gaussian Filtering and have an idea of how to achieve that. Regarding the cityscape that you painted: it would be great to see a sample. I wonder to what extent neural networks could learn to produce aesthetically pleasing representations. Another question is whether a net could be able to decide whether something is pleasing for humans. That, of course has quite some applications. There is an entire industry trying to predict whether film scripts, manuscripts of books or songs will be bestsellers or not. There was a BBC program where they tried to predict whether a song would be selling well, here's another discussion of that experiment. They sort of failed to achieve what they wanted, but it is interesting anyway. I'd love to try that on my entire iTunes library, but that does not work, because of copyright issues, I guess. Best wishes and thank you,Marco
Posted 3 years ago
 Wonderful work, @Marco ! I will have to think about what other net architectures could do this trick. Strangely the art and even the idea reminded me of Annihilation 2018 movie. Images below do not really convey, one should see the film for the photography :-)
Posted 3 years ago
 Dear Vitaliy,unfortunately I don't know that movie. I'll have a look as soon as it is available here in the UK.I will try different network architectures and also different trainings sets. As many of the blog entries and videos say, it is quite difficult to come up with a network from scratch so this is about modifying existing ones and I guess that folks at Wolfram have much more experience with this than me though...Thank you,Marco
Posted 3 years ago
 Great post. It dramatically illustrates that neural networks work with the data they are given, and are not necessarily neutral or benign.This point was made by Cathy O'Neil in her book Weapons of Math Destruction. If the data set is biased (e.g., parole and recidivism data), then the resulting functionality will be biased. Just because it is all "mathy" doesn't make it real.The good news is the with Mathematica and Wolfram Language, the people have access to these tools, and they can learn the benefits and perils of this technology.