Message Boards Message Boards

How does a neural network that only knows beauty interpret the world?

I recently came across a video that intended to show how neural networks interpret images of (not so beautiful) things if they have only been trained on beautiful things. It is quite a nice question, I think. Here is a website describing the technique, and here is a video that illustrates the idea. In this post I will show you, how to generate similar effects easily with the Wolfram Language:

enter image description here

and in video format:

enter image description here

On the right you see the "interpretation" of a neural network that has been shown lots of photos of flowers, when it actually looks at a rubbish dump with a couple of birds sitting on the rubbish.

Devising a plan

We will need a training dataset and should hope to find a network in the Wolfram Neural Net Repository that more or less does what we want to do. If you have watched some of the excellent training videos on neural nets offered by Wolfram you will have noticed that the general suggestion is not to develop your own neural networks from scratch, but rather use what is already there and perhaps combine it, or adapt it so that you can achieve what you want. This is also very well described in this recent blog-post by experts on the topic. I am usually happy if I can use the work of others and do not have to re-invent the wheel.

If you read the posts describing how to build a network that have only seen beautiful things, you will find that they used a variation of the pix2pix network and an implementation in tensorflow (a "conditional adversarial network"). If you go through the extensive list of networks that are offered on the Wolfram Neural Net Repository you will see that there are Pix2pix resources, e.g.

ResourceObject["Pix2pix Photo-To-Street-Map Translation"]

or

net=ResourceObject["Pix2pix Street-Map-To-Photo Translation"]

I will use the latter resource object, but that does not actually matter. Next, we will need to build a training set.

Scraping data for the training set

The next thing we need is a solid training set. My first attempt was to use ServiceConnect with the google search to obtain lots of images of flowers.

googleCS = ServiceConnect["GoogleCustomSearch"]
imgs = ServiceExecute["GoogleCustomSearch", 
   "Search", {"Query" -> "Flowers", MaxItems -> 1000, 
    "SearchType" -> "Image"}];

It turns out that the max of results returned is only 100, which is not enough for our purpose. I tried to fix this by using

imgs2 = ServiceExecute["GoogleCustomSearch", 
   "Search", {"Query" -> "Flowers", MaxItems -> 1000, 
    "StartIndex" -> 101, "SearchType" -> "Image"}];

but that did not work. So WebImageSearch is the way to go. It does cost ServiceCredits, but the costs are relatively limited. Let's download information on 1000 images of flowers:

imgswebsearch = WebImageSearch["Flowers", MaxItems -> 1000];
Export["~/Desktop/imglinks.mx", imgswebsearch]

A WebImageSearch of up to 10 results costs 3 ServiceCredits. So this should be 300 credits. 500 credits can be bought for $3, and 5000 for $25 (+VAT). This would mean the generation of our training set comes in at maximally $1.8, which is manageable - particularly if we consider the price of the eGPU that we will use later on.... Just in case we export the result, because we paid for it and might have to recover it later if we suffer a kernel crash or something.

Alright. Now we have a dataset that looks more or less like this:

enter image description here

Great. That contains the "ImageHyperlink" which we will now use to download all the images:

rawimgs = Import /@ ("ImageHyperlink" /. Normal[imgswebsearch]);
Export["~/Desktop/rawimgs.mx", rawimgs]

Again, we export the result (better safe than sorry!). Let's make the images conform:

imagesconform = ConformImages[Select[rawimgs, ImageQ]];

By using Select[...,ImageQ] we make sure that we use only images; and not error messages of the cases where it didn't work.

Generating a training set

In the original posts they suggest that they used edges, i.e. EdgeDetect to generate partial information of the images, and then linked that to the full image like so:

rules = ImageAdjust[EdgeDetect[ImageAdjust[#]]] -> # & /@ imagesconform;

It turns out that my results with that were less than impressive so I went for a more time consuming approach that gave better results. I used

Monitor[rulesnew = Table[Colorize[ClusteringComponents[rules[[i, 2]], 7]] -> rules[[i, 2]], {i, 1, Length[rules]}];, I]

i.e. ClusteringComponents to generate a trainingset. Partial information on the images now looked like this:

enter image description here

rather than

enter image description here

when we use EdgeDetect. Our training data set now links the image with partial (CluseteringComponents) information via a rule to the original image. Basically, we give partial information of the world and train the network to see flowers. Just in case we export the data set like so:

Export["~/Desktop/rulesnew.mx", rulesnew]

Training the network

If you want to train on the EdgeDetect version you can use:

retrainednet = NetTrain[net, rules, TargetDevice -> "GPU", TrainingProgressReporting -> "Panel", TimeGoal -> Quantity[120, "Minutes"]]

otherwise you can use

retrainednet2 = NetTrain[net, rulesnew, TargetDevice -> "GPU", TrainingProgressReporting -> "Panel", TimeGoal -> Quantity[120, "Minutes"]]

Note that I use a GPU and considerable training time (2h). On a CPU this would take quite a while. Here are typical results of the EdgeDetect network:

retrainednet[EdgeDetect[CurrentImage[], 0.7]]

enter image description here

and the ClusteringComponents one:

enter image description here

We should not forget to export the network:

Export["~/Desktop/teachnwnicethings2.wlnet", retrainednet2]

More examples

Let's look at the ClusteringComponents network a bit closer. We apply

GraphicsRow[{ImageResize[#, {256, 256}], beautifulnet[Colorize[ClusteringComponents[#, 7]]]}] &

to different images to obtain:

enter image description here

Application to videos


Suppose that I have the frames of a recorded movie stored in the variable movie1. Then load our network into the function beautifulnet

beautifulnet = Import["/Users/thiel/Desktop/teachnwnicethings2.wlnet"]

Then the following will generate frames for an animation:

animation1 = GraphicsRow[{ImageResize[#, {256, 256}], beautifulnet[Colorize[ClusteringComponents[#, 7]]]}] & /@ movie1;

We can animate this like so:

ListAnimate[animation1]

enter image description here

Conclusion

These are only very preliminary results. But we see the workflow from scraping data, via generating a training set, choosing a network and training it. I think that having more images and more training, perhaps a small change of the net might give us much better results. The video is quite bad, because we should use a better object than "four cables in a hand". It is also a bit debatable whether it is ok to say that this is how a network that only has seen beautiful things interprets the world, but I couldn't resit the hype. Sorry for that!

This is certainly in the realm of "recreational use of the Wolfram Language", but the network does appear to make the world more colourful and provides a very special interpretation of the world. I hope that people in this forum who are better than me at this (@Sebastian Bodenstein , @Matteo Salvarezza , @Meghan Rieu-Werden , @Vitaliy Kaurov ?) can improve on the results.

Cheers,

Marco

POSTED BY: Marco Thiel
10 Replies

Great post. It dramatically illustrates that neural networks work with the data they are given, and are not necessarily neutral or benign.

This point was made by Cathy O'Neil in her book Weapons of Math Destruction. If the data set is biased (e.g., parole and recidivism data), then the resulting functionality will be biased. Just because it is all "mathy" doesn't make it real.

The good news is the with Mathematica and Wolfram Language, the people have access to these tools, and they can learn the benefits and perils of this technology.

That is a very (!) interesting thought that I had not considered. As you say, it is clear and documented that different machine learning approaches take decisions that can be very biased when the training is biased.

The good news is the with Mathematica and Wolfram Language, the people have access to these tools, and they can learn the benefits and perils of this technology.

Yes, that is undoubtedly true. But are "people" exploring this sufficiently. Not to long ago I saw this video, which uses what is known as Deep Fake. There is more of an explanation here and another video here. It appears that there is a new arms race developing. One fraction produces fake videos and another one tries to recognise that a video is fake. I guess that the Wolfram Language is very useful to explore and perhaps apply these techniques, but "information" from a fake video will spread really fast on social media. A proof that it was fake cannot fix that.

Is the Wolfram Language the tool to figure these things out? Is it a tool to teach pupils/students these techniques to avoid the problems? Will the Wolfram Language, or Mathematica 18 or something allow people without much technical knowledge to calculate these things ideally with voice commands? (A little bit like Siri now solves sets of nonlinear equations by just asking based on Wolfram|Alpha.) Or will it allow a larger part of society allow to achieve a level of mathematical expertise to develop computational thinking such that we have the technology to figure out what is fact and what is fiction?

I am sorry, but I digress.

Thanks a lot,

Marco

POSTED BY: Marco Thiel

You are probably right, but I am still optimistic. We have ample evidence that it is much easier to mis-use statistical software (medical research and sociology, I'm looking at you) than it is to do careful science. The same can be said for any powerful tool.

The personal computer was invented mostly because there was a small group of enthusiasts who wanted their own computer, rather than having to jump through hoops to sue some company's or college's iron. I know, I was one of them, although I was an early adopter of the technology, rather than an inventor. As a percentage, there were not that many of us, but without this seed, there would not have been the personal computer as we know it. It took a 'killer app' (visicalc) for business to take notice and then computers became more mainstream.

I think that Mathematica could be the "killer app" of the twenty-first century. The percentage of people using it is very small at the moment, but if we get the breaks, the ideas generated by Mathematica users could have a profound effect on society in general.

There are some minor tweaks to the program that would facilitate things. Wolfram|Alpha is close to the interface needed for people without technical knowledge. I think that there is the beginning of the transition from W|A to Wolfram Language already in place.

Mathematical expertise is needed at several levels. However, as I learned by experience, the most difficult step for non-techies it to realize that a problem, properly understood, can be modeled with mathematics.

Despite recent evidence, I still have faith....

Wonderful work, @Marco ! I will have to think about what other net architectures could do this trick. Strangely the art and even the idea reminded me of Annihilation 2018 movie. Images below do not really convey, one should see the film for the photography :-)

enter image description here

enter image description here

POSTED BY: Vitaliy Kaurov

Dear Vitaliy,

unfortunately I don't know that movie. I'll have a look as soon as it is available here in the UK.

I will try different network architectures and also different trainings sets. As many of the blog entries and videos say, it is quite difficult to come up with a network from scratch so this is about modifying existing ones and I guess that folks at Wolfram have much more experience with this than me though...

Thank you,

Marco

POSTED BY: Marco Thiel

fascinating, on a non coding note. I used to paint cityscapes. And I found, merely the act of copying something down, in the imperfect realism of which I was capable still made a beautiful image. Windows that were square, but that I could not make entirely so with my brush. Shadows that in my photo were nondiscernable blobs simply became splashes of color. A neural net that could identify such ambiguities, and make choices on how to represent them would produce some interesting results. Imperfections in my style, in the limits of my skill, forced me to make choices in how I represented certain things.

If I were to suggest a revision to your net, it would possibly include some Gaussian filtering, though I'm not sure how a Neural net would fit in.

POSTED BY: Jeremy Sykes

Dear Jeremy,

yes, I see what you mean with the Gaussian Filtering and have an idea of how to achieve that. Regarding the cityscape that you painted: it would be great to see a sample. I wonder to what extent neural networks could learn to produce aesthetically pleasing representations. Another question is whether a net could be able to decide whether something is pleasing for humans. That, of course has quite some applications. There is an entire industry trying to predict whether film scripts, manuscripts of books or songs will be bestsellers or not.

There was a BBC program where they tried to predict whether a song would be selling well, here's another discussion of that experiment. They sort of failed to achieve what they wanted, but it is interesting anyway.

I'd love to try that on my entire iTunes library, but that does not work, because of copyright issues, I guess.

Best wishes and thank you,

Marco

POSTED BY: Marco Thiel

enter image description here - Congratulations! This post is now a Staff Pick as distinguished by a badge on your profile! Thank you, keep it coming!

POSTED BY: EDITORIAL BOARD

Dear Marco,

very nice, thanks for sharing! I am always amazed by your ideas and creativity! And as one can see this approach has practical implications, e.g. undisturbing traffic lights, soldiers in perfect camouflage, atomic flower power explosions, etc...

Best regards -- Henrik

POSTED BY: Henrik Schachner

Dear Henrik,

thank you very much for you kind words. I have, however, not done anything than copying what was described in the original article using the creativity that is in the Wolfram Language.

This post was more for me to explore machine learning. I do only apply some few applications of ML and am trying to explore more. I did find the original article quite interesting though. This was also my very first attempt at this. I will try to do the same with other sets of images etc.

Thank you for your comments,

Marco

POSTED BY: Marco Thiel
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract