Message Boards Message Boards

What Game of Thrones character do you resemble?

GROUPS:

Hi all,

First time community poster here! I'll start off by saying, I am by no means an expert in machine learning. I dabble with the basic tutorials and training options offered by Wolfram, but when it comes to the guts of this stuff, I'm brand new. I say this as I think my project here is a good proof of concept on how easy it is to pick up machine learning with the Wolfram language. (Feel free to skip to the bottom of the post for a link to my notebook.)

Initial testing

To start out, I simply looked into the basic machine learning examples provided around our Mathematica 11 release, linked here. The Create a Legendary Creature Recognizer in particular is what I based my initial testing off of.

To start, I narrowed it down to two relatively different but very popular characters, Daenerys Targaryen and Jon Snow. Similar to the format in the creature recognizer, I simply imported images, associated them to the character name, and plugged this into my Classify function. To test it's capabilities, I used the "Top Probabilities" call to see with what confidence it was assigning characters.

Like I said, I'm new to machine learning, so this was a learning process for me. To my surprise, this solution was rather messy and just not at the level of accuracy I was looking for. It struggled with actors/actresses out of costume, was polluted by the background, and just didn't have enough images to be accurate.

This is what lead me to ultimately focus on 3 things when training my classifier function and eventually my neural networks. 1) Images of characters should include the actors or actresses both in and out of costume. 2) The FindFaces functions needed to be used to remove any background or outfit pollution. 3) The amount of images for each character needed to be as large as possible.

Collecting images for my dataset

Once I was able to mast extracting images using FindFaces, it was time to do some heavy internet image searching to get the relevant images. I decided to limit my identifier to just 7 characters: Dany, Jon, Tyrion, Cersei, Jaime, Arya, and Sansa (apologies to all of the Davos fans!) As mentioned before, I did specific searches for characters throughout the seasons as well as actors/actresses outside of costume (as well as additional searches for different hairstyles, facial hair, glasses, age, etc.)

To speed up this process, I first created a quick image tester to be sure that each of the images I pulled from my searches was being pulled correctly without having to test the entire dataset all at once. For example, I would test all of the season 1 images I found of Dany at one time. You can see what this looks like below.

enter image description here

testImgs = {}
i = 1;
testFaces = {};
While[i < Length[testImgs] + 1,
 testFace = ImageTrim[testImgs[[i]], #] & /@ FindFaces[testImgs[[i]]];
 AppendTo[testFaces, ImageResize[testFace[[1]], Tiny]];
 i++;
 ]
testFaces

For proof of concept, the image includes a couple common errors that I found. In some cases, The image would not be able to find any faces per low-quality or just odd lighting/angles, in which would result in the error. In other cases, images would pick up background noise that resembled a face (not provided in this example). You can see in the last picture, another character's face was found first. These images would need to be removed from the set before being added to the full dataset. As I went along, it became easier to weed out images that would likely be unusable. You can see an example of what these image sets would look like below. Jaime's in particular was my smallest.

enter image description here

The final step in this process was to make the appropriate associations for the images and provide a list that could be used by the Classify function. You can see the code for this below. To summarize, I made separate lists of images for each character, then iterated through those list to associate them with the appropriate character name then add that associated object to my master list.

faceRec = {};
i = 1;
While[i < Length[dt] + 1,
 dtface = ImageTrim[dt[[i]], #] & /@ FindFaces[dt[[i]]];
 AppendTo[faceRec, ImageResize[dtface[[1]], {100, 100}] -> "Dany"];
 i++;
 ]
i = 1;
While[i < Length[js] + 1,
 jsface = ImageTrim[js[[i]], #] & /@ FindFaces[js[[i]]];
 AppendTo[faceRec, ImageResize[jsface[[1]], {100, 100}] -> "Jon"];
 i++;
 ]
i = 1;
While[i < Length[tl] + 1,
 tlface = ImageTrim[tl[[i]], #] & /@ FindFaces[tl[[i]]];
 AppendTo[faceRec, ImageResize[tlface[[1]], {100, 100}] -> "Tyrion"];
 i++;
 ]
i = 1;
While[i < Length[cl] + 1,
 clface = ImageTrim[cl[[i]], #] & /@ FindFaces[cl[[i]]];
 AppendTo[faceRec, ImageResize[clface[[1]], {100, 100}] -> "Cersei"];
 i++;
 ]
i = 1;
While[i < Length[jl] + 1,
 jlface = ImageTrim[jl[[i]], #] & /@ FindFaces[jl[[i]]];
 AppendTo[faceRec, ImageResize[jlface[[1]], {100, 100}] -> "Jaime"];
 i++;
 ]
i = 1;
While[i < Length[as] + 1,
 asface = ImageTrim[as[[i]], #] & /@ FindFaces[as[[i]]];
 AppendTo[faceRec, ImageResize[asface[[1]], {100, 100}] -> "Arya"];
 i++;
 ]
i = 1;
While[i < Length[ss] + 1,
 ssface = ImageTrim[ss[[i]], #] & /@ FindFaces[ss[[i]]];
 AppendTo[faceRec, ImageResize[ssface[[1]], {100, 100}] -> "Sansa"];
 i++;
 ]
faceRec = RandomSample[faceRec, Length[faceRec]];

Per a note from one of our developers, I also chose to randomize the list of associations, as our neural network framework down assume a random order when processing. This didn't provide any noticeable difference in my dataset, but I thought it was an important thing to at least note.

Using the Classify function

With my images associated, I was ready to jump into creating my ClassifierFunction. With a single line of code, I was able to do so, as seen below.

gotFaces = Classify[faceRec]

To test my model, I pulled in some new images and used the same logic as before to create a testing set. I did try to find images in costume as well as in real life, with different hairstyles, and at different ages to provide some more variety in my testing.

enter image description here

test = {}
i = 1;
testFaces = {};
While[i < Length[test] + 1,
 testFace = ImageTrim[test[[i]], #] & /@ FindFaces[test[[i]]];
 AppendTo[testFaces, ImageResize[testFace[[1]], {100, 100}]];
 i++;
 ]
testFaces

I will note, that doing this same process to this point with just Jon and Dany worked REALLY well. To my surprise, though, it just wasn't quite as promising with a larger cast of characters. Some characters were correct with relatively high confidence, some were right but with a split between a few characters, and some were wrong but at least had the correct character in close proximity to the top character.

Manipulate[gotFaces[character, "TopProbabilities"], {character, testFaces}]

enter image description here enter image description here enter image description here

This had me troubleshooting like crazy. I will say that simply duplicating the associations already in the list several times actually improved the accuracy quite a bit, which was expected. With this in mind, I decided to try my hand at our neural network functionality.

Using the NetChain and NetTrain functions

I'll say it once, and I'll say it again. I am by no means an expert in neural networks. It's a big topic in tech now, so I've developed a basic conversational understanding of the concept. However, this was my first time actually creating my own neural network in any language.

The process was relatively simple, much to my surprise, thanks to Wolfram's vast amount of documentation. Per some research, I learned about "convolutional" neural nets and found a similar image example of this in our documentation pages for NetTrain. Using this example, I made some minor adjustments to the NetChain to fit this to my own data and was well on my way to training my first neural net! You can see what this ended up looking like below. The variable "classes" simply being all of the potential characters.

lenet = NetChain[
  {ConvolutionLayer[20, 5], Ramp, PoolingLayer[2, 2], 
   ConvolutionLayer[50, 5], Ramp, PoolingLayer[2, 2], FlattenLayer[], 
   500, Ramp, 7, SoftmaxLayer[]},
  "Output" -> NetDecoder[{"Class", classes}],
  "Input" -> NetEncoder[{"Image", {100, 100}}]
  ]

From here, the NetTrain function is super simple. You can see an image of what this looks like in motion below with real-time training progress as the program runs. For larger sets of data, there is also the option to add a time goal. For mine, I simply allowed it to run for the full 10-15 minutes. It's certainly interesting to be able to actually see the loss function change as the data set runs.

trained = NetTrain[lenet, faceRec]

enter image description here

This method did still have its errors but the accuracy as well as the confidence in the selections did seem to be significantly higher. I saw much more examples of split "TopProbabilities" for the wrongly classified images, as with the Dany and Arya pictures below, but also a much larger level of correctly classified images with 90%+ as their "TopProbabilities" return, as with the Jon and Sansa pictures below.

Manipulate[trained[character, "TopProbabilities"], {character, testFaces}]

enter image description here enter image description here enter image description here enter image description here

As with the Classify function, adding more images (even duplicates) to the dataset did provide even more accurate results and confidence to the neural net model. Ideally, if using this for an actual application, I would have a more elegant way of importing the images in for the NetTrain. A suggestion from a meeting I sat in recently was to explore NetEncoder. I also got the vibe that there are some other options in the pipeline to support this, specifically with large numerical datasets.

Real-time character matching

Moving on to the fun part that you've all been waiting for! I started out by using the same logic as with my character image import to pull in images from LinkedIn of my coworkers to see how the ClassifierFunction and NetChain handled these outside inputs.

enter image description here

wolfram = {}
i = 1;
wolframFaces = {};
While[i < Length[wolfram] + 1,
 wolframFace = ImageTrim[wolfram[[i]], #] & /@ FindFaces[wolfram[[i]]];
 AppendTo[wolframFaces, ImageResize[wolframFace[[1]], {100, 100}]];
 i++;
 ]
wolframFaces

I understand that this is not exactly what deep learning is intended for, but it's a fun little application to try to trick the trained sets to force us into a character. The models definitely vary on what character they give to each coworker, so it is noticeable that their methods are likely quite different. However, it did give a highly odd confidence for some of my coworkers in both cases.

Manipulate[
 gotFaces[employee, "TopProbabilities"], {employee, wolframFaces}]

enter image description here

And finally, the quiz you've all been waiting for. THE LIVE IDENTIFIER! You can see a quick image of what this looks like below. It uses the camera on your computer to allow you to grab a live image of yourself and run this through the ClassifierFunction and NetChain.

Panel[
 Column[{
   Row[{ImageCapture[ImagingDevice -> $ImagingDevice]}],
   Row[{
     Button[TextCell["Grab Image", FontSize -> 24], 
      img = CurrentImage[ImageSize -> 900], ImageSize -> {337, 50}, 
      Alignment -> Center]
     }],
   Row[{TextCell["Captured Faces", Bold, FontSize -> 24]}, 
    Alignment -> {Top, Center}],
   Row[{Dynamic[currentFace = ImageTrim[img, #] & /@ FindFaces[img]]}],
   Row[{TextCell["Classify function ID", Bold, FontSize -> 24]}],
   Row[{Dynamic[
      Style[gotFaces[currentFace, "TopProbabilities"], 
       FontSize -> 16]]}],
   Row[{TextCell["Neural net ID", Bold, FontSize -> 24]}],
   Row[{Dynamic[
      Style[trained[currentFace, "TopProbabilities"], 
       FontSize -> 16]]}]
   }
  ], ImageSize -> 355]

enter image description here

It's interesting to use this "live" model to start exploring how the input images are affecting the final trained sets. I noticed that although I tended to lean towards a small group of characters (especially in the Neural net ID), some particular factors could vary the outputs. Image quality (per camera or amount of cropping from not being close enough), head tilt, hair in face, facial expressions, etc. did provide some level of variation in my quick analysis.

Closing thoughts

Although not a perfect model, it's been really reassuring to know the level of accuracy obtained on such a small dataset and with very little knowledge of the appropriate neural networks to be used with this type of analysis. Being able to jump right in per the support of the Wolfram documentation as well as the ease of built-in functions made it SUPER easy to build such a model from scratch.

If I were looking to develop further, I would like to explore more about neural networks geared towards facial recognition as well as better way to collect as well as import a vast dataset of even more characters. It would be my hope that this would 1) better identify certain defining features and 2) eliminate the variations that we see more obviously in the live camera model.

It's not perfect, but what a fun proof of concept and a great learning experience! You can download all of the files at the following links: full notebook, trained NN, and webcam app. Specifically, the GOTFacesApplication is a more simplistic model that imports the .wlnet model vs. having you have to sit and wait for the full project notebook to evaluate. Since some of the Import/Export on the WLNet functionality is still "experimental", I hit some snags on creating a CDF to be used with our free CDF player (my apologies to those who haven't made the jump to Mathematica yet!)

Hope you all enjoy!

POSTED BY: Sam Tone
Answer
1 month ago

That's really neat! Thanks for sharing. How many faces did you use for learning?

POSTED BY: Sander Huisman
Answer
1 month ago

Hi Sander -- Thanks for the positive feedback! The list had 1429 images total. I think the breakdown was about 250 for Dany, 250 for Jon, 200 for Tyrion, 175 for Cersei, 150 for Jaime, 200 for Arya, and 225 for Sansa (obviously including in/out of character photos).

POSTED BY: Sam Tone
Answer
1 month ago

Thanks for sharing!How did you get the images? WebImageSearch? or manually? or… ?

POSTED BY: Sander Huisman
Answer
1 month ago

I did them manually through drag and drop as to not use the WebImageSearch credits, which can be limited. It also made it a bit easier to pull images that I knew would probably pass my FindFaces iterator.

POSTED BY: Sam Tone
Answer
1 month ago

Sam, Thank you for an interesting post. If you don't have enough images, it might be a good idea to use the pre-trained model to build a classifier(Transfer Learning). There is an example in Wolfram Neural Net Repository. There are many examples besides ResNet-50 in Wolfram Neural Net Repository. I'm sorry I do not know which is better.

POSTED BY: Kotaro Okazaki
Answer
1 month ago

What are the steps for Transfer Learning in this case? Sam's net was trained just on a few classes of images which are exactly the GoT characters. ResNet-50 was trained on dataset "consisting of 1.2 million training images, with 1000 classes of objects." How do we squeeze 1000 classes of objects into just a few of GoT?

POSTED BY: Sam Carrettie
Answer
1 month ago

I understand Transfer Learning as follows.

・ Adapt a model trained in an area where data can be widely applied to another area where data collection is difficult

・ Lower levels of CNN seem to learn general-purpose features

・ Apply to the new domain by learning only the part close to the output

So, the steps in this case are written in the section Transfer Learning of the example as follows.

・ Remove the linear layer from the pre-trained net

・ Create a new net composed of the pretrained net followed by a linear and softmax layer(LinearLayer[6] in this case)

・ Train on the dataset, freezing all the weights except for new layer

POSTED BY: Kotaro Okazaki
Answer
1 month ago

Yes, Kotaro, thank you. I have also read about this recently. I also think ResNet-101 will be more useful as it is trained on faces.

POSTED BY: Sam Carrettie
Answer
1 month ago

This has been a super helpful conversation. Thanks for the input, Sam and Kotaro!

POSTED BY: Sam Tone
Answer
1 month ago

Group Abstract Group Abstract