Message Boards Message Boards

GROUPS:

Facial gesture detection with Classify

Posted 5 years ago
9317 Views
|
4 Replies
|
11 Total Likes
|

A lot of interest has gone into conveniently and quickly detecting facial gestures of users lately using low quality image feeds. I am going to stab at doing this in Mathematica here., Check out the youtube video for a walk-through:

Facial Gesture Detection using Mathematica 10

enter image description here

I will take a series of images from our connected webcam and then train a machine learning process to identify specific facial gestures based on the images. The proper way to control how fast one can capture images from the camera is with the Device property "FrameRate". My understanding is that this controls the frequency that the ScheduledTask runs at, I found that in testing trying to run this little demo with just one person in the camera's viewpoint caps out in performance at around 15 FPS.

camera = DeviceOpen["Camera"];
camera["FrameRate"] = 15;
camera["RasterSize"] = {480, 340};
camera["Timeout"] = 900;

Take 30 frames of me smiling from my webcam and store them - then show the first 10 images.

smiles = CurrentImage[30];
smiles[[1 ;; 10]]

enter image description here

Do the same thing for a neutral facial state

 neutral = CurrentImage[30];
 neutral[[1 ;; 10]]

enter image description here

This function takes a list of elements and associates them with a feature

createAssociation[dataSet_, feature_] :=
 Table[
    dataSet[[i]] -> feature,
     {i, 1, Length[dataSet]}]

Here we have a function that trims the faces out of a set of images for analysis

takeFaces[dataSet_] :=
 (
  Module[{tmpVar},
   tmpVar = Table[
     (ImageTrim[dataSet[[i]], #] & /@ 
        FindFaces[dataSet[[i]](*,{50,150}*)])[[1]],
     {i, 1, Length[dataSet]}];
   Select[tmpVar, UnsameQ[#, {}] &]
   ]
  )

Now create two separate training sets which are associations from the gathered images. We also trim the faces from these images. Show the first ten elements of both generated data sets.

smileData = createAssociation[takeFaces[smiles], "Smile"];
neutralData = createAssociation[takeFaces[neutral], "Neutral"];
smileData[[1 ;; 10]]
neutralData[[1 ;; 10]]

enter image description here enter image description here

Create a classifier from the gathered images setting the method to quality

c1 = Classify[Join[smileData, neutralData], Method -> "NeuralNetwork",
   PerformanceGoal -> "Quality"]

This is the main function that labels faces in the image. We use fold to apply ImageCompose to the original image, nesting the ImageComposes like so ImageCompose[ImageCompose[img, label1], label2]. This allows us full support for as many faces as necessary in the future, not just one. You can edit the styling options here if you want the text to be styled differently, I didn' t generalize out those options but that part of the function is pretty easily modified

imageLabel[image_, labelFunc_] :=
 Module[{img = image, faces},
  faces = FindFaces[img]; 
  If[Flatten[faces] === {}, img, 
   Fold[ImageCompose[#1, Sequence @@ #2] &, 
    HighlightImage[img, Join[fullRectangleInLines /@ faces], 
     "HighlightColor" -> 
      Blue], {Graphics[
        Text[Style[labelFunc[ImageTrim[img, #]], Black, Bold, Italic, 
          16, Background -> White]]], (First[#] + {(
          First[Last[#]] - First[First[#]])/2, -25})} & /@ faces]]]

This is a helper function that takes the coords produced by FindFaces and makes 4 line objects for the box around the face.

fullRectangleInLines[positions_] :=
 Line[{
   {First[First[positions]], Last[First[positions]]},
   {First[First[positions]], Last[Last[positions]]},
   {First[Last[positions]], Last[Last[positions]]},
   {First[Last[positions]], Last[First[positions]]},
   {First[First[positions]], Last[First[positions]]}
   }]

This is just a simple labeling function that returns a string of a number that gets incremented every time it is run.It doubles as a useful little framerate function too in the dynamic version.

labelingFunc[i_Image] :=
 Column[{
   "User: William Duhe", 
   "Gesture: " <> 
    c1[ImageTrim[CurrentImage[], #] & /@ 
      FindFaces[CurrentImage[], {50, 150}]]
   }]

Here I take a single frame from from the feed which shows you the facial gesture of the user bellow it. I do this for both smiling and neutral as a test.

imageLabel[CurrentImage[], labelingFunc]

enter image description here

Thanks for taking the time to read through this post and I hope that you find it useful! I might make another post where I wrap all this up in a nice toolkit which allows you to train different states using buttons and then show a live stream of the webcam labeling the users gestures using a Dynamically updating feed. Let me know if anyone would find this useful.

4 Replies

enter image description here - you earned "Featured Contributor" badge, congratulations !

This is a great post and it has been selected for the curated Staff Picks group. Your profile is now distinguished by a "Featured Contributor" badge and displayed on the "Featured Contributor" board.

That's cool! Yes, please do keep us updated. I'd be very interested to see how this approach fares with a larger pool of gestures and a larger pool of people. This might actually be a useful tool for people with visual impairments who want to use video chat without missing important cues (using audio annotations instead of text, but that part is trivial).

Also: Please let us know when you're confident enough to use a series of grimaces as your password! ;)

Posted 5 years ago

I will submit another post where I show this process working with 4 different facial states and 2 users dynamically in the next week! It iwll be able to tell the difference between different users and label them appropriately. I will also integrate voice analysis that works to analyze what a user is saying and put chat bubbles above their heads with a coloring based on the sentiment of what they said. Thanks for supporting the post and if there are any suggestions on how to improve the process in the future - let me know.

Posted 5 years ago

Ok - so with a little bit of changes I can get this to work 99% of the time with 4 facial states dynamically.

-Smile -Frown -Excited -Scowl

Check out the live demo at:

https://www.youtube.com/watch?v=a39LL5wV_A4&feature=youtu.be

All I did was implement validations sets of equal length of the training sets - both being 50 - and set the FeatureExtractor to be "FaceFeatures". This is new to Mathematica 11. I am really excited about how well this works.

c1 = Classify[
  Join[smileData[[1 ;; 50]], neutralData[[1 ;; 50]], 
   scowlData[[1 ;; 50]], excitedData[[51 ;; 100]]], 
  Method -> "NeuralNetwork", PerformanceGoal -> "Quality", 
  FeatureExtractor -> "FaceFeatures", 
  ValidationSet -> 
   Join[smileData[[51 ;; 100]], neutralData[[51 ;; 100]], 
    scowlData[[51 ;; 100]], excitedData[[51 ;; 100]]]]
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract