Group Abstract

Message Boards

WOLFRAM COMMUNITY

10.7K Views

12 Replies

14 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

How can I train a logo detector?

Mike Reynolds

Posted 11 years ago

My original question was posted on SE: How can I train a binary classifier to find a logo?, Vitaliy graciously answered it, but he suggested that since I need a more complete answer, I should try here on the Wolfram Community instead. The Challenge Part #1: Construct a logo detector that will find the logo of the Apple Inc. in a photo. Here are a few clarifying points: The output is whether or not the image contains the apple logo, with a confidence level. The program should use HOG features and an Support Vector Machine classifier. The classifier needs to be invariant to rotations, deformations, distortions, and translations. Here are the training and test sets for apple, the zipped file contains four folders. I've attached my notebook from the initial stack-exchange post to get you started. Note: The images are taken from Twitter, Tumblr, Instagram, and Google image search, so the algorithm when running in the wild will need to have good sensitivity, since only roughly 5% of these images have logos in them. Part #2: Added requirements: Needs to find the bounding box of the logos. Needs to handle images that have multiple logos Needs to classify fast (within 10 to 100 milliseconds). Part #3: Extend this to more logo brands (e.g. pepsi), I will assemble and post the training and test data for you, if anyone can get this far... Questions & Notes I have read up on the literature on HOG training, but I can't figure out what the best practices are for cropping the positive examples. Do you crop tight or with margins on the object? Do you crop all positive image to a standard aspect ratio or to whatever rectangle fits the appearance of the object in the particular image? I took me a very long time to cull the datasets for training and testing, this python script helped, what free tools are there for this, specifically for cropping the logo out? I have tried OpenCV cascades from HOG features, how much different will these be from an linear or nonlinear SVM classifier's performance? It's entirely unclear how Mathematica handles the sliding windows across the images... this might have to be done with manual code, and if so, the literature seems to suggest that something in the range of 4x4 to 32x32 are the best. The HOG features are not explicitly computed. Wouldn't it be nice if ImageKeypoints had other (free) methods like HOG, BRISK, ORB, GIST, ... Mathematica notebooks quickly become unstable when they have too many images in them, any suggestions for getting around this besides `DumpSave[]`ing everything... This really could be the coolest real world example of image processing + machine learning in Mathematica, and prove to the world that it is a professional toolbox for vision, which unfortunately many academics do not accept yet! Attachments:

POSTED BY: Mike Reynolds

12 Replies

Sort By:

Mike Reynolds

Posted 11 years ago

Can you please post the training and test set you are using? And FYI, I provided my original training and test set.... it is hyperlinked to my question!

POSTED BY: Mike Reynolds

Mike Reynolds

Posted 11 years ago

This is not research level by any means, and it is really trivial to do with many other vision systems. One could train a cascade on HOG feature points in 3 hours to have a classifier running on a gpu in OpenCV...

POSTED BY: Mike Reynolds

Marco Thiel

Marco Thiel, University of Aberdeen - Dept. of Physics/Mathematics

Posted 11 years ago

Indeed! :-)

POSTED BY: Marco Thiel

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 11 years ago

I think they say "The tree doesn't fall far from the apple". Or something like that.

POSTED BY: Daniel Lichtblau

Marco Thiel

Marco Thiel, University of Aberdeen - Dept. of Physics/Mathematics

Posted 11 years ago

Hi Everyone, I am not sure whether that helps but I get rather good results. I am not sure how many images you have used for the training but with my training sets, the results are ok. withlogo = Import["~/Desktop/Apple/withapple/" <> #] & /@ Import["~/Desktop/Apple/withapple/"]; nologo = Import["~/Desktop/Apple/noapple/" <> #] & /@ Import["~/Desktop/Apple/noapple/"]; I have 313 images with logos and 256 without. I think that these numbers are still too small. This is the classifier: c = Classify[Flatten[{ImageResize[#, {200, 200}] -> "logo" & /@ withlogo, ImageResize[#, {200, 200}] -> "no logo" & /@ nologo}], PerformanceGoal -> "Quality", Method -> "NeuralNetwork"]; Here are some results: I must admit though that there are also quite a number of images that are false positives - if often mistakes trees for the logo. I think that this could be mended by using a larger image database. Also, if the logo is only a tiny part of the overall image that can cause problems - but reasonably small logos seem to work. Once the classifier is trained, the classification takes only 0.02 seconds on my machine. Cheers, M. PS: Occasionally, I get quite abysmal results. For some datasets I get rather good results. So this is by no means "ready-to-use".

POSTED BY: Marco Thiel

Jan Poeschko

Jan Poeschko, Wolfram Research

Posted 11 years ago

you are asking about a research level problem Here's the relevant XKCD: http://xkcd.com/1425

POSTED BY: Jan Poeschko

Vitaliy Kaurov

Vitaliy Kaurov, WOLFRAM Research

Posted 11 years ago

From what people telling me you are asking about a research level problem. Some suggest some sort of partitioning of large images into smaller ones to get the logo but this would slow down the total time. Also orientations, deformations, distortions, and translations of the images matter. Did you search for any research papers on the subject? I am sure there are some general approaches. Have you seen THIS?

POSTED BY: Vitaliy Kaurov

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 11 years ago

POSTED BY: Daniel Lichtblau

Mike Reynolds

Posted 11 years ago

Exactly, it's missing any sort of sliding window analysis.

POSTED BY: Mike Reynolds

Vitaliy Kaurov

Vitaliy Kaurov, WOLFRAM Research

Posted 11 years ago

But is not your testing data (apple small buried in the background) are quite different from your training set (apple is image-sized - no background influence) ?

POSTED BY: Vitaliy Kaurov

Mike Reynolds

Posted 11 years ago

The accuracy is almost zero for the test data I provided.

POSTED BY: Mike Reynolds

Vitaliy Kaurov

Vitaliy Kaurov, WOLFRAM Research

Posted 11 years ago

I am reposting here my answer for convenience This seems logical to me (works same efficiently without ConformImages but I just wanted to feature it): dir =(path to dir containing unzipped folders); ndir = FileNameJoin[{dir, "negative"}]; pdir = FileNameJoin[{dir, "positive"}]; nfiles = Import[ndir <> "/.png"]; pfiles = Import[pdir <> "/.png"]; negative = ConformImages[nfiles, 200]; positive = ConformImages[pfiles, 200]; $train = 100; trainingData = <\|"Apple" -> positive[[;;$train]], "None" -> negative[[;;$train]]\|>; testingData = <\|"Apple" -> positive[[$train+1;;]], "None" -> negative[[$train+1;;]]\|>; c = Classify[trainingData, Method -> {"SupportVectorMachine", "KernelType" -> "RadialBasisFunction", "MulticlassMethod" -> "OneVersusAll"}, PerformanceGoal -> "Quality"]; Magnify[{#, c[#]} & /@ Flatten[{RandomSample[positive[[$trainSize + 1 ;;]], 10], RandomSample[negative[[$trainSize + 1 ;;]], 10]}] // Transpose // Grid, 0.5] cm = ClassifierMeasurements[c, testingData]; cm["Accuracy"] 0.796954 cm["ConfusionMatrixPlot"] Response to Answer Thanks Vitalyi, great start, yes 79% is not terrible! Unfortunately, this is not working for any images that have real backgrounds. For example: What do we need to to to make the detector more robust to the logo signal? This is the heart of the problem!

I am reposting here my answer for convenience

This seems logical to me (works same efficiently without ConformImages but I just wanted to feature it):

dir =(*path to dir containing unzipped folders*);

ndir = FileNameJoin[{dir, "negative"}];
pdir = FileNameJoin[{dir, "positive"}];

nfiles = Import[ndir <> "/*.png"];
pfiles = Import[pdir <> "/*.png"];

negative = ConformImages[nfiles, 200];
positive = ConformImages[pfiles, 200];

$train = 100;

trainingData = <|"Apple" -> positive[[;;$train]], "None" -> negative[[;;$train]]|>;
testingData = <|"Apple" -> positive[[$train+1;;]], "None" -> negative[[$train+1;;]]|>;

c = Classify[trainingData, 
   Method -> {"SupportVectorMachine", 
     "KernelType" -> "RadialBasisFunction", 
     "MulticlassMethod" -> "OneVersusAll"}, 
   PerformanceGoal -> "Quality"];

Magnify[{#, c[#]} & /@ 
Flatten[{RandomSample[positive[[$trainSize + 1 ;;]], 10], 
  RandomSample[negative[[$trainSize + 1 ;;]], 10]}] // Transpose // Grid, 0.5]

enter image description here

cm = ClassifierMeasurements[c, testingData];

cm["Accuracy"]

0.796954

cm["ConfusionMatrixPlot"]

enter image description here

Response to Answer

Thanks Vitalyi, great start, yes 79% is not terrible! Unfortunately, this is not working for any images that have real backgrounds. For example: enter image description here

What do we need to to to make the detector more robust to the logo signal? This is the heart of the problem!

POSTED BY: Vitaliy Kaurov

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback