Message Boards Message Boards

Tuning YOLOv2 object detection neural networks on custom datasets

Posted 3 years ago

Hello community, please enjoy my latest project! I hope you can find it useful, and feel free to provide some feedback. I am also quite new to organizing libraries for the Wolfram Language, so any feedback there will be greatly appreciated!

Keywords: YOLO, YOLOV2, fine tuning, pre trained, object detection, neural networks, machine learning

POSTED BY: Alec Graves
4 Replies

Dear Alec, I am currently working on a wood defects detector for my bachelor graduation thesis, and I am trying to implement this using Yolov2.
I ran through your packages and your code you did great job writing them and I would like to mention that they are the only helpful resources i found on the internet for training an object detector on custom data using Mathematica.
However, I tried your Packages and each time I run the BuildYoloLoss package I get the following error FunctionLayer::compilerr: Cannot interpret ThreadingLayer[<>][#1, #2] & as a network. I am running Version 13.01, and since you wrote your code on earlier versions, I think there is an issue related to the syntax of the function layers specification in the GIOU Loss package. I would appreciate your help on that, and if you have other resources or insights on how to build similar loss functions and perform transfer learning on recent Yolo versions using Mathematica, as I cant find any. Thanks in advance.

POSTED BY: Bassel Harby
Posted 1 year ago

Thanks for letting me know that it is not working. I have not tested this on 13, but I just downloaded 13.2 and will take a look at it today.

POSTED BY: Alec Graves
Posted 1 year ago

It looks like in 13 some things about what could be compiled in FunctionLayer changed.

In 12.3, I had to write some ugly code to get the anchor box to box output conversion network to compile in a FunctionLayer, so before it looked like this:

FunctionLayer[Apply[Function[{anchorsIn, grid, conv}, Block[
(* This is a really bad function because in 12.2, FunctionLayer was not compiling nice functions. *)
(* ... *)
(* We assume the net has been reshaped to dimensions n x Anchors x dim x dim.  *)
{
  boxes = conv[[1 ;; 4]],
  classPredictions = conv[[6 ;;]],
  confidences = LogisticSigmoid[conv[[5]]]
},
(* We first need to construct our boxes to find the best fits. *)

Block[{
  boxesScaled =
      Join[((1.0/(inputSize/32.0)*Tanh[boxes[[1 ;; 2]]] // TransposeLayer[{2 <-> 3, 3 <-> 4}]) +
          (grid // TransposeLayer[{1 <-> 3, 3 <-> 2}]) // TransposeLayer[{4 <-> 2, 3 <-> 4}]),
        (Transpose @ (5*LogisticSigmoid[ boxes[[2 ;; 3]]]) * anchorsIn // Transpose)]},
  <|
    "boxes" -> boxesScaled,
    "confidences" -> confidences,
    "classes" -> classPredictions
  |>
]]]]]

In 13, it seems like compiling nested blocks/modules/whatever does not work anymore, but we can simplify this expression to a single module now without getting compilation errors.

FunctionLayer[Apply[Function[{anchorsIn, grid, conv}, Module[
    (* We assume the net has been reshaped to dimensions n x Anchors x dim x dim.  *)
    {
      boxes = conv[[1 ;; 4]],
      classPredictions = conv[[6 ;;]],
      confidences = LogisticSigmoid[conv[[5]]]
    },
    (* We first need to construct our boxes to find the best fits. *)
      <|
        "boxes" -> Join[((1.0/(inputSize/32.0)*Tanh[boxes[[1 ;; 2]]] // TransposeLayer[{2 <-> 3, 3 <-> 4}]) +
              (grid // TransposeLayer[{1 <-> 3, 3 <-> 2}]) // TransposeLayer[{4 <-> 2, 3 <-> 4}]),
            (Transpose @ (5*LogisticSigmoid[ boxes[[2 ;; 3]]]) * anchorsIn // Transpose)],
        "confidences" -> confidences,
        "classes" -> classPredictions
      |>
    ]]]]

Lastly, it seems the convention for calling ThreadingLayer and FunctionLayer with multiple arguments changed:

before, you could do

ThreadingLaye[...][x, y]

but in 13 you need to do:

ThreadingLayer[...][{x, y}]

And now it works again:

Training progress indicator

semantic box labelled picture of apples

I have pushed these changes to the Wolf Detector github repo, so older versions are probably broken now. But it now works in 13.2. It is also worth noting that 13 added the built-in function TrainImageContentDetector, but IDK what algorithm it is using. Maybe someone from WRI can chime in on that?

POSTED BY: Alec Graves

enter image description here -- you have earned Featured Contributor Badge enter image description here Your exceptional post has been selected for our editorial column Staff Picks http://wolfr.am/StaffPicks and Your Profile is now distinguished by a Featured Contributor Badge and is displayed on the Featured Contributor Board. Thank you!

POSTED BY: EDITORIAL BOARD
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract