Below an image extracted from drone footage of my son's soccer ("football" in these parts of the world) practice session. . My goal is to track the position of players during the practice session and create some simple data analytics such as heat maps to share with the coaches. I'm exploring different approaches with Mathematica. One approach is to characterise the background carefully and subtract the background leaving only the players and their shadows. I've done the first step based on image subtraction from frames spaced by 1s which nicely isolates the players but also includes their shadows.
I'm thinking classification might be a good approach to distinguish four classes: players in orange shirts, players in blue shirts, shadows and other people (note the top part of the image can be masked out to focus only on what is happening on the pitch). I've sampled the RGB training signatures by drawing masks on top of them and extracting the pixels. However, I'm at a loss how to use these training sets in classification. Most examples I've seen focus on detecting animals, faces and others based on a series of training images. In my case I'm interested in classifying the content of a single image based on similarity to the training data from that same image.
Here an example if what I've tried.
imgdiff = ImageDifference[vidframes[[2]], vidframes[[1]]] (*image difference *)
shadowpixels = PixelValue[imgdiff, imgmaskshadow]; (* extract pixels in user defined mask for shadows*)
cf = ClusterClassify[Join[shadowpixels, orangplayerpixels]] (* create classifier based on representative shadow and orange shirt players*)
ClusteringComponents[imgdiff] (* this is what does not work *)
I also include a link to the first 10 images of the video sequence sampled at 1s.