Hello all,
I would like to be able to quantify when an image displays a blend of more than one object. Let's say I have trained a CNN on three classes of image: mouse, cat and dog, and I have lots of good images and the classifier is very accurate. But what I want to is be able to send it an image that is a blend of, say, a mouse and dog, and have it return the relative weights of the two detections. So, for example, if my image was the mean of an image of a mouse and a dog, it would return something close to {0.5->mouse, 0.5->dog}. I know that the final layer of a NN can be set to return relative weights rather than a decision, but my concern is that the other parts of a network trained to do classification are already effectively creating strong non-linearities that will try to push the output to one classification or the other. Is this concern valid, and if so, is there a way to construct or constrain the network to return a more linear 'blend' measure?
One possibility I did think of is to actually create a huge number of various blends from my original images, and train the network specifically to predict the three element 'blend weight' vector instead. But that will become rapidly intractable as the number of classes increases.
Gareth Russell