First of all I'd like to put as a disclaimer that I am not an expert in machine learning in any way whatsoever, I'm just an interested layman who occasionally thinks about these stuff, and tonight I'm bored under that lockdown and I feel like asking a question on this subject that has been bothering me for quite a while. It's also late and I've learned that writing stuff during the night is usually a bad idea, but yolo, I guess.
That being said, here I go.
The gist of it is that I'm kind of wondering how machine learning algorithms deal with the concept of loss functions, as I suspect that there is a lot of room for improvement in this regard.
I first had this thought while I was watching a report about a CycleGAN demo that turned a horse into a zebra. It's easy to find on YouTube. Here is a gif :
What I immediately thought while looking at this video was :
but it only changes the colors, it doesn't change the shape.
That kind of bothered me. It seemed to me that such algorithm would only work to change an object into an other one with exact same shape. It would struggle to change say a horse into an elephant, or a dog into a cat. Now, I've seen transitions from dogs to cats before, so I suppose some algorithms are capable of changing shapes, but then why was this CycleGAN algorithm not doing it ? I know that the shape of a horse is very similar to that of a zebra, but still it is not the same, and it was clear on the video that the algorithm didn't change the shape at all, not even a little bit.
It then occurred to me that one of the most common loss functions used in machine learning is the good old Euclidean distance. Basically if an image can be seen as a real-valued function f(x,y) defined on the plane, then the euclidean distance between two images f and g is the integral of (f(x,y)-g(x,y))^2dxdy.
Such a loss function is terrible at measuring a displacement, like one that is performed by Mathematica's function ImageTransformation
.
If we consider a displacement function d[x,y] we can estimate its measure as the integral of d[x,y]^2. Let's call this measure M[d]. Then if we consider an image f
and its transformation g=ImageTransformation[f, d]
, I think M[d]
is just as good if not better a loss function than the euclidean distance between f and g. This observation is all the more true if we consider linear displacements for instance, where euclidean distance is quite terrible (it doesn't reflect at all the amplitude of the linear displacement).
My question is : from an unknown displacement function d
, could an algorithm infer M[d]
from an image f
and its transformation ImageTransformation[f, d]
? Or, to put it differently, given two images f
and g
, could an algorithm find the smallest measure m
such that there exists a continuous displacement d
such that g == ImageTransformation[f, d]
and m=M[d]
?
And, if there is no analytical solution to this problem, could a neural network be trained to find an approximation ? Basically it would use supervised learning with synthetic data made from random images and their respective transformations made with random displacement functions. The training would consist in estimating the measure of the applied displacement. Such a network would provide a loss function that could be used in complement with the Euclidean loss function.