As a general concept, it sounds like you're describing something similar to what is called computational aesthetics. I haven't kept up with the overall progress, but have worked on two specific issues of interest to me (part of my job is being a professional photographer and digital imagine consultant). They are:
Auto-cropping. There are a variety of research papers on how to do this, but each one seems to scaffold yet more heavy-duty math on top of the last. I wanted to see if there was a way to do this more "simply" using a DNN. I played with it a bit, but it wasn't obvious to me how to encode the data (which is effectively a transform rather than a single sample), and the available training data also didn't fit very well with the way I wanted to model it. Part of my problem is I don't know much about parallel DNNs or RNNs, and I think something like that is needed, so back to virtual school for me.
Choosing among a series of photos to pick the best one. I started on this nearly 20 years ago using analysis functions in Mathematica, and didn't come up with anything that could be generalized. Seems like GPUs & DNNs would provide much more powerful possibilities for revisiting this.
Some of the teams that seem to be doing a lot in this area are the obvious suspects: Google (Photos), Adobe (Research), and some of the other cloud vendors with millions (or billions) of photos.
Definitely something I'm interested in.