This would be quite an endeavour I think. Your method sounds like space-carving:
https://www.youtube.com/watch?v=cGs90KF4oTc
or am I mistaking?
You could also use two photo of just slightly different angle (like human eyes) and then you can perceive depth using some image analysis (e.g. ImageDisplacements).
That does seem to be pretty much the right idea, seems I must at least be heading in the right direction. I am just struggling with the implementation.