Indeed Sam, your point is valid.
My approach was a probabilistic one. Get the probability of the previous input being 1.
For this I need to generate all possible outputs and get the mean, this would be the probability.
This way we could get a sense of what are the possible predecessors of a given configuration.
But I wasn't able to properly train it, not as fast as the previous approach. And, I believe, there must be an easy solution that can be trained pretty quickly.