Because I was doing feature extraction separately, I only had to construct the classification half of the classic CNN image processing architecture. Also, in the limited time (and with the mid-tier GPU in my laptop) I had to train the net, having a smaller number of weights made training a little faster, not to mention less likely to overfit my data (in a bad way :) ).