Ha! You are right it doesn't change the output, and removing it makes it 10x faster again. So the lesson here folks is improvements in the underlying function can make the biggest difference. In my defense I was following a template made by others, and now I have to figure out why, for example, the 'divide by the max value' step was in there. Maybe it is used later. But even if it is, it can be left out of the fitting process!
Having said this, I still want to know how one would code this as a direct numerical optimization, ideally specifying the 12 parameters as a single vector (list). It is possible?