I'm still puzzled why the CPU load was so low; it never went above
30%.
How many physical memory do you have? The usual reason for low CPU load is insufficient physical memory which causes swapping. You can get the memory required for evaluation of your code by evaluating MaxMemoryUsed[]
after the code. Note that the value returned includes only memory used by the Kernel and does not include the memory used by the FrontEnd.