How to utilize >8 CPU cores computing instance with free Wolfram Engine?

My use case: Having a heavy matrix multiplication with numerical integral computation job, need to utilize parallel to speed up.

Launched a 72 cores CPU instance and installed Wolfram Engine, but find out the speed is not significantly faster than my local machine (6-cores)

And realized only 8 CPU cores are utilized, and realized Wolfram Engine can only launch 8 kernels, each kernel corresponding to a single CPU core.

Is there a way to launch more kernels and utilize the rest CPU cores? (Not considering contacting sales, I don't have that time or patience or budget).

