How is the Dot product implemented in Mathematica?
Let's take for instance matrix multiplication.
I have a C++ function which does that in a more or less traditional way, but also making appropriate use of the cache memory, with parallelism and vectorization enabled.
Multiplying 2 matrices (1000x1000) takes ~1.6 sec in C++, on my machine. This is much slower than the same operation in Mathematica: ~ 0.18 sec.
I don't expect the source code. I'm just curious which generally known strategies are used, e.g. multi-threading.
if the input is packed arrays then it goes through MKL BLAS library code.