Based on a performance comparison, "Quality" is the default.
Just for fun, here's a naive OpenMP-based C++ implementation with LTemplate. I can specialize for 3D vectors and gain a speedup of 6x on a MacBook Pro with a 4-core CPU. Unfortunately the system compiler doesn't support OpenMP on OS X, so I'm using gcc 5 form MacPorts to compile.
In[1]:= dat = RandomReal[{-1, 1}, {10000, 3}];
In[2]:= dm = DistanceMatrix[dat, dat]; // AbsoluteTiming
Out[2]= {1.05789, Null}
In[3]:= << LTemplate`
In[4]:= $CCompiler = {"Compiler" ->
CCompilerDriver`GenericCCompiler`GenericCCompiler,
"CompilerInstallation" -> "/opt/local/bin",
"CompilerName" -> "g++-mp-5",
"SystemCompileOptions" ->
"-O3 -m64 -fPIC -framework Foundation -framework mathlink"};
In[5]:= SetDirectory[$TemporaryDirectory];
code = "
#include <cmath>
inline double sqr(double x) { return x*x; }
struct DistMatrix {
mma::RealTensorRef distMat(mma::RealMatrixRef a, mma::RealMatrixRef b) {
mma::RealMatrixRef mat = mma::makeMatrix<double>(a.rows(), b.rows());
#pragma omp parallel for
for (mint i=0; i < a.rows(); ++i)
for (mint j=0; j < b.rows(); ++j)
mat(i,j) = std::sqrt( sqr(a(i,0) - b(j,0)) + sqr(a(i,1) - b(j,1)) + sqr(a(i,2) - b(j,2)) );
return mat;
}
};
";
Export["DistMatrix.h", code, "String"];
In[8]:= tem =
LClass["DistMatrix", {LFun[
"distMat", {{Real, 2, "Constant"}, {Real, 2, "Constant"}}, {Real,
2}]}];
In[9]:= CompileTemplate[tem,
"CompileOptions" -> {"-std=c++14", "-fopenmp"}];
In[10]:= LoadTemplate[tem]
In[11]:= obj = Make["DistMatrix"];
This was the setup part, now we're ready to benchmark:
In[12]:= dm2 = obj@"distMat"[dat, dat]; // AbsoluteTiming
Out[12]= {0.166779, Null}
In[13]:= dm == dm2
Out[13]= True
I should really look into how to make it easier to use LTemplate with C++ templates, and automatically specialize the code for the vector dimensionality. 3D vectors are hard-coded here.