Am I wrong to expect FunctionCompile to be faster than Compile? What can be done to increase the speed of this function? With the arrival of version 12.0, I dug up an old function I have been trying to speed up for years. The function transforms the Real vectors (xi,zi} and {yv,zv} to a new Real vector {x,y}.
fun[{xi_, zi_}, {yv_, zv_}] :=
Module[{t},
t = Sqrt[-(-1 +
yv^2)^3 (1 + (-1 + xi^2) yv^2)]; {(xi (-2 yv^5 t zi +
2 yv t (zi - 2 zv) + zv + 2 yv^3 t (xi^2 (zi - 2 zv) + 2 zv) +
yv^2 (-4 zi + 2 xi^2 zv) +
yv^8 (-4 (-1 + xi^2) zi + (-3 + 2 xi^2) zv) +
yv^4 (-4 (-3 + xi^2) zi + (-6 - 2 xi^2 + xi^4) zv) +
2 yv^6 ((-6 + 4 xi^2) zi - (-4 + xi^2 + xi^4) zv))),
((-1 + 2 xi^2) yv^12 zi - 2 yv t (zi - zv) - zv -
2 yv^3 t ((-3 + xi^2) zi + 3 zv) -
2 yv^7 t ((-1 + 2 xi^2) zi + zv - xi^2 zv) +
yv^2 (zi + (5 - 2 xi^2) zv) +
5 yv^8 ((-2 + xi^4) zi + (-1 + 4 xi^2 - 2 xi^4) zv) -
yv^4 ((5 + 2 xi^2) zi + (10 - 12 xi^2 + xi^4) zv) -
2 yv^5 t (-3 (-1 + xi^2) zi + (-3 + xi^2 + xi^4) zv) +
yv^10 ((5 - 4 xi^2 - 2 xi^4) zi + (1 - 6 xi^2 + 4 xi^4) zv) +
yv^6 ((10 + 4 xi^2 - 3 xi^4) zi + (10 - 24 xi^2 +
7 xi^4) zv))/(yv (-1 + yv^2) )}/((1 + (-2 + xi^2) yv^2 +
yv^4)^2 (zi - zv))]
I made a table to time the function on 100,000 runs. The uncompiled version takes about 6.5 seconds on my machine: Mac 3,5 GHz Intel Core i7 with macOS Mojave and 16 GB 1600 MHz DDR3 of memory.
tF = Timing[Table[fun[{.5, zi}, {5., 3.}], {zi, .5, 1.5, .00001}];] //
First
tF=6.54502 seconds
This is the same function compiled using the old Compile giving a considerable speed gain: (approx 35 times faster)
funCF = Compile[{{iPt, _Real, 1}, {vPt, _Real, 1}},
Module[{xi, zi, yv, zv, t},
{xi, zi} = iPt; {yv, zv} = vPt;
t = Sqrt[-(-1 +
yv^2)^3 (1 + (-1 + xi^2) yv^2)]; {(xi (-2 yv^5 t zi +
2 yv t (zi - 2 zv) + zv +
2 yv^3 t (xi^2 (zi - 2 zv) + 2 zv) +
yv^2 (-4 zi + 2 xi^2 zv) +
yv^8 (-4 (-1 + xi^2) zi + (-3 + 2 xi^2) zv) +
yv^4 (-4 (-3 + xi^2) zi + (-6 - 2 xi^2 + xi^4) zv) +
2 yv^6 ((-6 + 4 xi^2) zi - (-4 + xi^2 + xi^4) zv))),
((-1 + 2 xi^2) yv^12 zi - 2 yv t (zi - zv) - zv -
2 yv^3 t ((-3 + xi^2) zi + 3 zv) -
2 yv^7 t ((-1 + 2 xi^2) zi + zv - xi^2 zv) +
yv^2 (zi + (5 - 2 xi^2) zv) +
5 yv^8 ((-2 + xi^4) zi + (-1 + 4 xi^2 - 2 xi^4) zv) -
yv^4 ((5 + 2 xi^2) zi + (10 - 12 xi^2 + xi^4) zv) -
2 yv^5 t (-3 (-1 + xi^2) zi + (-3 + xi^2 + xi^4) zv) +
yv^10 ((5 - 4 xi^2 - 2 xi^4) zi + (1 - 6 xi^2 + 4 xi^4) zv) +
yv^6 ((10 + 4 xi^2 - 3 xi^4) zi + (10 - 24 xi^2 +
7 xi^4) zv))/(yv (-1 + yv^2) )}/((1 + (-2 + xi^2) yv^2 +
yv^4)^2 (zi - zv))]]
tCF = Timing[
Table[funCF[{.5, zi}, {5., 3.}], {zi, .5, 1.5, .00001}];] // First
tCF=.182818 seconds This is the function compiled with the new FunctionCompile (works only with Real64 numbers?) Gives less gain in speed compared to classic Compile(approx 23 times faster)
funCCF = FunctionCompile[
Function[{Typed[iPt, TypeSpecifier["PackedArray"]["Real64", 1]],
Typed[vPt, TypeSpecifier["PackedArray"]["Real64", 1]]},
Module[{xi, zi, yv, zv, t},
{xi, zi} = iPt; {yv, zv} = vPt;
t = Sqrt[-(-1 +
yv^2)^3 (1 + (-1 + xi^2) yv^2)]; {(xi (-2 yv^5 t zi +
2 yv t (zi - 2 zv) + zv +
2 yv^3 t (xi^2 (zi - 2 zv) + 2 zv) +
yv^2 (-4 zi + 2 xi^2 zv) +
yv^8 (-4 (-1 + xi^2) zi + (-3 + 2 xi^2) zv) +
yv^4 (-4 (-3 + xi^2) zi + (-6 - 2 xi^2 + xi^4) zv) +
2 yv^6 ((-6 + 4 xi^2) zi - (-4 + xi^2 + xi^4) zv))),
((-1 + 2 xi^2) yv^12 zi - 2 yv t (zi - zv) - zv -
2 yv^3 t ((-3 + xi^2) zi + 3 zv) -
2 yv^7 t ((-1 + 2 xi^2) zi + zv - xi^2 zv) +
yv^2 (zi + (5 - 2 xi^2) zv) +
5 yv^8 ((-2 + xi^4) zi + (-1 + 4 xi^2 - 2 xi^4) zv) -
yv^4 ((5 + 2 xi^2) zi + (10 - 12 xi^2 + xi^4) zv) -
2 yv^5 t (-3 (-1 + xi^2) zi + (-3 + xi^2 + xi^4) zv) +
yv^10 ((5 - 4 xi^2 - 2 xi^4) zi + (1 - 6 xi^2 + 4 xi^4) zv) +
yv^6 ((10 + 4 xi^2 - 3 xi^4) zi + (10 - 24 xi^2 +
7 xi^4) zv))/(yv (-1 +
yv^2) )}/((1 + (-2 + xi^2) yv^2 + yv^4)^2 (zi - zv))]]]
tCCF = Timing[
Table[funCCF[{.5, zi}, {5., 3.}], {zi, .5, 1.5, .00001}];] // First
tCCF=.288228 seconds
Attachments: