(1) Neither the current B-PSW nor the new one have known exceptions when coupled with Miller-Rabin base 2 (MR2). Empirical evidence suggests the new one to be more powerful. If I recall correctly, it has no known pseudoprimes even without MR2. I'd have to check the article to verify that statement though.
(2) I believe the new method will be around 5-10% slower in terms of amortized cost. It will actually be faster for "most" composites. It will lose however on primes and MR2 pseudoprimes.
Here is a reference implementation of the Lucas UV test utilized in the recent article by Baillie et al. It uses modifications of a fairly efficient approach to the original B-PSW test, as described in Prime Numbers: A Computational Perspective by Richard Crandall and Carl Pomerance (Springer, 2005). It also has had the benefit of exgtensive testing and debugging by Robert Baillie. If any errors remain they are my own, and likely are the result of bad editing on my part.
LucasUVTest[n_Integer] /; n > 2 && OddQ[n] :=
Catch[Module[{pP = 1, qQ, dD = 5, jJ, aA, halfn1, zbitlen, bitlen,
bits, xX, yY, uU, vV, ustrong, vstrong, uu2, vv2, uu2p1, vv2p1},
While[True, jJ = JacobiSymbol[dD, n];
If[jJ == -1, qQ = BitShiftRight[1 - dD, 2];
If[qQ == -1, pP = 5; qQ = 5; dD = 5];
Break[];];
If[jJ == 0 && Abs[dD] != n, Throw[{False, False, False}, "LVT"]];
If[dD < 0, dD = -dD + 2, dD = -dD - 2];];
aA = Mod[pP^2, n];
aA = Mod[aA*ModularInverse[qQ, n] - 2, n];
xX = aA;
yY = Mod[xX^2 - 2, n];
halfn1 = BitShiftRight[n + 1, 1];
zbitlen = IntegerExponent[halfn1, 2];
bitlen = BitLength[halfn1];
bits = Reverse[IntegerDigits[halfn1, 2]];
Do[If[j == zbitlen + 1, uu2 = Mod[pP*Mod[2*yY - aA*xX, n], n];
vv2 = Mod[(aA*aA - 4)*xX*qQ, n];
uu2p1 = Mod[pP*uu2 + vv2, n];
vv2p1 = Mod[dD*uu2 + pP*vv2, n];
If[OddQ[uu2p1], uu2p1 += n];
If[OddQ[vv2p1], vv2p1 += n];
uu2p1 = BitShiftRight[uu2p1, 1];
vv2p1 = BitShiftRight[vv2p1, 1];
ustrong = uu2p1 == 0;
vstrong = vv2p1 == 0;];
If[bits[[j]] == 1, xX = Mod[xX*yY - aA, n];
yY = Mod[yY^2 - 2, n];,(*else*)yY = Mod[xX*yY - aA, n];
xX = Mod[xX^2 - 2, n];];, {j, bitlen - 1, zbitlen + 1, -1}];
Do[If[xX == 0, vstrong = True];
yY = Mod[xX*yY - aA, n];
xX = Mod[xX^2 - 2, n];, {j, zbitlen, 1, -1}];
uU = Mod[aA*xX - 2*yY, n] == 0;
vV = Mod[PowerMod[qQ, halfn1 - 1, n]*xX - 2, n] == 0;
{uU, ustrong || vstrong, vV}], "LVT"]
We test the odd numbers from 3 to 10^7.
In[46]:= Timing[
testcases =
Union[Table[
Apply[And, LucasUVTest[n]] == PrimeQ[n], {n, 3, 10^7, 2}]]]
Out[46]= {537.117, {True}}
The kernel version will be substantially faster. Also the version currently used for the Lucas test can be checked directly using an internal context function, as below. The new Lucas test will be slightly slower (by less than a second for the same test) and will simply replace the one currently used. As the run above indicates, also it will have 0 rather than 125 pseudoprimes less than 10 million.
In[55]:= Timing[
tt1 = Table[NumberTheory`LucasTest[n], {n, 3, 10^7, 2}];]
tt2 = Table[PrimeQ[n], {n, 3, 10^7, 2}];
Length[Position[Transpose[{tt1, tt2}], {a_, b_} /; a != b]]
Out[55]= {15.0394, Null}
Out[57]= 125