It does what you requested, that is, sues fewer iterations (factor of 10). Also each iteration does less work (fewer matrix-times-vector products).
Here is the full code I used.
n = 5;
amat = HilbertMatrix[n]; b = amat.ConstantArray[1, n];
x0 = ConstantArray[0, n];
pmat = LowerTriangularize[amat];
np = N[pmat]; nb = Flatten[N[b]]; nx0 = N[x0];
normb = Norm[nb];
iP = Inverse[np];
Timing[
gradient2 =
NestWhileList[(# + ((b - amat.#).(b - amat.#)/(b -
amat.#).amat.(b - amat.#))* iP.(b - amat.#)) &,
nx0, (Norm[b - amat.#]/normb > 10^-10) &];]
Length[gradient2]
Last[gradient2]
(* Out[779]= {0.456000, Null}
Out[780]= 3574
Out[781]= {0.99999955485, 1.00000817563, 0.999965167212, 1.00005215021, 0.999974653858} *)