The convincing example given above, while self-consistent, does not exactly meet the definition given by Leurent and Peyrin,
Two message prefixes P and P′ are first given as challenge to the adversary, and his goal is to compute two messages M and M′ such that H(P‖M)=H(P′‖M′), where ‖ denotes concatenation. With such ability, the attacker can obtain a collision with arbitrarily chosen prefixes, potentially containing meaningful information.
This is somewhat confusing language because first the prefix is given then it is chosen, and the so-called "message" is assumed to be a string of nonsense characters. To get better definitions, we can go to the MD5 paper at Technische Universiteit Eindhoven (one of the best places to study cryptography, and "the house de Bruijin helped to build", see also: https://oeis.org/A006480).
https://www.win.tue.nl/hashclash/rogue-ca/
https://www.win.tue.nl/hashclash/EC07v2.0.pdf
A chosen-prefix collision is a pair of messages M and M′ that consist of arbitrarily chosen prefixes P and P′ (not necessarily of the same length), together with constructed suffixes S and S′, such that M=P‖S, M′=P′‖S′, and HASH(M) = HASH(M′). (Editor note. MD5 changed to general HASH).
Due to permutation symmetry between P's & S's, a chosen-prefix attack is essentially the same as a chosen-suffix attack. Note that, as long as constructed P or S is non-unique, constructing involves a choice. Let us give a more general form HASH(P||F||S) = HASH(P'||F'||S'), or:
Hash[pre1<>fix1<>suf1,"type"]==Hash[pre2<>fix2<>suf2,"type"]
We shall call this as the form of a chosen-fix attack, where F and F' are chosen to fix the equality test so that it inappropriately returns True. When either the prefixes P,P' or the suffixes S,S' are chosen NULL, we recover the form of a typical chosen-prefix attack, modulo a disagreement about what is being chosen and what has been given.
The questions posed above are made more difficult by the black-box nature of Mathematica; however, the Adler32 standard, sometimes described as "ancient" (ha ha, almost!), is easy enough to reverse engineer, assuming some constraints on the lexicon. The following example involves two randomly given suffixes, S,S' null prefixes P,P' and the goal is to quickly choose valid F, F'.
For simplicity sake, we restrict the lexicon to letters only, uppercase and lowercase. A homebrew hash function Ad32 is then tested for meaning against the entire lexicon:
Ad32[str_] := With[{vals = FoldList[Plus, 1,
ToCharacterCode[str]][[2 ;; -1]]},
Mod[vals[[-1]], 65521] + Mod[Total[vals], 65521]*2^16]
vars = Join[Alphabet[], ToUpperCase[Alphabet[]]];
StringJoin[Flatten[Table[RandomSample[vars], {i, 10}]]]
Ad32[StringJoin[%]]
Hash[StringJoin[%%], "Adler32"]
% == %%
This is not a collision, but allows us to proceed by splitting apart the first and second summation, which the hash computes as intermediary:
Ad32Part[str_] := With[{vals = FoldList[Plus, 1,
ToCharacterCode[str]][[2 ;; -1]]},
Mod[{vals[[-1]], Total[vals]}, 65521]]
Ad32Part["Wikipedia"]
This function can be called time and again by whatever black magic manages to reverse engineer the summation process and force convergence to zero. We found the following:
update1[str1_, str2_] := Module[{
vals = Ad32Part[#][[1]] & /@ {str1, str2},
salt = SortBy[RandomSample[vars][[1 ;; 2]],
ToCharacterCode[#] &], vals2, strs},
vals2 = If[Subtract @@ vals < 0, Reverse@vals, vals];
strs = If[Subtract @@ vals < 0, {str2, str1}, {str1, str2}];
If[Subtract @@ vals2 < 26,
{FromCharacterCode[90 - Subtract @@ vals2] <> strs[[1]],
"Z" <> strs[[2]]},
{salt[[1]] <> strs[[1]], salt[[2]] <> strs[[2]]}]]
update2[str1_, str2_] := Module[{
vals = Ad32Part[#][[2]] & /@ {str1, str2},
salt = SortBy[RandomSample[vars][[1 ;; 2]],
ToCharacterCode[#] &], vals2, strs},
vals2 = If[Subtract @@ vals < 0, Reverse@vals, vals];
strs = If[Subtract @@ vals < 0, {str2, str1}, {str1, str2}];
If[Subtract @@ vals2 < 26,
{FromCharacterCode[90 - Subtract @@ vals2] <> "Z" <> strs[[1]],
"Z" <> FromCharacterCode[90 - Subtract @@ vals2] <> strs[[2]]},
{salt[[1]] <> salt[[2]] <> strs[[1]],
salt[[2]] <> salt[[1]] <> strs[[2]]}]]
Ad32Collide[init_] :=
NestWhile[update2 @@ # &, NestWhile[update1 @@ # &,
init, Part[Subtract @@ (Ad32Part /@ #), 1] != 0 &],
Part[Subtract @@ (Ad32Part /@ #), 2] != 0 &]
With nice similarities between subfunctions Update1 and Update2. As long as we input randomized strings drawn from the alphabetic lexicon, the function Ad32Collide should always halt and return inputs with identical Adler32 hash values, example:
str1 = StringJoin[RandomChoice[vars, 200]]
str2 = StringJoin[RandomChoice[vars, 200]]
AbsoluteTiming[Ad32Collide[{str1, str2}]]
(Hash[#, "Adler32"] & /@ %[[2]])
Subtract @@ %
The above is a relatively nice output, with a short prefix. Here's another example with probably too much textual bloating, but it prints zero nonetheless:
Questions for students. Prove halting of Ad32Collide and discuss possible performance issues. Design another, better Ad32Collide and explain its relative pros and cons.
The good news is that security researchers already understand the limitations of Adler32. The bad news is that "MD5" and "SHA" are both broken. For another relevant example, consider the given md5 checksum "259a0b9688fa1829924497ef355816de" (source). An adversary who has access to the Mathematica source could possibly serve you a surreptitious installation file:
local://Mathematica_${pkgver}_LINUX.sh
When pacman gets to "check integrity" (Cf. pkgbuild#Integrity) it can not detect a collision hidden by either md5 or sha1. Even though we assume the staff at W.R. trustworthy, be aware of the possibility of other rogue packages!
The other bad news is that the analogy between sha1 and sha2 implies that sha2 could see similar breaking someday in the future. What is going to happen when hashclash meets hashcash?