A Few Thoughts on Cryptographic Engineering
now a son of warn : what I ’ thousand going to talk about today is fairly askew and ( worse ), involves hash functions. If you ’ re not feeling up for this, this is your cue to bail and go read something decent about buffer zone overflows .
design of hash functions, and more specifically: the For those still with me, the submit of this post is theof hash functions, and more specifically : the indifferentiability validation that designers write to argue for their security. I was surprised to find that most people have never heard of these proofs, and thus have no theme why they ’ rhenium utilitarian. That ’ s excessively bad, since they ’ re extremely important to the room we analyze hashish functions today .
Merkle- Damgård
This is not Ivan Damgård. ( seriously Google ? ) |
The best way to begin any discussion of hashish serve design is to take a promptly glance inside of the hash functions we actually use. Since the most popular hashes nowadays are MD5 ( ugh ) and SHA, the right place to start is with the ‘ Merkle-Damgård ’ prototype .
To understand Merkle-Damgård, you need to understand that cryptographers love to build complicate things out of simple components. Under the hood of most block ciphers you ’ ll find S-boxes. similarly, if you take the hat off a Merkle-Damgård hash function — surprise ! — you find block ciphers. Or at least, something very much like them .
This approach dates to a 1979 marriage proposal by a young cryptanalyst named Ralph Merkle. What Merkle showed is a way to build hash functions with a variable-length stimulation, using any pay back one-way compression function ( a one-way serve that spits out fewer bits than it takes in ). While Merkle wasn ’ deoxythymidine monophosphate specific about the function, he suggested that DES might be a good campaigner .
Expressed as a colorful diagram, the Merkle construction looks something like this :
Merkle-Damgård construction ( source : Wikipedia because I ’ meter besides faineant to draw my own diagram ). IV is a specify value. f is a one-way compression function . |
The beauty of Merkle ’ second marriage proposal is that it ’ s relatively childlike to understand. You just chop your message into blocks, then feed each block into the function f along with the output signal of the former function evaluation. Throw in a finalization stagecoach and you ’ re done .
Of course there ’ s a deviation between proposing a technique and showing that it actually works. It would take ten more years, but at CRYPTO 1989, Merkle and another cryptanalyst named Ivan Damgård independently submitted formal analyses of Merkle ’ s proposal. What they showed is that as long as the function f has certain ideal properties, the resulting hashish function is guaranteed to be collision-resistant . The rest, as they say, is history .
The popularity of Merkle-Damgård can be attributed in part to its security system proof. But it besides owes something to some major practical advantages :
- f, with just a few You can use any fasten block cipher as the function, with just a few tweaks
- M-D hash functions can be pretty damn fast, again depending onf and how you use it .
- M-D hashes allow you to digest ‘ exist ’ data streams, where youdon’t know in overture how much data you ’ re going to be hashing .
length extension attackunknown message M, can ‘tack on’ additional blocks of her own choosing. This issue spells big trouble for people who think that H(key || message) is a good Message Authentication Code. Of path, Merkle-Damgård hashes besides have unplayful weaknesses. The most celebrated is the ‘ ‘ in which an attacker, given only H ( M ) for somemessage M, can ‘ tack on ’ extra blocks of her own choose. This issue spells big perturb for people who think that H ( key || message ) is a good What ’ s interesting about the length-extension issue is not that it leads to broken MACs. I mean, that is interest, and it ’ s why you should use HMAC. But what ’ s really concern is that this flawdoesn’t represent a irreverence of the collision-resistance guarantee. The two issues are in fact wholly orthogonal. And this tells us something fundamental. namely :collision-resistance is not enough. today ’ south implementers do all kinds of crazy things with hashish functions, and many of those applications require much more than collision-resistance. To achieve the necessity properties, we first need to figure out what they are. And that requires us to think hard about the following question :
What the heck is a secure hash function?
If you crack a typical security textbook ( or visit the page on hash functions ), you ’ ll see a hanker list of things of things a hash routine ‘ must ’ achieve. The list normally starts with these :Wikipedia page on hash functions
- Collision resistance. It should be unvoiced to find any pair of messages M1, M2 such that H ( M1 ) == H ( M2 ) .
- Pre-image resistance. Given only h it should be hard to find a ‘ pre-image ’ M2 such that H ( M2 ) == h .
immediately leave aside the technical fact that none of the atonal hash functions we use today are ‘ sincerely ’ collision-resistant. Or that the above definition of pre-image resistance implies that I can hash my cat ’ randomness name ( ‘ downy ’ ) and cipher can invert the hash ( note : not true. Go ask LinkedIn if you don ’ thymine believe me. ) The real problem is that these definitions don ’ metric ton cover the things that people actually do with hash functions .
For example, take the structure of PRNGs. A park PRNG invention hashes together large pools of collected information in the hope that the resultant role will be sufficiently uniform for cryptanalytic work. This is then coarse that it ’ s probably happening somewhere on your computer right immediately.
And so far, absolutely nothing in the definitions above implies that this technique is safe ! * exchangeable problems exist for winder derivation functions, and even for signature schemes like ECDSA which distinctly require hash functions that are more than just collision-resistant .
The more you look into the way that people use hash functions, the more you realize that they in truth need something that produces ‘ random-looking ’ output. unfortunately, this notion is surprisingly hard to formalize. Hash functions are atonal, so they ’ ra not pseudo-random functions. What in the earth are people asking for ?
Random oracles and indifferentiability
The answer, if you dig arduous adequate, is that people want hashish functions to be random oracles .
Random oracles are cryptographers ’ creation of what an ‘ ideal ’ hash function should be. Put succinctly, a random oracle is a absolutely random function that you can evaluate quickly. random functions are beautiful not fair because the output is random-looking ( of course ), but besides because they ’ re automatically collision-resistant and pre-image tolerant. It ’ s the only prerequisite you always need .
The problem with random functions is that you precisely can’t evaluate them quickly : you need exponential memory space to keep them, and exponential time to evaluate one. furthermore, we know of nothing in the ‘ real ’ world that can approximate them. When cryptographers try to analyze their schemes with random functions, they have to go off into an fanciful illusion global that we call the ‘ random prophet model ‘ .
But ok, this stake is not to judge. For the consequence, let ’ s think that we are willing to visit this illusion world. An obvious interrogate is : what would it take to build a random prophet ? If we had a compaction function that was good enough — itself a random affair — could we use a proficiency like Merkle-Damgård to get the rest of the way ?
In 2004 , Maurer, Renner and Holenstein gave us a knock-down creature for answering this doubt. What they showed is that it ’ second constantly possible to replace functionality A (e.g., a random oracle ) with another functionality B (e.g., an ideal compaction routine ) provided that the follow rules are satisfied :
- There exists a way to ‘ construct ’ something ‘ like ’ A extinct of B .
- There exists a direction to ‘ imitate ’ something ‘ like ’ B using A .
- An attacker who interacts with { construct A-like thing, B } can not tell the difference ( i.e., can ’ deoxythymidine monophosphate differentiate it ) from { A, simulated B-like thing }
The definition of simulation gets a bit rickety. but expressed in bare speech all this means is : if you can show that your hash officiate, instantiated with an ‘ ideal ’ compression function, looks indistinguishable from a random oracle . And you can show that a manufacture compaction affair, built using a random oracle as an ingredient, looks indistinguishable from an ideal compression function, then you can always replace one with the other. That is, your hash function is ‘ good enough ’ to be a random oracle .
The stick to year, Coron, Dodis, Malinaud and Puniya applied this framework to Merkle-Damgård-hash functions. Their first base result was immediate : such a proofread does not work for Merkle-Damgård. Of course this shouldn ’ thymine actually surprise us. We already know that Merkle-Damgård doesn ’ thyroxine behave like a random oracle, since random oracles don ’ thymine display length-extension attacks. still it ’ mho one thing to know this, and another to see a known problem actually turn up and screw up a proof. so far, no problem .
What Coron
et al.
showed next is much more interesting:
- They proved formally that Merkle-Damgård can be made indifferentiable from a random oracle, as long as you apply a prefix-free encoding to the input before hashing it. Prefix-free encodings prevent length-extensions by ensuring that no message can ever be a prefix of another.
- Next, they proved the security of HMAC applied to a Merkle-Damgård hash.
- Finally, and best of all, they showed that if you simply drop some bits from the last output block — something called a ‘chop’ construction — you can make Merkle-Damgård hashes secure with much less work.
The best part of Coron et al. ‘ second findings is that the chop construction is already ( unwittingly ) in place on SHA384, which is constructed by dropping some output bits from its big-brother hash SHA512. The modern hash variants SHA512/224 and SHA512/256 besides have this property. ** then this theoretical work already has one big payoff : we know that ( under certain assumptions ) these hashes may be better than some of the others .
And these results have bigger implications. nowadays that we know how to do this, we can repeat the action for just about every campaigner hashish function anyone proposes. This lets us immediately weed out obvious bugs, and avoid standardizing another hashish with problems like the length propagation attack. This process has become so common that all of the SHA3 candidates nowadays sport precisely such an indifferentiability proof .
Of path, in the real universe, indifferentiability only takes you thus far. It does tell us something, but it doesn ’ thyroxine tell us everything. Sure, if the compaction serve is perfective, you obtain a strong solution about the hashish function. But compression functions are never perfective. real compression functions have glitches and oddities that can make these theoretical results irrelevant. This is why we ’ ll constantly need smart people toarm wrestle over which hash we get to use future .
In conclusion
If I had it in me, I ’ d go on to talk about the SHA3 candidates, and the techniques that each uses to achieve security in this model. But this has already been a long military post, so that will have to wait for another time .
I want to say lone one final thing .
This is a practical blog, and I admit that I try to avoid theory. What fascinates me about this sphere is that it ’ s a great case of a space where hypothesis has immediately come to the aid of practice. You may think of hash functions as whizzing little bootleg boxes of ad-hoc machinery, and to some extent they are. But without theoretical analysis like this, they ’ five hundred be a whole set more ad-hoc. They might not even exercise .
Remember this when NIST last gets around to picking Keccak BLAKE .
Notes:
Read more: A Few Thoughts on Cryptographic Engineering
* For a farcical exemplar, imagine that you have a plug ( collision-resistant, pre-image immune ) hash function H. now construct a raw hash function H ’ such that H ‘ ( M ) = { “ long drawstring of 0s ” || H ( M ) }. This function is equally collision-resistant as the original, but won ’ t be identical utilitarian if you ’ ra generating keys with it .
** Thanks to Paulo Barreto for fixing numerous misprint and pointing out that SHA512/256 and /224 make excellent candidates for chop hashes !