# Merkle–Damgård construction – Wikipedia

In cryptanalysis, the

**Merkle–Damgård construction**or

**Merkle–Damgård hash function**is a method of building collision-resistant cryptanalytic hash functions from collision-resistant one-way compression functions. [ 1 ] : 145 This construction was used in the design of many popular hashish algorithms such as MD5, SHA-1 and SHA-2. The Merkle–Damgård construction was described in Ralph Merkle ‘s Ph.D. thesis in 1979. [ 2 ] Ralph Merkle and Ivan Damgård independently proved that the structure is strait : that is, if an appropriate pad dodge is used and the compression function is collision-resistant, then the hash officiate will besides be collision-resistant. [ 3 ] [ 4 ] The Merkle–Damgård hash function first applies an MD-compliant slog serve to create an remark whose size is a multiple of a fix phone number ( e.g. 512 or 1024 ) — this is because compression functions can not handle inputs of arbitrary size. The hash function then breaks the result into blocks of cook size, and processes them one at a time with the compression function, each time combining a block of the input with the output of the previous round. [ 1 ] : 146 In order to make the construction secure, Merkle and Damgård proposed that messages be padded with a padding that encodes the length of the original message. This is called

*length padding*or

**Merkle–Damgård strengthening**.

Merkle–Damgård hash construction In the diagram, the one-way compaction routine is denoted by *f*, and transforms two fixate distance inputs to an output of the lapp size as one of the inputs. The algorithm starts with an initial value, the low-level formatting vector ( IV ). The IV is a fix value ( algorithm or execution specific ). For each message stop, the compression ( or compacting ) function *f* takes the result so far, combines it with the message forget, and produces an intermediate result. The death block is padded with zeros as needed and bits representing the duration of the entire message are appended. ( See below for a detail distance padding exemplar. ) To harden the hash far, the final leave is then sometimes fed through a *finalisation function*. The finalization function can have several purposes such as compressing a bigger inner state ( the last result ) into a smaller output hashish size or to guarantee a better mix and avalanche consequence on the bits in the hashish sum. The finalization function is much built by using the compression serve. [ *citation needed* ] ( Note that in some documents a different terminology is used : the act of length padding is called “ finalization ”. [ *citation needed* ] )

## security system characteristics [edit ]

The popularity of this construction is due to the fact, proven by Merkle and Damgård, that if the one-way compression function *f* is collision insubordinate, then therefore is the hashish affair constructed using it. unfortunately, this structure besides has several undesirable properties :

- Second preimage attacks against long messages are always much more efficient than brute force.[5]
- Multicollisions (many messages with the same hash) can be found with only a little more work than collisions.[6]
- “Herding attacks”, which combines the cascaded construction for multicollision finding (similar to the above) with collisions found for a given prefix (chosen-prefix collisions). This allows for constructing highly specific colliding documents, and it can be done for more work than finding a collision, but much less than would be expected to do this for a random oracle.[7][8]
- Length extension: Given the hash H ( X ) { \displaystyle H ( X ) }
*X*, it is easy to find the value of H ( P a five hundred ( X ) ‖ Y ) { \displaystyle H ( { \mathsf { Pad } } ( X ) \|Y ) }*pad*is the padding function of the hash. That is, it is possible to find hashes of inputs related to*X*even though*X*remains unknown.[9] Length extension attacks were actually used to attack a number of commercial web message authentication schemes such as one used by Flickr.[10]

Wide organ pipe construction [edit ]

The Wide pipe hash construction. The intermediate chain values have been doubled. due to several geomorphologic weaknesses of Merkle–Damgård construction, particularly the length extension problem and multicollision attacks, Stefan Lucks proposed the use of the wide-pipe hash [ 11 ] alternatively of Merkle–Damgård construction. The wide-pipe hash is identical similar to the Merkle–Damgård structure but has a larger internal state size, meaning that the bit-length that is internally used is larger than the output bit-length. If a hash of *n* bits is desired, the compaction function *f* takes *2n* bits of chaining rate and *m* bits of the message and compresses this to an output signal of *2n* bits. therefore, in a concluding gradation a second compaction routine compresses the last internal hash value ( *2n* morsel ) to the final hash prize ( *n* bit ). This can be done arsenic plainly as discarding half of the stopping point *2n* -bit-output. SHA-512/224 and SHA-512/256 take this kind since they are derived from a form of SHA-512. SHA-384 and SHA-224 are similarly derived from SHA-512 and SHA-256, respectively, but the width of their pipe is much less than *2n* .

## Fast wide pipe construction [edit ]

The Fast wide pipe hashish construction. Half of chaining value is used in the compression function. It has been demonstrated by Mridul Nandi and Souradyuti Paul that the Widepipe hashish function can be made approximately twice as fast if the widepipe state can be divided in half in the follow manner : one half is input to the succeeding compression routine while the early half is combined with the output of that compression officiate. [ 12 ] The main mind of the hash construction is to forward half of the former chaining value ahead to XOR it to the output of the compaction function. In indeed doing the structure takes in longer message blocks every iteration than the original widepipe. Using the same function *f* as ahead, it takes *n* bit chaining values and *n+m* bits of the message. however, the monetary value to pay is the extra memory used in the construction for feed-forward .

## MD-compliant padding [edit ]

As mentioned in the introduction, the embroider scheme used in the Merkle–Damgård structure must be chosen carefully to ensure the security system of the scheme. Mihir Bellare gives sufficient conditions for a pad scheme to possess to ensure that the MD construction is batten : it suffices that the scheme be “ MD-compliant ” ( the original length-padding dodge used by Merkle is an exercise of MD-compliant embroider ). [ 1 ] : 145 Conditions :

- M { \displaystyle M } P a d ( M ). { \displaystyle { \mathsf { Pad } } ( M ). }
- If | M 1 | = | M 2 | { \displaystyle |M_ { 1 } |=|M_ { 2 } | } | P a five hundred ( M 1 ) | = | P a vitamin d ( M 2 ) |. { \displaystyle | { \mathsf { Pad } } ( M_ { 1 } ) |=| { \mathsf { Pad } } ( M_ { 2 } ) |. }
- If | M 1 | ≠ | M 2 | { \displaystyle |M_ { 1 } |\neq |M_ { 2 } | } P a five hundred ( M 1 ) { \displaystyle { \mathsf { Pad } } ( M_ { 1 } ) } P a five hundred ( M 2 ). { \displaystyle { \mathsf { Pad } } ( M_ { 2 } ). }

Where | X | { \displaystyle |X| } denotes the distance of X { \displaystyle ten } . With these conditions in identify, we find a collision in the MD hash serve *exactly when* we find a collision in the underlie compression function. therefore, the Merkle–Damgård construction is demonstrably dependable when the underlying compression routine is guarantee. [ 1 ] : 147

## Length padding example [edit ]

To be able to feed the message to the compression affair, the death block needs to be padded with ceaseless data ( broadly with zeroes ) to a fully barricade. For example, suppose the message to be hashed is “ HashInput ” ( 9 octet string, 0x48617368496e707574 in ASCII ) and the block size of the compression serve is 8 bytes ( 64 bits ). We get two blocks ( the padding octets shown with lightblue background tinge ) :

- 48 61 73 68 49 6e 70 75, 74
00 00 00 00 00 00 00

This implies that other messages having the same subject but ending with extra nothing at the end will result in the same hashish value. In the above exercise, another about identical message ( 0x48617368496e7075 7400 ) will generate the same hash prize as the original message “ HashInput ” above. In other words, any message having extra nothing at the end makes it identical with the one without them. To prevent this site, the first sting of the first base padding octet is changed to “ 1 ” ( 0x80 ), yielding :

- 48 61 73 68 49 6e 70 75, 74
80

00 00 00 00 00 00

To make it resistant against the duration extension approach, the message length is added in an extra block at the end ( shown with yellow background color ) :

- 48 61 73 68 49 6e 70 75, 74
80

00 00 00 00 00 00

,

00 00 00 00 00 00 00 09

however, most common implementations use a fix bit-size ( generally 64 or 128 bits in modern algorithm ) at a fix position at the end of the final obstruct for inserting the message length value ( see *SHA-1 pseudocode* ). far improvement can be made by inserting the length value in the last block if there is adequate space. Doing sol invalidate having an extra jam for the message duration. If we assume the length value is encoded on 5 bytes ( 40 bits ), the message becomes :

- 48 61 73 68 49 6e 70 75, 74
80

00

00 00 00 00 09

Read more: A Few Thoughts on Cryptographic Engineering

note that storing the message length out-of-band in metadata, or otherwise embedded at the start of the message is an effective moderation of the duration elongation attack [ *citation needed* ], arsenic long as annulment of either the message duration and checksum are both consider failure of integrity check .

## References [edit ]

*Handbook of Applied Cryptography*by Menezes, van Oorschot and Vanstone (2001), chapter 9.*Introduction to Modern Cryptography*, by Jonathan Katz and Yehuda Lindell. Chapman and Hall/CRC Press, August 2007, page 134 (construction 4.13).*Cryptography Made Simple*by Nigel Smart (2015), chapter 14.