# Cryptographic hash function – Wikipedia

Hash function that is suitable for consumption in cryptography
A cryptanalytic hashish serve ( specifically SHA-1 ) at employment. A belittled change in the remark ( in the word “ over ” ) drastically changes the output signal ( digest ). This is the alleged avalanche effect
A cryptographic hash function ( CHF ) is a numerical algorithm that maps data of an arbitrary size ( frequently called the “ message ” ) to a act align of a fix size ( the “ hash measure “, “ hashish ”, or “ message digest ” ). It is a one-way function, that is, a affair for which it is practically impracticable to invert or reverse the calculation. [ 1 ] Ideally, the only way to find a message that produces a given hashish is to attempt a brute-force search of possible inputs to see if they produce a equal, or use a rainbow table of matched hashes. Cryptographic hash functions are a basic creature of modern cryptanalysis. A cryptanalytic hash function must be deterministic, meaning that the lapp message always results in the like hash. Ideally it should besides have the pursue properties :

• it is quick to compute the hash value for any given message
• it is infeasible to generate a message that yields a given hash value (i.e. to reverse the process that generated the given hash value)
• it is infeasible to find two different messages with the same hash value
• a small change to a message should change the hash value so extensively that a new hash value appears uncorrelated with the old hash value (avalanche effect)[2]

cryptanalytic hash functions have many information-security applications, notably in digital signatures, message authentication codes ( MACs ), and other forms of authentication. They can besides be used as ordinary hash functions, to index data in hash tables, for fingerprint, to detect duplicate data or uniquely identify files, and as checksums to detect accidental data corruption. indeed, in information-security context, cryptanalytic hash values are sometimes called ( digital ) fingerprints, checksums, or barely hash values, even though all these terms stand for more general functions with quite different properties and purposes. [ 3 ]

## Properties

Most cryptanalytic hashish functions are designed to take a bowed stringed instrument of any length as input and produce a fixed-length hash value. A cryptanalytic hash serve must be able to withstand all known types of cryptanalytic attack. In theoretical cryptography, the security level of a cryptanalytic hash function has been defined using the follow properties :

Pre-image resistance
Given a hash value h, it should be difficult to find any message m such that h = hash ( m ). This concept is related to that of a one-way function. Functions that lack this property are vulnerable to preimage attacks.
Second pre-image resistance
Given an input m 1, it should be difficult to find a different input m 2 such that hashish ( m 1 ) = hash ( m 2 ). This property is sometimes referred to as weak collision resistance. Functions that lack this property are vulnerable to second-preimage attacks.
Collision resistance
It should be difficult to find two different messages m 1 and m 2 such that hashish ( m 1 ) = hashish ( m 2 ). Such a pair is called a cryptographic hash collision. This property is sometimes referred to as strong collision resistance. It requires a hash value at least twice as long as that required for pre-image resistance; otherwise collisions may be found by a birthday attack.

collision immunity implies second pre-image resistance but does not imply pre-image electric resistance. The weaker premise is always preferred in theoretical cryptanalysis, but in practice, a hash-function which is only second pre-image resistant is considered insecure and is consequently not recommended for real applications. colloquially, these properties mean that a malicious adversary can not replace or modify the input data without changing its digest. therefore, if two strings have the same digest, one can be very confident that they are identical. second pre-image resistor prevents an attacker from crafting a document with the lapp hash as a document the attacker can not control. Collision resistance prevents an attacker from creating two distinct documents with the same hash. A serve meet these criteria may still have undesirable properties. presently, popular cryptanalytic hashish functions are vulnerable to length-extension attacks : given hash ( m ) and len ( m ) but not m, by choosing a suitable m ′ an attacker can calculate hash ( mm ′ ), where ∥ denotes chain. [ 6 ] This property can be used to break naive authentication schemes based on hashish functions. The HMAC construction works around these problems. In exercise, collision resistance is insufficient for many virtual uses. In addition to collision resistance, it should be impossible for an adversary to find two messages with substantially like digests ; or to infer any useful information about the datum, given lone its digest. In particular, a hash function should behave vitamin a much as possible like a random function ( frequently called a random prophet in proof of security ) while even being deterministic and efficiently computable. This rules out functions like the SWIFFT serve, which can be rigorously proven to be collision-resistant assuming that certain problems on ideal lattices are computationally difficult, but, as a linear affair, does not satisfy these extra properties. Checksum algorithm, such as CRC32 and early cyclic redundancy checks, are designed to meet much weaker requirements and are by and large undesirable as cryptanalytic hash functions. For example, a CRC was used for message integrity in the WEP encoding standard, but an attack was promptly discovered, which exploited the one-dimensionality of the checksum .

### Degree of trouble

In cryptanalytic practice, “ unmanageable ” by and large means “ about surely beyond the reach of any adversary who must be prevented from breaking the system for adenine long as the security of the system is deemed significant ”. The think of of the condition is therefore slightly subject on the application since the attempt that a malicious agent may put into the undertaking is normally proportional to their expected reach. however, since the want attempt normally multiplies with the digest length, evening a thousand-fold advantage in processing power can be neutralized by adding a few twelve bits to the latter. For messages selected from a specify dress of messages, for exercise passwords or early light messages, it can be feasible to invert a hash by trying all possible messages in the specify. Because cryptanalytic hash functions are typically designed to be computed quickly, special key derivation functions that require greater computing resources have been developed that make such brute-force attacks more difficult. In some theoretical analyses “ unmanageable ” has a specific mathematical mean, such as “ not solvable in asymptotic polynomial time “. such interpretations of difficulty are important in the analyze of demonstrably plug cryptanalytic hash functions but do not normally have a strong connection to virtual security. For exemplar, an exponential-time algorithm can sometimes still be fast enough to make a feasible attack. conversely, a polynomial-time algorithm ( e.g., one that requires n 20 steps for n -digit keys ) may be besides slowly for any practical use .

## illustration

An example of the electric potential consumption of a cryptanalytic hash is as follows : Alice poses a tough mathematics trouble to Bob and claims that she has solved it. Bob would like to try it himself, but would so far like to be sure that Alice is not bluffing. therefore, Alice writes down her solution, computes its hash, and tells Bob the hashish value ( whilst keeping the solution secret ). then, when Bob comes up with the solution himself a few days late, Alice can prove that she had the solution early by revealing it and having Bob hash it and check that it matches the hash value given to him earlier. ( This is an exemplar of a simpleton commitment dodge ; in actual practice, Alice and Bob will often be computer programs, and the confidential would be something less well spoofed than a claim puzzle solution. )

## Applications

### Verifying the integrity of messages and files

An authoritative lotion of plug hashes is the verification of message integrity. Comparing message digests ( hashish digests over the message ) calculated before, and after, transmission can determine whether any changes have been made to the message or file. MD5, SHA-1, or SHA-2 hash digests are sometimes published on websites or forums to allow confirmation of integrity for download files, [ 8 ] including files retrieved using file sharing such as mirroring. This practice establishes a chain of reliance deoxyadenosine monophosphate long as the hashes are posted on a trust locate – normally the originating site – authenticated by HTTPS. Using a cryptanalytic hashish and a chain of trust detects malicious changes to the charge. Non-cryptographic error-detecting codes such as cyclic redundancy checks lone prevent against non-malicious alterations of the file, since an intentional parody can promptly be crafted to have the collide code value .

### touch generation and verification

about all digital signature schemes require a cryptanalytic hash to be calculated over the message. This allows the touch calculation to be performed on the relatively small, statically sized hash digest. The message is considered authentic if the signature verification succeeds given the touch and recalculated hashish digest over the message. So the message integrity place of the cryptanalytic hash is used to create secure and effective digital key signature schemes .

A proof-of-work system ( or protocol, or affair ) is an economic quantify to deter denial-of-service attacks and other serve abuses such as spam on a network by requiring some work from the service petitioner, normally meaning processing time by a calculator. A key sport of these schemes is their asymmetry : the ferment must be reasonably hard ( but feasible ) on the petitioner side but easy to check for the servicing supplier. One popular system – used in Bitcoin mining and Hashcash – uses fond hash inversions to prove that function was done, to unlock a mining reward in Bitcoin, and as a good-will token to send an electronic mail in Hashcash. The sender is required to find a message whose hashish measure begins with a act of zero bits. The average influence that the sender needs to perform in order to find a valid message is exponential in the number of zero bits required in the hashish prize, while the recipient role can verify the robustness of the message by executing a individual hash routine. For case, in Hashcash, a transmitter is asked to generate a header whose 160-bit SHA-1 hash value has the first 20 bits as zero. The transmitter will, on average, have to try 219 times to find a valid header .

### file or data identifier

A message digest can besides serve as a means of faithfully identifying a charge ; respective source code management systems, including Git, Mercurial and Monotone, use the sha1sum of respective types of content ( charge contented, directory trees, ancestry data, etc. ) to uniquely identify them. Hashes are used to identify files on peer-to-peer filesharing networks. For exercise, in an ed2k link, an MD4 -variant hash is combined with the file size, providing sufficient information for locating file sources, downloading the file, and verifying its contents. Magnet links are another example. such file hashes are often the top hashish of a hash list or a hash tree which allows for extra benefits. One of the main applications of a hash function is to allow the fast look-up of data in a hash table. Being hash functions of a finical kind, cryptanalytic hash functions lend themselves well to this application besides. however, compared with standard hashish functions, cryptanalytic hash functions tend to be much more expensive computationally. For this cause, they tend to be used in context where it is necessary for users to protect themselves against the possibility of forgery ( the universe of data with the same digest as the expected data ) by potentially malicious participants .

## Hash functions based on block ciphers

There are several methods to use a block cipher to build a cryptanalytic hash serve, specifically a one-way compaction function. The methods resemble the freeze zero modes of process normally used for encoding. many long-familiar hashish functions, including MD4, MD5, SHA-1 and SHA-2, are built from block-cipher-like components designed for the purpose, with feedback to ensure that the resulting function is not invertible. SHA-3 finalists included functions with block-cipher-like components ( for example, Skein, BLAKE ) though the function finally selected, Keccak, was built on a cryptanalytic sponge rather. A criterion block zero such as AES can be used in place of these custom barricade ciphers ; that might be utilitarian when an implant system needs to implement both encoding and hashing with minimal code size or hardware area. however, that approach can have costs in efficiency and security. The ciphers in hash functions are built for hashing : they use large keys and blocks, can efficiently change keys every block, and have been designed and vetted for resistor to related-key attacks. general-purpose ciphers tend to have unlike invention goals. In particular, AES has key and block sizes that make it nontrivial to use to generate farseeing hashish values ; AES encoding becomes less efficient when the key changes each freeze ; and related-key attacks make it potentially less guarantee for practice in a hash officiate than for encoding .

## Hash function design

### Merkle–Damgård construction

The Merkle–Damgård hashish construction

A hash function must be able to process an arbitrary-length message into a fixed-length end product. This can be achieved by breaking the stimulation up into a series of equally size blocks, and operating on them in sequence using a one-way compression function. The compression function can either be specially designed for hash or be built from a obstruct cipher. A hash routine built with the Merkle–Damgård construction is arsenic tolerant to collisions as is its compaction serve ; any collision for the full hashish function can be traced back to a collision in the compression serve. The stopping point block processed should besides be uniquely length padded ; this is crucial to the security of this structure. This construction is called the Merkle–Damgård construction. Most common classical hashish functions, including SHA-1 and MD5, take this shape .

### Wide pipe versus minute pipe

A aboveboard application of the Merkle–Damgård construction, where the size of hashish output is adequate to the internal state size ( between each compression footprint ), results in a narrow-pipe hash plan. This design causes many built-in flaws, including length-extension, multicollisions, [ 9 ] long message attacks, generate-and-paste attacks, [ citation needed ] and besides can not be parallelized. As a leave, modern hash functions are built on wide-pipe constructions that have a larger inner state size – which range from tweaks of the Merkle–Damgård construction [ 9 ] to new constructions such as the quick study construction and HAIFA structure. [ 11 ] none of the entrants in the NIST hash serve contest use a classical Merkle–Damgård construction. interim, truncating the output of a longer hash, such as used in SHA-512/256, besides defeats many of these attacks. [ 13 ]

## Use in building other cryptanalytic primitives

Hash functions can be used to build early cryptanalytic primitives. For these other primitives to be cryptographically secure, care must be taken to build them correctly. message authentication codes ( MACs ) ( besides called keyed hash functions ) are often built from hash functions. HMAC is such a MAC. just as block ciphers can be used to build hash functions, hashish functions can be used to build block ciphers. Luby-Rackoff constructions using hash functions can be demonstrably secure if the underlying hash serve is plug. besides, many hashish functions ( including SHA-1 and SHA-2 ) are built by using a special-purpose forget cipher in a Davies–Meyer or other construction. That code can besides be used in a conventional mode of operation, without the lapp security guarantees ; for exercise, SHACAL, BEAR and LION. Pseudorandom number generators ( PRNGs ) can be built using hash functions. This is done by combining a ( clandestine ) random seed with a counter and hashing it. Some hashish functions, such as Skein, Keccak, and RadioGatún, output an randomly long stream and can be used as a current cipher, and stream ciphers can besides be built from fixed-length digest hash functions. Often this is done by beginning building a cryptographically procure pseudorandom count generator and then using its current of random bytes as keystream. SEAL is a stream zero that uses SHA-1 to generate inner tables, which are then used in a keystream generator more or less unrelated to the hashish algorithm. SEAL is not guaranteed to be as solid ( or weak ) as SHA-1. similarly, the key expansion of the HC-128 and HC-256 stream ciphers makes heavy use of the SHA-256 hash serve .

## concatenation

Concatenating outputs from multiple hash functions provide collision resistance equally adept as the strongest of the algorithm included in the concatenate leave. [ citation needed ] For example, older versions of Transport Layer Security ( TLS ) and Secure Sockets Layer ( SSL ) used concatenated MD5 and SHA-1 sums. This ensures that a method acting to find collisions in one of the hashish functions does not defeat data protected by both hashish functions. [ citation needed ] For Merkle–Damgård construction hash functions, the concatenate function is angstrom collision-resistant as its strongest component, but not more collision-resistant. [ citation needed ] Antoine Joux observed that 2-collisions lead to n -collisions : if it is feasible for an attacker to find two messages with the lapp MD5 hash, then they can find as many extra messages with that same MD5 hash as they desire, with no greater trouble. Among those n messages with the same MD5 hashish, there is likely to be a collision in SHA-1. The extra sour needed to find the SHA-1 collision ( beyond the exponential birthday research ) requires only polynomial time. [ 17 ]

## cryptanalytic hashish algorithm

There are many cryptanalytic hash algorithm ; this section lists a few algorithm that are referenced relatively much. A more extensive list can be found on the page containing a comparison of cryptanalytic hashish functions .

### MD5

MD5 was designed by Ronald Rivest in 1991 to replace an earlier hash affair, MD4, and was specified in 1992 as RFC 1321. Collisions against MD5 can be calculated within seconds which makes the algorithm inapplicable for most use cases where a cryptanalytic hash is required. MD5 produces a digest of 128 bits ( 16 bytes ) .
SHA-1 was developed as region of the U.S. Government ‘s Capstone project. The original specification – immediately normally called SHA-0 – of the algorithm was published in 1993 under the title Secure Hash Standard, FIPS PUB 180, by U.S. government standards agency NIST ( National Institute of Standards and Technology ). It was withdrawn by the NSA concisely after publication and was superseded by the revised adaptation, published in 1995 in FIPS PUB 180-1 and normally designated SHA-1. Collisions against the fully SHA-1 algorithm can be produced using the shatter attack and the hash function should be considered bankrupt. SHA-1 produces a hashish digest of 160 bits ( 20 bytes ). Documents may refer to SHA-1 as fair “ SHA ”, even though this may conflict with the other Secure Hash Algorithms such as SHA-0, SHA-2, and SHA-3 .
RIPEMD ( RACE Integrity Primitives Evaluation Message Digest ) is a class of cryptanalytic hash functions developed in Leuven, Belgium, by Hans Dobbertin, Antoon Bosselaers, and Bart Preneel at the COSIC research group at the Katholieke Universiteit Leuven, and first published in 1996. RIPEMD was based upon the design principles used in MD4 and is similar in performance to the more popular SHA-1. RIPEMD-160 has, however, not been broken. As the name implies, RIPEMD-160 produces a hashish digest of 160 bits ( 20 bytes ) .

### whirlpool

Whirlpool is a cryptanalytic hash routine designed by Vincent Rijmen and Paulo S. L. M. Barreto, who first base described it in 2000. Whirlpool is based on a substantially modified version of the Advanced Encryption Standard ( AES ). Whirlpool produces a hashish digest of 512 bits ( 64 bytes ) .
SHA-2 ( Secure Hash Algorithm 2 ) is a located of cryptanalytic hashish functions designed by the United States National Security Agency ( NSA ), beginning published in 2001. They are built using the Merkle–Damgård structure, from a one-way compaction function itself built using the Davies–Meyer structure from a ( classified ) specialized pulley cipher. SHA-2 basically consists of two hash algorithms : SHA-256 and SHA-512. SHA-224 is a variant of SHA-256 with different starting values and truncated output signal. SHA-384 and the lesser-known SHA-512/224 and SHA-512/256 are all variants of SHA-512. SHA-512 is more batten than SHA-256 and is normally faster than SHA-256 on 64-bit machines such as AMD64. The output size in bits is given by the elongation to the “ SHA ” diagnose, indeed SHA-224 has an output signal size of 224 bits ( 28 bytes ) ; SHA-256, 32 bytes ; SHA-384, 48 bytes ; and SHA-512, 64 bytes .
SHA-3 ( Secure Hash Algorithm 3 ) was released by NIST on August 5, 2015. SHA-3 is a subset of the broader cryptanalytic crude family Keccak. The Keccak algorithm is the exercise of Guido Bertoni, Joan Daemen, Michael Peeters, and Gilles Van Assche. Keccak is based on a sponge construction which can besides be used to build other cryptanalytic primitives such as a flow zero. SHA-3 provides the lapp output sizes as SHA-2 : 224, 256, 384, and 512 bits. Configurable output sizes can besides be obtained using the SHAKE-128 and SHAKE-256 functions. here the -128 and -256 extensions to the name imply the security system strength of the affair rather than the output size in bits .

### BLAKE2

BLAKE2, an better adaptation of BLAKE, was announced on December 21, 2012. It was created by Jean-Philippe Aumasson, Samuel Neves, Zooko Wilcox-O’Hearn, and Christian Winnerlein with the goal of replacing the wide used but break MD5 and SHA-1 algorithm. When run on 64-bit x64 and ARM architectures, BLAKE2b is faster than SHA-3, SHA-2, SHA-1, and MD5. Although BLAKE and BLAKE2 have not been standardized as SHA-3 has, BLAKE2 has been used in many protocols including the Argon2 password hashish, for the high efficiency that it offers on modern CPUs. As BLAKE was a campaigner for SHA-3, BLAKE and BLAKE2 both offer the lapp output signal sizes as SHA-3 – including a configurable output size .

### BLAKE3

BLAKE3, an better translation of BLAKE2, was announced on January 9, 2020. It was created by Jack O’Connor, Jean-Philippe Aumasson, Samuel Neves, and Zooko Wilcox-O’Hearn. BLAKE3 is a single algorithm, in contrast to BLAKE and BLAKE2, which are algorithm families with multiple variants. The BLAKE3 compression officiate is closely based on that of BLAKE2s, with the biggest difference being that the number of rounds is reduced from 10 to 7. internally, BLAKE3 is a Merkle tree, and it supports higher degrees of parallelism than BLAKE2 .

## Attacks on cryptanalytic hash algorithm

There is a retentive list of cryptanalytic hash functions but many have been found to be vulnerable and should not be used. For exemplify, NIST selected 51 hashish functions [ 19 ] as candidates for rung 1 of the SHA-3 hashish contest, of which 10 were considered broken and 16 showed meaning weaknesses and therefore did not make it to the adjacent round ; more information can be found on the main article about the NIST hash officiate competitions. even if a hash function has never been broken, a successful attack against a weakened version may undermine the experts ‘ assurance. For case, in August 2004 collisions were found in several then-popular hash functions, including MD5. [ 20 ] These weaknesses called into interrogate the security of stronger algorithm derived from the unaccented hashish functions – in particular, SHA-1 ( a strengthened version of SHA-0 ), RIPEMD-128, and RIPEMD-160 ( both strengthened versions of RIPEMD ). [ 21 ] On August 12, 2004, Joux, Carribault, Lemuel, and Jalby announced a collision for the full SHA-0 algorithm. Joux et alabama. accomplished this using a generalization of the Chabaud and Joux attack. They found that the collision had complexity 251 and took about 80,000 CPU hours on a supercomputer with 256 Itanium 2 processors – equivalent to 13 days of full-time use of the supercomputer. [ citation needed ] In February 2005, an attack on SHA-1 was reported that would find collision in about 269 hashing operations, rather than the 280 expected for a 160-bit hash function. In August 2005, another approach on SHA-1 was reported that would find collisions in 263 operations. other theoretical weaknesses of SHA-1 have been known : [ 22 ] [ 23 ] and in February 2017 Google announced a collision in SHA-1. [ 24 ] Security researchers recommend that modern applications can avoid these problems by using late members of the SHA family, such as SHA-2, or using techniques such as randomize hashing [ 1 ] that do not require collision resistor. A successful, practical attack broke MD5 used within certificates for Transport Layer Security in 2008. [ 25 ] many cryptanalytic hashes are based on the Merkle–Damgård construction. All cryptanalytic hashes that immediately use the full output signal of a Merkle–Damgård construction are vulnerable to length reference attacks. This makes the MD5, SHA-1, RIPEMD-160, Whirlpool, and the SHA-256 / SHA-512 hash algorithm all vulnerable to this particular attack. SHA-3, BLAKE2, BLAKE3, and the truncate SHA-2 variants are not vulnerable to this type of attack. [ citation needed ]