Skip to content

Cryptographic Hash

Labs for GNU/Linux · macOS · Windows

Hash function

A hash function is a function that takes in data of any length and outputs data of a fixed length, called its hash.

Hash functions are deterministic. With the same hash function, the same input data will always result in the same hash.

For a good hash function, the time needed to compute grows with the size of the data, because the hash is determined by all of the data. Cryptographic hash functions have this property.

Below are two pieces of data and their 256-bit SHA-2 hash.

Data (hexadecimal) Text (UTF-8) SHA-2 256 hash
(empty) (empty) e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
48656c6c6f20776f726c64 Hello world 64ec88ca00b268e5ba1a35678a1b5316d212f4f366b2477232534a8aeca37f3c

Because there are infinite possible inputs but only finitely many hash values, there must be infinite pieces of data that hashes to the same value. This is called collision.

Cryptographic hash function

A cryptographic hash function is also:

  • Irreversible. The only way to find possible inputs from a hash is to try inputs one by one and check if the hash matched. It is also infeasible to make any inference about the input data from the hash alone.

  • Chaotic. The mapping from data to hashes should seem random. Two similar pieces of data will result in seemingly uncorrelated hashes. Therefore, it is infeasible to construct an input that hashes to a given value (purposefully construct a collision).

Data (hexadecimal) SHA-2 256 hash
089f8954a605c945 e7ad94f8f1338ce71c0b033d881abafd978cb332a56f1fe2d79082565aafc5ee
089f8954a605c946 581c858eba8d0c8a9e418f35193ca72536f91f55a7610b7001d0c0330bc8b1b6

Hashing is useful when we need to make sure two pieces of data are the same without comparing any meaningful details in them.

Currently, the most commonly used and secure cryptographic hash functions are the SHA-2 and SHA-3 families.

Usages and insecurities

Hash algorithms can be used in the following ways:

  • Compare hashes only. Used when any meaningful details are secrets or unnecessary. Password verification.

  • Derive symmetric encryption key. The hash of the user input is used as the symmetric encryption key, rather than the user input itself.

  • Verify data. The hash is trusted by some other method, and then one computes the hash of the data to verity its authenticity.

In some cases, a slow hash algorithm is chosen to improve security.

Different attacks target different use cases. Some are more interested in producing a certain hash (collision or second preimage), while some are more interested in getting the input data (preimage).

An intelligent finding in the algorithm may make attacking faster than brute force. When such attacks are found, the hash function should be obsoleted.