1. Hash Algorithm Working Procedure (General Structure)
- Core Component: Hash algorithms repeatedly use a compression function, denoted as .
- Compression Function Inputs:
- An -bit input from the previous step, called the chaining variable ().
- A -bit block () of the message being hashed.
- Compression Function Output: An -bit output, which serves as the chaining variable for the next step.
- Initialization:
- The chaining variable starts with an initial value (), specified by the algorithm. This is often denoted as .
- Processing Blocks:
- The input message is divided into blocks: , each of size bits.
- The compression function is applied iteratively:
- for .
- Final Hash Value: The final output of the last compression step is the hash value (or message digest) of the entire message .
- Diagram Description (Figure 11.8):
- Shows a series of blocks representing the compression function .
- Each block takes the previous chaining variable ( of size ) and the current message block ( of size ) as input.
- Each block produces the next chaining variable ( of size ).
- Starts with and the first block .
- Ends with the final hash after processing the last block .
- Key:
- : Initial Value
- : Chaining variable after step
- : input block (message block)
- : Compression algorithm
- : Number of input blocks
- : Length of hash code (and chaining variable)
- : Length of input block
- Note on Compression: Often, the block size is larger than the hash output size , hence the term “compression”.
3. Motivation: Signing Long Messages
- Context: Digital signatures (e.g., RSA, discrete log-based) have limitations on the size of the message they can sign directly.
- Example: RSA message length is limited by the modulus size (e.g., - bits, or - bytes).
- Real-world messages (emails, files) are often much larger.
- Question: How to efficiently and securely compute signatures for large messages?
- Naive Approach (Similar to ECB mode in block ciphers):
- Divide the large message into smaller blocks , each small enough for the signature algorithm.
- Sign each block separately using the private key to get signatures .
- Diagram Description (Fig 11.1 - Insecure approach): Shows message blocks to . Each block is independently fed into a “sig” box (signing algorithm) with the private key , producing individual signatures to .

- Problems with the Naive Approach:
- Problem 1: High Computational Load:
- Asymmetric operations (like modular exponentiation in RSA) are computationally intensive.
- Signing and verifying many small blocks takes significant time and energy, especially for large messages (email attachments, multimedia).
- Problem 2: Message Overhead:
- The total signature length is the same as the message length (or larger).
- Example: A MB message yields a MB RSA signature, requiring MB total transmission.
- Problem 3: Security Limitations (Most Serious):
- Lack of protection for the whole message integrity.
- Attacks Possible:
- An attacker (Oscar) can remove blocks and their corresponding signatures.
- Oscar can re-order blocks and their signatures.
- Oscar can reassemble new messages from fragments of blocks and signatures from previous messages.
- Manipulations within a block are prevented by the signature, but manipulations of the blocks are not.
- Problem 1: High Computational Load:
- Desired Solution: One short signature for a message of arbitrary length.
- Solution using Hash Functions:
- Compute a short, fixed-size “fingerprint” (the hash value ) of the entire message using a cryptographic hash function .
- Sign only the hash value using the private key to get a single, short signature .
- Diagram Description (Fig 11.2 - Signing with hash function): Shows message blocks to being fed into a hash function . The single output hash value is then fed into the “sig” box with the private key to produce one signature .

4. Security Requirements of Hash Functions
- Key Characteristic: Unlike other crypto algorithms, hash functions typically do not use keys.
- Need for Security: Despite not encrypting or using keys, weaknesses in hash functions can compromise the security of applications (like digital signatures).
- Three Central Security Properties:
- Preimage Resistance (One-Wayness)
- Second Preimage Resistance (Weak Collision Resistance)
- Collision Resistance (Strong Collision Resistance)
- Diagram Description (Fig 11.4 - Three security properties):

4.1 Preimage Resistance (One-Wayness)
- Definition: Given a hash output , it must be computationally infeasible to find any input message such that .
- Analogy: Easy to compute from , but hard to compute from .
- Security Level: For an -bit hash function, a brute-force attack requires trying, on average, inputs.
4.2 Second Preimage Resistance (Weak Collision Resistance)
- Definition: Given a specific input message , it must be computationally infeasible to find a different input message (where ) such that .
- Importance: Essential for digital signatures. If an attacker can find a second preimage for a signed message , they could claim the signature for also applies to .
- Security Level: For an -bit hash function, a brute-force attack requires trying, on average, inputs to match the hash of the given .
4.3 Collision Resistance (Strong Collision Resistance)
- Definition: It must be computationally infeasible to find any pair of distinct input messages (where ) such that .
- Difference from Second Preimage: The attacker can choose both messages ( and ), whereas for second preimage, is fixed. This makes finding collisions generally easier.
- Importance: Prevents attackers from creating two different messages (e.g., a legitimate contract and a fraudulent one) that have the same hash, then tricking someone into signing the hash corresponding to the legitimate one, while later claiming it applies to the fraudulent one.
- Attack Method: Often relies on the Birthday Attack.
- Security Level: Due to the Birthday Attack, finding a collision for an -bit hash function requires approximately operations, significantly less than .
4.4 Summary: Properties of Hash Functions (Page 5 List)
- Arbitrary Message Size: Can be applied to messages of any size (though practical limits like SHA-512’s exist).
- Fixed Output Length: Produces a hash value of a fixed length .
- Efficiency: Computing is relatively easy and fast.
- Preimage Resistance: Given , infeasible to find such that . (One-way).
- Second Preimage Resistance: Given , infeasible to find such that . (Weak collision resistance).
- Collision Resistance: Infeasible to find any pair such that . (Strong collision resistance).
5. Collision Resistance and the Birthday Attack
- Pigeonhole Principle: Collisions must exist if the number of possible inputs is larger than the number of possible outputs. Hash functions map potentially infinite inputs to a fixed number () of outputs.
- Question: How hard is it to find a collision?
- Initial Guess: Might seem as hard as finding a second preimage (requiring operations for an -bit` hash).
- Surprising Result: Finding a collision is much easier, requiring only about operations due to the Birthday Attack.
5.1 The Birthday Paradox (Analogy)
- Question: How many people are needed in a room so there’s a reasonable chance (e.g., 50%) that at least two share the same birthday? (Assume 365 days).
- Intuition: Might guess around people.
- Actual Result: Far fewer people are needed.
- Calculating Probability of No Collision:
- 1 Person:
- 2 People: The second person must have a different birthday than the first.
- 3 People: The third person must have a different birthday than the first two.
- t People:
- Calculating Probability of At Least One Collision:
- Result for 50% Chance: For people:
- Result for 90% Chance: Achieved with only people.
5.2 Application to Hash Functions
-
Analogy Mapping:
- People → Messages
- Birthdays → Hash values
- Number of days (365) → Number of possible hash values (for an -bit` hash)
-
Attacker’s Goal (Collision Search): Hash different messages and check if any two messages produce the same hash value .
-
Probability of No Collision among Hash Values:
- Using product notation:
-
Approximation for Probability:
- Recall the approximation for small . Since for typical and large , we can use this.
- The arithmetic series sum is .
- Substituting the sum:
-
Finding the Number of Messages Needed for a Collision:
- Let be the desired probability of finding at least one collision.
- Take the natural logarithm:
- For large (which is typical in practice), , so .
- (Equation 11.1)
-
Most Important Consequence of Birthday Attack:
- The number of messages needed to find a collision with a reasonable probability is roughly proportional to the square root of the number of possible output values ().
- Implication: To achieve bits of security against collision attacks, the hash function needs an output length of bits.
-
Example (80-bit Hash):
- Hash output length bits.
- Desired success probability (50%).
- Calculate :
- Since ,
- So, approximately messages need to be hashed. This confirms the estimate ().
6. Hash Functions Based on Cipher Block Chaining (CBC)
- Concept: Use a symmetric block cipher (like DES, AES) in a CBC-like mode, but without a secret key, to construct a hash function.
- Rabin’s Proposal [RABI78] (Example):
- Divide the message into fixed-size blocks (block size matches the cipher’s block size).
- Use a symmetric encryption algorithm (e.g., DES).
- Set an initial value .
- Iterate: for .
- Note: This uses the previous “hash” as the “key” input to the block cipher for encrypting the current message block . (Or sometimes - the exact structure varies). The formula given is .
- The final hash code is .
- Similarity to CBC: Resembles Cipher Block Chaining mode of encryption.
- Key Difference: No secret key is used or required.
- Security Considerations:
- Birthday Attack: Like any hash code, this construction is subject to the birthday attack.
- Vulnerability Example (DES): If DES (which has a -bit block size and thus produces a -bit` hash code in this construction) is used, the system is vulnerable. Finding collisions requires only about operations, which is computationally feasible.