1. Hash Algorithm Working Procedure (General Structure)

  • Core Component: Hash algorithms repeatedly use a compression function, denoted as .
  • Compression Function Inputs:
    • An -bit input from the previous step, called the chaining variable ().
    • A -bit block () of the message being hashed.
  • Compression Function Output: An -bit output, which serves as the chaining variable for the next step.
  • Initialization:
    • The chaining variable starts with an initial value (), specified by the algorithm. This is often denoted as .
  • Processing Blocks:
    • The input message is divided into blocks: , each of size bits.
    • The compression function is applied iteratively:
      • for .
  • Final Hash Value: The final output of the last compression step is the hash value (or message digest) of the entire message .
  • Diagram Description (Figure 11.8):
    • Shows a series of blocks representing the compression function .
    • Each block takes the previous chaining variable ( of size ) and the current message block ( of size ) as input.
    • Each block produces the next chaining variable ( of size ).
    • Starts with and the first block .
    • Ends with the final hash after processing the last block .
    • Key:
      • : Initial Value
      • : Chaining variable after step
      • : input block (message block)
      • : Compression algorithm
      • : Number of input blocks
      • : Length of hash code (and chaining variable)
      • : Length of input block
  • Note on Compression: Often, the block size is larger than the hash output size , hence the term “compression”.

3. Motivation: Signing Long Messages

  • Context: Digital signatures (e.g., RSA, discrete log-based) have limitations on the size of the message they can sign directly.
    • Example: RSA message length is limited by the modulus size (e.g., - bits, or - bytes).
    • Real-world messages (emails, files) are often much larger.
  • Question: How to efficiently and securely compute signatures for large messages?
  • Naive Approach (Similar to ECB mode in block ciphers):
    1. Divide the large message into smaller blocks , each small enough for the signature algorithm.
    2. Sign each block separately using the private key to get signatures .
    • Diagram Description (Fig 11.1 - Insecure approach): Shows message blocks to . Each block is independently fed into a “sig” box (signing algorithm) with the private key , producing individual signatures to .
  • Problems with the Naive Approach:
    • Problem 1: High Computational Load:
      • Asymmetric operations (like modular exponentiation in RSA) are computationally intensive.
      • Signing and verifying many small blocks takes significant time and energy, especially for large messages (email attachments, multimedia).
    • Problem 2: Message Overhead:
      • The total signature length is the same as the message length (or larger).
      • Example: A MB message yields a MB RSA signature, requiring MB total transmission.
    • Problem 3: Security Limitations (Most Serious):
      • Lack of protection for the whole message integrity.
      • Attacks Possible:
        • An attacker (Oscar) can remove blocks and their corresponding signatures.
        • Oscar can re-order blocks and their signatures.
        • Oscar can reassemble new messages from fragments of blocks and signatures from previous messages.
      • Manipulations within a block are prevented by the signature, but manipulations of the blocks are not.
  • Desired Solution: One short signature for a message of arbitrary length.
  • Solution using Hash Functions:
    1. Compute a short, fixed-size “fingerprint” (the hash value ) of the entire message using a cryptographic hash function .
    2. Sign only the hash value using the private key to get a single, short signature .
    • Diagram Description (Fig 11.2 - Signing with hash function): Shows message blocks to being fed into a hash function . The single output hash value is then fed into the “sig” box with the private key to produce one signature .

4. Security Requirements of Hash Functions

  • Key Characteristic: Unlike other crypto algorithms, hash functions typically do not use keys.
  • Need for Security: Despite not encrypting or using keys, weaknesses in hash functions can compromise the security of applications (like digital signatures).
  • Three Central Security Properties:
    1. Preimage Resistance (One-Wayness)
    2. Second Preimage Resistance (Weak Collision Resistance)
    3. Collision Resistance (Strong Collision Resistance)
  • Diagram Description (Fig 11.4 - Three security properties):

4.1 Preimage Resistance (One-Wayness)

  • Definition: Given a hash output , it must be computationally infeasible to find any input message such that .
  • Analogy: Easy to compute from , but hard to compute from .
  • Security Level: For an -bit hash function, a brute-force attack requires trying, on average, inputs.

4.2 Second Preimage Resistance (Weak Collision Resistance)

  • Definition: Given a specific input message , it must be computationally infeasible to find a different input message (where ) such that .
  • Importance: Essential for digital signatures. If an attacker can find a second preimage for a signed message , they could claim the signature for also applies to .
  • Security Level: For an -bit hash function, a brute-force attack requires trying, on average, inputs to match the hash of the given .

4.3 Collision Resistance (Strong Collision Resistance)

  • Definition: It must be computationally infeasible to find any pair of distinct input messages (where ) such that .
  • Difference from Second Preimage: The attacker can choose both messages ( and ), whereas for second preimage, is fixed. This makes finding collisions generally easier.
  • Importance: Prevents attackers from creating two different messages (e.g., a legitimate contract and a fraudulent one) that have the same hash, then tricking someone into signing the hash corresponding to the legitimate one, while later claiming it applies to the fraudulent one.
  • Attack Method: Often relies on the Birthday Attack.
  • Security Level: Due to the Birthday Attack, finding a collision for an -bit hash function requires approximately operations, significantly less than .

4.4 Summary: Properties of Hash Functions (Page 5 List)

  1. Arbitrary Message Size: Can be applied to messages of any size (though practical limits like SHA-512’s exist).
  2. Fixed Output Length: Produces a hash value of a fixed length .
  3. Efficiency: Computing is relatively easy and fast.
  4. Preimage Resistance: Given , infeasible to find such that . (One-way).
  5. Second Preimage Resistance: Given , infeasible to find such that . (Weak collision resistance).
  6. Collision Resistance: Infeasible to find any pair such that . (Strong collision resistance).

5. Collision Resistance and the Birthday Attack

  • Pigeonhole Principle: Collisions must exist if the number of possible inputs is larger than the number of possible outputs. Hash functions map potentially infinite inputs to a fixed number () of outputs.
  • Question: How hard is it to find a collision?
  • Initial Guess: Might seem as hard as finding a second preimage (requiring operations for an -bit` hash).
  • Surprising Result: Finding a collision is much easier, requiring only about operations due to the Birthday Attack.

5.1 The Birthday Paradox (Analogy)

  • Question: How many people are needed in a room so there’s a reasonable chance (e.g., 50%) that at least two share the same birthday? (Assume 365 days).
  • Intuition: Might guess around people.
  • Actual Result: Far fewer people are needed.
  • Calculating Probability of No Collision:
    • 1 Person:
    • 2 People: The second person must have a different birthday than the first.
    • 3 People: The third person must have a different birthday than the first two.
    • t People:
  • Calculating Probability of At Least One Collision:
  • Result for 50% Chance: For people:
  • Result for 90% Chance: Achieved with only people.

5.2 Application to Hash Functions

  • Analogy Mapping:

    • People Messages
    • Birthdays Hash values
    • Number of days (365) Number of possible hash values (for an -bit` hash)
  • Attacker’s Goal (Collision Search): Hash different messages and check if any two messages produce the same hash value .

  • Probability of No Collision among Hash Values:

    • Using product notation:
  • Approximation for Probability:

    1. Recall the approximation for small . Since for typical and large , we can use this.
    2. The arithmetic series sum is .
    3. Substituting the sum:
  • Finding the Number of Messages Needed for a Collision:

    1. Let be the desired probability of finding at least one collision.
    2. Take the natural logarithm:
    3. For large (which is typical in practice), , so .
    4. (Equation 11.1)
  • Most Important Consequence of Birthday Attack:

    • The number of messages needed to find a collision with a reasonable probability is roughly proportional to the square root of the number of possible output values ().
    • Implication: To achieve bits of security against collision attacks, the hash function needs an output length of bits.
  • Example (80-bit Hash):

    • Hash output length bits.
    • Desired success probability (50%).
    • Calculate :
      • Since ,
      • So, approximately messages need to be hashed. This confirms the estimate ().

6. Hash Functions Based on Cipher Block Chaining (CBC)

  • Concept: Use a symmetric block cipher (like DES, AES) in a CBC-like mode, but without a secret key, to construct a hash function.
  • Rabin’s Proposal [RABI78] (Example):
    1. Divide the message into fixed-size blocks (block size matches the cipher’s block size).
    2. Use a symmetric encryption algorithm (e.g., DES).
    3. Set an initial value .
    4. Iterate: for .
      • Note: This uses the previous “hash” as the “key” input to the block cipher for encrypting the current message block . (Or sometimes - the exact structure varies). The formula given is .
    5. The final hash code is .
  • Similarity to CBC: Resembles Cipher Block Chaining mode of encryption.
  • Key Difference: No secret key is used or required.
  • Security Considerations:
    • Birthday Attack: Like any hash code, this construction is subject to the birthday attack.
    • Vulnerability Example (DES): If DES (which has a -bit block size and thus produces a -bit` hash code in this construction) is used, the system is vulnerable. Finding collisions requires only about operations, which is computationally feasible.