Title

  • Mini-topic: External Sort-Merge Algorithm (Page 701)
    • Micro-topic: Figure 15.4 External sorting using sort-merge. (Page 703)
  • Mini-topic: Cost Analysis of External Sort-Merge (Page 702)

External Sort-Merge Algorithm (Page 701)

The External Sort-Merge Algorithm is the most common technique for sorting relations that don’t fit in memory. It works in two main stages:

  1. Run Creation Stage:

    • Read M blocks of the relation into memory (where M is the number of available memory blocks).
    • Sort the in-memory data using an efficient internal sorting algorithm (like quicksort).
    • Write the sorted data (a “run”) to a temporary file on disk.
    • Repeat until the entire relation is processed, creating multiple sorted runs (Ri).
  2. Merge Stage:

    • If Number of Run (N) M, one pass merge is enough.
    • If the number of runs (N) is less than or equal to M, merge them in a single pass using an N-way merge:
      • Read one block from each run into memory buffers.
      • Repeatedly find the smallest tuple among all buffers (considering the sort order).
      • Write the smallest tuple to the output.
      • If a buffer becomes empty, read the next block from the corresponding run (if any).
    • If N > M, perform merge in multiple passes.
    • If the number of runs is greater than M, perform multiple merge passes:
      • Each pass merges at most M-1 runs (because one buffer block is for output).
      • Repeat passes until the number of runs is less than M.
      • A final pass merges the remaining runs into the sorted result.

Example

2.2 Query procesing, p.18

Cost Analysis of External Sort-Merge (Page 702)

The cost of external sort-merge is dominated by disk I/O. We can estimate it as follows:

  • br: Number of blocks containing the relation r.
  • M: number of blocks in the main memory buffer available for sorting.
  • bb number of blocks for buffering input run.
  1. Run Creation:

    • Reads every block and writes it out: block transfers.
    • Number of initial runs:
  2. Merge Stage:

    • Each pass (except possibly the last) reads and writes every block.
    • With each pass, number of runs decreases by factor of ().
    • Number of merge passes (excluding last pass):
    • Total number of block transfer except last pass =
  3. Total Block Transfers:

  4. Disk Seeks:

    • Run generation phase:
    • Each merge pass read input: seeks
    • Each merge pass output, if on same disk: seeks.
    • Total Seeks (excluding the final pass):