16.1 Failure Classification
-
Goal: Database systems must maintain atomicity (all-or-nothing transactions) and durability (committed changes persist) despite failures. Recovery schemes restore the database to a consistent, correct state after a failure.
-
High Availability: Minimizing the duration during which the database is unavailable after a failure.
-
Failure Types:
- Transaction Failure:
- Logical Error: The transaction cannot proceed due to internal conditions. Examples:
- Bad input.
- Data not found.
- Overflow (numeric value too large).
- Resource limit exceeded.
- System Error: The entire system enters an undesirable state, preventing the transaction from continuing. Example:
- Deadlock (transactions waiting for each other).
- The transaction can often be retried later.
- Logical Error: The transaction cannot proceed due to internal conditions. Examples:
- System Crash:
- Hardware malfunction or software bug (in database software or operating system).
- Causes loss of volatile storage (e.g., RAM, cache) content.
- Nonvolatile storage (e.g., disk) content remains intact. This is called the fail-stop assumption.
- Disk Failure:
- A disk block loses its contents. Causes:
- Head crash.
- Failure during data transfer.
- Recovery relies on:
- Copies of data on other disks.
- Archival backups (e.g., DVDs, tapes).
- A disk block loses its contents. Causes:
- Transaction Failure:
16.2 Storage
- Classification of Storage Media: .
-
Storage Types:
- Volatile Storage:
- Information is lost when the system crashes.
- Examples: Main memory (RAM), CPU caches.
- Nonvolatile Storage:
- Information survives system crashes.
- Examples: Disk drives, solid-state drives (SSDs), magnetic tape.
- Stable Storage:
- A theoretical ideal: Information is never lost.
- Approximated in practice, but never perfectly achieved.
- Volatile Storage:
-
16.2.1 Stable-Storage Implementation
- Goal: Approximate stable storage to ensure data survival.
- Method:
- Replicate information across multiple nonvolatile storage devices (typically disks).
- These devices should have independent failure modes (i.e., they are unlikely to fail simultaneously).
- Controlled updates to ensure data consistency even during transfer failures.
- RAID Systems: Use redundancy to protect against disk failures. Mirrored disks (RAID 1) provide the simplest and fastest form, keeping two copies of each block.
- Remote Backup Systems: Maintain a copy of each stable storage block at a remote site (via a network). Protects against disasters (fire, flood) affecting the primary site.
- Data-Transfer Failure Outcomes:
- Successful Completion: Data transferred correctly to the destination.
- Partial Failure: A failure occurs during transfer; the destination block contains incorrect information.
- Total Failure: Failure occurs early enough that the destination block remains unchanged (and thus still valid).
- Stable-Storage Write Protocol (using two physical blocks):
- Write the information to the first physical block.
- Only after the first write completes successfully, write the same information to the second physical block.
- The output operation is considered complete only after the second write completes successfully.
- Recovery Procedure:
- Case 1: Both blocks identical and no errors: No action is needed.
- Case 2: One block has a detectable error: Replace the erroneous block with the content of the other (good) block.
- Case 3: Blocks differ, but no errors: Replace the content of the first block with the content of the second block.
- Key Point: This protocol ensures that a write to stable storage either completely succeeds (all copies updated) or results in no change (original data preserved).
-
16.2.2 Data Access
- Database Location: Primarily resides on nonvolatile storage (usually disks). Only parts of the database are in main memory at any given time.
- Blocks:
- Database is partitioned into fixed-length storage units called blocks.
- Blocks are the units of data transfer between disk and main memory.
- Assumption: A single data item does not span multiple blocks.
- Physical Blocks: Blocks residing on the disk.
- Buffer Blocks: Blocks residing temporarily in main memory (in the disk buffer).
- Disk Buffer: The area of memory holding buffer blocks.
- Block Movement Operations:
- : Transfers physical block from disk to a buffer block in main memory.
- : Transfers buffer block from main memory to disk, replacing the corresponding physical block on disk.
- Transaction Work Area: Each transaction has a private work area in memory. Copies of data items accessed and updated by are kept here.
- First Access:
- Data Access Operations (for transaction ):
- :
- If block (containing data item ) is not already in main memory, issue .
- Assign the value of (from the buffer block ) to the transaction’s local variable .
- :
- If block (containing data item ) is not already in main memory, issue .
- Assign the value of the transaction’s local variable to within the buffer block .
- :
- Important Note: and may require transferring blocks from disk to memory, but they do not directly cause blocks to be written back to disk.
- Force-Output: is performed when the buffer manager needs the memory space or the database system wants to make changes persistent on disk.
16.3 Recovery and Atomicity
- Central Problem: System crashes can occur during transaction execution, potentially leaving the database in an inconsistent state, violating atomicity.
- Example: Transaction \ T_i \ transfer \ \50 \ from \ account \ A \ to \ B.$
- Initial \ values: \ A = \1000, \ B= $2000$
- Final \ state \ after \ system \ restart: \ A = \950 \ and \ B=$2000.$
-
Solution: Output information describing the modifications to stable storage before actually modifying the database. This is achieved using a log.
-
16.3.1 Log Records
- Log: A sequence of log records. Records all update activities performed on the database. The log must reside in stable storage.
- Update Log Record Structure:
- : Transaction identifier (unique ID of the transaction).
- : Data-item identifier (location on disk: block ID and offset within the block).
- : Old value of (before the write).
- : New value of (after the write).
- Other Log Record Types:
- : Transaction has begun execution.
- : Transaction has successfully completed (committed).
- : Transaction has been aborted (rolled back).
- Crucial Rule: The log record for a write operation must be written to stable storage before the corresponding database modification is applied.
-
16.3.2 Database Modification
- Log-First Principle: A log record is always created before the corresponding database modification.
- Undo Operation: Using a log record, restore a data item to its old value ( in the log record).
- Redo Operation: Using a log record, set a data item to its new value ( in the log record).
- Modification Steps:
- Transaction performs computation in memory.
- Transaction modifies data in the disk buffer.
- Database executes operation.
- Deferred Modification:
- Database modifications are performed only after the transaction commits.
- Transactions work on local copies of data items.
- Immediate Modification:
- Database modifications can occur while the transaction is still active.
- This is the approach primarily covered in the chapter.
- Recovery Algorithm Requirements:
- Handle committed transactions whose changes might only exist in the buffer (not yet written to disk).
- Handle active transactions that modified the database and need to be aborted due to a failure.
-
16.3.3 Concurrency Control and Recovery
- Critical Restriction: If transaction modifies data item , no other transaction can modify until either commits or aborts.
- Enforcement:
- Strict Two-Phase Locking: Acquire exclusive locks on updated items and hold them until commit/abort.
- Snapshot Isolation and Validation: Also acquire exclusive locks (at validation time) and hold them until commit.
-
16.3.4 Transaction Commit
- .
-
16.3.5 Using the Log to Redo and Undo Transactions
- :
- Sets all data items updated by to their new values, using information from the log records.
- Order matters: Redo operations must be performed in the same order as the original updates.
- :
- Restores all data items updated by to their old values, using information from the log records.
- Writes redo-only log records during the undo process (see Section 16.4.1).
- Ends by writing a log record.
- Recovery After a System Crash:
- Undo: If the log contains but no corresponding or record, transaction must be undone.
- Redo: If the log contains and either or , transaction must be redone. (Even if aborted, redo-only records exist).
- :
-
16.3.6 Checkpoints
- Motivation: Searching the entire log during recovery is time-consuming, and many redone transactions might have already written their updates to disk. Checkpoints reduce this overhead.
- Simple Checkpoint Procedure:
- Output all log records currently in main memory to stable storage.
- Output all modified buffer blocks to disk.
- Output a log record of the form to stable storage, where is a list of all transactions that are active at the time of the checkpoint.
- Recovery Using Checkpoints:
- Find the most recent record in the log (search backward).
- Identify the set of transactions, , consisting of:
- Transactions in the list .
- Transactions that started after the record was written.
- Undo: For each transaction in that has no or record in the log, execute .
- Redo: For each transaction in that has a or record in the log, execute .
- Fuzzy Checkpoint: A more advanced technique that allows transactions to perform updates during the checkpointing process (discussed later).
16.4 Recovery Algorithm
-
16.4.1 Transaction Rollback (during normal operation, not after a crash)
- Scan Log Backward: Scan the log backward from the end. For each log record of transaction of the form that is found:
- Write the old value to data item .
- Write a special redo-only log record: . This records the undo operation itself. These are called compensation log records.
- Note: Redo-only records do not need an “old value” field because undo operations are never undone.
- Stop: Stop the backward scan when the record is found. Write a record to the log.
- Scan Log Backward: Scan the log backward from the end. For each log record of transaction of the form that is found:
-
16.4.2 Recovery After a System Crash
-
Redo Phase: Scan the log forward from the last checkpoint.
- Initialize a list called
undo-listto the list from the most recent record. - For each log record encountered:
- If it’s a normal update record or a redo-only record :
- Perform the redo operation: Write the value to data item .
- If it’s a record:
- Add transaction to the
undo-list.
- Add transaction to the
- If it’s a or record:
- Remove transaction from the
undo-list.
- Remove transaction from the
- If it’s a normal update record or a redo-only record :
- At the end of the redo phase,
undo-listcontains all transactions that were incomplete (neither committed nor fully rolled back) at the time of the crash.
- Initialize a list called
-
Undo Phase: Scan the log backward from the end.
-
For each log record encountered that belongs to a transaction currently in the
undo-list:- Perform the undo operation (as described in Section 16.4.1), including writing redo-only log records.
-
When a record is found for a transaction that is in
undo-list:- Write a record to the log.
- Remove from
undo-list.
-
The undo phase terminates when
undo-listbecomes empty (all incomplete transactions have been rolled back).
-
- Repeating History: The redo phase replays all updates since the last checkpoint, including updates made by transactions that were later aborted (or are being aborted during recovery). This is called “repeating history.” It simplifies the recovery algorithm, even though it might seem redundant.
-