19.6 Availability

Core Goal: High availability - The database system should continue functioning (or “remain available”) even in the presence of failures. This is crucial for distributed systems, where failures are more likely.

Robustness: The ability to continue operation even with different types of failures.

Key Concepts

Failure Detection: The system must be able to detect that a failure has occurred. This can be done through:
- Monitoring message transmissions and acknowledgments. Repeated failures to receive acknowledgments suggest a link or site failure.
- Timeouts: If a response is not received within a certain time, assume failure.
Reconfiguration: After a failure is detected, the system must reconfigure itself to:
- Continue processing transactions (possibly with reduced performance).
- Isolate the failed component(s).
Recovery: When a failed component (site or link) is repaired, the system must integrate it back into the active system.
Distinguishing Failures is Hard: It’s often difficult to distinguish between:
- Site failure.
- Link failure.
- Network partition.
This ambiguity makes designing robust systems challenging. A site might appear to be down when it’s actually just isolated due to a network partition.

Types of Failures (and basic handling)

Message Loss: Handled by retransmission (using protocols like TCP/IP).
Link Failure:
- The network may try to find an alternative route.
- If no alternative route exists, it may indicate a network partition.
Site Failure: Detected when other sites cannot communicate with it.
Network Partition: The network is split into two or more partitions that cannot communicate with each other. A partition can consist of a single node.

Reconfiguration Procedures

When a failure is detected, a reconfiguration procedure must be initiated.

Aborting Transactions:
- Transactions active at a failed/inaccessible site at the time of failure should be aborted.
- This is important because these transactions might hold locks at active sites, blocking other transactions. Waiting for the failed site to recover could cause significant delays.
- With data replication, reads and updates are possible even if some replicas are inaccessible.
Updating the Catalog:
- If replicated data were stored at a failed/inaccessible site, the catalog (system metadata) must be updated.
- This prevents queries from trying to access the unavailable copy.
- When a failed site recovers, the catalog must be updated again, and the data at the site brought up-to-date.
Electing a New Server (if necessary):
- If a central server (e.g., name server, concurrency coordinator, deadlock detector) fails, an election must be held to choose a replacement.
Consistency Constraints (during reconfiguration): Reconfiguration schemes must be designed to work correctly even in the case of network partitions. Specifically, they must avoid these situations:
- Two or more central servers being elected in different partitions.
- Multiple partitions independently updating a replicated data item.

19.6.1 Majority-Based Approach

Extends: The majority protocol for concurrency control (Section 19.5.1.4).

Version numbers: Each data object stores a version number.

Mechanism:
- Locking: To update an object, a transaction must obtain a lock on a majority of the replicas.
- Reading: Read operations look at all locked replicas and use the value from the replica with the highest version number. (Optionally, they can update older replicas).
- Writing:
  1. Read all replicas (like a read operation) to determine the highest version number.
  2. The new version number is one greater than the highest version number.
  3. Write to all locked replicas (which must be a majority) and update their version numbers.
Failure Tolerance: Transactions can proceed as long as:
- A majority of replicas of all written objects are available at commit time.
- A majority of replicas are read to find the highest version number during reads.
- The two-phase commit protocol can proceed using only available sites.
Reintegration: Trivial (no special action needed) because:
- Writes already updated a majority.
- Reads will find at least one up-to-date replica (due to the majority requirement).

19.6.2 Read One, Write All Available Approach

Based on: A special case of quorum consensus (biased protocol).

Give all sites unit weights.
Read quorum = 1.
Write quorum = n (all sites).
In this case, we do not need to use version numbers.

Mechanism:
- Read: Read any available replica (and obtain a read lock there).
- Write: Attempt to write to all replicas (and obtain write locks). If a site is down, proceed without it.
Advantages: Simple, allows reads even if some replicas are unavailable.
Disadvantages:
- Writes can be blocked if any replica is unavailable (in the basic “read one, write all” scheme).
- The “read one, write all available” variant can lead to inconsistencies if there are network partitions. Different partitions could update the same data item.
- Reintegration is complex: After a failure, a recovering site needs to determine which updates it missed. This is non-trivial.

19.6.3 Site Reintegration

Challenge: When a failed site or link recovers, it must be integrated back into the system.
Requirements:
- Update system tables to reflect the rejoined site/link.
- If the site had replicas, bring its data up-to-date. This is not simple because updates might have occurred while the site was down.
Naive Solution (Undesirable): Halt the entire system while the site rejoins. This is usually unacceptable due to disruption.
Better Solution: Allow concurrent updates while the site reintegrates.
- The site must catch up on missed updates before granting any new read/write locks.
- If a link recovers, all sites should be notified promptly (because partitioning restricts operations).

19.6.4 Comparison with Remote Backup

Remote Backup (Section 16.9): Data and log records are replicated at a backup site. Concurrency control and recovery are handled at the primary site.
- Advantages: Avoids two-phase commit, lower overhead, transactions only contact one site.
- Disadvantages: Less availability than full replication.
Replication (in this chapter): Data is replicated at multiple sites.
- Advantages: Higher availability (can use majority protocols).
- Disadvantages: More overhead (two-phase commit, communication with multiple sites).

19.6.5 Coordinator Selection

Problem: If a coordinator fails, a new one must be chosen.
Approaches:
- Backup Coordinator: A designated backup site maintains enough state to take over immediately if the primary coordinator fails.
  - Advantages: Fast recovery.
  - Disadvantages: Overhead of maintaining the backup state.
- Election Algorithms: If there’s no backup (or the backup also fails), an election algorithm is used to choose a new coordinator dynamically.
  - Requirements:
    - Must choose a unique coordinator.
    - Must work even with multiple failures.
  - Example: Bully Algorithm.
- Bully Algorithm (Simplified):
  1. Each site has a unique ID.
  2. The coordinator is always the active site with the highest ID.
  3. If a site suspects the coordinator has failed, it sends “election” messages to sites with higher IDs.
  4. If no response, the site declares itself the coordinator.
  5. If a site with a higher ID responds, the election is deferred.
  6. A recovered site “bullies” lower-ID sites to take over as coordinator.
  7. Should also verify the chosen coordinator can access the majority of sites, in order to avoid multiple coordinators.

Quartz 4

Explorer

19.6 Availability

19.6 Availability

Key Concepts

Types of Failures (and basic handling)

Reconfiguration Procedures

19.6.1 Majority-Based Approach

19.6.2 Read One, Write All Available Approach

19.6.3 Site Reintegration

19.6.4 Comparison with Remote Backup

19.6.5 Coordinator Selection

Graph View

Table of Contents

Backlinks