Question (5 Marks):

Drawing upon the provided text, explain how MongoDB addresses the challenges of High Availability and Scalability, which are often faced by traditional RDBMS. Describe the two primary mechanisms MongoDB employs for these purposes, outlining their basic function and key components as mentioned in the chapter overview.


Answer:

MongoDB addresses the challenges of High Availability and Scalability, often limitations for traditional RDBMS, through two core features: Replication and Sharding.

  1. Replication (for High Availability): (2.5 Marks)

    • Purpose: Replication provides data redundancy and high availability, ensuring the system can recover from hardware failures or service interruptions and that data remains accessible.
    • Mechanism: MongoDB uses Replica Sets, which are groups of MongoDB servers.
    • Components: A replica set consists of one Primary node that receives all write operations and logs them in its Oplog. Multiple Secondary nodes replicate the data by copying and applying operations from the primary’s Oplog.
    • Function: This ensures multiple copies of the data exist. If the primary fails, a secondary can be elected as the new primary, minimizing downtime. Read operations can also be directed to secondaries to distribute load.
  2. Sharding (for Scalability): (2.5 Marks)

    • Purpose: Sharding provides horizontal scalability (scaling out) to handle very large datasets or high throughput requirements that might exceed the capacity of a single server.
    • Mechanism: It works by distributing a large dataset across multiple servers, known as Shards.
    • Components: Each Shard is an independent database (or replica set) that holds a portion of the total data. Collectively, all shards function as a single logical database.
    • Function: This reduces the amount of data and the number of operations each individual shard needs to manage (e.g., splitting 1TB across 4 shards results in ~256GB per shard). Queries can often be routed only to the relevant shard(s), distributing the workload.

Question (5 Marks):

MongoDB utilizes a specific data format internally and offers significant flexibility in how data is structured within its containers.

a) What is the primary data format MongoDB uses for storing documents, and why is it preferred over JSON for internal use? (1 Mark) b) Explain the concepts of Collections and Documents in MongoDB, relating them to their RDBMS equivalents mentioned in the text. (2 Marks) c) Describe what is meant by a Dynamic Schema in MongoDB and explain its significance. (1 Mark) d) What is the role and importance of the _id field within a MongoDB document? (1 Mark)


Answer:

a) MongoDB primarily uses BSON (Binary JSON) for storing documents internally. It is preferred over text-based JSON because it is a binary format, making it more compact (uses less space) and faster for machines to parse and process. (1 Mark)

b) In MongoDB: (2 Marks) - A Collection is analogous to a Table in RDBMS. It is a container that holds a group of MongoDB documents and exists within a single database. - A Document is analogous to a Row/Record/Tuple in RDBMS. It represents a single data record stored within a collection and is composed of field-and-value pairs (like a JSON object).

c) Dynamic Schema means that documents within the same collection do not need to have the same set of fields, the same data types for fields, or the same structure. This provides great flexibility, allowing the database schema to evolve easily without requiring predefined structures like in RDBMS. (1 Mark)

d) The _id field is a mandatory field in every MongoDB document that acts as the primary key. Its value must be unique within the collection. It is used to uniquely identify and search for documents, and an index is automatically created on this field for efficient lookups. MongoDB can automatically generate a unique ObjectId value for _id if one is not provided. (1 Mark)