Batch 1: Questions 1-50

Quiz Questions (1-25)

  1. A financial application requires that every transaction report read must absolutely reflect the sum of all confirmed deposits and withdrawals up to that microsecond, across all distributed nodes. Which CAP property is being strictly prioritized here, potentially at the cost of responsiveness during network issues? a) Availability (A) b) Partition Tolerance (P) c) Consistency (C) d) Durability (D)

  2. Your Cassandra cluster has a Replication Factor (RF) of 5. To ensure that a read operation always sees the absolute latest successfully written data, even if some nodes are slow or temporarily down, what is the minimum Read Consistency Level () and Write Consistency Level () combination that guarantees this (assuming )? a) R=ONE, W=ONE b) R=QUORUM, W=ONE c) R=QUORUM, W=QUORUM d) R=ALL, W=ALL

  3. A startup is building a social media platform where user profiles can evolve rapidly – users might add new fields like ‘hobbies’, ‘job_title’, or remove ‘relationship_status’ at any time. They choose MongoDB. Which core MongoDB characteristic makes this easy without requiring database schema migrations? a) Replication using Replica Sets b) BSON data format c) Dynamic Schema d) Sharding capabilities

  4. During a network split in a distributed system, one set of servers continues processing transactions, while another isolated set also continues processing (potentially conflicting) transactions to ensure users can still use the service. This design choice explicitly sacrifices which CAP property during the partition? a) Availability (A) b) Partition Tolerance (P) c) Consistency (C) d) Scalability (S)

  5. In Cassandra, Node A receives a write request for data whose primary replica belongs to Node C. However, Node A cannot reach Node C due to a temporary network glitch. Node A stores the write locally with metadata indicating it’s for Node C, planning to send it later when Node C recovers. This mechanism is known as: a) Read Repair b) Anti-Entropy c) Hinted Handoff d) Commit Log Flush

  6. A large dataset needs to be stored, and analysis shows that 15% of the query processing time is inherently sequential (cannot be parallelized). According to Amdahl’s Law, even if you could use an infinite number of servers for the parallel part, the maximum theoretical speedup you could achieve is approximately: a) 15x b) 85x c) 6.67x () d) Cannot be determined without knowing the number of processors.

  7. Which component in Cassandra is responsible for calculating a hash value from a row’s partition key to determine which node(s) in the cluster should store the data? a) Gossip Protocol b) Partitioner c) Snitch d) Commit Log

  8. A MongoDB deployment uses a Replica Set with one Primary and two Secondaries. A client application is configured to perform reads from the ‘nearest’ node to minimize latency, accepting that data might be slightly stale. Which read preference is likely being used? a) primary b) primaryPreferred c) secondary d) nearest

  9. Consider the BASE properties. The idea that the system state might change over time even without direct user input, as background processes work to synchronize replicas, relates most closely to which property? a) Basically Available b) Soft State c) Eventual Consistency d) Strong Consistency

  10. An RDBMS system managing inventory across multiple warehouses needs to ensure that when an item is transferred, the count is decremented in the source warehouse and incremented in the destination warehouse atomically – either both happen or neither happens. Which traditional distributed transaction protocol is often used for this, ensuring strict consistency but potentially limiting scalability? a) Gossip Protocol b) Two-Phase Commit (2PC) c) Paxos d) Raft

  11. A developer stores user profile pictures, averaging 2MB each, directly within user documents in MongoDB. As the application scales, they notice performance degradation and challenges with document size limits. What MongoDB feature should they have ideally used for storing these images? a) Embedded Documents b) Indexing c) GridFS d) Sharding

  12. In Cassandra, data is first written to memory for speed, but to prevent loss if the node crashes, it’s also written sequentially to an on-disk log. What is this in-memory structure called? a) SSTable b) Bloom Filter c) Memtable d) Commit Log

  13. A company runs its application across two data centers (DC1, DC2). They use Cassandra with NetworkTopologyStrategy. They configure RF=3 for DC1 and RF=2 for DC2. How many total replicas of each row will exist across the entire cluster? a) 2 b) 3 c) 5 d) 6

  14. Which NoSQL database type is specifically optimized for storing and navigating data based on relationships between entities, like social networks or recommendation engines? a) Document Store b) Key-Value Store c) Columnar Database d) Graph Database

  15. When a read request in Cassandra triggers the comparison of data from multiple replicas, and an inconsistency is found, the system initiates a background process to update the stale replicas. This mechanism is called: a) Hinted Handoff b) Anti-Entropy c) Read Repair d) Gossip

  16. MongoDB uses a binary-encoded format for storing documents, which is more efficient for parsing and storage than human-readable JSON. What is this format called? a) XML b) BSON c) Protocol Buffers d) YAML

  17. You need to design a distributed cache where fast lookups are critical, but occasional data staleness is acceptable. The primary requirement is extremely high availability even during network partitions. Which CAP model does this system lean towards? a) CA (Consistency, Availability) b) CP (Consistency, Partition Tolerance) c) AP (Availability, Partition Tolerance) d) AC (Availability, Consistency - same as CA)

  18. Which Cassandra feature involves nodes periodically exchanging state information with random peers to maintain cluster membership and detect failures? a) Partitioner b) Replication Strategy c) Gossip Protocol d) Tunable Consistency

  19. The mandatory unique identifier field present in every MongoDB document, which is automatically indexed, is named: a) id b) primary_key c) _id d) doc_id

  20. If a database system prioritizes ensuring that every request receives a non-error response, even if that response contains slightly outdated data during a network failure, it exemplifies which CAP guarantee? a) Consistency b) Availability c) Partition Tolerance d) Atomicity

  21. A database system stores data in tables where each row can have vastly different columns or fields from other rows within the same table. Which NoSQL category does this most closely resemble? a) Key-Value Store b) Graph Database c) Document Store d) Relational Database

  22. In the context of scaling databases, adding more RAM and faster CPUs to an existing single database server is an example of: a) Horizontal Scaling (Scale Out) b) Vertical Scaling (Scale Up) c) Sharding d) Replication

  23. The process in Cassandra where the contents of a full Memtable are sorted and written to an immutable file on disk is called: a) Compaction b) Flushing c) Gossiping d) Repairing

  24. You are using Cassandra with RF=3 and have set both Read and Write Consistency Levels to QUORUM. How many nodes must acknowledge a write for it to succeed, and how many nodes must respond to a read request? a) Write: 1, Read: 1 b) Write: 2, Read: 2 () c) Write: 3, Read: 3 d) Write: 2, Read: 3

  25. Which term describes the process of splitting a large database table or collection horizontally across multiple machines to distribute load and storage? a) Replication b) Indexing c) Sharding d) Normalization


Fill-in-the-Blank Questions (26-50)

  1. Cassandra’s architecture, where all nodes are equal and there is no central controlling node, is described as ______ or peer-to-peer.
  2. According to the CAP theorem, it is impossible for a distributed data store to simultaneously guarantee Consistency, Availability, and ______ ______.
  3. MongoDB stores data in flexible, JSON-like structures called ______, which are grouped into collections.
  4. In Cassandra, the ______ ______ determines the total number of copies of data stored across the cluster for fault tolerance.
  5. Systems designed according to BASE principles prioritize availability and accept ______ consistency, meaning data will become consistent over time if updates cease.
  6. A MongoDB ______ ______ is a group of mongod instances (one primary, multiple secondaries) that host the same data set for redundancy.
  7. The sequential, append-only log on disk where Cassandra writes data before the Memtable for durability is the ______ ______.
  8. If a system guarantees that a read operation always returns the result of the most recently completed write, it is providing strong ______.
  9. MongoDB’s mechanism for storing files larger than the 16MB document size limit by breaking them into parts is called ______.
  10. The communication protocol used by Cassandra nodes to discover and share state information about each other is known as the ______ protocol.
  11. For a distributed system requiring Partition Tolerance, the fundamental trade-off defined by the CAP theorem is between Consistency and ______.
  12. The binary representation format used internally by MongoDB for data storage and transfer, which is more efficient than JSON, is ______.
  13. In Cassandra, the process of comparing data replicas in the background to fix inconsistencies, often using Merkle trees, is called ___________.
  14. The property in the BASE acronym that indicates the system’s state may change over time without explicit input due to background consistency processes is ______ ______.
  15. The MongoDB server process itself is typically named ______, while the command-line client shell is often mongo or mongosh.
  16. In Cassandra, immutable, sorted data files written to disk from Memtables are called ______.
  17. Amdahl’s Law provides a formula to estimate the theoretical ______ achievable by parallelizing a portion of a task.
  18. The NetworkTopologyStrategy in Cassandra is preferred over SimpleStrategy when deploying across multiple ______ ______ or racks.
  19. The ‘B’ and ‘A’ in BASE stand for ______ ______.
  20. In MongoDB replication, the log of operations on the primary node that secondaries use to replicate changes is called the ______.
  21. Data that lacks a predefined model and doesn’t fit neatly into relational tables, such as text documents or video files, is classified as ______ data.
  22. The consistency level in Cassandra that requires acknowledgement from a majority () of replica nodes is called ______.
  23. In RDBMS terminology, a row or record corresponds to a ______ in MongoDB terminology.
  24. The Two-Phase Commit (2PC) protocol is designed to ensure ______ (the ‘A’ in ACID) across distributed transactions.
  25. Scaling a database system by adding more servers to distribute the data and workload is known as ______ scaling or scaling out.

Solutions Batch 1 (1-50)

Quiz Solutions:

  1. c) Consistency (C)
  2. c) R=QUORUM, W=QUORUM (Requires . . )
  3. c) Dynamic Schema
  4. c) Consistency (C)
  5. c) Hinted Handoff
  6. c) 6.67x ()
  7. b) Partitioner
  8. d) nearest
  9. b) Soft State
  10. b) Two-Phase Commit (2PC)
  11. c) GridFS
  12. c) Memtable
  13. c) 5 ()
  14. d) Graph Database
  15. c) Read Repair
  16. b) BSON
  17. c) AP (Availability, Partition Tolerance)
  18. c) Gossip Protocol
  19. c) _id
  20. b) Availability
  21. c) Document Store
  22. b) Vertical Scaling (Scale Up)
  23. b) Flushing
  24. b) Write: 2, Read: 2
  25. c) Sharding

Fill-in-the-Blank Solutions:

  1. masterless / decentralized
  2. Partition Tolerance
  3. documents
  4. Replication Factor (RF)
  5. eventual
  6. Replica Set
  7. Commit Log
  8. Consistency
  9. GridFS
  10. Gossip
  11. Availability
  12. BSON
  13. Anti-Entropy
  14. Soft State
  15. mongod
  16. SSTables
  17. speedup
  18. data centers
  19. Basically Available
  20. Oplog
  21. unstructured
  22. QUORUM
  23. Document
  24. atomicity
  25. horizontal

Okay, here is the second batch of 50 questions (Questions 51-100), continuing the mix of quiz and fill-in-the-blank formats based on the provided materials.

Batch 2: Questions 51-100


Quiz Questions (51-75)

  1. A Cassandra cluster spans two data centers, DC1 and DC2. To ensure that writes are acknowledged by a majority of replicas in each data center before returning success to the client (providing strong cross-DC durability), which Write Consistency Level should be used? a) QUORUM b) LOCAL_QUORUM c) EACH_QUORUM d) ALL

  2. You are designing a system where the primary goal is extremely fast writes. You are willing to accept a small risk of data loss if a node crashes immediately after acknowledging a write, but before the data is fully durable. Which Cassandra Write Consistency Level offers the fastest acknowledgement but the lowest durability guarantee? a) ONE b) QUORUM c) ANY (implicitly, acknowledges after write hits commit log/memtable on at least one node, even potentially a coordinator holding a hint) d) ALL

  3. A development team is migrating from a traditional RDBMS to MongoDB. They are used to enforcing strict data types and presence of columns using ALTER TABLE. What fundamental difference must they adapt to regarding MongoDB’s collections? a) Collections require manual index creation. b) Collections store data in BSON, not text. c) Collections do not enforce a schema across documents. d) Collections are automatically sharded by default.

  4. Consider Amdahl’s Law: . If a task is 95% parallelizable () and run on 10 processors (), what is the approximate maximum speedup? a) ~20x b) ~10x c) ~7.14x () d) ~9.5x

  5. Which characteristic of Cassandra directly contributes to its “No Single Point of Failure” design? a) Use of SSTables for storage b) Tunable Consistency levels c) Masterless (peer-to-peer) architecture d) Support for CQL

  6. In a MongoDB replica set, which member is responsible for receiving all write operations from clients? a) Any Secondary node b) The Arbiter node c) The Primary node d) The mongos router

  7. A system built using the BASE philosophy guarantees that if updates stop, all replicas will eventually hold the same latest value. This specific guarantee is known as: a) Basic Availability b) Soft State c) Eventual Consistency d) Atomicity

  8. You need to store large, infrequently changing files like historical scanned documents (unstructured, static data). While a file system could work, you also need some metadata search capabilities. Which type of database system is often suitable for such scenarios, balancing storage and basic querying? a) In-memory Key-Value store b) Transactional RDBMS c) Document Store (like MongoDB using GridFS) or potentially some NoSQL approaches d) Graph Database

  9. In the Cassandra write path, after data is written to the Commit Log, where is it written next to allow for fast reads of recent data before it’s flushed to disk? a) SSTable b) Memtable c) Bloom Filter d) Index File

  10. A company wants its Cassandra cluster to be highly available for reads within its local data center, even if the network connection to other data centers is down. They can tolerate slightly stale data from other DCs. Which Read Consistency Level best suits this need? a) ONE b) QUORUM c) LOCAL_QUORUM d) EACH_QUORUM

  11. What is the primary motivation for using BSON over JSON within MongoDB? a) Human readability b) Better support for complex data types and efficient parsing/serialization c) Compatibility with XML standards d) Built-in schema enforcement

  12. If a distributed system guarantees Partition Tolerance (P), according to the CAP theorem, what must it fundamentally trade-off between? a) Scalability and Performance b) Consistency (C) and Availability (A) c) Durability and Latency d) Read performance and Write performance

  13. Which Cassandra mechanism helps ensure data consistency by having nodes periodically compare data summaries (like Merkle trees) and repair discrepancies proactively, independent of read operations? a) Hinted Handoff b) Read Repair c) Anti-Entropy d) Gossip

  14. MongoDB’s _id field, if not provided by the user, is automatically generated as an ObjectId. What information is encoded within this ObjectId? a) Timestamp, Machine ID, Process ID, Counter b) User ID, Session ID, Random Number c) Collection Name, Database Name, Timestamp d) Shard Key, Chunk ID, Timestamp

  15. A key difference between scaling up (vertical) and scaling out (horizontal) is that: a) Scaling up involves adding more machines. b) Scaling out involves upgrading hardware on a single machine. c) Scaling out involves distributing the load/data across multiple machines. d) Scaling up is always more cost-effective.

  16. Which data type classification would best fit a live stock market ticker feed? a) Structured and Static b) Unstructured and Static c) Structured and Dynamic d) Unstructured and Dynamic

  17. The Cassandra SimpleStrategy for replica placement is generally not recommended for production deployments primarily because: a) It requires manual configuration of replica locations. b) It is not aware of network topology (racks/data centers), potentially placing replicas in the same failure domain. c) It only supports a Replication Factor of 1. d) It performs poorly compared to NetworkTopologyStrategy.

  18. What does the “Soft State” in the BASE acronym imply about a distributed system? a) The system uses solid-state drives (SSDs). b) The system’s state may change over time without explicit input as consistency is achieved. c) The system provides weak security guarantees. d) The system requires manual intervention to maintain state.

  19. In MongoDB, how are relationships between documents typically handled, contrasting with JOIN operations in RDBMS? a) Using foreign key constraints enforced by the database. b) Primarily through embedding related data within a document or using application-level references (linking). c) Using predefined VIEWs. d) MongoDB does not support relationships between documents.

  20. Which component of the Two-Phase Commit (2PC) protocol is responsible for making the final decision (commit or abort) after receiving votes from participants? a) Any Participant b) The Coordinator c) The Transaction Log d) The Client Application

  21. Cassandra is often described as “column-oriented,” but a more precise description might be a partitioned row store or column-family store. This means data is primarily organized by: a) Individual columns across all rows. b) Rows, but columns within a row can be grouped into families, and rows are partitioned across nodes. c) A graph structure of nodes and edges. d) Key-value pairs where the value is always a single primitive type.

  22. You need to perform a complex data aggregation task in MongoDB, involving multiple stages like filtering, grouping, sorting, and transforming data. Which MongoDB feature provides a framework for such multi-stage processing? a) CRUD operations (find, update) b) Indexing c) Aggregation Pipeline d) GridFS

  23. When considering the CAP theorem, traditional single-node relational databases (like standard MySQL or PostgreSQL installations) are typically considered which type of system? a) AP (Available, Partition Tolerant) b) CP (Consistent, Partition Tolerant) c) CA (Consistent, Available - assuming no partitions) d) BASE system

  24. What is the primary purpose of the Oplog in a MongoDB replica set? a) To store user authentication credentials. b) To log all write operations on the primary for secondaries to replicate. c) To store large binary files (GridFS). d) To maintain schema information for collections.

  25. If a Cassandra write operation uses Consistency Level ONE, where must the write be successfully written before the coordinator acknowledges success to the client? a) Only the Memtable of one replica node. b) Only the Commit Log of one replica node. c) Both the Commit Log and Memtable of at least one replica node. d) An SSTable on at least one replica node.


Fill-in-the-Blank Questions (76-100)

  1. The CAP theorem was formally proposed by Eric Brewer and is sometimes called ______‘s Theorem.
  2. Distributing data copies across multiple servers to improve fault tolerance and read scalability is known as ______.
  3. In contrast to RDBMS tables, MongoDB ______ do not enforce a uniform structure for the documents they contain.
  4. Cassandra’s tunable consistency allows developers to choose a trade-off between consistency, availability, and ______.
  5. The process of splitting a large dataset across multiple MongoDB servers (shards) to handle large data volumes or high throughput is called ______.
  6. The ______ property of BASE indicates that the system prioritizes responding to requests, potentially with stale data.
  7. The Cassandra data structure stored on disk that is immutable and contains sorted key-value pairs flushed from a Memtable is the ______.
  8. A query in MongoDB using db.collection.find({ age: { $gt: 25 } }) is an example of a ______ query, retrieving documents based on field values.
  9. The failure detection mechanism in Cassandra relies heavily on the ______ protocol where nodes exchange state information.
  10. A system that sacrifices Availability during a partition to ensure that all accessible data is consistent follows the ______ model in CAP terms.
  11. The unique, automatically generated 12-byte identifier for MongoDB documents is called an ______.
  12. In Cassandra’s write path, the ______ ______ provides durability against node crashes before data is flushed to SSTables.
  13. ______ consistency is a weaker model than strong consistency, guaranteeing that eventually, all replicas will converge if updates cease.
  14. The specific replication strategy in Cassandra that is aware of data center and rack locations for intelligent replica placement is ______ ______ ______.
  15. Medical imaging data (e.g., MRI scans) that doesn’t change after being created is an example of ______ and ______ data.
  16. The component in a sharded MongoDB cluster that routes queries from applications to the appropriate shard(s) is the ______ process.
  17. Read Repair in Cassandra helps fix inconsistencies discovered during a ______ operation.
  18. The first phase of the Two-Phase Commit (2PC) protocol, where participants indicate if they are ready to commit, is called the ______ or Prepare phase.
  19. Data organized with a predefined schema, fitting neatly into rows and columns, is known as ______ data.
  20. According to Amdahl’s Law, the maximum speedup achievable through parallelization is limited by the ______ portion of the task.
  21. Cassandra’s ability to easily add or remove nodes from the cluster without downtime showcases its ______ ______.
  22. The JSON specification provides a human-readable format, while MongoDB uses the binary ______ format for efficiency.
  23. A Cassandra consistency level of ______ requires a response from all configured replicas for a read operation.
  24. In a MongoDB replica set, if the primary node fails, an election occurs to promote one of the ______ nodes to become the new primary.
  25. The fundamental concept in NoSQL databases that allows documents within the same collection to have different fields or structures is ______ ______.

Solutions Batch 2 (51-100)

Quiz Solutions:

  1. c) EACH_QUORUM
  2. c) ANY (While ONE ensures it hits log+memtable on one node, ANY is even weaker, potentially succeeding if written just as a hint by the coordinator if replicas are down)
  3. c) Collections do not enforce a schema across documents.
  4. c) ~7.14x
  5. c) Masterless (peer-to-peer) architecture
  6. c) The Primary node
  7. c) Eventual Consistency
  8. c) Document Store (like MongoDB using GridFS) or potentially some NoSQL approaches
  9. b) Memtable
  10. c) LOCAL_QUORUM
  11. b) Better support for complex data types and efficient parsing/serialization
  12. b) Consistency (C) and Availability (A)
  13. c) Anti-Entropy
  14. a) Timestamp, Machine ID, Process ID, Counter
  15. c) Scaling out involves distributing the load/data across multiple machines.
  16. c) Structured and Dynamic
  17. b) It is not aware of network topology (racks/data centers), potentially placing replicas in the same failure domain.
  18. b) The system’s state may change over time without explicit input as consistency is achieved.
  19. b) Primarily through embedding related data within a document or using application-level references (linking).
  20. b) The Coordinator
  21. b) Rows, but columns within a row can be grouped into families, and rows are partitioned across nodes.
  22. c) Aggregation Pipeline
  23. c) CA (Consistent, Available - assuming no partitions)
  24. b) To log all write operations on the primary for secondaries to replicate.
  25. c) Both the Commit Log and Memtable of at least one replica node.

Fill-in-the-Blank Solutions:

  1. Brewer
  2. replication
  3. collections
  4. latency
  5. sharding
  6. Basically Available
  7. SSTable
  8. read / find / dynamic
  9. Gossip
  10. CP (Consistent, Partition Tolerant)
  11. ObjectId
  12. Commit Log
  13. Eventual
  14. NetworkTopologyStrategy
  15. static, unstructured
  16. mongos
  17. read
  18. Voting
  19. structured
  20. sequential
  21. elastic scalability
  22. BSON
  23. ALL
  24. secondary
  25. dynamic schema