MongoDB Architecture

 Topics:

  1. Data Model
  2. MongoDB Instance
  3. MongoDB server Component
  4. Storage Engine
  5. Replication

1.DATA MODEL

  1. Documents
    • The basic unit of data in MongoDB.
    • Equivalent to a row in relational databases, but more flexible.
    • BSON format – supports complex/nested structures.
  2. Collections
    • A group of MongoDB documents.
    • Equivalent to a table in relational DBs.
    • No enforced schema, allowing different document structures.
  3. Databases
    • Top-level container for collections.
    • A single MongoDB instance can host multiple databases

MONGODB INSTANCE

In MongoDB, an instance refers to a running copy of the MongoDB server software. This is typically a process (mongod) running on a machine—either locally, on a server, or in the cloud.

  • MongoDB instance: The server process (mongod) running on a machine.
MongoDB Instance (mongod)
├── Database1
│ ├── CollectionA
│ │ ├── Document1
│ │ ├── Document2
│ └── CollectionB
├── Database2
│ └── CollectionC

2. MongoDB Server Components

MongoD(MongoDB Daemon)

  • mongod is the core server process of MongoDB.
  • It’s what you start to actually run the MongoDB database on your machine or server.
  • When you say “run MongoDB,” you’re really starting mongod.
  1. Handling CRUD Operations
    • Create: Insert documents into collections.
    • Read: Query data using filters, projections.
    • Update: Modify documents.
    • Delete: Remove documents or collections.
  2. Memory Management
    • Caches frequently accessed data in memory (RAM) for fast reads.
    • Uses an internal memory-mapped storage engine.
  3. Data Storage
    • Manages reading/writing data to disk.
    • Handles journaling and storage engines (like WiredTiger).
  4. Replication
    • Keeps multiple copies of your data across servers (replica sets).
    • Provides high availability and automatic failover.
  5. Sharding
    • Distributes data across multiple machines (shards) for horizontal scalability.
    • Allows MongoDB to handle large datasets and high throughput.

mongo Shell / Compass / Drivers

  • mongo shell: JavaScript-based CLI to interact with mongod.
  • Compass: GUI for querying and analyzing documents.
  • Drivers: Language-specific interfaces (Node.js, Python, Java, etc.). Allow applications to connect to MongoDB and perform operations programmatically

mongos

  • mongos is the query router used in sharded MongoDB clusters.
  • It's not a database itself—rather, it acts as an intermediary between clients and the shards.
  1. Query Routing:
    • Directs client requests to the correct shard(s) based on the shard key.
    • Hides the complexity of the sharded architecture from the client.
  2. Load Balancing:
    • Distributes queries across shards for performance and scalability.
  3. Aggregation of Results:
    • If a query spans multiple shards, mongos collects and merges the results before sending them back to the client.

Sharded Cluster Architecture:

Client
  |
  v
[mongos]  ← Query router
  |
  v
[Shards] — shard1, shard2, shard3... (each is a replica set)
  |
  v
[Config Servers] — store metadata about the cluster (e.g., which data is on which shard)

Storage Engine in MongoDB

A Storage Engine is the low-level component in MongoDB that manages how data is stored, updated, and retrieved from the disk.

Think of it like the “brain” behind data storage — choosing where and how data lives on your hard drive or SSD.

MongoDB Storage Engines

1. WiredTiger (Default Engine)

  • Default since MongoDB 3.2.
  • Modern and high-performance.
FeatureDescription
๐Ÿ”„ Document-level concurrencyMultiple documents in the same collection can be read/written at the same time. Efficient for high-load apps.
๐Ÿ—œ️ CompressionReduces disk usage with Snappy (fast) or Zlib (better compression).
๐Ÿง  CachingFrequently used data is kept in memory for faster access.
๐Ÿงพ JournalingWrite-ahead logs ensure data safety in case of crashes.

2. MMAPv1 (Deprecated)

  • Older engine used before MongoDB 3.2.
  • Now deprecated and not recommended.
  • Collection-level locking.
  • Not recommended for new projects.

Replication (High Availability)

Replica Set

A Replica Set is a group of mongod instances that maintain the same data set, ensuring high availability and data redundancy.

Key Benefits:

  • Automatic failover
  • Data redundancy
  • Read scalability (with read preferences)

Roles in a Replica Set:

  • Primary:
    • Handles all write operations by default.
    • Also serves read operations unless the read preference is changed.
    • Only one primary exists at a time.
  • Secondary:
    • Continuously replicates data from the primary.
    • Can be configured to serve read operations (based on read preference).
    • Eligible for election if the primary fails.
  • Arbiter:
    • Does not store data.
    • Participates in the election process to break ties.
    • Useful when you need an odd number of votes but don’t want the overhead of a full replica.

Failover Process:

If the primary node goes down:

  • An election process is triggered.
  • The replica set automatically promotes one of the secondaries to become the new primary.
  • Client drivers will detect the change and reroute operations accordingly.

Indexing in MongoDB

Indexes in MongoDB are like the index in a book – they help MongoDB find data faster.

Without an index, MongoDB has to scan every document in a collection (called a collection scan) – which is slow for large datasets.

Comments

Popular posts from this blog

"Don't Believe Everything You Think"-Joseph Nguyen

HOW JAVA WORKS

How Java Works (Interview Question Answers)