🧠 Simple Definition (Word-for-word)

Embed when: data is accessed together always, child data has no independent existence, limited number of children (avoid unbounded arrays).

⚡ Super Simple Line

Reference when: data is accessed independently, many-to-many relationships, data is shared across documents, child array could grow large (>100 items).

⚡ Key Details & Explanation

Embed when: data is accessed together always, child data has no independent existence, limited number of children (avoid unbounded arrays). Reference when: data is accessed independently, many-to-many relationships, data is shared across documents, child array could grow large (>100 items). Rule of thumb: embed for one-to-few, reference for one-to-many or many-to-many. MongoDB has a 16MB document size limit. A practical answer should also mention how I would verify the choice, for example by checking query plans, measuring response time, or testing behavior under concurrent writes.

⚡ One-line Interview Answer

Embed when: data is accessed together always, child data has no independent existence, limited number of children (avoid unbounded arrays).

🧠 Simple Definition (Word-for-word)

The aggregation pipeline is a sequence of stages that transform documents.

⚡ Super Simple Line

Key stages: $match (filter, like WHERE), $group (aggregate, like GROUP BY), $project (reshape documents), $lookup (join), $sort, $limit, $skip, $unwind (flatten arrays), $addFields, $facet (multiple pipelines in parallel).

⚡ Key Details & Explanation

The aggregation pipeline is a sequence of stages that transform documents. Key stages: $match (filter, like WHERE), $group (aggregate, like GROUP BY), $project (reshape documents), $lookup (join), $sort, $limit, $skip, $unwind (flatten arrays), $addFields, $facet (multiple pipelines in parallel). Example: [{$match:{status:'active'}},{$group:{_id:'$userId',total:{$sum:'$amount'}}}] A practical answer should also mention how I would verify the choice, for example by checking query plans, measuring response time, or testing behavior under concurrent writes.

⚡ One-line Interview Answer

The aggregation pipeline is a sequence of stages that transform documents.

🧠 Simple Definition (Word-for-word)

A Database Index is a data structure (typically a B-Tree or Hash) that improves the speed of data retrieval operations on a table at the cost of additional write time and storage space. MongoDB supports various index types (single field, compound, multikey, text, geospatial). A compound index indexes multiple fields together to support queries on those fields, while a text index tokenizes and stems strings to support search operations.

⚡ Super Simple Line

Compound Index = indexes order of fields for exact match/sort (e.g., { lastName: 1, age: -1 }).
Text Index = parses words for keyword search (e.g., { description: "text" }).

🧪 Code Example: Compound vs Text in MongoDB

// 1. Creating a Compound Index
db.users.createIndex({ lastName: 1, age: -1 });
// Supports queries on: { lastName } and { lastName, age }

// 2. Creating a Text Index (only one per collection)
db.articles.createIndex({ content: "text", title: "text" });
// Query with search keywords:
db.articles.find({ $text: { $search: "database index performance" } });

🚨 The ESR Rule for Compound Indexes (Interview Gold)

When creating a compound index, structure the keys in this order:

Equality: Fields tested for exact matches (e.g., status: "active").
Sort: Fields used to order the results (e.g., createdAt: -1).
Range: Fields tested for ranges (e.g., age: { $gt: 21 }).

⚡ One-line Interview Answer

Compound indexes sort multiple specific fields sequentially to speed up filter-sort queries, while text indexes tokenize string fields to facilitate full-text search queries.

🧠 Simple Definition (Word-for-word)

WiredTiger is MongoDB's default storage engine since 3.2.

⚡ Super Simple Line

It uses document-level concurrency control with MVCC (Multi-Version Concurrency Control): readers see a consistent snapshot of data at the start of their operation; writers don't block readers.

⚡ Key Details & Explanation

WiredTiger is MongoDB's default storage engine since 3.2. It uses document-level concurrency control with MVCC (Multi-Version Concurrency Control): readers see a consistent snapshot of data at the start of their operation; writers don't block readers. Each document has a history of versions; readers get the version current at their read timestamp. This is different from collection-level locking in older MMAPv1.

⚡ One-line Interview Answer

WiredTiger is MongoDB's default storage engine since 3.2.

🧠 Simple Definition (Word-for-word)

MongoDB supports multi-document ACID transactions across replica sets and sharded clusters. In MongoDB, write operations are atomic at the single-document level by default. When an application requires updating multiple documents across collections reliably, it uses session-based transactions to guarantee Atomicity, Consistency, Isolation, and Durability.

⚡ Super Simple Line

MongoDB is ACID-compliant: operations are atomic for single documents by default, and multi-document transactions can be run using Sessions.

🧪 Transaction Code Example (Node.js)

const MongoClient = require("mongodb");
const client = new MongoClient("mongodb://localhost:27017");

async function transferFunds(fromUserId, toUserId, amount) {
  const session = client.startSession();
  
  try {
    session.startTransaction();
    
    // Perform operations passing the session
    await db.collection("accounts").updateOne(
      { userId: fromUserId },
      { $inc: { balance: -amount } },
      { session }
    );
    
    await db.collection("accounts").updateOne(
      { userId: toUserId },
      { $inc: { balance: amount } },
      { session }
    );
    
    await session.commitTransaction(); // Commit changes to DB
    console.log("Transaction committed successfully.");
  } catch (error) {
    await session.abortTransaction(); // Rollback if error
    console.error("Transaction aborted due to error:", error);
  } finally {
    session.endSession();
  }
}

⚠️ When to use Transactions in MongoDB

Avoid using transactions as a crutch for poor schema design. 80-90% of MongoDB operations should utilize nested embedding to achieve single-document atomicity.
Only use transactions when updating multiple independent collections (e.g., billing ledgers, inventory vs user balances) where data consistency is absolutely critical.

⚡ One-line Interview Answer

MongoDB guarantees ACID properties by default at the single-document level and supports multi-document transactions using sessions for multi-collection write consistency.

🧠 Simple Definition (Word-for-word)

An SQL JOIN is an operation used to combine rows from two or more tables based on a related column between them. An INNER JOIN returns rows that have matching values in both tables. A LEFT JOIN returns all rows from the left table, and the matched rows from the right table (filling unmatched right columns with NULL). A FULL OUTER JOIN returns all rows when there is a match in either the left or right table.

⚡ Super Simple Line

INNER = only matching rows in both tables.
LEFT = all left rows + matched right rows (NULL if no match).
FULL = all rows from both tables, padded with NULL for missing matches.

🧪 SQL Code Example

-- 1. INNER JOIN
SELECT orders.id, customers.name
FROM orders
INNER JOIN customers ON orders.customer_id = customers.id;

-- 2. LEFT JOIN
SELECT customers.name, orders.id
FROM customers
LEFT JOIN orders ON customers.id = orders.customer_id;

📊 Visual Comparison (Venn Summary)

Join Type	Left Table Rows	Right Table Rows	Unmatched Values
INNER JOIN	Only matching rows	Only matching rows	Discarded
LEFT JOIN	All rows	Only matching rows	Filled with `NULL` on right
FULL OUTER JOIN	All rows	All rows	Filled with `NULL` on both sides

⚡ One-line Interview Answer

INNER JOIN returns only intersecting records between tables, LEFT JOIN returns all left-side records with matching right-side records, and FULL JOIN combines all rows from both tables, filling mismatches with NULL.

🧠 Simple Definition (Word-for-word)

Window functions perform calculations across a set of rows related to the current row without collapsing them (unlike GROUP BY).

⚡ Super Simple Line

Syntax: FUNCTION() OVER (PARTITION BY col ORDER BY col).

⚡ Key Details & Explanation

Window functions perform calculations across a set of rows related to the current row without collapsing them (unlike GROUP BY). Syntax: FUNCTION() OVER (PARTITION BY col ORDER BY col). Example: SELECT name, salary, RANK() OVER (PARTITION BY dept ORDER BY salary DESC) as rank FROM employees. ROW_NUMBER() — unique sequential number. RANK() — ties get same rank, gaps after. DENSE_RANK() — ties, no gaps. LAG/LEAD — access previous/next row.

⚡ One-line Interview Answer

Window functions perform calculations across a set of rows related to the current row without collapsing them (unlike GROUP BY).

🧠 Simple Definition (Word-for-word)

CTE (Common Table Expression): WITH cte_name AS (SELECT ...) SELECT * FROM cte_name.

⚡ Super Simple Line

Benefits: readable and reusable within the query, can reference the CTE multiple times, recursive CTEs for hierarchical data.

⚡ Key Details & Explanation

CTE (Common Table Expression): WITH cte_name AS (SELECT ...) SELECT * FROM cte_name. Benefits: readable and reusable within the query, can reference the CTE multiple times, recursive CTEs for hierarchical data. Use over subquery when: the same subquery would be repeated, for recursive queries (org charts, category trees), when readability matters. Performance is similar in modern PostgreSQL (optimizer often treats them the same). A practical answer should also mention how I would verify the choice, for example by checking query plans, measuring response time, or testing behavior under concurrent writes.

⚡ One-line Interview Answer

CTE (Common Table Expression): WITH cte_name AS (SELECT ...) SELECT * FROM cte_name.

🧠 Simple Definition (Word-for-word)

A Database Transaction is a sequence of operations performed as a single logical unit of work. Transactions must satisfy ACID properties: Atomicity (all operations succeed or all rollback), Consistency (data transitions between valid states), Isolation (concurrent transactions do not interfere), and Durability (committed updates survive crashes). PostgreSQL implements Isolation using MVCC (Multi-Version Concurrency Control) and Durability using a Write-Ahead Log (WAL).

⚡ Super Simple Line

ACID = Atomicity (all-or-nothing), Consistency (rules kept), Isolation (separated runs), Durability (saved permanently).

🛠️ PostgreSQL Under the Hood (MVCC & WAL)

Isolation (MVCC): PostgreSQL does not use read locks. When a row is modified, PostgreSQL writes a new version of that row in memory. Readers continue to read the old version of the row, avoiding read/write locking contentions.
Durability (WAL): When a transaction commits, PostgreSQL writes the change log sequentially to the Write-Ahead Log (WAL) on disk before modifying the actual database tables. This guarantees that transactions can be recovered in the event of a power loss or crash.

⚡ One-line Interview Answer

PostgreSQL guarantees ACID properties using Write-Ahead Logging (WAL) for crash recovery durability and Multi-Version Concurrency Control (MVCC) to support concurrent isolated reads and writes.

🧠 Simple Definition (Word-for-word)

Pessimistic locking: lock the row when reading to prevent others from modifying it — SELECT ...

⚡ Super Simple Line

FOR UPDATE.

⚡ Key Details & Explanation

Pessimistic locking: lock the row when reading to prevent others from modifying it — SELECT ... FOR UPDATE. Use when conflicts are frequent and expensive. Optimistic locking: don't lock; instead, include a version field. On update, check the version hasn't changed — UPDATE ... WHERE id=? AND version=? If 0 rows affected, someone else updated first, retry. Use when conflicts are rare — better concurrency, worse conflict handling.

⚡ One-line Interview Answer

Pessimistic locking: lock the row when reading to prevent others from modifying it — SELECT ...

🧠 Simple Definition (Word-for-word)

EXPLAIN ANALYZE executes the query and shows actual runtime statistics.

⚡ Super Simple Line

Look for: Seq Scan on large tables (should be Index Scan for selective queries), high actual rows vs estimated rows mismatch (stale statistics — run ANALYZE), nested loop joins on large result sets (consider hash join), high cost nodes, sort operations without an index.

⚡ Key Details & Explanation

EXPLAIN ANALYZE executes the query and shows actual runtime statistics. Look for: Seq Scan on large tables (should be Index Scan for selective queries), high actual rows vs estimated rows mismatch (stale statistics — run ANALYZE), nested loop joins on large result sets (consider hash join), high cost nodes, sort operations without an index. Fix by adding indexes on filter/join/sort columns and running VACUUM ANALYZE.

⚡ One-line Interview Answer

EXPLAIN ANALYZE executes the query and shows actual runtime statistics.

🧠 Simple Definition (Word-for-word)

Database Normalization is the process of organizing relational database schemas to minimize data redundancy and dependency anomalies, typically structured through Normal Forms (1NF, 2NF, 3NF). Denormalization is the intentional optimization of adding redundant data or grouping tables to speed up read query execution times by avoiding complex JOIN operations.

⚡ Super Simple Line

Normalization = split tables to eliminate redundancy (saves space, ensures consistency).
Denormalization = merge tables to speed up read queries (speeds up read-heavy dashboards).

📋 The Three Normal Forms (1NF, 2NF, 3NF)

First Normal Form (1NF): All table columns must contain atomic (indivisible) values, and there must be no repeating groups.
Second Normal Form (2NF): Must be in 1NF, and all non-key columns must be fully functionally dependent on the entire primary key (removes partial dependencies).
Third Normal Form (3NF): Must be in 2NF, and no non-key columns can depend on other non-key columns (removes transitive dependencies).

🔥 When to Denormalize Intentionally

Read-Heavy Reporting: In analytical databases or dashboards where generating reports requires joining 8+ tables, denormalizing speeds up execution time dramatically.
Caching: Storing calculated totals (e.g., total_orders on a Customer table) to avoid running aggregate queries on millions of rows frequently.

⚡ One-line Interview Answer

Normalization optimizes databases to prevent data anomalies and redundancy, while denormalization intentionally introduces redundancy to optimize search performance in read-heavy applications.

🧠 Simple Definition (Word-for-word)

Redis is an in-memory data store.

⚡ Super Simple Line

Structures: String (simple cache, counters — INCR, SET/GET), List (queues — LPUSH/RPOP), Hash (object storage — HSET/HGET), Set (unique collections — SADD/SMEMBERS), Sorted Set (leaderboards — ZADD with score, ZRANGE), Stream (event log — XADD/XREAD).

⚡ Key Details & Explanation

Redis is an in-memory data store. Structures: String (simple cache, counters — INCR, SET/GET), List (queues — LPUSH/RPOP), Hash (object storage — HSET/HGET), Set (unique collections — SADD/SMEMBERS), Sorted Set (leaderboards — ZADD with score, ZRANGE), Stream (event log — XADD/XREAD). Strings for caching JSON. Sorted Sets for rate limiting and leaderboards. Lists for job queues. Pub/Sub for real-time messaging. A practical answer should also mention how I would verify the choice, for example by checking query plans, measuring response time, or testing behavior under concurrent writes.

⚡ One-line Interview Answer

Redis is an in-memory data store.

🧠 Simple Definition (Word-for-word)

Cache-aside (lazy loading): app checks cache first, on miss fetches from DB and populates cache.

⚡ Super Simple Line

Most common.

⚡ Key Details & Explanation

Cache-aside (lazy loading): app checks cache first, on miss fetches from DB and populates cache. Most common. Write-through: write to cache and DB simultaneously — cache always current but write latency increases. Write-behind (write-back): write to cache immediately, sync to DB asynchronously — faster writes, risk of data loss. TTL: set expiry time on cached data — simplest but stale data possible. Event-driven invalidation: invalidate specific keys when data changes. A practical answer should also mention how I would verify the choice, for example by checking query plans, measuring response time, or testing behavior under concurrent writes.

⚡ One-line Interview Answer

Cache-aside (lazy loading): app checks cache first, on miss fetches from DB and populates cache.

🧠 Simple Definition (Word-for-word)

Cache stampede (dogpile): cached value expires, many concurrent requests all miss the cache simultaneously and all query the DB at once — DB gets hammered.

⚡ Super Simple Line

Prevention: probabilistic early expiration (re-cache before expiry based on probability), mutex/lock (first request gets the lock, others wait for the cached result), background refresh (asynchronously refresh the cache before it expires), stale-while-revalidate pattern.

⚡ Key Details & Explanation

Cache stampede (dogpile): cached value expires, many concurrent requests all miss the cache simultaneously and all query the DB at once — DB gets hammered. Prevention: probabilistic early expiration (re-cache before expiry based on probability), mutex/lock (first request gets the lock, others wait for the cached result), background refresh (asynchronously refresh the cache before it expires), stale-while-revalidate pattern. A practical answer should also mention how I would verify the choice, for example by checking query plans, measuring response time, or testing behavior under concurrent writes.

⚡ One-line Interview Answer

Cache stampede (dogpile): cached value expires, many concurrent requests all miss the cache simultaneously and all query the DB at once — DB gets hammered.

🧠 Simple Definition (Word-for-word)

ORM pros: type safety with Prisma, faster development, migrations as code, database-agnostic, prevents SQL injection by default.

⚡ Super Simple Line

ORM cons: can generate inefficient queries (N+1, unnecessary joins), abstraction leaks in complex queries, harder to use DB-specific features (window functions, CTEs, lateral joins).

⚡ Key Details & Explanation

ORM pros: type safety with Prisma, faster development, migrations as code, database-agnostic, prevents SQL injection by default. ORM cons: can generate inefficient queries (N+1, unnecessary joins), abstraction leaks in complex queries, harder to use DB-specific features (window functions, CTEs, lateral joins). Raw SQL pros: full control and optimization. Raw SQL cons: verbose, manual parameterization, no type safety without additional tooling. For complex reporting/analytics, use raw SQL. For CRUD, ORM is faster.

⚡ One-line Interview Answer

ORM pros: type safety with Prisma, faster development, migrations as code, database-agnostic, prevents SQL injection by default.

🧠 Simple Definition (Word-for-word)

Prisma Migrate generates SQL migration files from schema changes (prisma migrate dev).

⚡ Super Simple Line

Each migration is recorded in _prisma_migrations table.

⚡ Key Details & Explanation

Prisma Migrate generates SQL migration files from schema changes (prisma migrate dev). Each migration is recorded in _prisma_migrations table. In production: prisma migrate deploy applies pending migrations in order. If a migration fails mid-way, it's marked as failed in the table. Fix: manually apply the fix, mark the migration as resolved, or create a new migration. Always backup before migrating production. Use a deployment pipeline that runs migrations before deploying new code. A practical answer should also mention how I would verify the choice, for example by checking query plans, measuring response time, or testing behavior under concurrent writes.

⚡ One-line Interview Answer

Prisma Migrate generates SQL migration files from schema changes (prisma migrate dev).

🧠 Simple Definition (Word-for-word)

N+1 happens any time you first fetch a list of records, then run one extra query per record for related data.

⚡ Super Simple Line

Example: fetch 100 orders, then query the customer for each order separately, resulting in 101 queries.

⚡ Key Details & Explanation

N+1 happens any time you first fetch a list of records, then run one extra query per record for related data. Example: fetch 100 orders, then query the customer for each order separately, resulting in 101 queries. It causes latency and unnecessary database load. Fix it with joins, batching, eager loading, or preloading relations in your ORM. This problem is common in REST APIs and server-rendered pages too, not just GraphQL. A practical answer should also mention how I would verify the choice, for example by checking query plans, measuring response time, or testing behavior under concurrent writes.

⚡ One-line Interview Answer

N+1 happens any time you first fetch a list of records, then run one extra query per record for related data.

🧠 Simple Definition (Word-for-word)

Soft delete marks a record as deleted, usually with deletedAt or isDeleted, but keeps it in the database.

⚡ Super Simple Line

Use it when you need recovery, auditability, legal retention, or to preserve references from related records.

⚡ Key Details & Explanation

Soft delete marks a record as deleted, usually with deletedAt or isDeleted, but keeps it in the database. Use it when you need recovery, auditability, legal retention, or to preserve references from related records. Hard delete physically removes the row and is simpler when data truly should disappear. Tradeoff: soft delete adds query complexity because every read must filter deleted rows unless you centralize that logic.

⚡ One-line Interview Answer

Soft delete marks a record as deleted, usually with deletedAt or isDeleted, but keeps it in the database.

🧠 Simple Definition (Word-for-word)

A unique constraint lets the database enforce that no two rows share the same value or value combination, such as email or (userId, providerId).

⚡ Super Simple Line

This is critical because application-level pre-checks like 'SELECT then INSERT' still race under concurrency.

⚡ Key Details & Explanation

A unique constraint lets the database enforce that no two rows share the same value or value combination, such as email or (userId, providerId). This is critical because application-level pre-checks like 'SELECT then INSERT' still race under concurrency. The database is the final source of truth. Best practice: rely on unique constraints, catch the conflict error, and return a clean 409 Conflict or retry path from the API.

⚡ One-line Interview Answer

A unique constraint lets the database enforce that no two rows share the same value or value combination, such as email or (userId, providerId).

🧠 Simple Definition (Word-for-word)

A Deadlock is a concurrency conflict where two or more database transactions are unable to proceed because each is waiting for a lock held by the other. Databases resolve deadlocks by detecting the cycle and aborting one of the transactions. Deadlocks are prevented by locking resources in a consistent order, keeping transactions short, indexing foreign keys, and using lock timeout configurations with retry mechanisms.

⚡ Super Simple Line

Deadlock = Transaction A holds lock X and wants Y; Transaction B holds Y and wants X (infinite freeze).

🌳 Deadlock Flow Visualization

Transaction 1: locks Row A ────────────► tries to lock Row B (BLOCKED)
                                              ▲
                                              │
Transaction 2: tries to lock Row A ◄──────────┴─── locks Row B

🛠️ How to Prevent Deadlocks

Consistent Locking Order: Ensure all application code locks resources in the exact same sequence (e.g., always update accounts before users).
Keep Transactions Short: Avoid network requests or expensive operations inside transactions to minimize lock hold duration.
Lock Timeouts & Retries: Configure databases to abort transactions that wait too long for locks, and implement automatic retry block catch loops in backend logic.

⚡ One-line Interview Answer

Deadlocks happen when concurrent transactions block each other in a circular lock dependency, and they are prevented by ordering lock acquisitions consistently and keeping transaction scopes small.

🧠 Simple Definition (Word-for-word)

A subquery is basically a query inside another query.

⚡ Super Simple Line

It’s used to get intermediate results that are then used by the main query.

⚡ Key Details & Explanation

A subquery is basically a query inside another query. It’s used to get intermediate results that are then used by the main query.

The inner query runs first, and its result is passed to the outer query. So it helps break complex problems into smaller steps.

SELECT name
FROM employees
WHERE salary > (
  SELECT AVG(salary) FROM employees
);

⚡ One-line Interview Answer

A subquery is basically a query inside another query.

🧠 Simple Definition (Word-for-word)

primary key uniquely identifies each record in a table and cannot be null.

⚡ Super Simple Line

foreign key is used to create a relationship between two tables by referencing a primary key in another table.

⚡ Key Details & Explanation

primary key uniquely identifies each record in a table and cannot be null.
foreign key is used to create a relationship between two tables by referencing a primary key in another table.

⚡ One-line Interview Answer

primary key uniquely identifies each record in a table and cannot be null.

🧠 Simple Definition (Word-for-word)

A stored procedure is a precompiled SQL code stored in the database.

⚡ Super Simple Line

It improves performance and reusability.

⚡ Key Details & Explanation

A stored procedure is a precompiled SQL code stored in the database. It improves performance and reusability.

⚡ One-line Interview Answer

A stored procedure is a precompiled SQL code stored in the database.

🧠 Simple Definition (Word-for-word)

Network - If there is a network issue, such as high latency or low bandwidth, it can cause queues to build up as requests take longer to reach their destination and responses take longer to return.

⚡ Super Simple Line

Database - This is a common bottleneck, especially if the database is not optimized or if there are too many read/write operations happening simultaneously.

⚡ Key Details & Explanation

Network - If there is a network issue, such as high latency or low bandwidth, it can cause queues to build up as requests take longer to reach their destination and responses take longer to return.
Database - This is a common bottleneck, especially if the database is not optimized or if there are too many read/write operations happening simultaneously.
Application Server - If the application server is not able to handle the incoming requests efficiently, it can lead to a backlog of requests waiting to be processed.
External APIs - If the application relies on external APIs, any latency or downtime in those APIs can cause queues to build up in the application as it waits for responses.
Application Code - Inefficient code can lead to longer processing times, which can cause queues to build up as requests take longer to complete.

Root Cause of Queue Build-up

Inefficient slow processing in the application code.
Insufficient / limited resources (CPU, memory) on the application server.
Serial Resource Access - If multiple requests are trying to access the same resource (e.g., a file, a database record) and the access is serialized, it can lead to queues building up as requests wait for their turn.
Database performance issues, such as slow queries or lack of indexing.
Network latency or bandwidth issues.
External API latency or downtime.

*Note: We should always try to avoid building up queues when designing our system or find where queues are building up. If we can identify potential bottlenecks and address them proactively, we can improve the overall performance and user experience of our application.

⚡ One-line Interview Answer

Network - If there is a network issue, such as high latency or low bandwidth, it can cause queues to build up as requests take longer to reach their destination and responses take longer to return.

🧠 Simple Definition (Word-for-word)

DELETE removes rows and can be rolled back.

⚡ Super Simple Line

TRUNCATE removes all rows quickly and usually cannot be rolled back.

⚡ Key Details & Explanation

DELETE removes rows and can be rolled back.
TRUNCATE removes all rows quickly and usually cannot be rolled back.
DROP deletes the entire table structure.

⚡ One-line Interview Answer

DELETE removes rows and can be rolled back.

🧠 Simple Definition (Word-for-word)

WHERE is used to filter rows before grouping, while HAVING is used to filter groups after aggregation.

⚡ Super Simple Line

So HAVING is used with GROUP BY.

⚡ Key Details & Explanation

WHERE is used to filter rows before grouping, while HAVING is used to filter groups after aggregation. So HAVING is used with GROUP BY.

⚡ One-line Interview Answer

WHERE is used to filter rows before grouping, while HAVING is used to filter groups after aggregation.

🧠 Simple Definition (Word-for-word)

Relational databases (SQL) store data in structured tables with fixed schemas and relationships, enforcing strict ACID compliance. Non-relational databases (NoSQL) store semi-structured data as documents, key-values, or graphs, prioritizing dynamic schemas and horizontal scaling.

⚡ Super Simple Line

Use SQL (PostgreSQL) when data consistency and complex relationships are critical; use NoSQL (MongoDB) when data shapes change rapidly and you need horizontal scaling.

📊 Comparison Table

Feature	Relational (SQL / PostgreSQL)	Document (NoSQL / MongoDB)
Data Model	Structured tables with rows and columns	Semi-structured JSON-like BSON documents
Schema	Strict, predefined, enforced at database level	Dynamic, flexible, schema-on-read
Scaling	Typically Vertical (scale up CPU/RAM)	Horizontal (sharding across multiple nodes)
Relationships	Rich JOIN support, foreign key constraints	Normalized references or denormalized embedding
ACID Safety	Full transactional consistency out of the box	Configurable/eventual consistency (tunable)

⚡ One-line Interview Answer

Choose PostgreSQL for transactional systems with complex joins and strict integrity, and MongoDB for flexible schemas, high-throughput writes, and easy horizontal scaling.

🧠 Simple Definition (Word-for-word)

Redis is an in-memory database that stores data in RAM for sub-millisecond latency. To prevent data loss, it provides two persistence options: RDB (Redis Database), which takes point-in-time binary snapshots of the dataset, and AOF (Append Only File), which logs every write operation to disk as it is received.

⚡ Super Simple Line

RDB = backup snapshots taken at specific time intervals; AOF = continuous append-only log of every single database modification.

📊 Comparison Table

Feature	RDB (Redis Database Snapshots)	AOF (Append Only File)
Mechanism	Point-to-time database snapshot	Appends every write command to a log file
Durability	Lower (loses changes since last snapshot)	Higher (logs every command, typically max 1s loss)
Performance	High (main thread forks background process)	Slight overhead (disk write sync per command)
Recovery Speed	Very fast (directly loads binary dataset into memory)	Slower (re-executes all logged commands in order)
File Size	Compact single binary file	Larger text file (requires periodic compaction)

⚡ One-line Interview Answer

RDB provides fast recovery and minimal overhead through periodic snapshots, while AOF maximizes durability by logging every write command, and they are typically combined in production.

🧠 Simple Definition (Word-for-word)

Scaling a database increases its performance and storage capacity under load. Read Replication copies data from a single primary database to multiple read-only replicas to distribute read traffic. Sharding breaks database tables horizontally into smaller partitions (shards) across separate database servers to scale both reads and writes.

⚡ Super Simple Line

Replication = clone the entire database to scale read queries; Sharding = split tables into parts across nodes to scale both reads and writes.

📊 Comparison Table

Feature	Read Replication (Replicas)	Sharding (Horizontal Partitioning)
Scales What?	Read traffic capacity	Read and Write capacity, plus storage volume
Data Distribution	Every replica holds a full copy of the dataset	Each shard holds a subset of the dataset
Writes Target	Must target the primary write node only	Writes are routed across different shards based on shard key
Application Design	Requires separating read and write queries	Requires sharding key routing and complex queries management
Implementation Cost	Low (native feature in PostgreSQL, MySQL, MongoDB)	High (significant operational and code complexity)

⚡ One-line Interview Answer

Read replication handles high read traffic by distributing queries across duplicates, while sharding partitions a database horizontally to scale write capacity and database size limit barriers.

🧠 Simple Definition (Word-for-word)

Zero-downtime database migrations update schemas without blocking database traffic or causing errors in active application servers. This is achieved by making all changes backward-compatible so that both the old and new versions of the application code can query the database simultaneously during the deployment.

⚡ Super Simple Line

Never rename or drop columns directly; instead, use the Expand-Contract pattern: add first, migrate data in background batches, and drop old columns only when stable.

🛠️ The Expand-Contract (Parallel Run) Pattern Steps

1. Expand Phase: Add the new column or table to the database. Make it nullable or assign a default value. Do not delete or rename anything.
2. Deploy Code (Write Double): Deploy application updates that write to both the old and new columns, but continue reading from the old column.
3. Backfill Data: Run a background job to copy data from the old column to the new column in small, controlled batches to avoid database locks.
4. Deploy Code (Switch Read): Deploy application updates that read and write exclusively using the new column.
5. Contract Phase: Safely drop the old column and clean up any database triggers or temporary tables.

⚡ One-line Interview Answer

Zero-downtime migrations rely on the Expand-Contract pattern to introduce schema changes in backward-compatible phases, allowing older and newer code deployments to safely run in parallel.

Databases

When would you embed a document vs reference it in MongoDB?

🧠 Simple Definition (Word-for-word)

⚡ Super Simple Line

⚡ Key Details & Explanation

⚡ One-line Interview Answer

How does the MongoDB aggregation pipeline work?

🧠 Simple Definition (Word-for-word)

⚡ Super Simple Line

⚡ Key Details & Explanation

⚡ One-line Interview Answer

What are database indexes? What indexes does MongoDB support, how do they improve performance, and when should you use (or avoid) compound vs text indexes?

🧠 Simple Definition (Word-for-word)

⚡ Super Simple Line

🧪 Code Example: Compound vs Text in MongoDB

🚨 The ESR Rule for Compound Indexes (Interview Gold)

⚡ One-line Interview Answer

What is the WiredTiger storage engine and MVCC?

🧠 Simple Definition (Word-for-word)

⚡ Super Simple Line

⚡ Key Details & Explanation

⚡ One-line Interview Answer

How do you handle transactions in MongoDB? Explain MongoDB's support for transactions and ACID properties.

🧠 Simple Definition (Word-for-word)

⚡ Super Simple Line

🧪 Transaction Code Example (Node.js)

⚠️ When to use Transactions in MongoDB

⚡ One-line Interview Answer

What is the difference between INNER JOIN, LEFT JOIN, and FULL OUTER JOIN? Explain types of SQL JOINs.

🧠 Simple Definition (Word-for-word)

⚡ Super Simple Line

🧪 SQL Code Example

📊 Visual Comparison (Venn Summary)

⚡ One-line Interview Answer

What are window functions? Example with RANK().

🧠 Simple Definition (Word-for-word)

⚡ Super Simple Line

⚡ Key Details & Explanation

⚡ One-line Interview Answer

What is a CTE? When over a subquery?

🧠 Simple Definition (Word-for-word)

⚡ Super Simple Line

⚡ Key Details & Explanation

⚡ One-line Interview Answer

What is a database transaction? Explain ACID properties and how they are implemented (e.g., in PostgreSQL).

🧠 Simple Definition (Word-for-word)

⚡ Super Simple Line

🛠️ PostgreSQL Under the Hood (MVCC & WAL)

⚡ One-line Interview Answer

Difference between optimistic and pessimistic locking?

🧠 Simple Definition (Word-for-word)

⚡ Super Simple Line

⚡ Key Details & Explanation

⚡ One-line Interview Answer

How do you use EXPLAIN ANALYZE? What should you look for?

🧠 Simple Definition (Word-for-word)

⚡ Super Simple Line

⚡ Key Details & Explanation

⚡ One-line Interview Answer

What is database normalization? When is denormalization intentional, and why is it important in database design?

🧠 Simple Definition (Word-for-word)

⚡ Super Simple Line

📋 The Three Normal Forms (1NF, 2NF, 3NF)

🔥 When to Denormalize Intentionally

⚡ One-line Interview Answer

What is Redis? What data structures does it support?

🧠 Simple Definition (Word-for-word)

⚡ Super Simple Line

⚡ Key Details & Explanation

⚡ One-line Interview Answer

What is cache invalidation? What strategies exist?

🧠 Simple Definition (Word-for-word)

⚡ Super Simple Line

⚡ Key Details & Explanation

⚡ One-line Interview Answer

What is a cache stampede and how do you prevent it?

🧠 Simple Definition (Word-for-word)

⚡ Super Simple Line

⚡ Key Details & Explanation

⚡ One-line Interview Answer