🧠 Simple Definition (Word-for-word)
Embed when: data is accessed together always, child data has no independent existence, limited number of children (avoid unbounded arrays).
⚡ Super Simple Line
Reference when: data is accessed independently, many-to-many relationships, data is shared across documents, child array could grow large (>100 items).
⚡ Key Details & Explanation
Embed when: data is accessed together always, child data has no independent existence, limited number of children (avoid unbounded arrays). Reference when: data is accessed independently, many-to-many relationships, data is shared across documents, child array could grow large (>100 items). Rule of thumb: embed for one-to-few, reference for one-to-many or many-to-many. MongoDB has a 16MB document size limit. A practical answer should also mention how I would verify the choice, for example by checking query plans, measuring response time, or testing behavior under concurrent writes.
⚡ One-line Interview Answer
Embed when: data is accessed together always, child data has no independent existence, limited number of children (avoid unbounded arrays).
🧠 Simple Definition (Word-for-word)
The aggregation pipeline is a sequence of stages that transform documents.
⚡ Super Simple Line
Key stages: $match (filter, like WHERE), $group (aggregate, like GROUP BY), $project (reshape documents), $lookup (join), $sort, $limit, $skip, $unwind (flatten arrays), $addFields, $facet (multiple pipelines in parallel).
⚡ Key Details & Explanation
The aggregation pipeline is a sequence of stages that transform documents. Key stages: $match (filter, like WHERE), $group (aggregate, like GROUP BY), $project (reshape documents), $lookup (join), $sort, $limit, $skip, $unwind (flatten arrays), $addFields, $facet (multiple pipelines in parallel). Example: [{$match:{status:'active'}},{$group:{_id:'$userId',total:{$sum:'$amount'}}}] A practical answer should also mention how I would verify the choice, for example by checking query plans, measuring response time, or testing behavior under concurrent writes.
⚡ One-line Interview Answer
The aggregation pipeline is a sequence of stages that transform documents.
🧠 Simple Definition (Word-for-word)
A Database Index is a data structure (typically a B-Tree or Hash) that improves the speed of data retrieval operations on a table at the cost of additional write time and storage space. MongoDB supports various index types (single field, compound, multikey, text, geospatial). A compound index indexes multiple fields together to support queries on those fields, while a text index tokenizes and stems strings to support search operations.
⚡ Super Simple Line
Compound Index = indexes order of fields for exact match/sort (e.g.,
{ lastName: 1, age: -1 }).
Text Index = parses words for keyword search (e.g.,{ description: "text" }).
🧪 Code Example: Compound vs Text in MongoDB
// 1. Creating a Compound Index
db.users.createIndex({ lastName: 1, age: -1 });
// Supports queries on: { lastName } and { lastName, age }
// 2. Creating a Text Index (only one per collection)
db.articles.createIndex({ content: "text", title: "text" });
// Query with search keywords:
db.articles.find({ $text: { $search: "database index performance" } });
🚨 The ESR Rule for Compound Indexes (Interview Gold)
When creating a compound index, structure the keys in this order:
Equality: Fields tested for exact matches (e.g.,
status: "active").Sort: Fields used to order the results (e.g.,
createdAt: -1).Range: Fields tested for ranges (e.g.,
age: { $gt: 21 }).
⚡ One-line Interview Answer
Compound indexes sort multiple specific fields sequentially to speed up filter-sort queries, while text indexes tokenize string fields to facilitate full-text search queries.
🧠 Simple Definition (Word-for-word)
WiredTiger is MongoDB's default storage engine since 3.2.
⚡ Super Simple Line
It uses document-level concurrency control with MVCC (Multi-Version Concurrency Control): readers see a consistent snapshot of data at the start of their operation; writers don't block readers.
⚡ Key Details & Explanation
WiredTiger is MongoDB's default storage engine since 3.2. It uses document-level concurrency control with MVCC (Multi-Version Concurrency Control): readers see a consistent snapshot of data at the start of their operation; writers don't block readers. Each document has a history of versions; readers get the version current at their read timestamp. This is different from collection-level locking in older MMAPv1.
⚡ One-line Interview Answer
WiredTiger is MongoDB's default storage engine since 3.2.
🧠 Simple Definition (Word-for-word)
MongoDB supports multi-document ACID transactions across replica sets and sharded clusters. In MongoDB, write operations are atomic at the single-document level by default. When an application requires updating multiple documents across collections reliably, it uses session-based transactions to guarantee Atomicity, Consistency, Isolation, and Durability.
⚡ Super Simple Line
MongoDB is ACID-compliant: operations are atomic for single documents by default, and multi-document transactions can be run using Sessions.
🧪 Transaction Code Example (Node.js)
const MongoClient = require("mongodb");
const client = new MongoClient("mongodb://localhost:27017");
async function transferFunds(fromUserId, toUserId, amount) {
const session = client.startSession();
try {
session.startTransaction();
// Perform operations passing the session
await db.collection("accounts").updateOne(
{ userId: fromUserId },
{ $inc: { balance: -amount } },
{ session }
);
await db.collection("accounts").updateOne(
{ userId: toUserId },
{ $inc: { balance: amount } },
{ session }
);
await session.commitTransaction(); // Commit changes to DB
console.log("Transaction committed successfully.");
} catch (error) {
await session.abortTransaction(); // Rollback if error
console.error("Transaction aborted due to error:", error);
} finally {
session.endSession();
}
}
⚠️ When to use Transactions in MongoDB
Avoid using transactions as a crutch for poor schema design. 80-90% of MongoDB operations should utilize nested embedding to achieve single-document atomicity.
Only use transactions when updating multiple independent collections (e.g., billing ledgers, inventory vs user balances) where data consistency is absolutely critical.
⚡ One-line Interview Answer
MongoDB guarantees ACID properties by default at the single-document level and supports multi-document transactions using sessions for multi-collection write consistency.
🧠 Simple Definition (Word-for-word)
An SQL JOIN is an operation used to combine rows from two or more tables based on a related column between them. An INNER JOIN returns rows that have matching values in both tables. A LEFT JOIN returns all rows from the left table, and the matched rows from the right table (filling unmatched right columns with
NULL). A FULL OUTER JOIN returns all rows when there is a match in either the left or right table.
⚡ Super Simple Line
INNER = only matching rows in both tables.
LEFT = all left rows + matched right rows (NULL if no match).
FULL = all rows from both tables, padded with NULL for missing matches.
🧪 SQL Code Example
-- 1. INNER JOIN
SELECT orders.id, customers.name
FROM orders
INNER JOIN customers ON orders.customer_id = customers.id;
-- 2. LEFT JOIN
SELECT customers.name, orders.id
FROM customers
LEFT JOIN orders ON customers.id = orders.customer_id;
📊 Visual Comparison (Venn Summary)
| Join Type | Left Table Rows | Right Table Rows | Unmatched Values |
|---|---|---|---|
| INNER JOIN | Only matching rows | Only matching rows | Discarded |
| LEFT JOIN | All rows | Only matching rows | Filled with NULL on right |
| FULL OUTER JOIN | All rows | All rows | Filled with NULL on both sides |
⚡ One-line Interview Answer
INNER JOIN returns only intersecting records between tables, LEFT JOIN returns all left-side records with matching right-side records, and FULL JOIN combines all rows from both tables, filling mismatches with NULL.
🧠 Simple Definition (Word-for-word)
Window functions perform calculations across a set of rows related to the current row without collapsing them (unlike GROUP BY).
⚡ Super Simple Line
Syntax: FUNCTION() OVER (PARTITION BY col ORDER BY col).
⚡ Key Details & Explanation
Window functions perform calculations across a set of rows related to the current row without collapsing them (unlike GROUP BY). Syntax: FUNCTION() OVER (PARTITION BY col ORDER BY col). Example: SELECT name, salary, RANK() OVER (PARTITION BY dept ORDER BY salary DESC) as rank FROM employees. ROW_NUMBER() — unique sequential number. RANK() — ties get same rank, gaps after. DENSE_RANK() — ties, no gaps. LAG/LEAD — access previous/next row.
⚡ One-line Interview Answer
Window functions perform calculations across a set of rows related to the current row without collapsing them (unlike GROUP BY).
🧠 Simple Definition (Word-for-word)
CTE (Common Table Expression): WITH cte_name AS (SELECT ...) SELECT * FROM cte_name.
⚡ Super Simple Line
Benefits: readable and reusable within the query, can reference the CTE multiple times, recursive CTEs for hierarchical data.
⚡ Key Details & Explanation
CTE (Common Table Expression): WITH cte_name AS (SELECT ...) SELECT * FROM cte_name. Benefits: readable and reusable within the query, can reference the CTE multiple times, recursive CTEs for hierarchical data. Use over subquery when: the same subquery would be repeated, for recursive queries (org charts, category trees), when readability matters. Performance is similar in modern PostgreSQL (optimizer often treats them the same). A practical answer should also mention how I would verify the choice, for example by checking query plans, measuring response time, or testing behavior under concurrent writes.
⚡ One-line Interview Answer
CTE (Common Table Expression): WITH cte_name AS (SELECT ...) SELECT * FROM cte_name.
🧠 Simple Definition (Word-for-word)
A Database Transaction is a sequence of operations performed as a single logical unit of work. Transactions must satisfy ACID properties: Atomicity (all operations succeed or all rollback), Consistency (data transitions between valid states), Isolation (concurrent transactions do not interfere), and Durability (committed updates survive crashes). PostgreSQL implements Isolation using MVCC (Multi-Version Concurrency Control) and Durability using a Write-Ahead Log (WAL).
⚡ Super Simple Line
ACID = Atomicity (all-or-nothing), Consistency (rules kept), Isolation (separated runs), Durability (saved permanently).
🛠️ PostgreSQL Under the Hood (MVCC & WAL)
Isolation (MVCC): PostgreSQL does not use read locks. When a row is modified, PostgreSQL writes a new version of that row in memory. Readers continue to read the old version of the row, avoiding read/write locking contentions.
Durability (WAL): When a transaction commits, PostgreSQL writes the change log sequentially to the Write-Ahead Log (WAL) on disk before modifying the actual database tables. This guarantees that transactions can be recovered in the event of a power loss or crash.
⚡ One-line Interview Answer
PostgreSQL guarantees ACID properties using Write-Ahead Logging (WAL) for crash recovery durability and Multi-Version Concurrency Control (MVCC) to support concurrent isolated reads and writes.
🧠 Simple Definition (Word-for-word)
Pessimistic locking: lock the row when reading to prevent others from modifying it — SELECT ...
⚡ Super Simple Line
FOR UPDATE.
⚡ Key Details & Explanation
Pessimistic locking: lock the row when reading to prevent others from modifying it — SELECT ... FOR UPDATE. Use when conflicts are frequent and expensive. Optimistic locking: don't lock; instead, include a version field. On update, check the version hasn't changed — UPDATE ... WHERE id=? AND version=? If 0 rows affected, someone else updated first, retry. Use when conflicts are rare — better concurrency, worse conflict handling.
⚡ One-line Interview Answer
Pessimistic locking: lock the row when reading to prevent others from modifying it — SELECT ...
🧠 Simple Definition (Word-for-word)
EXPLAIN ANALYZE executes the query and shows actual runtime statistics.
⚡ Super Simple Line
Look for: Seq Scan on large tables (should be Index Scan for selective queries), high actual rows vs estimated rows mismatch (stale statistics — run ANALYZE), nested loop joins on large result sets (consider hash join), high cost nodes, sort operations without an index.
⚡ Key Details & Explanation
EXPLAIN ANALYZE executes the query and shows actual runtime statistics. Look for: Seq Scan on large tables (should be Index Scan for selective queries), high actual rows vs estimated rows mismatch (stale statistics — run ANALYZE), nested loop joins on large result sets (consider hash join), high cost nodes, sort operations without an index. Fix by adding indexes on filter/join/sort columns and running VACUUM ANALYZE.
⚡ One-line Interview Answer
EXPLAIN ANALYZE executes the query and shows actual runtime statistics.
🧠 Simple Definition (Word-for-word)
Database Normalization is the process of organizing relational database schemas to minimize data redundancy and dependency anomalies, typically structured through Normal Forms (1NF, 2NF, 3NF). Denormalization is the intentional optimization of adding redundant data or grouping tables to speed up read query execution times by avoiding complex JOIN operations.
⚡ Super Simple Line
Normalization = split tables to eliminate redundancy (saves space, ensures consistency).
Denormalization = merge tables to speed up read queries (speeds up read-heavy dashboards).
📋 The Three Normal Forms (1NF, 2NF, 3NF)
First Normal Form (1NF): All table columns must contain atomic (indivisible) values, and there must be no repeating groups.
Second Normal Form (2NF): Must be in 1NF, and all non-key columns must be fully functionally dependent on the entire primary key (removes partial dependencies).
Third Normal Form (3NF): Must be in 2NF, and no non-key columns can depend on other non-key columns (removes transitive dependencies).
🔥 When to Denormalize Intentionally
Read-Heavy Reporting: In analytical databases or dashboards where generating reports requires joining 8+ tables, denormalizing speeds up execution time dramatically.
Caching: Storing calculated totals (e.g.,
total_orderson aCustomertable) to avoid running aggregate queries on millions of rows frequently.
⚡ One-line Interview Answer
Normalization optimizes databases to prevent data anomalies and redundancy, while denormalization intentionally introduces redundancy to optimize search performance in read-heavy applications.
🧠 Simple Definition (Word-for-word)
Redis is an in-memory data store.
⚡ Super Simple Line
Structures: String (simple cache, counters — INCR, SET/GET), List (queues — LPUSH/RPOP), Hash (object storage — HSET/HGET), Set (unique collections — SADD/SMEMBERS), Sorted Set (leaderboards — ZADD with score, ZRANGE), Stream (event log — XADD/XREAD).
⚡ Key Details & Explanation
Redis is an in-memory data store. Structures: String (simple cache, counters — INCR, SET/GET), List (queues — LPUSH/RPOP), Hash (object storage — HSET/HGET), Set (unique collections — SADD/SMEMBERS), Sorted Set (leaderboards — ZADD with score, ZRANGE), Stream (event log — XADD/XREAD). Strings for caching JSON. Sorted Sets for rate limiting and leaderboards. Lists for job queues. Pub/Sub for real-time messaging. A practical answer should also mention how I would verify the choice, for example by checking query plans, measuring response time, or testing behavior under concurrent writes.
⚡ One-line Interview Answer
Redis is an in-memory data store.
🧠 Simple Definition (Word-for-word)
Cache-aside (lazy loading): app checks cache first, on miss fetches from DB and populates cache.
⚡ Super Simple Line
Most common.
⚡ Key Details & Explanation
Cache-aside (lazy loading): app checks cache first, on miss fetches from DB and populates cache. Most common. Write-through: write to cache and DB simultaneously — cache always current but write latency increases. Write-behind (write-back): write to cache immediately, sync to DB asynchronously — faster writes, risk of data loss. TTL: set expiry time on cached data — simplest but stale data possible. Event-driven invalidation: invalidate specific keys when data changes. A practical answer should also mention how I would verify the choice, for example by checking query plans, measuring response time, or testing behavior under concurrent writes.
⚡ One-line Interview Answer
Cache-aside (lazy loading): app checks cache first, on miss fetches from DB and populates cache.
🧠 Simple Definition (Word-for-word)
Cache stampede (dogpile): cached value expires, many concurrent requests all miss the cache simultaneously and all query the DB at once — DB gets hammered.
⚡ Super Simple Line
Prevention: probabilistic early expiration (re-cache before expiry based on probability), mutex/lock (first request gets the lock, others wait for the cached result), background refresh (asynchronously refresh the cache before it expires), stale-while-revalidate pattern.
⚡ Key Details & Explanation
Cache stampede (dogpile): cached value expires, many concurrent requests all miss the cache simultaneously and all query the DB at once — DB gets hammered. Prevention: probabilistic early expiration (re-cache before expiry based on probability), mutex/lock (first request gets the lock, others wait for the cached result), background refresh (asynchronously refresh the cache before it expires), stale-while-revalidate pattern. A practical answer should also mention how I would verify the choice, for example by checking query plans, measuring response time, or testing behavior under concurrent writes.
⚡ One-line Interview Answer
Cache stampede (dogpile): cached value expires, many concurrent requests all miss the cache simultaneously and all query the DB at once — DB gets hammered.
🧠 Simple Definition (Word-for-word)
ORM pros: type safety with Prisma, faster development, migrations as code, database-agnostic, prevents SQL injection by default.
⚡ Super Simple Line
ORM cons: can generate inefficient queries (N+1, unnecessary joins), abstraction leaks in complex queries, harder to use DB-specific features (window functions, CTEs, lateral joins).
⚡ Key Details & Explanation
ORM pros: type safety with Prisma, faster development, migrations as code, database-agnostic, prevents SQL injection by default. ORM cons: can generate inefficient queries (N+1, unnecessary joins), abstraction leaks in complex queries, harder to use DB-specific features (window functions, CTEs, lateral joins). Raw SQL pros: full control and optimization. Raw SQL cons: verbose, manual parameterization, no type safety without additional tooling. For complex reporting/analytics, use raw SQL. For CRUD, ORM is faster.
⚡ One-line Interview Answer
ORM pros: type safety with Prisma, faster development, migrations as code, database-agnostic, prevents SQL injection by default.
🧠 Simple Definition (Word-for-word)
Prisma Migrate generates SQL migration files from schema changes (prisma migrate dev).
⚡ Super Simple Line
Each migration is recorded in _prisma_migrations table.
⚡ Key Details & Explanation
Prisma Migrate generates SQL migration files from schema changes (prisma migrate dev). Each migration is recorded in _prisma_migrations table. In production: prisma migrate deploy applies pending migrations in order. If a migration fails mid-way, it's marked as failed in the table. Fix: manually apply the fix, mark the migration as resolved, or create a new migration. Always backup before migrating production. Use a deployment pipeline that runs migrations before deploying new code. A practical answer should also mention how I would verify the choice, for example by checking query plans, measuring response time, or testing behavior under concurrent writes.
⚡ One-line Interview Answer
Prisma Migrate generates SQL migration files from schema changes (prisma migrate dev).
🧠 Simple Definition (Word-for-word)
N+1 happens any time you first fetch a list of records, then run one extra query per record for related data.
⚡ Super Simple Line
Example: fetch 100 orders, then query the customer for each order separately, resulting in 101 queries.
⚡ Key Details & Explanation
N+1 happens any time you first fetch a list of records, then run one extra query per record for related data. Example: fetch 100 orders, then query the customer for each order separately, resulting in 101 queries. It causes latency and unnecessary database load. Fix it with joins, batching, eager loading, or preloading relations in your ORM. This problem is common in REST APIs and server-rendered pages too, not just GraphQL. A practical answer should also mention how I would verify the choice, for example by checking query plans, measuring response time, or testing behavior under concurrent writes.
⚡ One-line Interview Answer
N+1 happens any time you first fetch a list of records, then run one extra query per record for related data.
🧠 Simple Definition (Word-for-word)
Soft delete marks a record as deleted, usually with deletedAt or isDeleted, but keeps it in the database.
⚡ Super Simple Line
Use it when you need recovery, auditability, legal retention, or to preserve references from related records.
⚡ Key Details & Explanation
Soft delete marks a record as deleted, usually with deletedAt or isDeleted, but keeps it in the database. Use it when you need recovery, auditability, legal retention, or to preserve references from related records. Hard delete physically removes the row and is simpler when data truly should disappear. Tradeoff: soft delete adds query complexity because every read must filter deleted rows unless you centralize that logic.
⚡ One-line Interview Answer
Soft delete marks a record as deleted, usually with deletedAt or isDeleted, but keeps it in the database.
🧠 Simple Definition (Word-for-word)
A unique constraint lets the database enforce that no two rows share the same value or value combination, such as email or (userId, providerId).
⚡ Super Simple Line
This is critical because application-level pre-checks like 'SELECT then INSERT' still race under concurrency.
⚡ Key Details & Explanation
A unique constraint lets the database enforce that no two rows share the same value or value combination, such as email or (userId, providerId). This is critical because application-level pre-checks like 'SELECT then INSERT' still race under concurrency. The database is the final source of truth. Best practice: rely on unique constraints, catch the conflict error, and return a clean 409 Conflict or retry path from the API.
⚡ One-line Interview Answer
A unique constraint lets the database enforce that no two rows share the same value or value combination, such as email or (userId, providerId).
🧠 Simple Definition (Word-for-word)
A Deadlock is a concurrency conflict where two or more database transactions are unable to proceed because each is waiting for a lock held by the other. Databases resolve deadlocks by detecting the cycle and aborting one of the transactions. Deadlocks are prevented by locking resources in a consistent order, keeping transactions short, indexing foreign keys, and using lock timeout configurations with retry mechanisms.
⚡ Super Simple Line
Deadlock = Transaction A holds lock X and wants Y; Transaction B holds Y and wants X (infinite freeze).
🌳 Deadlock Flow Visualization
Transaction 1: locks Row A ────────────► tries to lock Row B (BLOCKED)
▲
│
Transaction 2: tries to lock Row A ◄──────────┴─── locks Row B
🛠️ How to Prevent Deadlocks
Consistent Locking Order: Ensure all application code locks resources in the exact same sequence (e.g., always update
accountsbeforeusers).Keep Transactions Short: Avoid network requests or expensive operations inside transactions to minimize lock hold duration.
Lock Timeouts & Retries: Configure databases to abort transactions that wait too long for locks, and implement automatic retry block catch loops in backend logic.
⚡ One-line Interview Answer
Deadlocks happen when concurrent transactions block each other in a circular lock dependency, and they are prevented by ordering lock acquisitions consistently and keeping transaction scopes small.
🧠 Simple Definition (Word-for-word)
A subquery is basically a query inside another query.
⚡ Super Simple Line
It’s used to get intermediate results that are then used by the main query.
⚡ Key Details & Explanation
A subquery is basically a query inside another query. It’s used to get intermediate results that are then used by the main query.
The inner query runs first, and its result is passed to the outer query. So it helps break complex problems into smaller steps.
SELECT name
FROM employees
WHERE salary > (
SELECT AVG(salary) FROM employees
);
⚡ One-line Interview Answer
A subquery is basically a query inside another query.
🧠 Simple Definition (Word-for-word)
primary key uniquely identifies each record in a table and cannot be null.
⚡ Super Simple Line
foreign key is used to create a relationship between two tables by referencing a primary key in another table.
⚡ Key Details & Explanation
- primary key uniquely identifies each record in a table and cannot be null.
- foreign key is used to create a relationship between two tables by referencing a primary key in another table.
⚡ One-line Interview Answer
primary key uniquely identifies each record in a table and cannot be null.
🧠 Simple Definition (Word-for-word)
A stored procedure is a precompiled SQL code stored in the database.
⚡ Super Simple Line
It improves performance and reusability.
⚡ Key Details & Explanation
A stored procedure is a precompiled SQL code stored in the database. It improves performance and reusability.
⚡ One-line Interview Answer
A stored procedure is a precompiled SQL code stored in the database.
🧠 Simple Definition (Word-for-word)
Network - If there is a network issue, such as high latency or low bandwidth, it can cause queues to build up as requests take longer to reach their destination and responses take longer to return.
⚡ Super Simple Line
Database - This is a common bottleneck, especially if the database is not optimized or if there are too many read/write operations happening simultaneously.
⚡ Key Details & Explanation
- Network - If there is a network issue, such as high latency or low bandwidth, it can cause queues to build up as requests take longer to reach their destination and responses take longer to return.
- Database - This is a common bottleneck, especially if the database is not optimized or if there are too many read/write operations happening simultaneously.
- Application Server - If the application server is not able to handle the incoming requests efficiently, it can lead to a backlog of requests waiting to be processed.
- External APIs - If the application relies on external APIs, any latency or downtime in those APIs can cause queues to build up in the application as it waits for responses.
- Application Code - Inefficient code can lead to longer processing times, which can cause queues to build up as requests take longer to complete.
Root Cause of Queue Build-up
- Inefficient slow processing in the application code.
- Insufficient / limited resources (CPU, memory) on the application server.
- Serial Resource Access - If multiple requests are trying to access the same resource (e.g., a file, a database record) and the access is serialized, it can lead to queues building up as requests wait for their turn.
- Database performance issues, such as slow queries or lack of indexing.
- Network latency or bandwidth issues.
- External API latency or downtime.
*Note: We should always try to avoid building up queues when designing our system or find where queues are building up. If we can identify potential bottlenecks and address them proactively, we can improve the overall performance and user experience of our application.
⚡ One-line Interview Answer
Network - If there is a network issue, such as high latency or low bandwidth, it can cause queues to build up as requests take longer to reach their destination and responses take longer to return.
🧠 Simple Definition (Word-for-word)
DELETE removes rows and can be rolled back.
⚡ Super Simple Line
TRUNCATE removes all rows quickly and usually cannot be rolled back.
⚡ Key Details & Explanation
DELETEremoves rows and can be rolled back.TRUNCATEremoves all rows quickly and usually cannot be rolled back.DROPdeletes the entire table structure.
⚡ One-line Interview Answer
DELETE removes rows and can be rolled back.
🧠 Simple Definition (Word-for-word)
WHERE is used to filter rows before grouping, while HAVING is used to filter groups after aggregation.
⚡ Super Simple Line
So HAVING is used with GROUP BY.
⚡ Key Details & Explanation
WHERE is used to filter rows before grouping, while HAVING is used to filter groups after aggregation. So HAVING is used with GROUP BY.
⚡ One-line Interview Answer
WHERE is used to filter rows before grouping, while HAVING is used to filter groups after aggregation.
🧠 Simple Definition (Word-for-word)
Relational databases (SQL) store data in structured tables with fixed schemas and relationships, enforcing strict ACID compliance. Non-relational databases (NoSQL) store semi-structured data as documents, key-values, or graphs, prioritizing dynamic schemas and horizontal scaling.
⚡ Super Simple Line
Use SQL (PostgreSQL) when data consistency and complex relationships are critical; use NoSQL (MongoDB) when data shapes change rapidly and you need horizontal scaling.
📊 Comparison Table
| Feature | Relational (SQL / PostgreSQL) | Document (NoSQL / MongoDB) |
|---|---|---|
| Data Model | Structured tables with rows and columns | Semi-structured JSON-like BSON documents |
| Schema | Strict, predefined, enforced at database level | Dynamic, flexible, schema-on-read |
| Scaling | Typically Vertical (scale up CPU/RAM) | Horizontal (sharding across multiple nodes) |
| Relationships | Rich JOIN support, foreign key constraints | Normalized references or denormalized embedding |
| ACID Safety | Full transactional consistency out of the box | Configurable/eventual consistency (tunable) |
⚡ One-line Interview Answer
Choose PostgreSQL for transactional systems with complex joins and strict integrity, and MongoDB for flexible schemas, high-throughput writes, and easy horizontal scaling.
🧠 Simple Definition (Word-for-word)
Redis is an in-memory database that stores data in RAM for sub-millisecond latency. To prevent data loss, it provides two persistence options: RDB (Redis Database), which takes point-in-time binary snapshots of the dataset, and AOF (Append Only File), which logs every write operation to disk as it is received.
⚡ Super Simple Line
RDB = backup snapshots taken at specific time intervals; AOF = continuous append-only log of every single database modification.
📊 Comparison Table
| Feature | RDB (Redis Database Snapshots) | AOF (Append Only File) |
|---|---|---|
| Mechanism | Point-to-time database snapshot | Appends every write command to a log file |
| Durability | Lower (loses changes since last snapshot) | Higher (logs every command, typically max 1s loss) |
| Performance | High (main thread forks background process) | Slight overhead (disk write sync per command) |
| Recovery Speed | Very fast (directly loads binary dataset into memory) | Slower (re-executes all logged commands in order) |
| File Size | Compact single binary file | Larger text file (requires periodic compaction) |
⚡ One-line Interview Answer
RDB provides fast recovery and minimal overhead through periodic snapshots, while AOF maximizes durability by logging every write command, and they are typically combined in production.
🧠 Simple Definition (Word-for-word)
Scaling a database increases its performance and storage capacity under load. Read Replication copies data from a single primary database to multiple read-only replicas to distribute read traffic. Sharding breaks database tables horizontally into smaller partitions (shards) across separate database servers to scale both reads and writes.
⚡ Super Simple Line
Replication = clone the entire database to scale read queries; Sharding = split tables into parts across nodes to scale both reads and writes.
📊 Comparison Table
| Feature | Read Replication (Replicas) | Sharding (Horizontal Partitioning) |
|---|---|---|
| Scales What? | Read traffic capacity | Read and Write capacity, plus storage volume |
| Data Distribution | Every replica holds a full copy of the dataset | Each shard holds a subset of the dataset |
| Writes Target | Must target the primary write node only | Writes are routed across different shards based on shard key |
| Application Design | Requires separating read and write queries | Requires sharding key routing and complex queries management |
| Implementation Cost | Low (native feature in PostgreSQL, MySQL, MongoDB) | High (significant operational and code complexity) |
⚡ One-line Interview Answer
Read replication handles high read traffic by distributing queries across duplicates, while sharding partitions a database horizontally to scale write capacity and database size limit barriers.
🧠 Simple Definition (Word-for-word)
Zero-downtime database migrations update schemas without blocking database traffic or causing errors in active application servers. This is achieved by making all changes backward-compatible so that both the old and new versions of the application code can query the database simultaneously during the deployment.
⚡ Super Simple Line
Never rename or drop columns directly; instead, use the Expand-Contract pattern: add first, migrate data in background batches, and drop old columns only when stable.
🛠️ The Expand-Contract (Parallel Run) Pattern Steps
1. Expand Phase: Add the new column or table to the database. Make it nullable or assign a default value. Do not delete or rename anything.
2. Deploy Code (Write Double): Deploy application updates that write to both the old and new columns, but continue reading from the old column.
3. Backfill Data: Run a background job to copy data from the old column to the new column in small, controlled batches to avoid database locks.
4. Deploy Code (Switch Read): Deploy application updates that read and write exclusively using the new column.
5. Contract Phase: Safely drop the old column and clean up any database triggers or temporary tables.
⚡ One-line Interview Answer
Zero-downtime migrations rely on the Expand-Contract pattern to introduce schema changes in backward-compatible phases, allowing older and newer code deployments to safely run in parallel.