Grouped by difficulty and topic. Use this as a preparation guide before technical rounds, or as a self-assessment to identify which areas need more practice.
Questions are grouped into five categories in increasing order of difficulty. Start from the category that matches your current level. If you can answer every question in a category without hesitation, move to the next. If you hesitate on more than two in a row, that category is where your preparation should focus.
Category 1: Basic SQL – Foundations and Syntax
Every data analyst interview starts here. These questions filter out candidates who have listed SQL on a resume without actually using it. They should be answered immediately, without calculation.
Q1. What is the difference between WHERE and HAVING?
A: WHERE filters rows before aggregation. HAVING filters groups after aggregation. WHERE cannot reference aggregate functions like COUNT() or SUM() – HAVING can.
SELECT department, COUNT(*) AS emp_count
FROM employees
WHERE status = 'active'
GROUP BY department
HAVING COUNT(*) > 5;
Q2. What is the difference between INNER JOIN, LEFT JOIN, and FULL OUTER JOIN?
A: INNER JOIN returns only rows with matching values in both tables. LEFT JOIN returns all rows from the left table and matched rows from the right — unmatched right rows are NULL. FULL OUTER JOIN returns all rows from both tables — unmatched rows on either side are NULL.
Q3. What does the DISTINCT keyword do?
A: Removes duplicate rows from the result set. SELECT DISTINCT returns only unique combinations of the selected columns.
SELECT DISTINCT department FROM employees;
Q4. What is the difference between DELETE, TRUNCATE, and DROP?
A: DELETE removes specific rows (can use WHERE, is logged, can be rolled back). TRUNCATE removes all rows from a table quickly (minimal logging, faster than DELETE). DROP removes the entire table and its structure from the database.
Q5. How do you sort results in descending order?
A: Use ORDER BY column_name DESC. Ascending is the default and can be specified with ASC.
SELECT name, salary FROM employees ORDER BY salary DESC;
Q6. What does NULL mean in SQL and how do you test for it?
A: NULL means the absence of a value — it is not zero or an empty string. NULL cannot be compared with = or !=. Use IS NULL or IS NOT NULL to test for it.
SELECT * FROM orders WHERE delivery_date IS NULL;
Q7. What is the difference between UNION and UNION ALL?
A: UNION combines result sets and removes duplicates. UNION ALL combines result sets and keeps all rows including duplicates. UNION ALL is faster because it skips the deduplication step.
Q8. What is a primary key?
A: A column or combination of columns that uniquely identifies each row in a table. Primary keys cannot contain NULL values and must be unique across all rows.
Q9. What is a foreign key?
A: A column in one table that references the primary key of another table. It enforces referential integrity — you cannot insert a foreign key value that does not exist in the referenced table.
Q10. How do you limit the number of rows returned in a query?
A: In MySQL and PostgreSQL use LIMIT. In SQL Server use TOP. In Oracle use FETCH FIRST n ROWS ONLY or ROWNUM.
SELECT * FROM orders ORDER BY order_date DESC LIMIT 10;
Category 2: Joins and Filtering – Multi-Table Queries
These questions are tested in nearly every data analyst technical round. The join questions in particular distinguish candidates who understand relational data from those who only know single-table operations.
Q11. How do you find rows in Table A that have no match in Table B?
A: Use a LEFT JOIN and filter for NULL on a column from Table B — specifically the join key. This is the anti-join pattern.
SELECT a.* FROM orders a
LEFT JOIN deliveries b ON a.order_id = b.order_id
WHERE b.order_id IS NULL;
Q12. What is a self-join and when would you use it?
A: A self-join joins a table to itself. Used when rows in a table have a relationship to other rows in the same table — such as an employee table where each employee has a manager_id that references another employee_id in the same table.
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;
Q13. What is a CROSS JOIN?
A: Returns the Cartesian product — every row from the first table paired with every row from the second table. A 10-row table crossed with a 5-row table produces 50 rows. Used in date spine generation and combination analysis.
Q14. How do you join on multiple conditions?
A: Add additional conditions after the first join condition using AND.
SELECT * FROM orders o
JOIN shipments s ON o.order_id = s.order_id
AND o.warehouse_id = s.warehouse_id;
Q15. What is the difference between a subquery and a JOIN?
A: A subquery is a nested SELECT inside another query — it can appear in WHERE, FROM, or SELECT clauses. A JOIN combines tables horizontally. Subqueries in FROM produce a derived table; subqueries in WHERE filter rows. JOINs are often more readable and sometimes faster, but subqueries are more flexible for certain filtering patterns.
Q16. How do you filter rows based on a list of values?
A: Use IN. Use NOT IN to exclude a list.
SELECT * FROM products WHERE category IN ('Electronics', 'Clothing', 'Home');
SELECT * FROM products WHERE category NOT IN ('Electronics', 'Clothing');
Q17. How do you filter rows based on a pattern in a text column?
A: Use LIKE with wildcards. % matches any sequence of characters. _ matches exactly one character.
SELECT * FROM customers WHERE email LIKE '%@gmail.com';
SELECT * FROM products WHERE sku LIKE 'A_B%';
Q18. How do you return rows where a column value falls between two values?
A: Use BETWEEN. It is inclusive on both ends.
SELECT * FROM orders WHERE order_date BETWEEN '2025-01-01' AND '2025-03-31';
Q19. How do you write a query that returns customers who placed more than 5 orders?
A: GROUP BY customer, count orders, filter with HAVING.
SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 5;
Q20. How do you combine filter conditions using AND and OR correctly?
A: AND requires both conditions true. OR requires at least one. When combining both, use parentheses to control evaluation order. Without parentheses, AND evaluates before OR, which can produce unintended results.
— Correct: parentheses make intent explicit
WHERE (status = ‘shipped’ OR status = ‘delivered’)
AND order_date >= ‘2025-01-01’
Category 3: Aggregation and Grouping – Business Logic Queries
These questions test whether you can translate business questions into SQL logic. Most real analyst work lives here.
Q21. How do you calculate the percentage each group contributes to the total?
A: Divide the group count or sum by the total using a subquery or a window function.
SELECT department,
COUNT(*) AS dept_count,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS pct
FROM employees
GROUP BY department;
Q22. How do you find the second highest value in a column?
A: Use a subquery with MAX and exclude the maximum, or use DENSE_RANK() with window functions.
— Subquery approach
SELECT MAX(salary) FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
-- Window function approach (preferred)
SELECT salary FROM (
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees) sub
WHERE rnk = 2 LIMIT 1;
Q23. What is the difference between COUNT(*), COUNT(column), and COUNT(DISTINCT column)?
A: COUNT(*) counts all rows including NULLs. COUNT(column) counts non-NULL values in that column. COUNT(DISTINCT column) counts unique non-NULL values.
Q24. How do you calculate a running total?
A: Use SUM() with a window function and ORDER BY inside OVER().
SELECT order_date, revenue,
SUM(revenue) OVER (ORDER BY order_date) AS running_total
FROM daily_revenue;
Q25. How do you pivot rows into columns?
A: Use conditional aggregation with CASE WHEN or PIVOT syntax in dialects that support it.
SELECT month,
SUM(CASE WHEN category = 'Electronics' THEN revenue ELSE 0 END) AS electronics,
SUM(CASE WHEN category = 'Clothing' THEN revenue ELSE 0 END) AS clothing
FROM sales
GROUP BY month;
Q26. How do you find duplicate rows in a table?
A: GROUP BY the columns that should be unique, then filter with HAVING COUNT(*) > 1.
SELECT email, COUNT(*) AS cnt
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;
Q27. How do you calculate month-over-month revenue change?
A: Use LAG() to retrieve the previous month’s revenue, then calculate the difference or percentage change.
SELECT month, revenue,
LAG(revenue, 1) OVER (ORDER BY month) AS prev_revenue,
ROUND((revenue - LAG(revenue,1) OVER (ORDER BY month))
* 100.0 / LAG(revenue,1) OVER (ORDER BY month), 2) AS pct_change
FROM monthly_revenue;
Q28. How do you calculate a 7-day rolling average?
A: Use AVG() with ROWS BETWEEN 6 PRECEDING AND CURRENT ROW inside an OVER() clause.
SELECT date, revenue,
AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS rolling_7d_avg
FROM daily_revenue;
Q29. How do you find customers who have not placed any orders?
A: LEFT JOIN customers to orders and filter for NULL on the orders join key.
SELECT c.customer_id, c.name
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.customer_id IS NULL;
Q30. How do you get the first and last order date per customer in one query?
A: Use MIN() and MAX() with GROUP BY.
SELECT customer_id,
MIN(order_date) AS first_order,
MAX(order_date) AS last_order
FROM orders
GROUP BY customer_id;
Category 4: Window Functions and CTEs – Advanced Analyst SQL
These questions separate candidates who can answer the question from candidates who can answer it elegantly. Window functions and CTEs are the most tested advanced concepts in senior analyst and analytics engineer interviews.
Q31. What is a window function and how is it different from GROUP BY?
A: A window function performs a calculation across a set of rows related to the current row without collapsing those rows into one. GROUP BY collapses rows. A window function keeps all rows and adds a computed column alongside them.
Q32. What is PARTITION BY and how does it differ from GROUP BY?
A: PARTITION BY divides rows into groups for window function calculations but does not collapse the rows. GROUP BY collapses rows into one per group. PARTITION BY is used inside OVER() with window functions — it cannot be used outside that context.
Q33. What is the difference between RANK(), DENSE_RANK(), and ROW_NUMBER()?
A: ROW_NUMBER() assigns a unique sequential number regardless of ties. RANK() assigns the same rank to ties but skips subsequent ranks (1, 1, 3). DENSE_RANK() assigns the same rank to ties without gaps (1, 1, 2).
SELECT name, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
RANK() OVER (ORDER BY salary DESC) AS rnk,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rnk
FROM employees;
Q34. How do you return the top N rows per group?
A: Use ROW_NUMBER() OVER (PARTITION BY group ORDER BY metric DESC) in a CTE or subquery, then filter WHERE row_num <= N in the outer query.
WITH ranked AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) AS rn
FROM products
)
SELECT * FROM ranked WHERE rn <= 3;
Q35. What is a CTE and why is it preferred over a subquery?
A: A CTE (Common Table Expression) is a named temporary result set defined with the WITH clause. It is preferred over subqueries because it is readable, reusable within the query, and easier to debug by isolating each step with a meaningful name.
Q36. How do you chain multiple CTEs?
A: Define additional CTEs after the first one, separated by commas, all under the same WITH clause.
WITH base AS (
SELECT * FROM orders WHERE status = 'completed'
),
agg AS (
SELECT customer_id, SUM(amount) AS total
FROM base GROUP BY customer_id
)
SELECT * FROM agg WHERE total > 1000;
Q37. What does LAG() do and when would you use it?
A: LAG(column, n) returns the value of column from n rows before the current row in the defined window. Used to compare a row’s value to a previous period’s value — for example, calculating month-over-month changes without a self-join.
Q38. What does LEAD() do?
A: LEAD(column, n) returns the value of column from n rows after the current row. Used to compare a current value to a future one — such as finding the next event date or the next transaction amount.
Q39. How do you calculate a cumulative distribution?
A: Use CUME_DIST() or PERCENT_RANK() window functions.
SELECT name, score,
CUME_DIST() OVER (ORDER BY score) AS cumulative_dist
FROM test_results;
Q40. How do you find the previous non-NULL value in a column?
A: In most SQL dialects, use LAST_VALUE() with a window frame, or use a conditional window function. In Spark SQL and some others, LAST(column, true) OVER (…) handles this. The general pattern is to assign a group number to consecutive NULLs using a conditional COUNT, then take MAX or FIRST value within each group.
Category 5: Optimisation, Design, and Scenario Questions
These questions appear in senior analyst, analytics engineer, and data engineering interviews. They test whether you understand why queries behave the way they do, not just how to write them.
Q41. What is a query execution plan and why should you read one?
A: An execution plan shows how the database engine plans to execute a query — which indexes it will use, how it will join tables, what the estimated cost of each step is. Reading it identifies slow steps such as full table scans, expensive hash joins, or missing index usage. Most databases show it with EXPLAIN or EXPLAIN ANALYZE.
Q42. What is an index and how does it affect query performance?
A: An index is a data structure that speeds up reads on a column by maintaining a sorted reference. Queries filtering or joining on indexed columns run faster. Indexes slow down writes (INSERT, UPDATE, DELETE) because the index must be maintained. Over-indexing is a real problem in write-heavy tables.
Q43. What is the difference between a clustered and a non-clustered index?
A: A clustered index determines the physical sort order of the table — there can be only one per table and it is usually the primary key. A non-clustered index is a separate structure that points back to the table rows. PostgreSQL uses ‘heap’ for the main table and creates all indexes as non-clustered by default.
Q44. What is query optimisation and what are the most common techniques?
A: Query optimisation reduces the time and resources a query consumes. Common techniques: filter early (apply WHERE before joining large tables), avoid SELECT *, use indexes on join and filter columns, use CTEs or temp tables to materialise intermediate results, avoid functions on indexed columns in WHERE clauses, and prefer EXISTS over IN for large subquery sets.
Q45. Why should you avoid using SELECT * in production queries?
A: SELECT * retrieves all columns, including ones not needed. This increases I/O, network transfer, and memory usage. It also breaks queries silently when table schemas change — columns reorder, new columns appear, or columns are dropped. Always specify the columns you need explicitly.
Q46. What is the difference between a view and a materialised view?
A: A view is a saved query that is executed at read time — it always reflects current data but adds execution overhead on every call. A materialised view stores the query result physically and must be refreshed periodically. Materialised views are faster to read but may not reflect the most current data.
Q47. How do you handle slowly changing dimensions (SCD) in SQL?
A: SCD refers to how you track historical changes to dimension data. SCD Type 1 overwrites the old value. SCD Type 2 inserts a new row with the new value and marks the old row as inactive with an effective date range. Type 2 is the most common in data warehouse design because it preserves history.
Q48. You have a query that runs in 45 seconds. The business needs it under 5 seconds. What do you do?
A: Start with EXPLAIN ANALYZE to read the execution plan. Identify the most expensive step. Check whether the join columns are indexed. Check whether filtering is happening before or after large joins. Look for full table scans. Consider materialising intermediate CTEs. Check whether the query can be simplified or the problem reframed to require less data.
Q49. What is a star schema and how does it differ from a snowflake schema?
A: A star schema has one central fact table connected directly to multiple dimension tables. A snowflake schema normalises dimension tables further dimensions connect to sub-dimensions. Star schemas are faster to query because they require fewer joins. Snowflake schemas are more storage-efficient but add query complexity.
Q50. How do you deduplicate a table keeping only the most recent record per entity?
A: Use ROW_NUMBER() partitioned by the entity identifier, ordered by the timestamp descending, then keep only row number 1.
WITH deduped AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY updated_at DESC) AS rn
FROM customers
)
SELECT * FROM deduped WHERE rn = 1;
How to Use This List for Interview Preparation
Use it as a self-assessment first. Read each question. Answer it out loud without looking at the answer. Then check. Mark every question where your answer was wrong, incomplete, or where you hesitated for more than ten seconds.
The questions you marked are your preparation list. Work through them in order within each category. For questions involving window functions and CTEs (categories 4 and 5), write the query on paper or in a SQL editor rather than just reading the answer. The act of writing it reinforces the syntax in a way that reading does not.
Read Also:
How to Build a 10 LPA Data Analyst Naukri Profile That Gets 5–6 Interview Calls Daily
Best 10 Job Portals for Freshers in 2026
5 Best Laptops Under ₹60,000 for Data Analysts in 2026
Job Notification Join us on Telegram: Click here
Job Notification Join us on WhatsApp: Click here