DataBricks Series - SQL Fundamentals
SQL (Structured Query Language) is a standardized programming language used to manage, query, and
manipulate relational databases. It allows users to retrieve, update, insert, and delete data efficiently.
If you're learning SQL, i recommending focusing on following topics:
- SELECT + WHERE + JOIN (most used)
- GROUP BY + HAVING (summarizing data)
- ORDER BY + LIMIT (controlling results)
- CTEs + Subqueries (for complex logic)
- Window Functions (advanced analytics)
- Indexes (for performance tuning)
We will be focusing on most used concepts - which upon mastery will allow you to accomplish 80% of the
- Basic Querying (SELECT, FROM, WHERE)
Lets assume we are tasked with loading data from students table, which have achieved grades > 85.
Select *
FROM students
WHERE grade >= 85;In this query Select * tells the SQL engine to load all available columns in the table students and apply the filter to the grade column.
- Filtering Data (WHERE, AND, OR, IN, LIKE, BETWEEN)
SELECT * FROM students
WHERE grade BETWEEN 80 AND 90
AND department IN ('Computer Science', 'Math');- Sorting and Limiting Results (ORDER BY, LIMIT)
SELECT * FROM students
ORDER BY grade DESC
LIMIT 10;- Aggregation (COUNT, SUM, AVG, MIN, MAX, GROUP BY, HAVING)
SELECT department, AVG(grade) AS avg_grade
FROM students
GROUP BY department
HAVING AVG(grade) > 80;Here we can replace AVG by any of the following:
- COUNT (count rows)
- SUM (total sum of values)
- AVG (average value)
- MIN/MAX (smallest/largest value)
- GROUP BY (aggregates by category)
- HAVING (filters aggregated results)
5. Joins (INNER, LEFT, RIGHT, FULL)
SELECT s.first_name, s.last_name, c.course_name
FROM students s
INNER JOIN enrollments e ON s.student_id = e.student_id
INNER JOIN courses c ON e.course_id = c.course_id;- Subqueries (Nested Queries)
SELECT first_name, last_name
FROM students
WHERE student_id IN (
SELECT student_id FROM enrollments WHERE course_id = 101
);- Common Table Expressions (CTEs)
WITH top_students AS (
SELECT student_id, grade FROM students WHERE grade > 90
)
SELECT s.first_name, s.last_name, t.grade
FROM top_students t
JOIN students s ON s.student_id = t.student_id;WITH (defines temporary query results)
- Window Functions (OVER, PARTITION BY)
SELECT student_id, grade,
FROM enrollments;
RANK() OVER (PARTITION BY course_id ORDER BY grade DESC) AS rankConcepts to know:
- OVER (defines how results are computed)
- PARTITION BY (grouping for window calculations)
- RANK(), DENSE_RANK(), ROW_NUMBER() (ranking functions)
- Updating Data (INSERT, UPDATE, DELETE)
UPDATE students
SET grade = 95
WHERE student_id = 101;
INSERT INTO students (first_name, last_name, grade)
VALUES ('John', 'Doe', 88);
DELETE FROM students WHERE grade < 50;Extensions that we need to know that are super important.
- INSERT INTO (adds new records)
- UPDATE (modifies existing records)
- DELETE (removes records)
Note: Always use WHERE with UPDATE and DELETE to prevent accidental changes!
- Indexing for Performance (INDEX, EXPLAIN)
CREATE INDEX idx_students ON students (last_name);Concepts to know:
- CREATE INDEX (speeds up searches)
- EXPLAIN (analyzes query execution plan)