Last time we covered some basic keywords for getting started in SQL. That on its own isn’t much use because what kind of a database only has one table?
In this blog, we are going to make SQL come alive, using multiple tables. We’ll learn about joins and aggregate functions.
We use multiple tables because we want to:
- organize the data into groups that make sense
- avoid repeating data
- make changes in only one place, when we update data records
- have data that is easy to use and fast to process
The way that we connect tables together is by using a JOIN clause. Joins have two parts JOIN and ON. JOIN identifies the table that we want to add additional data from. ON specifies how to join the tables. It matches a column from each table and combines rows where the values in the columns are equal.
Continuing the example of our imaginary basketball database. Here’s an oversimplified schema or ERD (Entity Relationship Diagram)
SELECT player.last_name, game.points
ON game.player_id = player.id
ORDER BY game.points DESC;
The ON clause is written in the format table1.column1 = table2.column2
INNER JOIN: combines data that exists in both tables. This is the default JOIN in SQL.
LEFT JOIN — combines data that exists in both tables and all of the rows from the FROM table.
RIGHT JOIN — combines data that exists in both tables and all of the rows from the JOIN table. Note this join is rarely used as we could get the same effect by reversing the FROM and JOIN tables in a LEFT JOIN
OUTER JOIN — combines all data from both tables, matching rows when the is a match and placing NULL in places where there is no match.
Database tables will usually have a primary key. This is a column that is unique for every row in the table. The primary key is also usually the first column in the table. Foreign keys are the primary key appearing in another table. The rows of the foreign keys are not unique.
Databases may not need primary keys if tables are unrelated, or if it is a table using to join two tables with a many-to-many relationship. A primary key can also be a combination of columns. This is called a composite key.
We use the primary-foreign key relationship to JOIN tables.
When we JOIN tables together it can be easier to use an alias. Typing out the full table and column name quickly gets tiresome, and it takes up extra room. We can replace the full name with an alias by adding a space after the table or column name and then typing the alias — often just a single letter. You can also use the AS statement, but it is not necessary. The following lines have the same effect.
FROM table AS tFROM table t
Databases are great in combining and accumulating data. These methods are called aggregate functions. Here are the most common examples.
COUNT the number of rows in a column.
SUM the total of all the values in a column.
AVG average of all the values in a column.
NULL is a case to be aware of when using aggregations. NULL is not considered a value, it is different from a zero or a blank space. It is where data doesn’t exist. (To identify NULL we use IS NULL rather than = NULL.) COUNT(*) will include NULL, counting on individual columns counts rows where that column is not NULL.
Aggregate Functions aggregate across the entire table, but sometimes you only want to aggregate across part of the table. We can do this by using GROUP BY, which allows us to aggregate groups independently of each other.
Whenever we use aggregations in the SELECT statement, any column that is not aggregated must be included in the GROUP BY. GROUP BY always goes between WHERE and ORDER BY.
DISTINCT is used in SELECT statements, and it gives us unique rows for all columns in the SELECT statement. You only need to use DISTINCT one time in a SELECT statement. Note that using DISTINCT can slow your queries down.
The WHERE clause doesn’t allow you to filter on aggregate columns. If we want to filter after an aggregate, we need to use the HAVING clause.
# players who have scored over 1000 points for their team
SELECT t.name, p.last_name, SUM(g.points) points
FROM player p
JOIN game g
ON g.player_id = p.id
JOIN team t
ON t.player_id = p.id
GROUP BY t.name, p.last_name
HAVING SUM(g.points) > 1000
ORDER BY points DESC;
Here’s a summary of how to write everything we’ve seen so far.
- GROUP BY
- ORDER BY
In terms of execution order SELECT comes between HAVING and ORDER BY.
Putting it together | Intermediate SQL - Mode Analytics
Welcome to the Intermediate SQL Tutorial! If you skipped the Basic SQL Tutorial, you should take a quick peek at this…