Getting Started With SQL Part 2

Andrew Jamieson
4 min readNov 23, 2020

Joining Tables

Last time we covered some basic keywords for getting started in SQL. That on its own isn’t much use because what kind of a database only has one table?

In this blog, we are going to make SQL come alive, using multiple tables. We’ll learn about joins and aggregate functions.

Multiple Tables

We use multiple tables because we want to:

  • organize the data into groups that make sense
  • avoid repeating data
  • make changes in only one place, when we update data records
  • have data that is easy to use and fast to process
Photo by Roman Kraft on Unsplash

JOIN

The way that we connect tables together is by using a JOIN clause. Joins have two parts JOIN and ON. JOIN identifies the table that we want to add additional data from. ON specifies how to join the tables. It matches a column from each table and combines rows where the values in the columns are equal.

Continuing the example of our imaginary basketball database. Here’s an oversimplified schema or ERD (Entity Relationship Diagram)

Basketball ERD
SELECT player.last_name, game.points
FROM player
JOIN game
ON game.player_id = player.id
ORDER BY game.points DESC;

The ON clause is written in the format table1.column1 = table2.column2

Join Types

INNER JOIN: combines data that exists in both tables. This is the default JOIN in SQL.

LEFT JOIN — combines data that exists in both tables and all of the rows from the FROM table.

RIGHT JOIN — combines data that exists in both tables and all of the rows from the JOIN table. Note this join is rarely used as we could get the same effect by reversing the FROM and JOIN tables in a LEFT JOIN

OUTER JOIN — combines all data from both tables, matching rows when the is a match and placing NULL in places where there is no match.

Photo by Everyday basics on Unsplash

Primary Key

Database tables will usually have a primary key. This is a column that is unique for every row in the table. The primary key is also usually the first column in the table. Foreign keys are the primary key appearing in another table. The rows of the foreign keys are not unique.

Databases may not need primary keys if tables are unrelated, or if it is a table using to join two tables with a many-to-many relationship. A primary key can also be a combination of columns. This is called a composite key.

We use the primary-foreign key relationship to JOIN tables.

Aliases

When we JOIN tables together it can be easier to use an alias. Typing out the full table and column name quickly gets tiresome, and it takes up extra room. We can replace the full name with an alias by adding a space after the table or column name and then typing the alias — often just a single letter. You can also use the AS statement, but it is not necessary. The following lines have the same effect.

FROM table AS tFROM table t
Photo by Alexandre Debiève on Unsplash

Aggregations

Databases are great in combining and accumulating data. These methods are called aggregate functions. Here are the most common examples.

COUNT the number of rows in a column.

SUM the total of all the values in a column.

MIN and MAX the lowest and highest values.

AVG average of all the values in a column.

NULL is a case to be aware of when using aggregations. NULL is not considered a value, it is different from a zero or a blank space. It is where data doesn’t exist. (To identify NULL we use IS NULL rather than = NULL.) COUNT(*) will include NULL, counting on individual columns counts rows where that column is not NULL.

GROUP BY

Aggregate Functions aggregate across the entire table, but sometimes you only want to aggregate across part of the table. We can do this by using GROUP BY, which allows us to aggregate groups independently of each other.

Whenever we use aggregations in the SELECT statement, any column that is not aggregated must be included in the GROUP BY. GROUP BY always goes between WHERE and ORDER BY.

DISTINCT

DISTINCT is used in SELECT statements, and it gives us unique rows for all columns in the SELECT statement. You only need to use DISTINCT one time in a SELECT statement. Note that using DISTINCT can slow your queries down.

HAVING

The WHERE clause doesn’t allow you to filter on aggregate columns. If we want to filter after an aggregate, we need to use the HAVING clause.

# players who have scored over 1000 points for their team
SELECT t.name, p.last_name, SUM(g.points) points
FROM player p
JOIN game g
ON g.player_id = p.id
JOIN team t
ON t.player_id = p.id
GROUP BY t.name, p.last_name
HAVING SUM(g.points) > 1000
ORDER BY points DESC;

Query Order

Here’s a summary of how to write everything we’ve seen so far.

  1. SELECT
  2. FROM
  3. JOIN
  4. WHERE
  5. GROUP BY
  6. HAVING
  7. ORDER BY
  8. LIMIT

In terms of execution order SELECT comes between HAVING and ORDER BY.

References:

--

--

Andrew Jamieson

Andrew has an analytics background with over 20 years of experience in various industries, working with world-leading brands.