Aggregate functions

Overview

COUNT, SUM, AVG, MIN, and MAX are aggregate functions that can be used with or without the GROUP BY and HAVING clauses. When used without GROUP BY or HAVING, these functions perform the action that they are named after. For example, let’s look at our database from the joins page:

Orders
id	description	customer_id	value
1	Water bottle	1	15.00
2	Key chain	1	7.50
3	Computer	3	2000.00
4	Thumb drive	3	25.00
5	Notebook	4	9.00
6	Shampoo		5.00
7	Paper		4.00

Customers
id	first_name	last_name	email
1	Natalie	Wright	wright@example.com
2	Ana	Sousa	sousa@example.com
3	Ben	Schwartz	schwartz@example.com
4	Chen	Xi	xi@example.com
5	Frank	Zhang	zhang@example.com
6	Tianchi	Liu	liu@example.com
7	Jake	Jons	jons@example.com

To count the number of rows in a table, for example, the orders table, you could run the following query.

SELECT COUNT(*) FROM orders;

COUNT(*)
7

COUNT(*)

As you can see, by default the column header for the resulting row is COUNT(*). To rename the column, you can create an alias.

SELECT COUNT(*) AS 'number of results' FROM orders;

number of results
7

number of results

COUNT only returns the number of rows returned by the query. If the query only returns 4 rows, 4 will be the result of COUNT.

SELECT COUNT(*) AS 'number of results' FROM orders WHERE value > 10;

number of results
3

number of results

In COUNT(*), the * is a wildcard that returns all columns from the selected table. In our examples with the orders table, this is the same as running the following query:

SELECT id, description, customer_id, value FROM orders;

You can, of course, selected a specific column, however, if there are any NULL values for the selected column, they will not be counted.

SELECT COUNT(customer_id) FROM orders;

COUNT(customer_id)
5

COUNT(customer_id)

The rest of the functions work the same way. For example, SUM returns the sum of all of the values in the selected rows.

SELECT SUM(value) AS 'our sum' FROM orders WHERE value > 10;

our sum
2040.0

our sum

2040.0

MIN, MAX, and AVG return the minimum, maximum, and average, respectively.

SELECT MIN(value) AS minimum, MAX(value) AS maximum, AVG(value) AS avg FROM orders WHERE value > 10;

minimum	maximum	avg
15.0	2000.0	680.0

minimum

maximum

avg

15.0

2000.0

680.0

`GROUP BY`

The capabilities of the aggregate functions are not realized until used in conjunction with the GROUP BY clause. The GROUP BY clause goes after the WHERE clause (if WHERE clause is present). GROUP BY forces the SQL engine to calculate the aggregate function after the data has been grouped by one or more columns. For example, both of the following queries work. The first calculates the sum of the value of all orders by the category of the order. The second does the same thing, but excludes all rows where value ⇐ 10 before performing the operations.

SELECT category, SUM(value) FROM orders GROUP BY category;

category\|SUM(value)
beauty products\|5.0
electronics\|2025.0
misc\|22.5
office\|13.0

category|SUM(value)

beauty products|5.0

electronics|2025.0

misc|22.5

office|13.0

SELECT category, SUM(value) FROM orders WHERE value > 10 GROUP BY category;

category	SUM(value)
electronics	2025.0
misc	15.0

category	COUNT(*)
beauty products	1
electronics	2
misc	2
office	2

category	COUNT(*)	SUM(value)	MIN(value)	MAX(value)	AVG(value)
beauty products	1	5.0	5.0	5.0	5.0
electronics	2	2025.0	25.0	2000.0	1012.5
misc	2	22.5	7.5	15.0	11.25
office	2	13.0	4.0	9.0	6.5

`HAVING`

HAVING adds one more tool to the tool chest. HAVING is similar to the WHERE clause but instead of being applied to every row before grouping, it is applied on the aggregated data. A single query can have both a WHERE clause and a HAVING clause. The WHERE clause will filter the data before aggregation, and the HAVING clause will filter the data after aggregation. The following is an example that tries to demonstrate this slightly subtle difference.

In this example, the SUM of the value is calculated for each category, and afterwards, only results are included where the SUM of the value is greater than 10.

SELECT
	category,
	SUM(value)
FROM
	orders
GROUP BY
	category
HAVING
	value > 10;

The following are also equivalent:

SELECT
	category,
	SUM(value)
FROM
	orders
GROUP BY
	category
HAVING
	SUM(value) > 10;

SELECT
	category,
	SUM(value) AS 's_value'
FROM
	orders
GROUP BY
	category
HAVING
	s_value > 10;

category	sum(value)
electronics	2025.0
misc	22.5

category	sum(value)
electronics	2025.0
misc	15.0

Examples

All of the following examples are taken from the chinook database found at /anvil/projects/tdm/data/chinook/Chinook_Sqlite.sqlite

Find the average length (in seconds) of the tracks with genre `Pop`.

Click to see solution

SELECT
	AVG(Milliseconds / 1000.0) AS avg
FROM
	Track
WHERE
	genreId = 9;

avg
229.0341

avg

229.0341

Figure out what the longest track (in seconds) is for the "Bossa Nova" genre.

Click to see solution

SELECT
	GenreId
FROM
	Genre
WHERE
	Name = 'Bossa Nova';

GenreId
11

GenreId

SELECT
	*,
	MAX(Milliseconds / 1000.0) AS Seconds
FROM
	Track
WHERE
	genreId = 11;

TrackId	Name	AlbumId	MediaTypeId	GenreId	Composer	Milliseconds	Bytes	UnitPrice	Seconds
646	Samba Da Bênção	52	1	11	NA	409965	13490008	0.99	409.965

TrackId

Name

AlbumId

MediaTypeId

GenreId

Composer

Milliseconds

Bytes

UnitPrice

Seconds

646

Samba Da Bênção

409965

13490008

0.99

409.965

Get the average price per hour for Bossa Nova music (`genreId` of 11).

Click to see solution

SELECT
	AVG(UnitPrice / (Milliseconds / 1000.0 / 3600)) AS 'Price per Hour'
FROM
	Track
WHERE
	genreId = 11;

Price per Hour
17.7131539609759

Price per Hour

17.7131539609759

Get the average time (in seconds) for tracks by genre.

Click to see solution

SELECT
	genreId,
	AVG(Milliseconds / 1000.0) AS 'Average seconds per track'
FROM
	Track
GROUP BY
	genreId;

GenreId	Average seconds per track
1	283.9100
2	291.7554
3	309.7494
4	234.3538
5	134.6435
6	270.3598
7	232.8593
8	247.1778
9	229.0341
10	244.3709

You can take this 1 step further by getting the genre names. This requires using an inner join, however.

SELECT
	g.Name,
	track_time. 'Average seconds per track'
FROM
	Genre AS g
	INNER JOIN (
		SELECT
			genreId,
			AVG(Milliseconds / 1000.0) AS 'Average seconds per track'
		FROM
			Track
		GROUP BY
			genreId) AS track_time ON g.GenreId = track_time.GenreId
ORDER BY
	track_time. 'Average seconds per track' DESC;

Name	Average seconds per track
Sci Fi & Fantasy	2911.7830
Science Fiction	2625.5491
Drama	2575.2838
TV Shows	2145.0410
Comedy	1585.2637
Metal	309.7494
Electronica/Dance	302.9858
Heavy Metal	297.4529
Classical	293.8676
Jazz	291.7554

Figure out the average price per track for each genre.

Click to see solution

SELECT
	genreId,
	AVG(UnitPrice) AS 'Average price per track'
FROM
	Track
GROUP BY
	genreId;

GenreId	Average price per track
1	0.99
2	0.99
3	0.99
4	0.99
5	0.99
6	0.99
7	0.99
8	0.99
9	0.99
10	0.99

What is the average number of tracks per album?

Click to see solution

SELECT
	AVG(trackCount)
FROM (
	SELECT
		COUNT(*) AS trackCount
	FROM
		Track
	GROUP BY
		albumId) AS track_count;

AVG(trackCount)
10.0951

AVG(trackCount)

10.0951

Figure out the average number of tracks per album genre.

Click to see solution

SELECT
	genreId,
	AVG(trackCount)
FROM (
	SELECT
		genreId,
		COUNT(*) AS trackCount
	FROM
		Track
	GROUP BY
		albumId) AS track_count
GROUP BY
	genreId;

genreId	AVG(trackCount)
1	11.41379
2	10.00000
3	10.90625
4	14.43478
5	12.00000
6	13.85714
7	14.81579
8	15.00000
9	16.00000
10	10.75000

Of course, we can add the names of genres in, using joins.

SELECT
	Name,
	avg_track_count. 'Average Track Count'
FROM
	Genre AS g
	INNER JOIN (
		SELECT
			genreId,
			AVG(trackCount) AS 'Average Track Count'
		FROM (
			SELECT
				genreId,
				COUNT(*) AS trackCount
			FROM
				Track
			GROUP BY
				albumId) AS track_count
		GROUP BY
			genreId) AS avg_track_count ON g.GenreId = avg_track_count.genreId;

Name	Average Track Count
Rock	11.41379
Jazz	10.00000
Metal	10.90625
Alternative & Punk	14.43478
Rock And Roll	12.00000
Blues	13.85714
Latin	14.81579
Reggae	15.00000
Pop	16.00000
Soundtrack	10.75000

Aggregate functions

Overview

GROUP BY

HAVING

Examples

Find the average length (in seconds) of the tracks with genre Pop.

Figure out what the longest track (in seconds) is for the "Bossa Nova" genre.

Get the average price per hour for Bossa Nova music (genreId of 11).

Get the average time (in seconds) for tracks by genre.

Figure out the average price per track for each genre.

What is the average number of tracks per album?

Figure out the average number of tracks per album genre.

`GROUP BY`

`HAVING`

Find the average length (in seconds) of the tracks with genre `Pop`.

Get the average price per hour for Bossa Nova music (`genreId` of 11).