Books     Data Engineering
Intermediate SQL

Aggregate functions

Aggregate functions as the name suggests enable aggregating rows some manners. SQL supports five basic aggregations: summation, minimum and maximum values, mean value, count of data points in a data set.

Summation - SUM

SUM aggregates the sum of a certain column over an entire data set. An entire data set may mean an entire table of a filtered portion thereof. For instance, given an orders table with the following row structure (id, occurred_at, total) populated as follows


1, '2018-03-01', 9.1
2, '2016-02-01', 7.2
3, '2017-06-01', 8.4
4, '2017-03-01', 12.3

The total sales over the lifetime of the business can be aggregated as follows


SELECT SUM(total) FROM orders;

which returns 37.0 because 9.1 + 7.2 + 8.4 + 12.3 = 37.0.

The total sales for the year 2017 could be found by using a WHERE clause and a date function called date_part which extracts a date part from a date.


SELECT SUM(total)
FROM orders
WHERE date_part('year', occurred_at) = 2017;

-- 20.7

Minimum and maximum values - MIN, MAX

MIN and MAX find the minimum and maximum values over an entire data set, respectively. In the same spirit of finding the total sales over the lifetime of the business in the SUM example above, the minimum and maximum values of order totals can be found as follows


SELECT
  MIN(total) AS minimum,
  MAX(total) AS maximum
FROM orders;

-- minimum  | maximum
-- ---------+---------
-- 7.2      | 12.3

For the minimum and maximum of orders in 2017, the following query could be used


SELECT
  MIN(total) AS minimum,
  MAX(total) AS maximum
FROM orders
WHERE date_part('year', occurred_at) = 2017;

-- minimum  | maximum
-- ---------+---------
-- 8.4      | 12.3

Averaging - AVG

AVG aggregates the mean value of a certain column over an entire data set. In same spirit of finding the total sales over the lifetime of the business, the average total for an order over the life time of the business can be found as follows


SELECT
AVG(total) AS average
FROM orders;

-- average
-- ---------
-- 9.25

For the average total of orders in 2017, the following query could be used


SELECT
AVG(total) AS average
FROM orders
WHERE date_part('year', occurred_at) = 2017;

-- average
-- ------------------
-- 10.3499999046326

Counting values - COUNT

COUNT counts the rows in a data set. To find the number of orders over the lifetime of business


SELECT
COUNT (*) AS num_orders
FROM orders;

-- num_orders
-- ------------
-- 4

Note that COUNT(*) counts a row if it at least contains a single non-NULL column. We could have used COUNT(1) or COUNT(id) to achieve the same result.


SELECT
  COUNT (1) AS num_orders
FROM orders
WHERE date_part('year', occurred_at) = 2017;
-- num_orders
-- ------------
-- 2