Aggregate functions

The GROUP BY clause
The HAVING clause

The aggregate functions – COUNT, SUM, AVG, MAX, MIN and LIST – don't handle NULL in the same way as ordinary functions and operators. Instead of returning NULL as soon as a NULL operand is encountered, they only take non-NULL fields into consideration while computing the outcome. That is, if you have this table:

MyTable
ID	Name	Amount
1	John	37
2	Jack	`NULL`
3	Jim	5
4	Joe	12
5	Josh	`NULL`

...the statement select sum(Amount) from MyTable returns 54, which is 37 + 5 + 12. Had all five fields been summed, the result would have been NULL. For AVG, the non-NULL fields are summed and the sum divided by the number of non-NULL fields.

There is one exception to this rule: COUNT(*) returns the count of all rows, even rows whose fields are all NULL. But COUNT(FieldName) behaves like the other aggregate functions in that it only counts rows where the specified field is not NULL.

Another thing worth knowing is that COUNT(*) and COUNT(FieldName) never return NULL: if there are no rows in the set, both functions return 0. COUNT(FieldName) also returns 0 if all FieldName fields in the set are NULL. The other aggregate functions return NULL in such cases. Be warned that SUM even returns NULL if used on an empty set, which is contrary to common logic (if there are no rows, the average, maximum and minimum are undefined, but the sum is known to be zero).

Now let's put all that knowledge in a table for your easy reference:

Table 7. Aggregate function results with different column states

Function	Results
Function	Empty set	All-null set or column	Other sets or columns
COUNT(*)	0	Total number of rows	Total number of rows
COUNT(Field)	0	0	Number of rows where Field is not `NULL`
MAX, MIN	`NULL`	`NULL`	Max or min value found in the column
SUM	`NULL`	`NULL`	Sum of non-`NULL` values in the column
AVG	`NULL`	`NULL`	Average of non-`NULL` values in the column. This equals SUM(Field) / COUNT(Field).^[a]
LIST^[b]	`NULL`	`NULL`	Comma-separated string concatenation of non-`NULL` values in the column
^[a]If Field is of an integer type, AVG is always rounded towards 0. For instance, 6 non-null INT records with a sum of -11 yield an average of -1, not -2. ^[b]LIST was added in Firebird 2.1

The GROUP BY clause

A GROUP BY clause doesn't change the aggregate function logic described above, except that it is now applied to each group individually rather than to the result set as a whole. Suppose you have a table Employee, with fields Dept and Salary which both allow NULLs, and you run this query:

SELECT Dept, SUM(Salary) FROM Employee GROUP BY Dept

The result may look like this (the row where Dept is <null> may be at the top or bottom, depending on your Firebird version):

DEPT                     SUM
====== =====================
<null>             219465.19
000                266643.00
100                155262.50
110                130442.81
115              13480000.00
120                   <null>
121                110000.00
123                390500.00

First notice that the people whose department is unknown (NULL) are grouped together, although you can't say that they have the same value in the Dept field. But the alternative would have been to give each of those records a “group” of their own. Not only would this possibly add a huge number of lines to the output, but it would also defeat the purpose of grouping: those lines wouldn't be aggregates, but simple “SELECT Dept, Salary” rows. So it makes sense to group the NULL depts by their state and the rest by their value.

Anyway, the Dept field is not what interests us most. What does the aggregate SUM column tell us? That all salaries are non-NULL, except in department 120? No. All we can say is that in every department except 120, there is at least one employee with a known salary in the database. Each department may contain NULL salaries; in dept. 120 all the salaries are NULL.

You can find out more by throwing in one or more COUNT() columns. For instance, if you want to know the number of NULL salaries in each group, add a column “COUNT(*) – COUNT(Salary)”.

Counting frequencies

A GROUP BY clause can be used to report the frequencies with which values occur in a table. In that case you use the same field name several times in the query statement. Let's say you have a table TT with a column A whose contents are { 3, 8, NULL, 6, 8, -1, NULL, 3, 1 }. To get a frequencies report, you could use:

SELECT A, COUNT(A) FROM TT GROUP BY A

which would give you this result:

           A        COUNT
============ ============
          -1            1
           1            1
           3            2
           6            1
           8            2
      <null>            0

Oops – something went wrong with the NULL count, but what? Remember that COUNT(FieldName) skips all NULL fields, so with COUNT(A) the count of the <null> group can only ever be 0. Reformulate your query like this:

SELECT A, COUNT(*) FROM TT GROUP BY A

and the correct value will be returned (in casu 2).

The HAVING clause

HAVING clauses can place extra restrictions on the output rows of an aggregate query – just like WHERE clauses do in record-by-record queries. A HAVING clause can impose conditions on any output column or combination of columns, aggregate or not.

As far as NULL is concerned, the following two facts are worth knowing (and hardly surprising, I would guess):

Rows for which the HAVING condition evaluates to NULL won't be included in the result set. (“Only true is good enough.”)
“HAVING <col> IS [NOT] NULL” is a legal and often useful condition, whether <col> is aggregate or not. (But if <col> is non-aggregate, you may save the engine some work by changing HAVING to WHERE and placing the condition before the “GROUP BY” clause. This goes for any condition on non-aggregate columns.)

For instance, adding the following clause to the example query from the “GROUP BY” paragraph:

...HAVING Dept IS NOT NULL

will prevent the first row from being output, whereas this one:

...HAVING SUM(Salary) IS NOT NULL

suppresses the sixth row (the one with Dept = 120).