SQL Server Essentials: Using the SQL GROUP BY Clause

Programming-code

SQL Server’s T-SQL query language has a plethora of data retrieval options. That includes the ability to query a SQL database, as well as providing a sum function, a count aggregate function, and grouping columns. The SQL GROUP BY clause can make use of all of these capabilities. In this article, I’ll show you how the GROUP BY clause can be used by itself or in conjunction with one of the aggregate functions to perform calculations based on the grouping.

How does the SQL GROUP BY clause work?

The SQL GROUP BY clause is used with the SELECT statement to arrange data into groups of identical data. It follows clause follows the WHERE clause in a SELECT statement.  

This clause is most often used with aggregate functions like SUM, COUNT, AVG, MAX, or MIN, which allow you to perform calculations with grouped data. You can group the data by any of the different column types including numbers like product IDs, dates, or characters like descriptions.

The basic GROUP BY syntax is quite simple:

SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1;

Here, you can see the GROUP BY clause used with a basic SELECT statement. The SELECT statement retrieves two columns, and the second column uses an aggregation function.

The use of an aggregation function is not technically required, but it’s a very common adjunct to the GROUP BY clause. The grouping will be performed on the data contained in the first column.

Let’s dive in deeper and see the GROUP BY clause in action.

Using SQL GROUP BY with a SELECT statement

First, let’s look at just using the SQL GROUP BY clause with a simple SELECT statement without adding an aggregation function. The following listing retrieves and groups the ‘Product ID’ column from the SalesLT.SalesOrderDetail table in the AdventureWorksLT2019 sample database.

USE AdventureWorksLT2019;

SELECT ProductID FROM SalesLT.SalesOrderDetail
  GROUP BY ProductID;

This essentially returns a list of all of the ProductIDs that have been ordered, as you can see in the following figure.

SQL Group By clause example
We get a list of all of the ProductIDs that have been ordered (Image credit: Petri/Michael Otey)

Using SQL GROUP BY with aggregate functions

It’s more common to use the SQL GROUP BY clause along with one of the aggregation functions that can give you a count, total, average, or summary of all the columns that you have grouped together. The following statement uses GROUP BY with the SalesLT.Product table from the AdventureWorksLT2019 database to find the total number of products in each product category:

USE AdventureWorksLT2019;

SELECT ProductCategoryID, COUNT(*) AS TotalProducts
  FROM SalesLT.Product
   GROUP BY ProductCategoryID;

In this example, the SELECT statement returns the ‘ProductCategoryID’ column from the SalesLT.Product table, and it counts the number of products in each category using the COUNT(*) aggregate function. Here, the GROUP BY clause groups the results by the ‘ProductCategoryID’ while the SELECT COUNT function calculates the total number of products in each product category. The results of the COUNT(*) function use the alias ‘TotalProducts’.

The results of this query are shown in the following figure. This output shows the total number of products in each product category, as calculated using the GROUP BY clause. The summary information is returned by the COUNT(*) aggregate function.

Using SQL GROUP BY with aggregate functions
Using SQL GROUP BY with aggregate functions (Image credit: Petri/Michael Otey)

Using SQL GROUP BY with multiple columns

The SQL GROUP BY function is not limited to working with a single column. It can also be used with multiple columns simultaneously in the same query.

Let’s take a look at an example of using the GROUP BY clause with two columns from the SalesLT.Product table in the AdventureWorksLT2019 database. The following query shows how to retrieve the total number of products for each product category and product model.

USE AdventureWorksLT2019;

SELECT ProductCategoryID, ProductModelID, COUNT(*) AS TotalProducts
  FROM SalesLT.Product
  GROUP BY ProductCategoryID, ProductModelID;

Here, the ‘ProductCategoryID’ and ‘ProductModelID’ columns are both retrieved and we’re using the COUNT(*) aggregate function to count the number of products in each product category and subcategory. In this example, the GROUP BY clause groups the results of both the ‘ProductCategoryID’ and ‘ProductModelID’ columns. This creates two distinct groups based on the two different columns.

You can see the results of this query in the following figure. These results show the total number of products and product models and the total count is retrieved using the column alias of TotalProducts.

Using SQL GROUP BY with multiple columns
Using SQL GROUP BY with multiple columns (Image credit: Petri/Michael Otey)

Using SQL GROUP BY with the HAVING clause

Another common use of the GROUP BY clause is to combine it with the HAVING clause. Indeed, you can use the GROUP BY clause with the HAVING clause to filter the results of a query based on the conditions the HAVING clause applies to the data groupings. The condition can be applied to a regular data type column or to an aggregate function.

Let’s look at an example of using the SQL GROUP BY clause along with the HAVING clause with the AdventureWorksLT2019 database. This example shows which customers have placed orders in the SalesLT.SalesOrderHeader table with a total order amount greater than $2000. Here, the GROUP BY clause is used to group orders by customer, and then the HAVING clause filters out customers based on the total order amount.

USE AdventureWorksLT2019;

SELECT  c.CustomerID, c.FirstName, c.LastName, SUM(soh.TotalDue) AS TotalOrderAmount
  FROM SalesLT.Customer c
 JOIN SalesLT.SalesOrderHeader soh ON c.CustomerID = soh.CustomerID
  GROUP BY c.CustomerID, c.FirstName, c.LastName
 HAVING SUM(soh.TotalDue) > 2000;
  • In this code listing, the SalesLT.Customer table is joined with the SalesLT.SalesOrderHeader table on the ‘CustomerID’ column.
  • The SELECT list retrieves the ‘CustomerID’, ‘FirstName’, ‘LastName’, and ‘TotalOrderAmount’.
  • The GROUP BY clause is used to group the results by customer.
  • The SUM() aggregate function calculates the total order amount for each customer.
  • The HAVING clause filters the grouped results to only pick up customers whose total order amount is greater than $2000.

You can see the results of this code listing in the following figure:

SQLGroupBy4
Using SQL GROUP BY with the HAVING clause (Image credit: Petri/Michael Otey)

Using SQL GROUP BY with the ORDER BY clause

You can also use SQL GROUP BY clause with the ORDER BY clause to sort the result set by one or more columns. The following example uses the AdventureWorksLT2019 database. To illustrate the use of ORDER BY clause with the GROUP BY clause, you can simply take the previous example code and add the ORDER BY clause to sort the results displaying the largest order amounts in descending order.

USE AdventureWorksLT2019;

SELECT  c.CustomerID, c.FirstName, c.LastName, SUM(soh.TotalDue) AS TotalOrderAmount
  FROM SalesLT.Customer c
   JOIN SalesLT.SalesOrderHeader soh ON c.CustomerID = soh.CustomerID
  GROUP BY c.CustomerID, c.FirstName, c.LastName
   HAVING SUM(soh.TotalDue) > 2000
   ORDER BY TotalOrderAmount DESC;

You can see the results in the following figure. These results show the list of customers with orders over $2000 in descending order, with the customers having the largest orders at the top.

Using SQL GROUP BY with the ORDER BY clause
Using SQL GROUP BY with the ORDER BY clause (Image credit: Petri/Michael Otey)

As you can see, you can do quite a lot of things with the SQL GROUP BY clause. I showed you how the GROUP BY clause can be used with multiple columns, as well as how the HAVING and ORDER BY clauses can be added to filter and sort the results.

As usual, feel free to ask any questions about the SQL GROUP BY clause. And if you want to learn more about SQL Server, feel free to check out previous articles in my SQL Server Essential series.