Qualify

Snowflake QUALIFY function

In a SELECT statement, the QUALIFY clause filters the results of window functions.

QUALIFY does with window functions what HAVING does with aggregate functions and GROUP BY clauses.

In the execution order of a query, QUALIFY is therefore evaluated after window functions are computed.

QUALIFY <predicate>

The QUALIFY clause simplifies queries that require filtering on the result of window functions. Without QUALIFY, filtering requires nesting. 

The QUALIFY clause requires at least one window function to be specified in at least one of the following clauses of the SELECT statement:

  • The SELECT column list.
  • The filter predicate of the QUALIFY clause.
  • Expressions in the SELECT list, including window functions, can be referred to by the column alias defined in the SELECT list.
  • QUALIFY supports aggregates and subqueries in the predicate. For aggregates, the same rules as for the HAVING clause apply.
  • The word QUALIFY is a reserved word.
  • The Snowflake syntax for QUALIFY is not part of the ANSI standard.

The QUALIFY clause simplifies SQL queries by eliminating the need for nested subqueries when filtering results based on window functions.

Aggregate functions can be used in the predicate per the same rules as the HAVING clause. This includes filtering based on a window aggregate calculation.

For example:

         QUALIFY SUM(amount) OVER (PARTITION BY id) > 100

Filter based on window function in SELECT:

SELECT ...,
  ROW_NUMBER() OVER (PARTITION BY id) AS row_num 
FROM data
QUALIFY row_num BETWEEN 1 AND 5

Snowflake QUALIFY enables elegant filtering based on window functions without subqueries. Its syntax fits cleanly within standard SELECT statements.

Using QUALIFY instead of WHERE

select 
        salesperson_id, 
        region, 
        sales_amount, 
        sales_date,
        row_number() over (partition by sales_date order by sales_amount desc) sales_date_rank
from sales
qualify sales_date_rank = 1;

Order in which Snowflake process SELECT statement

It is important to note that any filtering in the from and where clause happen before any filtering in qualify. Also, any aggregation and aggregate filtering (group by and having) come before this as well.

EXAMPLE Expand source

-- use the ROW_NUMBER() function to return only the first row in each partition. Note how the window function can be used with QUALIFY (first example) or in the SELECT clause with a simple QUALIFY (second example)
SELECT i, p, o
    FROM qt
    QUALIFY ROW_NUMBER() OVER (PARTITION BY p ORDER BY o) = 1
    ;
SELECT i, p, o, ROW_NUMBER() OVER (PARTITION BY p ORDER BY o) AS row_num
    FROM qt
    QUALIFY row_num = 1
    ;

EXAMPLE – Duplicate records Expand source

select * 
from my_table
qualify count(*) over (partition by id) > 1
order by id;

EXAMPLE Expand source

--The QUALIFY clause can also be combined with aggregates and can have subqueries in the predicate. 
SELECT c2, SUM(c3) OVER (PARTITION BY c2) as r
  FROM t1
  WHERE c3 < 4
  GROUP BY c2, c3
  HAVING SUM(c1) > 3
  QUALIFY r IN (
                SELECT MIN(c1)
                  FROM test
                 GROUP BY c2
                 HAVING MIN(c1) > 3
               );

EXAMPLE – with and without QUALIFY Expand source

-- Without QUALIFY.    Use a nested query with a window function and an outer query that filters these results:
SELECT *
FROM (
  SELECT store_id, sales_date, total_sales,
         ROW_NUMBER() OVER (PARTITION BY store_id ORDER BY total_sales DESC) AS rank
  FROM store_sales
) AS subquery
WHERE rank <= 3;
-- With QUALIFY, you can incorporate this filtering directly in the main query, which simplifies the overall approach
SELECT  store_id, sales_date, total_sales
FROM    store_sales
QUALIFY ROW_NUMBER() OVER (PARTITION BY store_id ORDER BY total_sales DESC) <= 3;
-- OR if we move the window function to SELECT clause
SELECT  store_id, sales_date, total_sale
        ROW_NUMBER() OVER (PARTITION BY store_id ORDER BY total_sales DESC) as rank
FROM    store_sales
QUALIFY rank <= 3;

Third highest salary

WINDOWS functions

1. Aggregate Window Functions

Aggregate window functions compute a single result from a set of input values, similar to aggregate functions used with GROUP BY. However, they operate on a window of rows defined by the OVER() clause.

  • SUM: Calculates the total sum of a numeric column over a specified window.
  • AVG: Computes the average value of a numeric column over a specified window.
  • COUNT: Returns the number of rows in a specified window.

2. Ranking Window Functions

Ranking window functions assign a rank to each row within a partition of a result set, based on the values of specified columns.

  • ROW_NUMBER: Assigns a unique number to each row within a partition, starting from 1.
  • RANK: Assigns a rank to each row within a partition, with the same rank assigned to rows with equal values.
  • DENSE_RANK: Similar to RANK, but without gaps in the ranking sequence for rows with equal values.

3. Value Window Functions

Value window functions return a value for each row based on the values of other rows in the same window.

  • FIRST_VALUE: Returns the first value in a specified window of rows.
  • LAST_VALUE: Returns the last value in a specified window of rows.
  • LAG: Returns the value of a specified column from a preceding row in the window.

4. Statistical Window Functions

Statistical window functions perform statistical calculations over a specified window of rows.

  • STDDEV: Computes the standard deviation of a numeric column over a specified window.
  • VARIANCE: Calculates the variance of a numeric column over a specified window.
  • COVAR: Computes the covariance between two numeric columns over a specified window.

5. Percentile Window Functions

Percentile window functions calculate percentile values for a specified window of rows.

  • PERCENT_RANK: Computes the relative rank of a row within a partition as a percentage.
  • CUME_DIST: Calculates the cumulative distribution of a value within a partition.
  • NTILE: Divides rows in a partition into a specified number of groups and assigns a group number to each row.

6. Navigation Window Functions

Navigation window functions return a value from a specified row within a window.

  • LEAD: Returns the value of a specified column from a following row in the window.
  • NTH_VALUE: Returns the value of a specified column from the nth row in the window.
  • NTILE: Divides rows in a partition into a specified number of groups and assigns a group number to each row.

7. Custom Window Functions

Custom window functions allow users to define their own calculations and logic for window functions, providing flexibility and customization for specific data analysis tasks.

  • Custom Aggregates: Users can define custom aggregate functions to perform specific calculations over a window of rows.
  • Custom Rankings: Custom ranking functions can be defined to assign ranks based on user-defined criteria.
  • Custom Navigation: Users can create custom navigation functions to return values from specific rows within a window.

BEST PRACTICE

How to use the QUALIFY clause effectively in Snowflake?

Using the QUALIFY clause effectively in Snowflake requires an understanding of its syntax, structure, and best practices. By following a step-by-step approach, users can leverage the full power of the QUALIFY clause for complex data analysis tasks.

1. Understand the Execution Order

Before using the QUALIFY clause, it’s important to understand its place in the execution order of a query. QUALIFY is evaluated after window functions, so ensure that your window functions are correctly defined and computed before applying QUALIFY.

2. Define the Window Functions

To use QUALIFY, you must have at least one window function in your SELECT statement. Define the window functions you need for your analysis, ensuring that they are correctly specified with the appropriate PARTITION BY and ORDER BY clauses.

3. Specify the QUALIFY Predicate

After defining the window functions, specify the QUALIFY predicate to filter the results. The predicate can involve multiple window functions, aggregate functions, or ranking functions, allowing for complex filtering criteria.

4. Test and Validate the Query

Once the QUALIFY clause is specified, test and validate the query to ensure that it produces the desired results. Check for any syntax errors or logical issues, and make necessary adjustments to the window functions or QUALIFY predicate as needed.

5. Optimize for Performance

For large datasets, performance optimization is crucial. Consider indexing relevant columns, minimizing the number of window functions used, and optimizing the query structure to improve execution times and reduce resource usage.

6. Combine with Other SQL Clauses

QUALIFY can be combined with other SQL clauses, such as SELECT, WHERE, GROUP BY, and HAVING, to create more sophisticated queries. Leverage the full power of SQL by integrating QUALIFY into your existing workflows and data analysis tasks.