SQL Questions - Ready for Interview

June 26, 2024

1. Use case for each of the functions Rank, Dense_Rank & Row_Number

Row Number - the ROW_NUMBER() function is used to assign a unique integer to every row that is returned by a query.

Syntax:

ROW_NUMBER() OVER(
    [PARTITION BY column_1, column_2, …]
    [ORDER BY column_3, column_4, …]
)

Let’s analyze the above syntax:

The set of rows on which the ROW_NUMBER() function operates is called a window.
The PARTITION BY clause is used to divide the query set results.
The ORDER BY clause inside the OVER clause is used to set the order in which the query result will be displayed.

Query Format:

SELECT
    mammal_id,
    mammal_name,
    animal_id,
    ROW_NUMBER () OVER (
           ORDER BY mammal_name
        )
FROM
    Mammals;

Rank:

The RANK() function assigns a rank to every row within a partition of a result set.

For each partition, the rank of the first row is 1. The RANK() function adds the number of tied rows to the tied rank to calculate the rank of the next row, so the ranks may not be sequential. In addition, rows with the same values will get the same rank.

Syntax:

RANK() OVER ( [PARTITION BY partition_expression, ... ] ORDER BY sort_expression [ASC | DESC], ... )

In this syntax:

First, the PARTITION BY clause distributes rows of the result set into partitions to which the RANK() function is applied.
Then, the ORDER BY clause specifies the order of rows in each a partition to which the function is applied.

The RANK() function can be useful for creating top-N and bottom-N reports.

Assigns a unique number for each row starting with 1,except for rows that have duplicate values,in which case the same ranking is assigned and a gap appears in the sequence for each duplicate ranking.

Important Point: This function is very similar to the ROW_NUMBER() function. The only difference is that identical rows are marked with the same rank. Also, please note that if the function skipped a number while ranking (because of row similarity), that rank will be skipped.

Dense Rank:

When using DENSE_RANK, the same rules apply as we described for RANK(), with one difference - when similar rows are detected, the next rank in line isn't skipped.

Example:

1	`select` `name, DENSE_RANK() OVER(ORDER` `BY` `name)` `from` `fruits;`

Union vs Union All

Both UNION and UNION ALL combine the result of two or more tables.
The result set of UNION does not contain duplicate rows, while the result set of UNION ALL returns all the rows from both tables.
The execution time of UNION ALL is less than the execution time of UNION as it does not remove the duplicate rows.

Search This Blog

The Future of QA Engineering: Harnessing Spark Hadoop in Cloud Environments

SQL Questions - Ready for Interview

Example:

Comments

Post a Comment

Popular posts from this blog

Spark Debugging And performance Optimzation

Spark File Format

Basics of Spark