Posts

Showing posts from June, 2024

SQL Questions - Ready for Interview

 1.   U se case for each of the functions Rank, Dense_Rank & Row_Number Row Number -  the  ROW_NUMBER()  function is used to assign a unique integer to every row that is returned by a query. Syntax: ROW_NUMBER() OVER( [PARTITION BY column_1, column_2, …] [ORDER BY column_3, column_4, …] ) Let’s analyze the above syntax: The set of rows on which the  ROW_NUMBER()  function operates is called a window. The  PARTITION BY  clause is used to divide the query set results.  The  ORDER BY  clause inside the  OVER  clause is used to set the order in which the query result will be displayed. Query Format: SELECT mammal_id, mammal_name, animal_id, ROW_NUMBER () OVER ( ORDER BY mammal_name ) FROM Mammals; Rank: The  RANK()  function assigns a rank to every row within a partition of a result set. For each partition, the rank of the first row is 1. The  RANK() ...

Spark Technique

Number of stage is dependent on wide transformations the number of stages = number of wide transformations + 1 Number of tasks = number of partitions Repartitions Vs Coalesce: Repartition can both increase and decrease the number of partitions in the RDD while coalesce can only decrease the partions Full shuffle is involved in case of partitions while in case of coalesce, full shuffle is not performed. Rather it reduces the partitions in much efficient manner say two partitions in one node will be combined together to form one partitions. when you have to increase the number of partitions you should use repartition. This will increase the parallelism