Spark Technique
Number of stage is dependent on wide transformations
the number of stages = number of wide transformations + 1
Number of tasks = number of partitions
Repartitions Vs Coalesce:
- Repartition can both increase and decrease the number of partitions in the RDD while coalesce can only decrease the partions
- Full shuffle is involved in case of partitions while in case of coalesce, full shuffle is not performed. Rather it reduces the partitions in much efficient manner say two partitions in one node will be combined together to form one partitions.
- when you have to increase the number of partitions you should use repartition. This will increase the parallelism
Comments
Post a Comment