Basics of Spark
SparkContext vs SparkSession While SparkContext is used for accessing Spark features through RDDs, SparkSession provides a single point of entry for DataFrames, streaming, or Hive features including HiveContext, SQLContext or Streaming Context. Also, with the introduction of Spark 2.0, SparkSession was introduced as a new entry point for Spark applications. Lazy Evaluation When you apply a transformation like sort to your data in Spark, it's crucial to understand that Spark operates under a principle known as Lazy Evaluation . This means that when you call a transformation, Spark doesn't immediately manipulate your data. Instead, what happens is that Spark queues up a series of transformations, building a plan for how it will eventually execute these operations across the cluster when necessary. Lazy evaluation in Spark ensures that transformations like sort don't trigger any immediate action on the data. The transformation is registered as a part of the ...