Spark Read Modes & Details on Spark Session

 Spark processes the files in three modes:

  1. Permissive: This is the default read mode in spark. Say we are creating the dataframe in spark and we encountered a datatype mismatch. So, In case of datatype mismatch it will convert the value into NULL without impacting the rest of the results.  
  2. Drop Malformed: In case of spark mode, Any malformed records will be eliminated and rest of the records in proper shape will be processed. 
  3. Fail Fast: Errors out on encountering any malformed records. 

So, It is very important to choose the respective modes based on the business requirement.  

     Creation of Spark Session:

          - Spark Session acts as an entry point to the Spark Cluster. To run the

            code on Spark Cluster, a Spark Session has to be created.

          - In order to work with Higher Level APIs like Dataframes and Spark SQL,

            Spark Session has to be created to run the code across the cluster.

         - To work at RDD level, Spark Context is required.

         - Spark Session acts as an umbrella that encapsulates and unifies the

            different contexts like Spark Context, Hive Context, SQL Context…


Need for Spark Session when we already have Spark Context?

  • Spark Session encapsulates the different contexts like Spark,Hive,SQL

and allows these contexts to be accessed from a single session.

  • When there is a need to have more than one Spark Session for a single

application with their respective isolated environments.


Same Spark Context will be shared across multiple spark sessions

created.


 

Comments

Popular posts from this blog

Spark Debugging And performance Optimzation

Spark File Format

Spark Technique