Spark Read Modes & Details on Spark Session

 Spark processes the files in three modes:

  1. Permissive: This is the default read mode in spark. Say we are creating the dataframe in spark and we encountered a datatype mismatch. So, In case of datatype mismatch it will convert the value into NULL without impacting the rest of the results.  
  2. Drop Malformed: In case of spark mode, Any malformed records will be eliminated and rest of the records in proper shape will be processed. 
  3. Fail Fast: Errors out on encountering any malformed records. 

So, It is very important to choose the respective modes based on the business requirement.  

     Creation of Spark Session:

          - Spark Session acts as an entry point to the Spark Cluster. To run the

            code on Spark Cluster, a Spark Session has to be created.

          - In order to work with Higher Level APIs like Dataframes and Spark SQL,

            Spark Session has to be created to run the code across the cluster.

         - To work at RDD level, Spark Context is required.

         - Spark Session acts as an umbrella that encapsulates and unifies the

            different contexts like Spark Context, Hive Context, SQL Context…


Need for Spark Session when we already have Spark Context?

  • Spark Session encapsulates the different contexts like Spark,Hive,SQL

and allows these contexts to be accessed from a single session.

  • When there is a need to have more than one Spark Session for a single

application with their respective isolated environments.


Same Spark Context will be shared across multiple spark sessions

created.


 

Comments

Popular posts from this blog

SQL Questions - Ready for Interview

Spark Streaming