Spark Read Modes & Details on Spark Session
Spark processes the files in three modes:
- Permissive: This is the default read mode in spark. Say we are creating the dataframe in spark and we encountered a datatype mismatch. So, In case of datatype mismatch it will convert the value into NULL without impacting the rest of the results.
- Drop Malformed: In case of spark mode, Any malformed records will be eliminated and rest of the records in proper shape will be processed.
- Fail Fast: Errors out on encountering any malformed records.
So, It is very important to choose the respective modes based on the business requirement.
Creation of Spark Session:
- Spark Session acts as an entry point to the Spark Cluster. To run the
code on Spark Cluster, a Spark Session has to be created.
- In order to work with Higher Level APIs like Dataframes and Spark SQL,
Spark Session has to be created to run the code across the cluster.
- To work at RDD level, Spark Context is required.
- Spark Session acts as an umbrella that encapsulates and unifies the
different contexts like Spark Context, Hive Context, SQL Context…
Need for Spark Session when we already have Spark Context?
- Spark Session encapsulates the different contexts like Spark,Hive,SQL
and allows these contexts to be accessed from a single session.
- When there is a need to have more than one Spark Session for a single
application with their respective isolated environments.
Same Spark Context will be shared across multiple spark sessions
created.
Comments
Post a Comment