Spark Read Modes & Details on Spark Session

July 13, 2024

Spark processes the files in three modes:

Permissive: This is the default read mode in spark. Say we are creating the dataframe in spark and we encountered a datatype mismatch. So, In case of datatype mismatch it will convert the value into NULL without impacting the rest of the results.
Drop Malformed: In case of spark mode, Any malformed records will be eliminated and rest of the records in proper shape will be processed.
Fail Fast: Errors out on encountering any malformed records.

So, It is very important to choose the respective modes based on the business requirement.

Creation of Spark Session:

- Spark Session acts as an entry point to the Spark Cluster. To run the

code on Spark Cluster, a Spark Session has to be created.

- In order to work with Higher Level APIs like Dataframes and Spark SQL,

Spark Session has to be created to run the code across the cluster.

- To work at RDD level, Spark Context is required.

- Spark Session acts as an umbrella that encapsulates and unifies the

different contexts like Spark Context, Hive Context, SQL Context…

Need for Spark Session when we already have Spark Context?

Spark Session encapsulates the different contexts like Spark,Hive,SQL

and allows these contexts to be accessed from a single session.

When there is a need to have more than one Spark Session for a single

application with their respective isolated environments.

Same Spark Context will be shared across multiple spark sessions

created.

Search This Blog

The Future of QA Engineering: Harnessing Spark Hadoop in Cloud Environments

Spark Read Modes & Details on Spark Session

Spark processes the files in three modes:

Creation of Spark Session:

Need for Spark Session when we already have Spark Context?

Comments

Post a Comment

Popular posts from this blog

SQL Questions - Ready for Interview

Spark Streaming