

SparkContext is available since Spark 1.x (JavaSparkContext for Java) and is used to be an entry point to Spark and PySpark before introducing SparkSession in 2.0. Val spark:SparkSession = SparkSession.builder() SparkSession will be created using SparkSession.builder() builder pattern. It’s object spark is default available in spark-shell.Ĭreating a SparkSession instance would be the first statement you would write to program with RDD, DataFrame and Dataset.


SparkSession introduced in version 2.0, It is an entry point to underlying Spark functionality in order to programmatically use Spark RDD, DataFrame and Dataset.

Spark Core is the main base library of the Spark which provides the abstraction of how distributed task dispatching, scheduling, basic I/O functionalities and etc.īefore getting your hands dirty on Spark programming, have your Development Environment Setup to run Spark Examples using IntelliJ IDEA SparkSession In this section of the Apache Spark Tutorial, you will learn different concepts of the Spark Core library with examples in Scala code. The history server is very helpful when you are doing Spark performance tuning to improve spark jobs where you can cross-check the previous application run with the current run. $SPARK_HOME/bin/spark-class.cmd .history.HistoryServerīy default History server listens at 18080 port and you can access it from browser using Spark History Serverīy clicking on each App ID, you will get the details of the application in Spark web UI. If you are running Spark on windows, you can start the history server by starting the below command.
#Scala http client mac#
Now, start spark history server on Linux or mac by running. before you start, first you need to set the below config on nf Spark History server, keep a log of all completed Spark application you submit by spark-submit, spark-shell. On Spark Web UI, you can see how the operations are executed.
#Scala http client download#
Winutils are different for each Hadoop version hence download the right version from spark-shell PATH=%PATH% C:\apps\spark-3.0.0-bin-hadoop2.7\binĭownload wunutils.exe file from winutils, and copy it to %SPARK_HOME%\bin folder.
#Scala http client driver#
When you run a Spark application, Spark Driver creates a context that is an entry point to your application, and all operations (transformations and actions) are executed on worker nodes, and the resources are managed by Cluster Manager. Apache Spark ArchitectureĪpache Spark works in a master-slave architecture where the master is called “Driver” and slaves are called “Workers”.
