Apache Spark is an open source big data processing tool which is becoming very popular in IT world because of its superfast speed of processing huge data. It is a hot skill and you should learn it as soon as possible if you want to make a career in the Big data World.
In this tutorial, I ‘ll tell you to install Spark and Scala also on your Windows computer so that you can practice on your personal computer as everyone doesn’t have access to the real-world cluster. It doesn’t matter which version of windows you are using either Windows 7 or Windows 10, it will work. Installation is almost similar to Linux or Mac environment. Just follow these few steps.
Install Spark On Windows
Step1: Download Spark
Download spark from http://spark.apache.org/downloads.html. Download the Spark 2.2 version as 2.3 is not stable yet and other tools which I am going to tell you in this tutorial supports Spark 2.2.
Select pre-built for Apache Hadoop 2.7 and later in package type as later we will install pseudo-Hadoop as we need to make food our spark tool that Hadoop is present in our system.
Download the .tgz file.
After downloading the .tgz file, extract it on your computer and copy somewhere in the C folder with the folder name spark
Step2: Download Winutils.exe
Download the winutils.exe file from https://sundog-spark.s3.amazonaws.com/winutils.exe. Yes, it is an executable file and I promise it doesn’t have any virus in it. If you want, you can install it from other resources available on the internet. Just make sure it matches the spark version.
Put it into your c drive with a folder winutils and subfolder name bin.
Step 3: Download Java
You must have to ensure that you have java installed on your computer and make sure it is Java 1.8 version, not Java 1.9. It is also to protect version mismatch. If you don’t have java installed you can install it from here
http://www.oracle.com/technetwork/java/javase/downloads/index.html use java 1.8
Step 4: Set Environment Variable
Set the path of Spark, Java and Hadoop in your environment variable dialog box.
- Press Windows button.
- Type environment variables in the search box.
- Click on environment variables button
- Click on new button is user variables box.
- Set variable path for hadoop_home and give the path where you put winutils.exe
- Similarly, do it for Spark_Home and give the path where you have installed Spark in your C drive
- Do it same for Java and give name Java_Home(if you haven’t done it before)
- Now select path button and edit it.
- Add %JAVA_HOME%\bin and %Spark_Home%\bin here
- Save OK.
Step5: Run Spark
Open a command prompt and go to the directory where the spark is installed and type spark-shell. You should see something like this on your screen.
Bingo you are good to go. You can type all your scala commands here.
Let me know in comments if you face any issue.