How to install apache spark on linux

HOW TO INSTALL APACHE SPARK ON LINUX UPDATE
HOW TO INSTALL APACHE SPARK ON LINUX DOWNLOAD

Open the PySpark shell by running the pyspark command from any directory. You will need to do this if you wish to persist the SPARK_HOME variable beyond the current session. bashrc file or similar user or system profile scripts. The SPARK_HOME environment variable could also be set using the.

HOW TO INSTALL APACHE SPARK ON LINUX UPDATE

Alternatively, you could use the Oracle JDK instead: sudo apt-get update

If Java 1.7 or higher is not installed, install the Java 1.7 runtime and development environments using Ubuntu’s APT (Advanced Packaging Tool).

7 is already installed with the operating system, so we do not need to install Python. In this example, I’m installing Spark on an Ubuntu 14.04 LTS Linux distribution.Īs with the Red Hat example, Python 2. Try it Yourself: Install Spark on Ubuntu/Debian Linux Note, this is an estimator program, so the actual result may vary: Pi is roughly 3.140576 If the installation was successful, you should see something similar to the following result (omitting the informational log messages). Run the included Pi Estimator example by executing the following command: spark-submit -class .SparkPi -master local $SPARK_HOME/lib/spark-examples*.jar 10 You should see a similar result by running the spark-shell command from any directory. SparkContext available as sc, HiveContext available as sqlContext. If Spark has been successfully installed, you should see the following output (with informational logging messages omitted for brevity): Welcome to Open the PySpark shell by running the pyspark command from any directory (as you’ve added the Spark bin directory to the PATH). You need to do this if you wish to persist the SPARK_HOME variable beyond the current session. Sudo mv spark-1.5.2-bin-hadoop2.6 /opt/spark OpenJDK 64-Bit Server VM (build 24.91-b01, mixed mode)Įxtract the Spark package and create SPARK_HOME: tar -xzf spark-1.5.2-bin-hadoop2.6.tgz If Java 1.7 or higher is not installed, install the Java 1.7 runtime and development environments using the OpenJDK yum packages (alternatively, you could use the Oracle JDK instead): sudo yum install java-1.7.0-openjdk java-1.7.0-openjdk-develĬonfirm Java was successfully installed: $ java -version

HOW TO INSTALL APACHE SPARK ON LINUX DOWNLOAD

However, the same installation steps would apply to Centos distributions as well.Īs shown in Figure 3.1, download the spark-1.5.2-bin-hadoop2.6.tgz package from your local mirror into your home directory using wget or curl. In this example, I’m installing Spark on a Red Hat Enterprise Linux 7.1 instance. Try it Yourself: Install Spark on Red Hat/Centos