docker start bdu_spark docker attach bdu_spark start spark: $SPARK_HOME/bin/spark-shell

Scala

Open up a docker terminal.
Create a new subdirectory /home/virtuser/SparkPi:

mkdir -p /home/virtuser/SparkPi

Under the SparkPi directory, set up the typical directory structure for your application. Once that is in place and you have your application code written, you can package it up into a JAR using sbt and run it using spark-submit.

mkdir -p /home/virtuser/SparkPi/src/main/scala

The SparkPi.scala file will be under src/main/scala/ directory. Change to the scala directory and create this file:

cat > SparkPi.scala

At this point, copy and paste the contents here into the newly created file:

see src

To quit out of the file, type CTRL + D
Remember, you can have any business logic you need for your application in your scala class. This is just a sample class. Let's spend a few moments analyzing the content of SparkPi.scala. Type in the following to view the content:

more SparkPi.scala

8.- 14. siehe src code

At this point, you have completed the SparkPi.scala class. The application depends on the Spark API, so you will also include a sbt configuration file, SparkPi.sbt. This file adds a repository that Spark depends on. Change to the home directory of the SparkPi folder: cd ../../..
and create this file.

cat > sparkpi.sbt

Copy and paste this into the sparkpi.sbt file:

name := "SparkPi Project" version := "1.0" scalaVersion := "2.10.4" libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.1"

NOTE: This is the result of which spark: which spark-shell /opt/ibm/spark-1.4.0-bin-hadoop2.6/bin/spark-shell so I will try with that Scala version is 2.10.4

name := "SparkPi Project" version := "1.0" scalaVersion := "2.10.4" libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.0"

Now your folder structure under SparkPi should look like this:

./sparkpi.sbt ./src ./src/main ./src/main/scala ./src/main/scala/SparkPi.scala

While in the top directory of the SparkPi application, run the sbt tool to create the JAR file:

sbt package

It will take a long time to create the package initially because of all the dependencies. Step out for a cup of coffee or tea or grab a snack.

Note: You may need to return back into the bash and start the Hadoop service. Use these commands docker start bdu_spark docker attach bdu_spark /etc/bootstrap.sh

Make sure you are in your SparkPi directory.

cd /home/virtuser/SparkPi

1.3 Creating a Spark application using Python

submit: Does not work

$SPARK_HOME/bin/spark-submit
--class "SparkPi"
--master local[4]
target/scala-2.10/sparkpi-project_2.10-1.0.jar

For the Python example, you are going to create Python application to calculate Pi. Running Python application is actually quite simple. For applications that use custom classes or thirdparty libraries, you would add the dependencies to the spark-submit through its –py=files argument by packing them in a .zip file.

Create a PythonPi directory under /home/virtuser.

mkdir /home/virtuser/PythonPi

Change into the new PythonPi directory,
Create a Python file. Type in: cat > PythonPi.py
In PythonPi.py, paste these lines of code:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lesson3.md

Lesson3.md

Scala

1.3 Creating a Spark application using Python

Files

Lesson3.md

Latest commit

History

Lesson3.md

File metadata and controls

Scala

1.3 Creating a Spark application using Python