This directory contains some examples of processing Divolte events with Spark using Python.
The examples are:
divolte_spark_example_notebook.ipynb
: An IPython notebook which demonstrates how to interactively process Divolte events using Spark.divolte_spark_example.py
: A standalone python script which can be submitted usingspark-submit
.
These have been tested with the Spark distribution included with CDH.
Our examples make use of a helper library that we provide in our Divolte Spark project. We use SBT for building this. Once you have it installed you can build the helper library:
% git clone https://github.com/divolte/divolte-spark.git
% cd divolte-spark
% sbt assembly
% DIVOLTE_SPARK_JAR="$PWD"/target/scala-*/divolte-spark-assembly-*.jar
To start the IPython notebook:
% DIVOLTE_SPARK_JAR="<PATH_TO_DIVOLTE_SPARK_JAR>"
% export IPYTHON=1
% export IPYTHON_OPTS="notebook"
% pyspark --jars "$DIVOLTE_SPARK_JAR" --driver-class-path "$DIVOLTE_SPARK_JAR"
You should set DIVOLTE_SPARK_JAR
to match the location of the helper library built
in the previous section.
If run locally, your browser should automatically open to the notebook. If not, open the URL displayed by IPython in your browser.
To execute the standalone example:
% DIVOLTE_SPARK_JAR="<PATH_TO_DIVOLTE_SPARK_JAR>"
% spark-submit --jars "$DIVOLTE_SPARK_JAR" --driver-class-path "$DIVOLTE_SPARK_JAR" divolte_spark_example.py DIVOLTE_LOG_PATH
As with the IPython notebook example you should set DIVOLTE_SPARK_JAR
to match
the location of where you built the helper library.