Name	Name	Last commit message	Last commit date
Latest commit prp Added missing dependencies Apr 24, 2019 7ce0009 · Apr 24, 2019 History 22,121 Commits
.github	.github	[SPARK-18073][DOCS][WIP] Migrate wiki to spark.apache.org web site	Nov 23, 2016
C	C	A little more restructuring to get dependencies right	May 10, 2018
FaultsLactecAPP	FaultsLactecAPP	Lactec data	Mar 15, 2018
R	R	Preparing development version 2.3.2-SNAPSHOT	Jun 1, 2018
assembly	assembly	* Bumping up project version to Spark: 2.3.2-SGX	Feb 12, 2019
bin	bin	[PYSPARK] Update py4j to version 0.10.7.	May 10, 2018
build	build	[SPARK-19810][BUILD][CORE] Remove support for Scala 2.10	Jul 13, 2017
common	common	* Bumping up project version to Spark: 2.3.2-SGX	Feb 12, 2019
conf	conf	Add Sgx-Spark default config files	Jun 14, 2018
core	core	Merge branch 'master' of lsds.doc.ic.ac.uk:sereca/sgx-spark	Feb 27, 2019
data	data	[SPARK-23205][ML] Update ImageSchema.readImages to correctly set alph…	Jan 26, 2018
dev	dev	[SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4	May 24, 2018
doc/slides/2017-06 SecureCloud Meeting	doc/slides/2017-06 SecureCloud Meeting	Add SecureCloud Spark slides	Jun 5, 2017
dockerfiles	dockerfiles	Added missing dependencies	Apr 24, 2019
docs	docs	Preparing development version 2.3.2-SNAPSHOT	Jun 1, 2018
examples	examples	Josh and PL: custom join	Mar 1, 2019
external	external	* Bumping up project version to Spark: 2.3.2-SGX	Feb 12, 2019
graphx	graphx	* Bumping up project version to Spark: 2.3.2-SGX	Feb 12, 2019
hadoop-2.6.5-src	hadoop-2.6.5-src	adding support for HDFS encryption of output in TextOutputFormat	Feb 26, 2019
hadoop-cloud	hadoop-cloud	* Bumping up project version to Spark: 2.3.2-SGX	Feb 12, 2019
launcher	launcher	* Bumping up project version to Spark: 2.3.2-SGX	Feb 12, 2019
licenses	licenses	[SPARK-19112][CORE] Support for ZStandard codec	Nov 1, 2017
lkl	lkl	add network and java certificates for SGX-LKL and the SecureCloud demo	Mar 1, 2019
mllib-local	mllib-local	* Bumping up project version to Spark: 2.3.2-SGX	Feb 12, 2019
mllib	mllib	* Bumping up project version to Spark: 2.3.2-SGX	Feb 12, 2019
phasor	phasor	add utfpr input data	Feb 13, 2019
project	project	[SPARK-23070] Bump previousSparkVersion in MimaBuild.scala to be 2.2.0	Jan 15, 2018
python	python	Preparing development version 2.3.2-SNAPSHOT	Jun 1, 2018
repl	repl	* Bumping up project version to Spark: 2.3.2-SGX	Feb 12, 2019
resource-managers	resource-managers	* Bumping up project version to Spark: 2.3.2-SGX	Feb 12, 2019
sbin	sbin	[PYSPARK] Update py4j to version 0.10.7.	May 10, 2018
securecloud-demo	securecloud-demo	Demo slides and instructions	Jun 13, 2018
sgx-spark-common	sgx-spark-common	* Moving SGX subproject into proper packages	Feb 20, 2019
sgx-spark-shm	sgx-spark-shm	* Moving SGX subproject into proper packages	Feb 20, 2019
sql	sql	* Important FIX	Feb 12, 2019
streaming	streaming	* Bumping up project version to Spark: 2.3.2-SGX	Feb 12, 2019
tools	tools	* Bumping up project version to Spark: 2.3.2-SGX	Feb 12, 2019
.gitattributes	.gitattributes	[SPARK-3870] EOL character enforcement	Oct 31, 2014
.gitignore	.gitignore	[SPARK-21485][SQL][DOCS] Spark SQL documentation generation for built…	Jul 26, 2017
.gitmodules	.gitmodules	sgx-lkl as submodule	Mar 15, 2018
.travis.yml	.travis.yml	[SPARK-18278][SCHEDULER] Spark on Kubernetes - Basic Scheduler Backend	Nov 29, 2017
CONTRIBUTING.md	CONTRIBUTING.md	[SPARK-18073][DOCS][WIP] Migrate wiki to spark.apache.org web site	Nov 23, 2016
LICENSE	LICENSE	[PYSPARK] Update py4j to version 0.10.7.	May 10, 2018
NOTICE	NOTICE	[SPARK-18278][SCHEDULER] Spark on Kubernetes - Basic Scheduler Backend	Nov 29, 2017
README-orig.md	README-orig.md	SGX readme up	Apr 4, 2019
README.md	README.md	SGX readme up	Apr 4, 2019
appveyor.yml	appveyor.yml	[SPARK-22817][R] Use fixed testthat version for SparkR tests in AppVeyor	Dec 17, 2017
chocolatecloud-kvs3.pem	chocolatecloud-kvs3.pem	need to add this key to Java certs (\'sudo keytool -import -alias kvs…	Feb 27, 2019
driver-enclave-nosgx.sh	driver-enclave-nosgx.sh	* Cleaning unused variables	Feb 20, 2019
driver-enclave.sh	driver-enclave.sh	* Cleaning unused variables	Feb 20, 2019
install-maven-dependencies.sh	install-maven-dependencies.sh	Spark partitioning: map works. Although wrong results are returned	Jun 29, 2017
kill_local_spark.sh	kill_local_spark.sh	adding scripts for the demo	Mar 1, 2019
master.sh	master.sh	FIX dependency path in the query/initialization scripts	Feb 21, 2019
pom.xml	pom.xml	* Moving SGX subproject into proper packages	Feb 20, 2019
scalastyle-config.xml	scalastyle-config.xml	adding style	Feb 20, 2019
securecloud_demo.txt	securecloud_demo.txt	add readme for the securecloud demo	Mar 4, 2019
start_utfpr_demo.sh	start_utfpr_demo.sh	scripts for the demo seem to work	Mar 1, 2019
submitdef.sh	submitdef.sh	* Cleaning unused variables	Feb 20, 2019
submitkmeans.sh	submitkmeans.sh	FIX dependency path in the query/initialization scripts	Feb 21, 2019
submitlactec1.sh	submitlactec1.sh	FIX dependency path in the query/initialization scripts	Feb 21, 2019
submitlactec2.sh	submitlactec2.sh	FIX dependency path in the query/initialization scripts	Feb 21, 2019
submitlinecount.sh	submitlinecount.sh	FIX dependency path in the query/initialization scripts	Feb 21, 2019
submitrandomforestclassification.sh	submitrandomforestclassification.sh	FIX dependency path in the query/initialization scripts	Feb 21, 2019
submitsimple.sh	submitsimple.sh	FIX dependency path in the query/initialization scripts	Feb 21, 2019
submitsparkpi.sh	submitsparkpi.sh	FIX dependency path in the query/initialization scripts	Feb 21, 2019
submitsparksql.sh	submitsparksql.sh	FIX dependency path in the query/initialization scripts	Feb 21, 2019
submitutfpr.sh	submitutfpr.sh	scripts for the demo seem to work	Mar 1, 2019
submitutfprfilemode.sh	submitutfprfilemode.sh	submitutfprfilemode.sh is to be used without SGX	Feb 26, 2019
submitwordcount.sh	submitwordcount.sh	FIX dependency path in the query/initialization scripts	Feb 21, 2019
variables.sh	variables.sh	* Cleaning unused variables	Feb 20, 2019
worker-enclave-nosgx.sh	worker-enclave-nosgx.sh	FIX dependency path in the query/initialization scripts	Feb 21, 2019
worker-enclave.sh	worker-enclave.sh	FIX dependency path in the query/initialization scripts	Feb 21, 2019
worker.sh	worker.sh	FIX dependency path in the query/initialization scripts	Feb 21, 2019

Repository files navigation

Sgx-Spark

This is Apache Spark with modifications to run security sensitive code inside Intel SGX enclaves. The implementation leverages sgx-lkl, a library OS that allows to run Java-based applications inside SGX enclaves.

Docker quick start

This guide shows how to run Sgx-Spark in a few simple steps using Docker. Most parts of the setup and deployment are wrapped within Docker containers. Compliation and deployment should thus be smooth.

Preparing the Sgx-Spark Docker environment

Clone this Sgx-Spark repository
Build the Sgx-Spark base image. The name of the resulting Docker image is sgxpsark. This process might take a while (30-60 mins):
```
  sgx-spark/dockerfiles$ docker build -t sgxspark .
```
Prepare the disk image that will be required by sgx-lkl. Due to restrictions of Docker, this step can currently not be implemented as part of the above Docker build process. Thus, this step is platform-dependent. The process has been successfully tested on Ubuntu 16.04 and Arch Linux:
```
  sgx-spark/lkl$ make prepare-image
```
Create a Docker network device that will be used for communication by the Docker containers. Note that by creating a user-defined network, Docker will create an embedded DNS server so that workers can find the Spark master by name.
```
  sgx-spark$ docker network create sgxsparknet
```

Running Sgx-Spark jobs using Docker

From within directory sgx-spark/dockerfiles, run the Sgx-Spark master node, the Sgx-Spark worker node, as well as the actual Sgx-Spark job as follows.

Run the Sgx-Spark master node:

  sgx-spark/dockerfiles$ docker run \
  --user user \
  --env-file $(pwd)/docker-env \
  --net sgxsparknet \
  --name sgxspark-docker-master \
  -p 7077:7077 \
  -p 8082:8082 \
  -ti sgxspark /sgx-spark/master.sh

Run the Sgx-Spark worker node:

  sgx-spark/dockerfiles$ docker run \
  --user user \
  --memory="4g" \
  --shm-size="8g" \
  --env-file $(pwd)/docker-env \
  --net sgxsparknet \
  --privileged \
  -v $(pwd)/../lkl:/spark-image:ro \
  -ti sgxspark /sgx-spark/worker-and-enclave.sh

Run the Sgx-Spark job as follows.

As of writing, the three jobs below are known to be fully supported:

WordCount

  sgx-spark/dockerfiles$ docker run \
  --user user \
  --memory="4g" \
  --shm-size="8g" \
  --env-file $(pwd)/docker-env \
  --net sgxsparknet \
  --privileged \
  -v $(pwd)/../lkl:/spark-image:ro \
  -e SPARK_JOB_CLASS=org.apache.spark.examples.MyWordCount \
  -e SPARK_JOB_NAME=WordCount \
  -e SPARK_JOB_ARG0=README.md \
  -e SPARK_JOB_ARG1=output \
  -ti sgxspark /sgx-spark/driver-and-enclave.sh

KMeans

  sgx-spark/dockerfiles$ docker run \
  --user user \
  --memory="4g" \
  --shm-size="8g" \
  --env-file $(pwd)/docker-env \
  --net sgxsparknet \
  --privileged \
  -v $(pwd)/../lkl:/spark-image:ro \
  -e SPARK_JOB_CLASS=org.apache.spark.examples.mllib.KMeansExample \
  -e SPARK_JOB_NAME=KMeans \
  -e SPARK_JOB_ARG0=data/mllib/kmeans_data.txt \
  -ti sgxspark /sgx-spark/driver-and-enclave.sh

LineCount

  sgx-spark/dockerfiles$ docker run \
  --user user \
  --memory="4g" \
  --shm-size="8g" \
  --env-file $(pwd)/docker-env \
  --net sgxsparknet \
  --privileged \
  -v $(pwd)/../lkl:/spark-image:ro \
  -e SPARK_JOB_CLASS=org.apache.spark.examples.LineCount \
  -e SPARK_JOB_NAME=LineCount \
  -e SPARK_JOB_ARG0=SgxREADME.md \
  -ti sgxspark /sgx-spark/driver-and-enclave.sh

Native compilation, installation and deployment

To run Sgx-Spark natively, proceed as detailed in the following.

Install package dependencies

Install all required dependencies. For Ubuntu 16.04, these can be installed as follows:

$ sudo apt-get update
$ sudo apt-get install -y --no-install-recommends scala libtool autoconf curl xutils-dev git build-essential maven openjdk-8-jdk ssh bc python autogen wget autotools-dev sudo automake

Compile and install Google Protocol Buffer 2.5.0

Hadoop, and thus Spark, depends on Google Protocol Buffers (GPB) in version 2.5.0:

Make sure to uninstall any other versions of GPB

Install GPB v2.5.0. Instructions for Ubuntu 16.04 are as follows:

  $ cd /tmp
  /tmp$ wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
  /tmp$ tar xvf protobuf-2.5.0.tar.gz
  /tmp$ cd protobuf-2.5.0
  /tmp/protobuf-2.5.0$ ./autogen.sh && ./configure && make && sudo make install
  /tmp/protobuf-2.5.0$ sudo apt-get install -y --no-install-recommends libprotoc-dev

Instructions for Arch Linux are available at https://stackoverflow.com/a/29799354/2273470.

Compile sgx-lkl

As Sgx-Spark uses sgx-lkl, the latter must have been downloaded and compiled successfully. As of writing (June 14, 2018), sgx-lkl should be compiled using branch cleanup-musl. Please follow the documentation of sgx-lkl and ensure that your installation of sgx-lkl executes simple Java applications successfully.

Compile Sgx-Spark

sgx-spark$ build/mvn -DskipTests package

As part of this compilation process, a modified Hadoop library has been compiled. Copy the Hadoop JAR file into the Sgx-Spark jars directory:
```
  sgx-spark$ cp hadoop-2.6.5-src/hadoop-common-project/hadoop-common/target/hadoop-common-2.6.5.jar assembly/target/scala-2.11/jars/
```
Sgx-Spark ships with a native C library (libringbuff.so) that enables shared-memory-based communication between two JVMs. Compile as follows:
```
  sgx-spark/C$ make install
```

Prepare the Sgx-Spark disk images that will be run using sgx-lkl

Adjust file spark-sgx/lkl/Makefile for your environment:

Variable SGX_LKL must point to your sgx-lkl directory (see Prerequisites).
Build the Sgx-Spark disk image required for sgx-lkl:
```
  sgx-spark/lkl$ make clean all
```

Run Sgx-Spark using sgx-lkl

Finally, we are ready to run (i) the Sgx-Spark master node, (ii) the Sgx-Spark worker node, (iii) the worker's enclave, (iv) the Sgx-Spark client, and (v) the client's enclave. In the following commands, replace: <hostname> with the master node's actual hostname; <sgx-lkl> with the path to your sgx-lkl installation.

Note: After running each example, make sure to (i) restart all processes, (ii) delete all shared memory files in /dev/shm.

If you run all the nodes locally, you need to add the following line to variables.sh:
```
  export SPARK_LOCAL_IP=127.0.0.1
```
Run the Master node
```
  sgx-spark$ ./master.sh
```
Run the Worker node
```
  sgx-spark$ ./worker.sh
```
Run the enclave for the Worker node
```
  sgx-spark$ ./worker-enclave.sh
```
Run the enclave for the driver program. This is the process that will output the job results!
```
  sgx-spark$ ./driver-enclave.sh
```
Finally, submit a Spark job. The result will be output in the process we started just before.
- WordCount
```
  sgx-spark$ ./submitwordcount.sh
```
- KMeans
```
  sgx-spark$ ./submitkmeans.sh
```
- LineCount
```
  sgx-spark$ ./submitlinecount.sh
```

Native execution of the same Spark installation

In order to run the above installation without SGX, start your environment as follows:

Start the Master node as above
Start the Worker node as above, but change environment variable SGX_ENABLED=true to SGX_ENABLED=false
Do not start the enclaves
Submit the Spark job as above, but change evironment variable SGX_ENABLED=true to SGX_ENABLED=false

Important developer notes

Code changes and recompilation

There are a few important things to keep in mind when developing Sgx-Spark:

Whenever you change parts of the code, obviously, you must recompile the Spark code
```
  sgx-spark$ mvn package -DskipTests
```
There have been (not clearly definable) situations in which the above command did not compile all of the changed files. In this case, issue:
```
  sgx-spark$ mvn clean package -DskipTests
```
After making changes to the Sgx-Spark code and after compiling the Java/Scala code (see above), you always need to rebuild the lkl image that will be used by sgx-lkl:
```
  sgx-spark/lkl$ make clean all
```

If you changed parts of the Hadoop code (in directory hadoop-2.6.5-src), you will also need to copy the resulting *jar file:

  sgx-spark$ cp hadoop-2.6.5-src/hadoop-common-project/hadoop-common/target/hadoop-common-2.6.5.jar assembly/target/scala-2.11/jars/

Lastly, do not forget to remove all related shared memory files in /dev/shm/ before running your next experiment!

Running without sgx-lkl

Development with sgx-lkl can be tedious. For development purposes, a special flag allows to run the enclave-side of Sgx-Spark in a regular JVM rather than on top of sgx-lkl. To make use of this feature, run the enclave JVMs using scripts worker-enclave-nosgx.sh and driver-enclave-nosgx.sh.

Under the hood, these scripts set environment variable DEBUG_IS_ENCLAVE_REAL=false (defaults to true) and provide the JVM with a value for environment variable SGXLKL_SHMEM_FILE. Note that the value of SGXLKL_SHMEM_FILE must be the same as the one provided for the corresponding Worker (worker.sh) and Driver (driver.sh).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sgx-Spark

Docker quick start

Preparing the Sgx-Spark Docker environment

Running Sgx-Spark jobs using Docker

Native compilation, installation and deployment

Install package dependencies

Compile and install Google Protocol Buffer 2.5.0

Compile sgx-lkl

Compile Sgx-Spark

Prepare the Sgx-Spark disk images that will be run using sgx-lkl

Run Sgx-Spark using sgx-lkl

Native execution of the same Spark installation

Important developer notes

Code changes and recompilation

Running without sgx-lkl

About

Releases

Packages

Contributors 1,166

Languages

License

lsds/sgx-spark

Folders and files

Latest commit

History

Repository files navigation

Sgx-Spark

Docker quick start

Preparing the Sgx-Spark Docker environment

Running Sgx-Spark jobs using Docker

Native compilation, installation and deployment

Install package dependencies

Compile and install Google Protocol Buffer 2.5.0

Compile sgx-lkl

Compile Sgx-Spark

Prepare the Sgx-Spark disk images that will be run using sgx-lkl

Run Sgx-Spark using sgx-lkl

Native execution of the same Spark installation

Important developer notes

Code changes and recompilation

Running without sgx-lkl

About

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 1,166

Languages

Packages