- CATALAN GRIS LUCIA
- RODRIGUEZ INSERTE PAU
- MAROUF DANIEL
The objective of this work is to help students to put into practice the concepts learnt during the theory lessons, and to get proficiency in the use of Spark and other related Big Data technologies. In this exercise, the students are required to develop a Spark application that creates a machine learning model for a real-world problem, using real-world data: Predicting the arrival delay of commercial flights.
Data Expo 2009: Airline on time data Put data into source/data/.
We recommend Docker because it works on all operating systems.
First install docker, Docker Installation. User Manual Docs.
Clone repository and build Docker: It takes around 3 minutes to build (the first time it may take longer as it needs to download some files).
git clone https://github.com/Maroufd/bigdataprojectgroup27
cd bigdataprojectgroup27
docker build -t spark_app .
docker run -v $(pwd)/source:/job spark_app:latest /job/main.py --model "RegularizedLinearRegression"