Skip to content

pauri32/bigdataprojectgroup27

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark Practical Work First Semester 2021/2022

Group C27 - BIG DATA - UNIVERSIDAD POLITÉCNICA DE MADRID

Group Members:

  • CATALAN GRIS LUCIA
  • RODRIGUEZ INSERTE PAU
  • MAROUF DANIEL

About our Project

The objective of this work is to help students to put into practice the concepts learnt during the theory lessons, and to get proficiency in the use of Spark and other related Big Data technologies. In this exercise, the students are required to develop a Spark application that creates a machine learning model for a real-world problem, using real-world data: Predicting the arrival delay of commercial flights.

Data

Data Expo 2009: Airline on time data Put data into source/data/.

Installation

We recommend Docker because it works on all operating systems.

1. Docker

First install docker, Docker Installation. User Manual Docs.

Docker Build

Clone repository and build Docker: It takes around 3 minutes to build (the first time it may take longer as it needs to download some files).

git clone https://github.com/Maroufd/bigdataprojectgroup27
cd bigdataprojectgroup27
docker build -t spark_app .

Run the docker

docker run -v  $(pwd)/source:/job spark_app:latest /job/main.py --model "RegularizedLinearRegression"

Root Docker

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 86.6%
  • Dockerfile 12.3%
  • Shell 1.1%