Skip to content

This is a project I made using pyspark with jupyter notebook for my subject "Data Analytics" in my 6th semester of college

Notifications You must be signed in to change notification settings

Vasanthagokul/pyspark-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pyspark-project

  • This is a project I made using pyspark with jupyter notebook for my subject "Data Analytics" in my 6th semester of college
  • This project consists of the following steps
    • Initializing connection with pyspark
    • Data loading and preprocessing
    • Data Cleaning
    • Data Normalization
    • Splitting the data
  • The notebook has two datasets
    • One was used for the clustering application (k-means)
    • The other dataset was used for random forest classification

This entire project was done in Apache Spark using its python wrapper class 'pyspark'

Plese use git lfs clone to clone the repo as the datasets might not be able to be accessed from a simple zip download Apache Spark 3.1.1 , python 3.8.3

About

This is a project I made using pyspark with jupyter notebook for my subject "Data Analytics" in my 6th semester of college

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published