Skip to content

Latest commit

 

History

History
10 lines (9 loc) · 1.01 KB

README.md

File metadata and controls

10 lines (9 loc) · 1.01 KB

PySpark_Tutorials

This repo contains code for practising PySpark.

Contents
1. rdd_.ipynb -> This notebook contains basics of RDDs.
2. Pyspark_Intro.ipynb -> This notebook contains code for creating RDDs, PySpark DataFrame from the RDDs, and Pandas DataFrame from PySpark DataFrame.
3. Working_with_Hive_and_PySpark_in_Google_Cloud_Dataproc.ipynb -> This notebook explains how to save PySpark DataFrame in Hive Tables and how to run all these codes on Google Cloud Dataproc.
4. PySpark_Advanced.ipynb -> This notebook delves deep into DataFrames, dealing with different type of data, Spark SQL and some advanced concepts in RDDs.
5. Algoscale_Assignment.ipynb -> This notebook contains solution of the Take Home Assignment Round of the interview for Data Engineer position at AlgoScale.
6. AlgoScale_Interview_Problems.ipynb -> This notebook contains solution of the Technical Round of the interview for Data Engineer position at AlgoScale.