All the project work done during coursework of Data Mining during Masters at USC
This task was assigned to get familiarized on how to use spark RDD and fetch various results of the queries. There are 3 tasks which are explained in Assignment_1/assignment_description.pdf
- Implemented SON algorithm both in python and scala using Apache Spark Framework to find frequent itemsets in two datasets (simulated + real) satisfying the time as well as threshold constraints.
- Description of the tasks and how to run the code is given in Assignment_2/assignment_description.pdf
This assignment consists of 2 parts
- Implemented Locality Sensitive Hashing(LSH) using both Cosine and Jaccard similarity measure.
- Dataset used here is yelp dataset.
- Implemented collaborative-filtering recommendation systems (model-based, user-based, item-based) using Pearson correlation.
- This task was also a part of 3 round competition project where we have to improve the performance and efficiency of our recommendation system and beat the improved baseline.
Description of the tasks and how to run the code is given in Assignment_3/assignment_description.pdf
- Explored the Spark GraphFrames library as well as Implemented Girvan Newman algorithm to detect communities in a graph which has widespread applications.
- Description of the tasks and how to run the code is given in Assignment_4/assignment_description.pdf
This assignment is to get familiarized with data streams and how to work with large data streams This assignment consists of 3 parts