Amrish-Goel-Data-Mining-Projects

All the project work done during coursework of Data Mining during Masters at USC

HW 1

Familiarize with Spark RDD using python and scala

This task was assigned to get familiarized on how to use spark RDD and fetch various results of the queries. There are 3 tasks which are explained in Assignment_1/assignment_description.pdf

HW 2

SON Algorithm to find Frequent itemsets

Implemented SON algorithm both in python and scala using Apache Spark Framework to find frequent itemsets in two datasets (simulated + real) satisfying the time as well as threshold constraints.
Description of the tasks and how to run the code is given in Assignment_2/assignment_description.pdf

HW 3

LSH and Recommendation System to find similar products

This assignment consists of 2 parts

Part 1

Implemented Locality Sensitive Hashing(LSH) using both Cosine and Jaccard similarity measure.
Dataset used here is yelp dataset.

Part 2

Implemented collaborative-filtering recommendation systems (model-based, user-based, item-based) using Pearson correlation.
This task was also a part of 3 round competition project where we have to improve the performance and efficiency of our recommendation system and beat the improved baseline.

Description of the tasks and how to run the code is given in Assignment_3/assignment_description.pdf

HW 4

Girvan Newman algorithm for community detection

Explored the Spark GraphFrames library as well as Implemented Girvan Newman algorithm to detect communities in a graph which has widespread applications.
Description of the tasks and how to run the code is given in Assignment_4/assignment_description.pdf

HW 5

This assignment is to get familiarized with data streams and how to work with large data streams This assignment consists of 3 parts

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amrish-Goel-Data-Mining-Projects

HW 1

Familiarize with Spark RDD using python and scala

HW 2

SON Algorithm to find Frequent itemsets

HW 3

LSH and Recommendation System to find similar products

Part 1

Part 2

HW 4

Girvan Newman algorithm for community detection

HW 5

Part 1 Bloom Filtering

Part 2 Flajolet Martin

Part 3 Twitter Stream Analysis using Twitter API

About

Releases

Packages

Amrish-Goel/Data-Mining-Projects

Folders and files

Latest commit

History

Repository files navigation

Amrish-Goel-Data-Mining-Projects

HW 1

Familiarize with Spark RDD using python and scala

HW 2

SON Algorithm to find Frequent itemsets

HW 3

LSH and Recommendation System to find similar products

Part 1

Part 2

HW 4

Girvan Newman algorithm for community detection

HW 5

Part 1 Bloom Filtering

Part 2 Flajolet Martin

Part 3 Twitter Stream Analysis using Twitter API

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages