Skip to content

All the project work done during coursework of Data Mining done during Masters at USC

Notifications You must be signed in to change notification settings

Amrish-Goel/Data-Mining-Projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

Amrish-Goel-Data-Mining-Projects

All the project work done during coursework of Data Mining during Masters at USC

HW 1

Familiarize with Spark RDD using python and scala

This task was assigned to get familiarized on how to use spark RDD and fetch various results of the queries. There are 3 tasks which are explained in Assignment_1/assignment_description.pdf

HW 2

SON Algorithm to find Frequent itemsets

  1. Implemented SON algorithm both in python and scala using Apache Spark Framework to find frequent itemsets in two datasets (simulated + real) satisfying the time as well as threshold constraints.
  2. Description of the tasks and how to run the code is given in Assignment_2/assignment_description.pdf

HW 3

LSH and Recommendation System to find similar products

This assignment consists of 2 parts

Part 1

  1. Implemented Locality Sensitive Hashing(LSH) using both Cosine and Jaccard similarity measure.
  2. Dataset used here is yelp dataset.

Part 2

  1. Implemented collaborative-filtering recommendation systems (model-based, user-based, item-based) using Pearson correlation.
  2. This task was also a part of 3 round competition project where we have to improve the performance and efficiency of our recommendation system and beat the improved baseline.

Description of the tasks and how to run the code is given in Assignment_3/assignment_description.pdf

HW 4

Girvan Newman algorithm for community detection

  1. Explored the Spark GraphFrames library as well as Implemented Girvan Newman algorithm to detect communities in a graph which has widespread applications.
  2. Description of the tasks and how to run the code is given in Assignment_4/assignment_description.pdf

HW 5

This assignment is to get familiarized with data streams and how to work with large data streams This assignment consists of 3 parts

Part 1 Bloom Filtering

Part 2 Flajolet Martin

Part 3 Twitter Stream Analysis using Twitter API

About

All the project work done during coursework of Data Mining done during Masters at USC

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published