Skip to content

albert037037037/Massive-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NTHU 109Fall Massive Data Analysis

Homework 1 Matrix Multiplication

Write a MapReduce program in Pyspark to solve a Matrix Multiplication problem

  • Given two 500 * 500 matrices, output the multiplication result

Homework 2 Page Rank

Calculate the page rank for the given simulated network

  • Use below formula

Homework 3 Kmeans

Calculate the Kmeans and cluster the data by MapReduce on Pyspark

Homework 4 Finding Similar Articles

Given a set of BBCSports articles, Implement LSH using MapReduce to find out articles similarity.

  • Three Step
    1. Shingling
    2. Minhashing
    3. Locality-Sensitive Hashing (LSH)

Term Project Recommendation System

Implement Item-item Collaborative Filtering

  • Similarity: cosine similarity with subtract mean

About

NTHU CS Massive Data Analysis 109Fall

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published