Write a MapReduce program in Pyspark to solve a Matrix Multiplication problem
- Given two 500 * 500 matrices, output the multiplication result
Calculate the page rank for the given simulated network
Calculate the Kmeans and cluster the data by MapReduce on Pyspark
Given a set of BBCSports articles, Implement LSH using MapReduce to find out articles similarity.
- Three Step
- Shingling
- Minhashing
- Locality-Sensitive Hashing (LSH)
Implement Item-item Collaborative Filtering
- Similarity: cosine similarity with subtract mean