Skip to content

Latest commit

 

History

History
54 lines (42 loc) · 1.68 KB

course_description.md

File metadata and controls

54 lines (42 loc) · 1.68 KB

Big Data Modeling & Analytics

Course Description:

This course is about big data and its role in  carrying 
out modern business intelligence for actionable insight 
to address new business needs. This course is a lab-led 
and open source software rooted  course.  Students will 
learn the  fundamentals  of MapReduce, Spark framework, 
NoSQL databases, PySpark, and Amazon Athena. The  class 
will  focus  on  the storage, processing,  and analysis
aspects of  big data.  Students will use  Spark cluster 
and MapReduce fundamentals to solve big data  problems.

Course Concepts

The main focus of this class is to cover the following concepts:

  • Concepts of Big Data

    • Cluster Computing
    • Scale-up Architecture: Why or Why Not
    • Scale-out Architecture: Why or Why Not
    • Scale-out Architectures (using Hadoop, Spark, PySpark)
    • Fault Tolerance: How?
    • Data Replication: How?
  • Distributed Computing

    • Cluster Computing (Master and Worker Nodes)
    • Distributed and Parallel Algorithms
  • Distributed File Systems

    • Hadoop Distributed File System
    • Amazon S3
  • MapReduce

  • Spark

    • Apache Spark
    • Spark Cluster Computing
    • Use Spark, PySpark, and Python to teach MapReduce and distributed computing
    • Spark RDDs
    • Spark DataFrames
    • SQL for NoSQL Data, How?
  • Amazon Athena