Big Data Modeling & Analytics

Course Description:

This course is about big data and its role in  carrying 
out modern business intelligence for actionable insight 
to address new business needs. This course is a lab-led 
and open source software rooted  course.  Students will 
learn the  fundamentals  of MapReduce, Spark framework, 
NoSQL databases, PySpark, and Amazon Athena. The  class 
will  focus  on  the storage, processing,  and analysis
aspects of  big data.  Students will use  Spark cluster 
and MapReduce fundamentals to solve big data  problems.

Course Concepts

The main focus of this class is to cover the following concepts:

Concepts of Big Data
- Cluster Computing
- Scale-up Architecture: Why or Why Not
- Scale-out Architecture: Why or Why Not
- Scale-out Architectures (using Hadoop, Spark, PySpark)
- Fault Tolerance: How?
- Data Replication: How?
Distributed Computing
- Cluster Computing (Master and Worker Nodes)
- Distributed and Parallel Algorithms
Distributed File Systems
- Hadoop Distributed File System
- Amazon S3
MapReduce
- MapReduce Paradigm
- MapReduce Algorithms
Spark
- Apache Spark
- Spark Cluster Computing
- Use Spark, PySpark, and Python to teach MapReduce and distributed computing
- Spark RDDs
- Spark DataFrames
- SQL for NoSQL Data, How?
Amazon Athena
- Serverless Architectures
- Amazon Athena
- Amazon Athena, S3, Data Partitioning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

course_description.md

course_description.md

Big Data Modeling & Analytics

Course Description:

Course Concepts

Files

course_description.md

Latest commit

History

course_description.md

File metadata and controls

Big Data Modeling & Analytics

Course Description:

Course Concepts