-
This course is about data warehousing and its role in carrying out modern business intelligence for actionable insight to address new business needs.
-
What is a data warehouse? A data warehouses is the central component of a modern data stack: a modern data stack is a combination of various software tools used to collect, process, and store data on a well-integrated cloud-based data platform.
-
Data warehouses have solved the problem of analyzing massive amounts of structured, semi-structured, and non-structured data and are cost-effective, performant and easy to use. Note that non-structured (such as images and log data) data can not be analyzed directly by SQL.
-
Data warehouses are the foundation for reporting, ad hoc analysis, business intelligence and machine learning, and enable collaboration among a diversity of users and stakeholders across organizations of all sizes.
-
This class will provide students with the conceptual background and hands on data analytics skills needed to utilize a data warehouse effectively. Throughout the course, students will work on an end-to-end development project, building a working data platform for New York City Transit Data. Using actual taxi, rideshare, bike share and weather data, students will answer real-world analytics questions, such as "How does location and time of day affect trip length?" and "How does weather affect transit preferences?". By the end, students will be empowered with the skills, tools and techniques needed to take a real-world data project from problem statement to prototype to production.
- Implement data ingestion techniques (ETL)
- Write SQL for data analytics, including time series
- Transform data using SQL and Big Data Analytics
- Compare modern and classic strategies of data modeling
- Understand data warehouse architecture
- Maintain data quality
- Create reports, analysis & visualizations
- Write OLAP queries