This is material for developers who have some experience with Go and statistics and want to learn how to work with data to make better decisions. We believe these classes are perfect for data analysts/scientists/engineers interested in working in Go or Go programmers interested in doing data analysis.
Note: This material has been designed to be taught in a classroom environment. The code is well commented but missing some of the contextual concepts and ideas that will be covered in class.
This material introduces the basic principles and procedures of rigorous data analysis and motivates the use of Go in this context. Once you are done with this material you will understand what data analysis is and how Go can help developers maintain integrity while using data to make decisions.
This material covers the gathering, organization, and parsing of data to/from a local and remote sources. Once you are done with this material you will understand how to interact with data stored in various places and in various formats, how to parse and clean that data, and how to output that cleaned and parsed data.
Data Gathering, Organization, and Parsing
This material covers the organization of data into matrices and matrix operations. Once you are done with this material you will understand how to form matrices within Go programs and how to utilize those matrices to perform various types of matrix operations.
This material covers statistical measures and operations key to day-to-day data science work. Once you are done with this material you will understand how to perform solid summary data analysis, describe and visualize distributions, quantify hypotheses, and transform data sets with, e.g., dimensionality reductions.
This material introduces various types of machine learning along with evaluation and validation techniques. Once you are done with this material you will understand how to train, evaluate, validate, and utilize various models (e.g., for regression, clustering, and classification).
This material introduces techniques for reproducible and scalable data science deployments. Once you are done with this material you will understand how to Dockerize your Go data analysis code and how to integrate those Docker images in distributed data pipelines.
Deployment, Distributed Processing
All material is licensed under the Apache License Version 2.0, January 2004.