This repo has been inspired by these:
I want to track the progress of my studies in this broad area. I do not intend to list a huge number of resources or courses, just the ones that I have completed so far and the following ones that are on my mind for a next step (in a short/medium or even long term). Anyway, valuable resources that may be part of my plan in the future.
It is not that easy to classify subjects in Data Science. Some courses may correspond clearly to only one category, some others may belong to more than one, etc. I have tried to simplify the categories of interest in what you can see in the table of contents. There may still be some incongruences, but I think I am happy with the result :)
- Introductory Courses in Data Science.
- General Courses in Data Science.
- Data Analysis.
- Machine Learning.
- Octave/Matlab.
- Python.
- R.
- Text Mining and NLP.
- Data Visualization and Reporting.
- Python.
- R.
- JavaScript.
- Probability and Statistics.
- Big Data.
- Books.
- Other courses in Computer Science.
1. INTRODUCTORY COURSES IN DATA SCIENCE (back to top ↑)
Python (back to top ↑)
- Intro to Python for Data Science (by Filip Schouwenaars from DataCamp and Jonathan Sanito from Microsoft at edX). ~ 12 hours.
- Python Basics.
- List - A Data Structure.
- Functions and Packages.
Numpy
.- Plotting with
Matplotlib
. - Control Flow and
Pandas
. - Course Certificate ✓.
- Intro to Python for Data Science (by Filip Schouwenaars at DataCamp). ~ 4 hours.
- Python Basics.
- Python Lists.
- Functions and Packages.
NumPy
.- Course Certificate ✓.
- Intermediate Python for Data Science (by Filip Schouwenaars at DataCamp). ~ 4 hours.
Matplotlib
.- Dictionaries &
Pandas
. - Logic, Control Flow and Filtering.
- Loops.
- Case Study: Hacker Statistics.
- Course Certificate ✓.
- Python Data Science Toolbox (Part 1) (by Hugo Bowne-Anderson at DataCamp). ~ 3 hours.
- Writing your own functions.
- Default arguments, variable-length arguments and scope.
- Lambda functions and error-handling.
- Course Certificate ✓.
- Python Data Science Toolbox (Part 2) (by Hugo Bowne-Anderson at DataCamp). ~ 4 hours.
- Using iterators in PythonLand.
- List comprehensions and generators.
- Bringing it all together!
- Course Certificate ✓.
- Data Types for Data Science (by Jason Myers at DataCamp). ~ 4 hours.
- Fundamental data types.
- Dictionaries - the root of Python.
- Meet the collections module.
- Handling Dates and Times.
- Answering Data Science Questions.
R (back to top ↑)
- Mathematical and Statistical Sofware (by Yosu Yurramendi, Master in Computational Engineering and Intelligent Systems, University of the Basque Country). 3 ECTS ~ 75 hours.
- Basic concepts of R.
- Data structures in R.
- Data files management with R.
- Graphics with R.
- Programming structures in R.
- Statistical and Mathematical Computing.
- The Data Scientist’s Toolbox (by Jeff Leek, Roger D. Peng and Brian Caffo from Johns Hopkins University at Coursera). ~ 4 hours.
- Installing the Toolbox.
- Conceptual Issues.
- Course Project Submission & Evaluation.
- Course Certificate ✓.
- R Programming (by Roger D. Peng, Jeff Leek and Brian Caffo from Johns Hopkins University at Coursera). ~ 28 hours.
- Overview of R, R data types and objects, reading and writing data.
- Control structures, functions, scoping rules, dates and times.
- Loop functions, debugging tools.
- Simulation, code profiling.
- Course Certificate ✓.
- Introduction to R (by Jonathan Cornelissen at DataCamp). ~ 4 hours.
- Intro to basics.
- Vectors.
- Matrices.
- Factors.
- Data frames.
- Lists.
- Course Certificate ✓.
- Intermediate R (by Filip Schouwenaars at DataCamp). ~ 6 hours.
- Conditionals and Control Flow.
- Loops.
- Functions.
- The apply family.
- Utilities.
- Course Certificate ✓.
- Intermediate R - Practice (by Filip Schouwenaars at DataCamp). Same sections to the previous course. ~ 4 hours.
- Writing Functions in R (by Hadley Wickham and Charlotte Wickham at DataCamp). ~ 4 hours.
- A quick refresher.
- When and how you should write a function.
- Functional programming.
- Advanced inputs and outputs.
- Robust functions.
- Writing Efficient R Code (by Colin Gillespie at DataCamp). ~ 4 hours.
- The Art of Benchmarking.
- Fine Tuning: Efficient Base R.
- Diagnosing Problems: Code Profiling.
- Turbo Charged Code: Parallel Programming.
- Object-Oriented Programming in R:
S3
andR6
(by Richie Cotton at DataCamp). ~ 4 hours.- Introduction to Object-Oriented Programming.
- Using
S3
. - Using
R6
. R6
Inheritance.- Advanced
R6
Usage.
SQL (back to top ↑)
- Intro to SQL for Data Science (by Nick Carchedi at DataCamp). ~ 4 hours.
- Selecting columns.
- Filtering rows.
- Aggregate Functions.
- Sorting, grouping and joins.
- Course Certificate ✓.
- Joining Data in PostgreSQL (by Chester Ismay at DataCamp). ~ 5 hours.
- Introduction to joins.
- Outer joins and cross joins.
- Set theory clauses.
- Subqueries.
2. GENERAL COURSES IN DATA SCIENCE (back to top ↑)
Python (back to top ↑)
- Master thesis corresponding to the Master in Computational Engineering and Intelligent Systems program (by Javier Estraviz. Master in Computational Engineering and Intelligent Systems, University of the Basque Country). 18 ECTS ~ 450 hours. Expected: 2018.
- Intro to Data Science (by Dave Holtz and Cheng-Han Lee at Udacity). ~ 80 hours.
- Introduction to Data Science.
- Data Wrangling.
- Data Analysis.
- Data Visualization.
- MapReduce.
R (back to top ↑)
- 15.071x The Analytics Edge (by Dimitris Bertsimas from MITx at edX). ~ 120 hours.
- An Introduction to Analytics.
- Linear Regression.
- Logistic Regression.
- Trees.
- Text Analytics.
- Clustering.
- Visualization.
- Linear Optimization.
- Integer Optimization.
- Course Certificate ✓.
- Data Science Capstone (by Jeff Leek, Roger D. Peng and Brian Caffo from Johns Hopkins University at Coursera). 35 hours.
- Overview, Understanding the Problem, and Getting the Data.
- Exploratory Data Analysis and Modeling.
- Prediction Model.
- Creative Exploration.
- Data Product.
- Slide Deck.
- Final Project Submission and Evaluation.
3. DATA ANALYSIS (back to top ↑)
Python (back to top ↑)
- Importing Data in Python (Part 1) (by Hugo Bowne-Anderson at DataCamp). ~ 3 hours.
- Introduction and flat files.
- Importing data from other file types.
- Working with relational databases in Python.
- Course Certificate ✓.
- Importing Data in Python (Part 2) (by Hugo Bowne-Anderson at DataCamp). ~ 2 hours.
- Importing data from the Internet.
- Interacting with APIs to import data from the web.
- Diving deep into the
Twitter API
. - Course Certificate ✓.
- Cleaning Data in Python (by Daniel Chen at DataCamp). ~ 4 hours.
- Exploring your data.
- Tidying data for analysis.
- Combining data for analysis.
- Cleaning data for analysis.
- Case study.
- Course Certificate ✓.
- pandas Foundations (by Dhavide Aruliah at DataCamp). ~ 4 hours.
- Data ingestion & inspection.
- Exploratory data analysis.
- Time series in
pandas
. - Case Study - Sunlight in Austin.
- Course Certificate ✓.
- Manipulating DataFrames with
pandas
(by Dhavide Aruliah at DataCamp). ~ 4 hours.- Extracting and transforming data.
- Advanced indexing.
- Rearranging and reshaping data.
- Grouping data.
- Bringing it all together.
- Course Certificate ✓.
- Merging DataFrames with
pandas
(by Dhavide Aruliah at DataCamp). ~ 4 hours.- Preparing data.
- Concatenating data.
- Merging data.
- Case Study - Summer Olympics.
- Course Certificate ✓.
- Introduction to Databases in Python (by Jason Myers at DataCamp). ~ 4 hours.
- Basics of Relational Databases.
- Applying Filtering, Ordering and Grouping to Queries.
- Advanced
SQLAlchemy
Queries. - Creating and Manipulating your own Databases.
- Putting it all together.
- Course Certificate ✓.
- Become a Python Data Analyst (by Alvaro Fuentes at Safari). ~ 4 hours 30 minutes. My github repo.
- The Anaconda Distribution and the Jupyter Notebook.
- Vectorizing Operations with NumPy.
- Pandas: Everyone’s Favorite Data Analysis Library.
- Visualization and Exploratory Data Analysis.
- Statistical Computing with Python.
- Introduction to Predictive Analytics Models.
- Intro to Data Analysis (by Caroline Buckey at Udacity). ~ 60 hours.
- Data Analysis Process.
- NumPy and Pandas for 1D Data.
- NumPy and Pandas for 2D Data.
- Investigate a Dataset.
R (back to top ↑)
- Exploratory Data Analysis (by Iñaki Inza, Itziar Irigoien, Yosu Yurramendi, Javier Muguerza, Ibai Gurrutxaga, José
Ignacio Martín, Olatz Arbelaitz, Txus Perez. Master in Computational Engineering and Intelligent Systems, University of the Basque Country). 6 ECTS ~ 150 hours.
- General introduction to the problematic and basic notions.
- Visualization of a variable and the relationships between variables.
- Unsupervised Classification Methods.
- Supervised Classification Methods.
- Methods of reducing dimensionality (factorial methods).
- Temporal series.
- Getting and Cleaning Data (by Jeff Leek, Roger D. Peng and Brian Caffo from Johns Hopkins University at Coursera). ~ 16 hours.
- Finding data and reading different file types.
- Introduction to the most common data storage systems and the appropriate tools to extract data from web or from databases like
MySQL
. - Organizing, merging and managing data.
- Text and date manipulation in R.
- Course Certificate ✓.
- Exploratory Data Analysis (by Roger D. Peng, Jeff Leek and Brian Caffo from Johns Hopkins University at Coursera). ~ 16 hours.
- The basics of analytic graphics and the base plotting system in R.
- More advanced graphing systems available in R: the
Lattice
system and theggplot2
system. - Some statistical methods for exploratory analysis. These methods include clustering and dimension reduction techniques.
- Two case studies in exploratory data analysis.
- Course Certificate ✓.
- Data Analysis with R (by at Udacity). ~ 80 hours.
- What is EDA?
- R Basics.
- Explore One Variable.
- Explore Two Variables.
- Explore Many Variables.
- Diamonds and Price Predictions.
- Importing Data in R (Part 1) (by Filip Schouwenaars at DataCamp). ~ 3 hours.
- Importing data from flat files with utils.
readr
&data.table
.- Importing Excel data.
- Reproducible Excel work with
XLConnect
.
- Importing Data in R (Part 2) (by Filip Schouwenaars at DataCamp). ~ 3 hours.
- Importing data from databases (Part 1).
- Importing data from databases (Part 2).
- Importing data from the web (Part 1).
- Importing data from the web (Part 2).
- Importing data from statistical software packages.
- Cleaning Data in R (by Nick Carchedi at DataCamp). ~ 4 hours.
- Introduction and exploring raw data.
- Tidying data.
- Preparing data for analysis.
- Putting it all together.
- Importing & Cleaning Data in R: Case Studies (by Nick Carchedi at DataCamp). ~ 4 hours.
- Ticket Sales Data.
- MBTA Ridership Data.
- World Food Facts.
- School Attendance Data.
- String Manipulation in R with
stringr
(by Charlotte Wickham at DataCamp). ~ 4 hours.- String basics.
- Introduction to stringr.
- Pattern matching with regular expressions.
- More advanced matching and manipulation.
- Case Studies.
- Data Manipulation in R with
dplyr
(by Garrett Grolemund at DataCamp). ~ 4 hours.- Introduction to
dplyr
andtbls
. - Select and mutate.
- Filter and arrange.
- Summarise and the pipe operator.
- Group_by and working with databases.
- Introduction to
- Joining Data in R with
dplyr
(by Garrett Grolemund at DataCamp). ~ 4 hours.- Mutating joins.
- Filtering joins and set operations.
- Assembling data.
- Advanced joining.
- Case Study.
- Exploratory Data Analysis in R: Case Study (by David Robinson at DataCamp). ~ 4 hours.
- Data cleaning and summarizing with
dplyr
. - Data visualization with
ggplot2
. - Tidy modeling with
broom
. - Joining and tidying.
- Data cleaning and summarizing with
- Data Analysis in R, the
data.table
Way (by Matt Dowle and Arun Srinivasan at DataCamp). ~ 4 hours.Data.table
novice.Data.table
yeoman.Data.table
expert.
Weka (back to top ↑)
- Data Mining and Big Data Analysis (by Itziar Irigoien, Javier Muguerza, Ibai Gurrutxaga, José Ignacio Martín, Olatz Arbelaitz,
Txus Perez. Master in Computational Engineering and Intelligent Systems, University of the Basque Country). 3 ECTS ~ 75 hours.
- Introduction to data mining.
- Data mining: from theory to practice through the use of
WEKA
software. - Preprocessed data for further analysis. Main techniques and filters..
- Introduction of variable selection. Selection of the relevant variables for the analysis. Types of techniques.
- Estimation of the percentage of good classifier and statistical tests for the comparison of classifiers: evaluation and credibility of the learned models.
- Discussion and study of concrete case studies: Bioinformatics.
4. MACHINE LEARNING (back to top ↑)
Octave/Matlab (back to top ↑)
- Machine Learning (by Andrew Ng from at Stanford at Coursera). ~ 55 hours.
- Introduction.
- Linear Regression with One Variable.
- Linear Algebra Review.
- Linear Regression with Multiple Variables.
- Logistic Regression.
- Regularization.
- Neural Networks: Representation.
- Neural Networks: Learning.
- Advice for Applying Machine Learning.
- Machine Learning System Design.
- Support Vector Machines.
- Unsupervised Learning.
- Dimensionality Reduction.
- Anomaly Detection.
- Recommender Systems.
- Large Scale Machine Learning.
- Application Example: Photo OCR.
- Course Certificate ✓.
- Learning from Data (by Yaser Abu-Mostafa at Caltech). ~ 108 hours.
- The Learning Problem
- Is Learning Feasible?
- The Linear Model I.
- Error and Noise.
- Training versus Testing.
- Theory of Generalization.
- The VC Dimension.
- Bias-Variance Tradeoff.
- The Linear Model II.
- Neural Networks.
- Overfitting.
- Regularization.
- Validation.
- Support Vector Machines.
- Kernel Methods.
- Radial Basis Functions.
- Three Learning Principles.
- Epilogue.
Python (back to top ↑)
- Unsupervised Learning in Python (by Benjamin Wilson at DataCamp). ~ 4 hours.
- Clustering for dataset exploration.
- Visualization with hierarchical clustering and t-SNE.
- Decorrelating your data and dimension reduction.
- Discovering interpretable features.
- Course Certificate ✓.
- Making Predictions with Data and Python (by Alvaro Fuentes at Safari). ~ 4 hours. My github repo.
- Chapter 1 : The Tools for Doing Predictive Analytics with Python.
- Chapter 2 : Visualization Refresher.
- Chapter 3 : Concepts in Predictive Analytics.
- Chapter 4 : Regression: Concepts and Models.
- Chapter 5 : Regression: Predicting Crime, Stock Prices, and Post Popularity.
- Chapter 6 : Classification: Concepts and Models.
- Chapter 7 : Classification: Predicting Bankruptcy, Credit Default, and Spam Text Messages.
- Neural Networks and Deep Learning (by Andrew Ng from deeplearning.ai at Coursera) ~ 12 hours.
- Introduction to deep learning.
- Neural Networks Basics.
- Shallow neural networks.
- Deep Neural Networks.
- Course Certificate ✓.
- Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization (by Andrew Ng from deeplearning.ai at Coursera) ~ 9 hours.
- Practical aspects of Deep Learning.
- Optimization algorithms.
- Hyperparameter tuning, Batch Normalization and Programming Frameworks.
- Course Certificate ✓.
- Structuring Machine Learning Projects (by Andrew Ng from deeplearning.ai at Coursera) ~ 6 hours.
- ML Strategy (1).
- ML Strategy (2).
- Course Certificate ✓.
- Deep Learning for Business (by Jong-Moon Chung from Yonsei University at Coursera) ~ 8 hours.
- Deep Learning Products & Services.
- Business with Deep Learning & Machine Learning.
- Deep Learning Computing Systems & Software.
- Basics of Deep Learning Neural Networks.
- Deep Learning with CNN & RNN.
- Deep Learning Project with TensorFlow Playground.
- Course Certificate ✓.
- Supervised Learning with scikit-learn (by Andreas Müller and Hugo Bowne-Anderson at DataCamp). ~ 4 hours.
- Classification.
- Regression.
- Fine-tuning your model.
- Preprocessing and pipelines.
- Machine Learning with the Experts: School Budgets (by Peter Bull at DataCamp). ~ 4 hours.
- Exploring the raw data.
- Creating a simple first model.
- Improving your model.
- Learning from the experts.
- Deep Learning in Python (by Dan Becker at DataCamp). ~ 4 hours.
- Basics of deep learning and neural networks.
- Optimizing a neural network with backward propagation.
- Building deep learning models with
keras
. - Fine-tuning keras models.
- Introduction to Data Science in Python (by Christopher Brooks from University of Michigan at Coursera) ~ 40 hours.
- Applied Plotting, Charting & Data Representation in Python (by Christopher Brooks from University of Michigan at Coursera) ~ 40 hours.
- Principles of Information Visualization.
- Basic Charting.
- Charting Fundamentals.
- Applied Visualizations.
- Applied Machine Learning in Python (by Kevyn Collins-Thompson from University of Michigan at Coursera) ~ 40 hours.
- Fundamentals of Machine Learning - Intro to SciKit Learn.
- Supervised Machine Learning - Part 1.
- Evaluation.
- Supervised Machine Learning - Part 2.
- Applied Text Mining in Python (by V.G. Vinod Vydiswaran from University of Michigan at Coursera) ~ 40 hours.
- Working with Text in Python.
- Basic Natural Language Processing.
- Classification of Text.
- Topic Modeling.
- Applied Social Network Analysis in Python (by Daniel Romero from University of Michigan at Coursera) ~ 40 hours.
- Machine Learning Foundations: A Case Study Approach (by Carlos Guestrin and Emily Fox from University of Washington at Coursera) ~ 30 hours.
- Welcome.
- Regression: Predicting House Prices.
- Classification: Analyzing Sentiment.
- Clustering and Similarity: Retrieving Documents.
- Recommending Products.
- Deep Learning: Searching for Images.
- Closing Remarks.
- Machine Learning: Regression (by Emily Fox and Carlos Guestrin from University of Washington at Coursera) ~ 30 hours.
- Welcome.
- Simple Linear Regression.
- Multiple Regression.
- Assessing Performance.
- Ridge Regression.
- Feature Selection & Lasso.
- Nearest Neighbors & Kernel Regression.
- Closing Remarks.
- Machine Learning: Classification (by Carlos Guestrin and Emily Fox from University of Washington at Coursera) ~ 35 hours.
- Welcome!
- Linear Classifiers & Logistic Regression.
- Learning Linear Classifiers.
- Overfitting & Regularization in Logistic Regression.
- Decision Trees.
- Preventing Overfitting in Decision Trees.
- Handling Missing Data.
- Boosting.
- Precision-Recall.
- Scaling to Huge Datasets & Online Learning.
- Machine Learning: Clustering & Retrieval (by Emily Fox and Carlos Guestrin from University of Washington at Coursera) ~ 30 hours.
- Welcome.
- Nearest Neighbor Search.
- Clustering with k-means.
- Mixture Models.
- Mixed Membership Modeling via Latent Dirichlet Allocation.
- Hierarchical Clustering & Closing Remarks.
- Convolutional Neural Networks (by Andrew Ng from deeplearning.ai at Coursera).
- Sequence Models (by Andrew Ng from deeplearning.ai at Coursera).
- Neural Networks for Machine Learning (by Geoffrey Hinton from University of Toronto at Coursera). ~ 112 hours.
- Introduction.
- The Perceptron learning procedure.
- The backpropagation learning proccedure.
- Learning feature vectors for words.
- Object recognition with neural nets.
- Optimization: How to make the learning go faster.
- Recurrent neural networks.
- More recurrent neural networks.
- Ways to make neural networks generalize better.
- Combining multiple neural networks to improve generalization.
- Hopfield nets and Boltzmann machines.
- Restricted Boltzmann machines (RBMs).
- Stacking RBMs to make Deep Belief Nets.
- Deep neural nets with generative pre-training.
- Modeling hierarchical structure with neural nets.
- Recent applications of deep neural nets.
- Probabilistic Graphical Models 1: Representation (by Daphne Koller from Stanford at Coursera) ~ 75 hours.
- Introduction and Overview.
- Bayesian Network (Directed Models).
- Template Models for Bayesian Networks.
- Structured CPDs for Bayesian Networks.
- Markov Networks (Undirected Models).
- Decision Making.
- Knowledge Engineering & Summary.
- Probabilistic Graphical Models 2: Inference (by Daphne Koller from Stanford at Coursera) ~ 75 hours.
- Inference Overview.
- Variable Elimination.
- Belief Propagation Algorithms.
- MAP Algorithms.
- Sampling Methods.
- Inference in Temporal Models.
- Inference Summary.
- Probabilistic Graphical Models 3: Learning (by Daphne Koller from Stanford at Coursera) ~ 75 hours.
- Learning: Overview.
- Review of Machine Learning Concepts from Prof. Andrew Ng's Machine Learning Class (Optional).
- Parameter Estimation in Bayesian Networks.
- Learning Undirected Models.
- Learning BN Structure.
- Learning BNs with Incomplete Data.
- Learning Summary and Final.
- PGM Wrapup.
- Introduction to Machine Learning (by Katie Malone and Sebastian Thrun at Udacity). ~ 100 hours.
- Welcome to Machine Learning.
- Naive Bayes.
- Support Vector Machines.
- Decision Trees.
- Choose your own Algorithm.
- Datasets and Questions.
- Regressions.
- Outliers.
- Clustering.
- Feature Scaling.
- Machine Learning (by Michael Littman, Charles Isbell and Pushkar Kolhe at Udacity). ~ 160 hours.
- Supervised Learning.
- Unsupervised Learning.
- Reinforcement Learning.
- Deep Learning (by Vincent Vanhoucke and Arpan Chakraborty at Udacity). ~ 120 hours.
- From Machine Learning to Deep Learning.
- Deep Neural Networks.
- Convolutional Neural Networks.
- Deep Models for Text and Sequences.
- Practical Deep Learning For Coders, Part 1 (by Jeremy Howard from fast.ai) ~ 90 hours.
- Image Recognition.
- CNNs.
- Overfitting.
- Embeddings.
- NLP.
- RNNs.
- CNN Architectures.
- Cutting Edge Deep Learning For Coders, Part 2 (by Jeremy Howard from fast.ai) ~ 90 hours.
- Artistic Style.
- Generative Models.
- Multi-Modals & GANs.
- Memory Networks.
- Attentional Models.
- Neural Translation.
- Time Series & Segmentation.
- Creative Applications of Deep Learning with TensorFlow (by Parag Mital at Kadenze) ~ 30 hours.
- Introduction to TensorFlow.
- Training A Network W/ TensorFlow.
- Unsupervised And Supervised Learning.
- Visualizing And Hallucinating Representations.
- Generative Models.
- Neural Networks (by Hugo Larochelle from Université de Sherbrooke).
0. Introduction and math revision.
- Feedforward neural network.
- Training neural networks.
- Conditional random fields.
- Training CRFs.
- Restricted Boltzmann machine.
- Autoencoders.
- Deep learning.
- Sparse coding.
- Computer vision.
- Natural language processing.
- Machine Learning (by Nando de Freitas at Oxford University).
- Introduction.
- Linear Prediction.
- Maximum likelihood.
- Regularizers, basis functions and cross-validation.
- Optimisation.
- Logistic regression.
- Back-propagation and layer-wise design of neural nets.
- Neural networks and deep learning with Torch.
- Convolutional neural networks.
- Max-margin learning and siamese networks.
- Recurrent neural networks and LSTMs.
- Hand-writing with recurrent neural networks (Guest speaker: Alex Graves from Google Deepmind).
- Variational autoencoders and image generation (Guest speaker: Karol Gregor from Google Deepmind).
- Reinforcement learning with direct policy search.
- Reinforcement learning with action-value functions.
R (back to top ↑)
- Practical Machine Learning (by Jeff Leek, Roger D. Peng and Brian Caffo from Johns Hopkins University at Coursera). ~ 16 hours.
- Prediction, Errors, and Cross Validation.
- The
caret
Package. - Predicting with trees, Random Forests, & Model Based Predictions.
- Regularized Regression and Combining Predictors.
- Course Certificate ✓.
- Machine Learning Toolbox (by Zachary Deane-Mayer and Max Kuhn at DataCamp). ~ 4 hours.
- Regression models: fitting them and evaluating their performance.
- Classification models: fitting them and evaluating their performance.
- Tuning model parameters to improve performance.
- Preprocessing your data.
- Selecting models: a case study in churn prediction.
- Introduction to Machine Learning (by Vincent Vankrunkelsven and Gilles Inghelbrecht at DataCamp). ~ 6 hours.
- What is Machine Learning.
- Performance measures.
- Classification.
- Regression.
- Clustering.
- Unsupervised Learning in R (by Hank Roark at DataCamp). ~ 4 hours.
- Unsupervised learning in R.
- Hierarchical clustering.
- Dimensionality reduction with PCA.
- Putting it all together with a case study.
- Supervised Learning in R: Regression (by Nina Zumel and John Mount at DataCamp). ~ 4 hours.
- What is Regression?
- Training and Evaluating Regression Models.
- Issues to Consider.
- Dealing with Non-Linear Responses.
- Tree-Based Methods.
5. TEXT MINING AND NLP (back to top ↑)
Python (back to top ↑)
- Natural Language Processing Fundamentals in Python (by Katharine Jarmul at DataCamp). ~ 4 hours.
- Regular expressions & word tokenization.
- Simple topic identification.
- Named-entity recognition.
- Building a "fake news" classifier.
R (back to top ↑)
- Text Mining: Bag of Words (by Ted Kwartler at DataCamp). ~ 4 hours.
- Jumping into text mining with bag of words.
- Word clouds and more interesting visuals.
- Adding to your tm skills.
- Battle of the tech giants for talent.
6. DATA VISUALIZATION AND REPORTING (back to top ↑)
Python (back to top ↑)
- Introduction to Data Visualization with Python (by Bryan Van de Ven at DataCamp). ~ 4 hours.
- Customizing plots.
- Plotting 2D arrays.
- Statistical plots with
seaborn
. - Analyzing time series and images.
- Course Certificate ✓.
- Interactive Data Visualization with
bokeh
(by Bryan Van de Ven at DataCamp). ~ 4 hours.- Basic plotting with
bokeh
. - Layouts, Interactions, and Annotations.
- High-level Charts.
- Building interactive apps with
bokeh
. - Putting It All Together! A Case Study.
- Basic plotting with
R (back to top ↑)
- Reproducible Research (by Roger D. Peng, Jeff Leek and Brian Caffo from Johns Hopkins University at Coursera). ~ 16 hours.
- Concepts, Ideas, & Structure.
- Markdown &
knitr
. - Reproducible Research Checklist & Evidence-based Data Analysis.
- Case Studies & Commentaries.
- Course Certificate ✓.
- Data Analysis and Visualization (by Arpan Chakraborty at Udacity). ~ 160 hours.
- Programming in R.
- Data Analysis.
- Regression.
- Data Visualization in R (by Ronald Pearson at DataCamp). ~ 4 hours.
- A quick introduction to base R graphics.
- Different plot types.
- Adding details to plots.
- How much is too much?
- Advanced plot customization and beyond.
- Data Visualization in R with lattice (by Deepayan Sarkar at DataCamp). ~ 4 hours.
- Basic plotting with
lattice
. - Conditioning and the formula interface.
- Controlling scales and graphical parameters.
- Customizing plots using panel functions.
- Extensions and the
lattice
ecosystem.
- Basic plotting with
- Data Visualization with
ggplot2
(Part 1) (by Rick Scavetta at DataCamp). ~ 5 hours.- Introduction.
- Data.
- Aesthetics.
- Geometries.
qplot
and wrap-up.
- Data Visualization with
ggplot2
(Part 2) (by Rick Scavetta at DataCamp). ~ 5 hours.- Statistics.
- Coordinates and Facets.
- Themes.
- Best Practices.
- Case Study.
- Data Visualization with
ggplot2
(Part 3) (by Rick Scavetta at DataCamp). ~ 6 hours.- Statistical plots.
- Plots for specific data types (Part 1).
- Plots for specific data types (Part 2).
ggplot2
Internals.- Data Munging and Visualization Case Study.
- Reporting with R Markdown (by Garrett Grolemund at DataCamp). ~ 3 hours.
- Authoring R Markdown Reports.
- Embedding Code.
- Compiling Reports.
- Configuring R Markdown (optional).
- Working with Geospatial Data in R (by Charlotte Wickham at DataCamp). ~ 4 hours.
- Basic mapping with
ggplot2
andggmap
. - Point and polygon data.
- Raster data and color.
- Data import and projections.
- Basic mapping with
JavaScript (back to top ↑)
- Data Visualization and D3.js (by Ryan Orban, Chris Saden and Jonathan Dinu at Udacity) ~ 70 hours.
- Visualization Fundamentals.
- Building Blocks.
- Design Principles.
- Dimple js.
- Narratives.
- Animation and Interaction.
7. PROBABILITY AND STATISTICS (back to top ↑)
Python (back to top ↑)
- Probabilistic Modeling and Bayesian Networks (by Borja Calvo and Aritz Pérez. Master in Computational Engineering and Intelligent Systems, University of the Basque Country). 3 ECTS ~ 75 hours.
- Introduction to probabilistic modeling.
- Modeling of joint probabilities through Bayesian networks.
- Automatic learning of Bayesian networks.
- Applications of Bayesian networks: supervised classification.
- Statistical Thinking in Python (Part 1) (by Justin Bois at DataCamp). ~ 3 hours.
- Graphical exploratory data analysis.
- Quantitative exploratory data analysis.
- Thinking probabilistically-- Discrete variables.
- Thinking probabilistically-- Continuous variables.
- Course Certificate ✓.
- Developing Data Products (by Brian Caffo, Jeff Leek and Roger D. Peng from Johns Hopkins University at Coursera). ~ 16 hours.
shiny
,googleVis
, andplotly
.- R Markdown and
leaflet
. - R Packages.
swirl
and Course Project.
- Statistical Thinking in Python (Part 2) (by Justin Bois at DataCamp). ~ 4 hours.
- Parameter estimation by optimization.
- Bootstrap confidence intervals.
- Introduction to hypothesis testing.
- Hypothesis test examples.
- Putting it all together: a case study.
- Case Studies in Statistical Thinking (by Justin Bois at DataCamp). ~ 4 hours.
- Fish sleep and bacteria growth: A review of Statistical Thinking I and II.
- Analysis of results of the 2015 FINA World Swimming Championships.
- The "Current Controversy" of the 2013 World Championships.
- Statistical seismology and the Parkfield region.
- Earthquakes and oil mining in Oklahoma.
- Network Analysis in Python (Part 1) (by Eric Ma at DataCamp). ~ 4 hours.
- Introduction to networks.
- Important nodes.
- Structures.
- Bringing it all together.
- Network Analysis in Python (Part 2) (by Eric Ma at DataCamp). ~ 4 hours.
- Bipartite graphs & product recommendation systems.
- Graph projections.
- Comparing graphs & time-dynamic graphs.
- Tying it up!.
R (back to top ↑)
- Statistical Inference (by Brian Caffo, Roger D. Peng and Jeff Leek from Johns Hopkins University at Coursera). ~ 28 hours.
- Probability & Expected Values.
- Variability, Distribution, & Asymptotics.
- Intervals, Testing, & Pvalues.
- Power, Bootstrapping, & Permutation Tests.
- Course Certificate ✓.
- Regression Models (by Brian Caffo, Roger D. Peng and Jeff Leek from Johns Hopkins University at Coursera). ~ 16 hours.
- Least Squares and Linear Regression.
- Linear Regression & Multivariable Regression.
- Multivariable Regression, Residuals, & Diagnostics.
- Logistic Regression and Poisson Regression.
- Course Certificate ✓.
- Statistical Learning (by Trevor Hastie and Robert Tibshirani from Stanford at Stanford Online). ~ 50 hours.
- Introduction.
- Overview of Statistical Learning.
- Linear Regression.
- Classification.
- Resampling Methods.
- Linear Model Selection and Regularization.
- Moving Beyond Linearity.
- Tree-Based Methods.
- Support Vector Machines.
- Unsupervised Learning.
- Introduction to Data (by Mine Cetinkaya-Rundel at DataCamp). ~ 4 hours.
- Language of data.
- Study types and cautionary tales.
- Sampling strategies and experimental design.
- Case study.
- Exploratory Data Analysis (by Andrew Bray at DataCamp). ~ 4 hours.
- Exploring Categorical Data.
- Exploring Numerical Data.
- Numerical Summaries.
- Case Study.
- Correlation and Regression (by Ben Baumer at DataCamp). ~ 4 hours.
- Visualizing two variables.
- Correlation.
- Simple linear regression.
- Interpreting regression models.
- Model Fit.
- Multiple and Logistic Regression (by Ben Baumer at DataCamp). ~ 4 hours.
- Parallel Slopes.
- Evaluating and extending parallel slopes model.
- Multiple Regression.
- Logistic Regression.
- Case Study: Italian restaurants in NYC.
- Foundations of Inference (by Jo Hardin at DataCamp). ~ 4 hours.
- Introduction to ideas of inference.
- Completing a randomization test: gender discrimination.
- Hypothesis testing errors: opportunity cost.
- Confidence intervals.
- Foundations of Probability in R (by David Robinson at DataCamp). ~ 4 hours.
- The binomial distribution.
- Laws of probability.
- Bayesian statistics.
- Related distributions.
- Beginning Bayes in R (by Jim Albert at DataCamp). ~ 4 hours.
- Introduction to Bayesian thinking.
- Learning about a binomial probability.
- Learning about a normal mean.
- Bayesian comparisons.
- Spatial Statistics in R (by Barry Rowlingson at DataCamp). ~ 4 hours.
- Introduction.
- Point Pattern Analysis.
- Areal Statistics.
- Geostatistics.
- Sentiment Analysis in R: The Tidy Way (by Julia Silge at DataCamp). ~ 4 hours.
- Tweets across the United States.
- Shakespeare gets Sentimental.
- Analyzing TV News.
- Singing a Happy Song (or Sad?!).
8. BIG DATA (back to top ↑)
General (back to top ↑)
- Introduction to Big Data (2015) (by Ilkay Altintas and Amarnath Gupta from UC San Diego at Coursera). ~ 15 hours.
- Introduction to Big Data.
- Demystifying Data Science.
- Getting Started in Hadoop.
- Course Certificate ✓.
Spark (back to top ↑)
- CS100.1x Introduction to Big Data with Apache Spark (by Anthony D. Joseph from BerkeleyX at edX). ~ 65 hours.
- Data Science Background and Course Software Setup.
- Introduction to Apache Spark.
- Data Management.
- Data Quality, Exploratory Data Analysis, and Machine Learning.
- Lab 4 Introduction to Machine Learning with Apache Spark.
- Course Certificate ✓.
- CS190.1x Scalable Machine Learning (by Ameet Talwalkar from BerkeleyX at edX). ~ 20 hours.
- Course Software Setup.
- Course Overview and Machine Learning Basics.
- Introduction to Apache Spark.
- Linear Regression and Distributed Machine Learning Principles.
- Logistic Regression and Click-through Rate Prediction.
- Principal Component Analysis and Neuroimaging.
- Course Certificate ✓.
- Introduction to Spark in R using sparklyr (by Richie Cotton at DataCamp). ~ 4 hours.
- Light My Fire: Starting To Use Spark With
dplyr
Syntax. - Tools of the Trade: Advanced
dplyr
Usage. - Going Native: Use The Native Interface to Manipulate Spark DataFrames.
- Case Study: Learning to be a Machine: Running Machine Learning Models on Spark.
- Light My Fire: Starting To Use Spark With
9. BOOKS (back to top ↑)
This is a selection of books for Data Science and related disciplines from which I have good references. The books are listed in descending order of publication date.
Title | Author | Publisher | Release Date | Code | |
---|---|---|---|---|---|
◻️ | Deep Learning with Python | Francois Chollet | Manning | Jan 2018 (*) | GitHub |
✔️ | Python Tricks: The Book | Dan Bader | Ron Holland Designs | Oct 2017 | |
◻️ | Python for Data Analysis (2nd ed.) | Wes McKinney | O'Reilly | Oct 2017 | GitHub |
◻️ | Python Machine Learning (2nd ed.) | Sebastian Raschka, Vahid Mirjalili | Packt | Sep 2017 | GitHub |
◻️ | An Introduction to Statistical Learning (2nd ed.) | Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani | Springer | Sep 2017 | R code |
◻️ | Deep Learning | Josh Patterson, Adam Gibson | O'Reilly | Aug 2017 | |
◻️ | Fundamentals of Deep Learning | Nikhil Buduma | O'Reilly | Jun 2017 | GitHub |
◻️ | The Elements of Statistical Learning (2nd ed.) | Trevor Hastie, Robert Tibshirani, Jerome Friedman | Springer | May 2017 | Datasets |
◻️ | Practical Statistics for Data Scientists | Peter Bruce, Andrew Bruce | O'Reilly | May 2017 | GitHub |
◻️ | Hands-On Machine Learning with Scikit-Learn and TensorFlow | Aurélien Géron | O'Reilly | Apr 2017 | GitHub |
◻️ | Think Like a Data Scientist | Brian Godsey | Manning | Abr 2017 | |
◻️ | Deep Learning | Ian Goodfellow, Yoshua Bengio, Aaron Courville | MIT Press | Jan 2017 | |
◻️ | Efficient R Programming | Robin Lovelace, Colin Gillespie | O'Reilly | Dec 2016 | GitHub |
◻️ | Python Data Science Handbook | Jake VanderPlas | O'Reilly | Nov 2016 | GitHub |
◻️ | Introduction to Machine Learning with Python | Sarah Guido, Andreas C. Müller | O'Reilly | Oct 2016 | GitHub |
◻️ | Real-World Machine Learning | Henrik Brink, Joseph W. Richards, Mark Fetherolf | Manning | Sep 2016 | GitHub |
◻️ | Algorithms of the Intelligent Web (2nd ed.) | Douglas G. McIlwraith, Haralambos Marmanis, Dmitry Babenko | Manning | Aug 2016 | GitHub |
◻️ | R for Data Science | Garrett Grolemund, Hadley Wickham | O'Reilly | Jul 2016 | GitHub |
◻️ | Introducing Data Science | Davy Cielen, Arno D. B. Meysman, Mohamed Ali | Manning | May 2016 | Code 1, 2 |
◻️ | R Deep Learning Essentials | Joshua F. Wiley | Packt | Mar 2016 | GitHub |
◻️ | R in Action (2nd ed.) | Robert I. Kabacoff | Manning | May 2015 | GitHub |
◻️ | Data Science from Scratch | Joel Grus | O'Reilly | Apr 2015 | GitHub |
◻️ | Data Science at the Command Line | Jeroen Janssens | O'Reilly | Oct 2014 | GitHub |
✔️ | Learning scikit-learn: Machine Learning in Python | Raúl Garreta, Guillermo Moncecchi | Packt | Nov 2013 | GitHub |
(*) Expected publication date
10. OTHER COURSES IN COMPUTER SCIENCE (back to top ↑)
Software Design (back to top ↑)
- Domain-Driven Design Distilled (by Vaughn Vernon at Safari). ~ 4 hours.
- Introduction.
- Lesson 1: DDD for Me.
- Lesson 2: Strategic Design with Bounded Contexts and the Ubiquitous Language.
- Lesson 3: Strategic Design with Subdomains.
- Lesson 4: Strategic Design with Context Mapping.
- Lesson 5: Tactical Design with Aggregates.
- Lesson 6: Tactical Design with Domain Events.
- Lesson 7: Acceleration and Management Tools.
- Summary.
- Microservices: The Big Picture (by Antonio Goncalves at Pluralsight). ~ 2 hours.
- Course Overview.
- What Are Microservices?
- Microservices Elements.
- Are Microservices Right for Your Organization?
JavaScript (back to top ↑)
- Introduction to graphics engines: modeling, animation and graphic representation (by Joseba Makazaga and Aitor Soroa. Master in Computational Engineering and Intelligent Systems, University of the Basque Country). Library
three.js
. 6 ECTS ~ 150 hours.- Fundamentals: concepts and equipment.
- Geometric models: structures.
- Modeling Systems.
- Imaging systems.
- 3d animation and simulation techniques.
- Practices.
Python (back to top ↑)
- Heuristic Search (by José Antonio Lozano and Roberto Santana. Master in Computational Engineering and Intelligent Systems, University of the Basque Country). 3 ECTS ~ 75 hours.
- Introduction to optimization.
- Local Search Algorithms.
- Population and Hybrid Algorithms.
- Multiobjective optimization.
- Evaluation of optimization.
- Python Epiphanies. Exploring Fundamental Concepts (by Stuart Williams at Safari). ~ 2.5 hours.
- Introduction to Python Epiphanies.
- Objects.
- Names.
- More About Namespaces.
- Import.
- Functions.
- Decorators.
- How Classes Work.
- Special Methods.
- Iterators and Generators.
- Taking Advantage of First Class Objects.
- Python: Design Patterns (by Jungwoo Ryoo at Lynda.com). ~ 2 hours.
- Understanding Design Patterns.
- Creational Patterns.
- Structural Patterns.
- Behavioural Patterns.
- Design Best Practices.
- Enterprise Software with Python (by Mahmoud Hashemi at Safari). ~ 8 hours.
- Introduction to Enterprise Software with Python.
- Defining the Basics.
- Architecture & Design.
- Best Practices.
- Next Steps.
- Python: Getting Started (by Bo Milanovich at Pluralsight). ~ 3 hours.
- Course overview.
- Introduction.
- Types, Statements, and Other Goodies.
- Functions, Files, Yield, and Lambda.
- Object Oriented Programming - Classes and Why Do We Need Them?
- Putting It All Together - Let’s Make It a Web App.
- Python Tips and Tricks.
- Python Fundamentals (by Austin Bingham and Robert Smallshire at Pluralsight). ~ 5 hours.
- Introduction to the Python Fundamentals Course.
- Getting Starting With Python 3.
- Strings and Collections.
- Modularity.
- Objects.
- Collections.
- Handling exceptions.
- Iterables.
- Classes.
- Files and Resource Management.
- Shipping Working and Maintainable Code.
- Intermediate Python Programming (by Jessica McKellar at Safari). ~ 3 hours.
- Introduction.
- Wordplay warm-up.
- Data structures, a practical intermediate introduction.
- Jeopardy database.
- Plotting with Matplotlib.
- Scraping the NASA Astronomy Picture of the Day Website.
R (back to top ↑)
- Computation in Science and Engineering: numerical simulation (by Ander Murua. Master in Computational Engineering and Intelligent Systems, University of the Basque Country). 6 ECTS ~ 150 hours.
- Some examples of initial value problems modeled by differential equations and elementary methods of numerical resolution.
- Methods of numerical resolution of ordinary differential equations.
- Computational aspects of the numerical resolution of ordinary differential equations. R package
deSolve
. - Numerical resolution of systems of linear and non-linear algebraic equations.
- Special methods for stiff problems.
- Introductory examples of numerical resolution of partial differential equations of evolution.
- Image and signal processing (by Mamen Hernández and Josune Gallego. Master in Computational Engineering and Intelligent Systems, University of the Basque Country). 4.5 ECTS ~ 112.5 hours.
- Digital Image Basics.
- Processing in the spatial domain.
- Processing in the frequency domain.
- Morphological processing and image segmentation.
- Introduction to sound analysis.
- Digital filters for audio processing.
- Languages and standards for sound processing.
- Sound processing in the time and frequency domain.
- Cryptography (by Itziar Baragaña and Alicia Roca. Master in Computational Engineering and Intelligent Systems, University of the Basque Country). 4.5 ECTS ~ 112.5 hours.
- Introduction to cryptography.
- Mathematical fundamentals.
- Stream Encryption.
- Symmetric block encryption.
- Public Key Cryptography.
- Applications.
Miscellaneous (back to top ↑)
- Methodology and research techniques (by Basilio Sierra. Master in Computational Engineering and Intelligent Systems, University of the Basque Country). 3 ECTS ~ 75 hours.
- Introduction.
- Search for scientific articles.
- Design of articles.
- Real Applications.
- Cracking the Data Science Interview (by Jonathan Dinu and Katie Kent at Safari). ~ 3 hours.
- Introduction.
- Preparing for interviews.
- Try Docker (by Jon Friskics at Codeschool.com). ~ 1 hour.
- Containers & Images.
- Dockerfiles.
- Volumes.
- Introduction to Git for Data Science (by Greg Wilson at DataCamp). ~ 4 hours.
- Basic workflow.
- Repositories.
- Undo.
- Working with branches.
- Collaborating.
- Course Certificate ✓.
- Introduction to Shell for Data Science Course (by Greg Wilson at DataCamp). ~ 4 hours.
- Manipulating files and directories.
- Manipulating data.
- Combining tools.
- Batch processing.
- Creating new tools.
- Course Certificate ✓.