Skip to content

Latest commit

 

History

History
1041 lines (998 loc) · 64.3 KB

README-detailed-version.md

File metadata and controls

1041 lines (998 loc) · 64.3 KB

DATA SCIENCE ROADMAP

This repo has been inspired by these:

Why this?

I want to track the progress of my studies in this broad area. I do not intend to list a huge number of resources or courses, just the ones that I have completed so far and the following ones that are on my mind for a next step (in a short/medium or even long term). Anyway, valuable resources that may be part of my plan in the future.

How are things classified here?

It is not that easy to classify subjects in Data Science. Some courses may correspond clearly to only one category, some others may belong to more than one, etc. I have tried to simplify the categories of interest in what you can see in the table of contents. There may still be some incongruences, but I think I am happy with the result :)


TABLE OF CONTENTS

  1. Introductory Courses in Data Science.
  2. General Courses in Data Science.
  3. Data Analysis.
  4. Machine Learning.
  5. Text Mining and NLP.
  6. Data Visualization and Reporting.
  7. Probability and Statistics.
  8. Big Data.
  9. Books.
  10. Other courses in Computer Science.

1. INTRODUCTORY COURSES IN DATA SCIENCE (back to top ↑)

  • Mathematical and Statistical Sofware (by Yosu Yurramendi, Master in Computational Engineering and Intelligent Systems, University of the Basque Country). 3 ECTS ~ 75 hours.
    1. Basic concepts of R.
    2. Data structures in R.
    3. Data files management with R.
    4. Graphics with R.
    5. Programming structures in R.
    6. Statistical and Mathematical Computing.
  • The Data Scientist’s Toolbox (by Jeff Leek, Roger D. Peng and Brian Caffo from Johns Hopkins University at Coursera). ~ 4 hours.
    1. Installing the Toolbox.
    2. Conceptual Issues.
    3. Course Project Submission & Evaluation.
    4. Course Certificate ✓.
  • R Programming (by Roger D. Peng, Jeff Leek and Brian Caffo from Johns Hopkins University at Coursera). ~ 28 hours.
    1. Overview of R, R data types and objects, reading and writing data.
    2. Control structures, functions, scoping rules, dates and times.
    3. Loop functions, debugging tools.
    4. Simulation, code profiling.
    5. Course Certificate ✓.
  • Introduction to R (by Jonathan Cornelissen at DataCamp). ~ 4 hours.
    1. Intro to basics.
    2. Vectors.
    3. Matrices.
    4. Factors.
    5. Data frames.
    6. Lists.
    7. Course Certificate ✓.
  • Intermediate R (by Filip Schouwenaars at DataCamp). ~ 6 hours.
    1. Conditionals and Control Flow.
    2. Loops.
    3. Functions.
    4. The apply family.
    5. Utilities.
    6. Course Certificate ✓.
  • Intermediate R - Practice (by Filip Schouwenaars at DataCamp). Same sections to the previous course. ~ 4 hours.
    1. Course Certificate ✓.
  • Writing Functions in R (by Hadley Wickham and Charlotte Wickham at DataCamp). ~ 4 hours.
    1. A quick refresher.
    2. When and how you should write a function.
    3. Functional programming.
    4. Advanced inputs and outputs.
    5. Robust functions.
  • Writing Efficient R Code (by Colin Gillespie at DataCamp). ~ 4 hours.
    1. The Art of Benchmarking.
    2. Fine Tuning: Efficient Base R.
    3. Diagnosing Problems: Code Profiling.
    4. Turbo Charged Code: Parallel Programming.
  • Object-Oriented Programming in R: S3 and R6 (by Richie Cotton at DataCamp). ~ 4 hours.
    1. Introduction to Object-Oriented Programming.
    2. Using S3.
    3. Using R6.
    4. R6 Inheritance.
    5. Advanced R6 Usage.

2. GENERAL COURSES IN DATA SCIENCE (back to top ↑)

  • 15.071x The Analytics Edge (by Dimitris Bertsimas from MITx at edX). ~ 120 hours.
    1. An Introduction to Analytics.
    2. Linear Regression.
    3. Logistic Regression.
    4. Trees.
    5. Text Analytics.
    6. Clustering.
    7. Visualization.
    8. Linear Optimization.
    9. Integer Optimization.
    10. Course Certificate ✓.
  • Data Science Capstone (by Jeff Leek, Roger D. Peng and Brian Caffo from Johns Hopkins University at Coursera). 35 hours.
    1. Overview, Understanding the Problem, and Getting the Data.
    2. Exploratory Data Analysis and Modeling.
    3. Prediction Model.
    4. Creative Exploration.
    5. Data Product.
    6. Slide Deck.
    7. Final Project Submission and Evaluation.

3. DATA ANALYSIS (back to top ↑)

  • Exploratory Data Analysis (by Iñaki Inza, Itziar Irigoien, Yosu Yurramendi, Javier Muguerza, Ibai Gurrutxaga, José Ignacio Martín, Olatz Arbelaitz, Txus Perez. Master in Computational Engineering and Intelligent Systems, University of the Basque Country). 6 ECTS ~ 150 hours.
    1. General introduction to the problematic and basic notions.
    2. Visualization of a variable and the relationships between variables.
    3. Unsupervised Classification Methods.
    4. Supervised Classification Methods.
    5. Methods of reducing dimensionality (factorial methods).
    6. Temporal series.
  • Getting and Cleaning Data (by Jeff Leek, Roger D. Peng and Brian Caffo from Johns Hopkins University at Coursera). ~ 16 hours.
    1. Finding data and reading different file types.
    2. Introduction to the most common data storage systems and the appropriate tools to extract data from web or from databases like MySQL.
    3. Organizing, merging and managing data.
    4. Text and date manipulation in R.
    5. Course Certificate ✓.
  • Exploratory Data Analysis (by Roger D. Peng, Jeff Leek and Brian Caffo from Johns Hopkins University at Coursera). ~ 16 hours.
    1. The basics of analytic graphics and the base plotting system in R.
    2. More advanced graphing systems available in R: the Lattice system and the ggplot2 system.
    3. Some statistical methods for exploratory analysis. These methods include clustering and dimension reduction techniques.
    4. Two case studies in exploratory data analysis.
    5. Course Certificate ✓.
  • Data Analysis with R (by at Udacity). ~ 80 hours.
    1. What is EDA?
    2. R Basics.
    3. Explore One Variable.
    4. Explore Two Variables.
    5. Explore Many Variables.
    6. Diamonds and Price Predictions.
  • Importing Data in R (Part 1) (by Filip Schouwenaars at DataCamp). ~ 3 hours.
    1. Importing data from flat files with utils.
    2. readr & data.table.
    3. Importing Excel data.
    4. Reproducible Excel work with XLConnect.
  • Importing Data in R (Part 2) (by Filip Schouwenaars at DataCamp). ~ 3 hours.
    1. Importing data from databases (Part 1).
    2. Importing data from databases (Part 2).
    3. Importing data from the web (Part 1).
    4. Importing data from the web (Part 2).
    5. Importing data from statistical software packages.
  • Cleaning Data in R (by Nick Carchedi at DataCamp). ~ 4 hours.
    1. Introduction and exploring raw data.
    2. Tidying data.
    3. Preparing data for analysis.
    4. Putting it all together.
  • Importing & Cleaning Data in R: Case Studies (by Nick Carchedi at DataCamp). ~ 4 hours.
    1. Ticket Sales Data.
    2. MBTA Ridership Data.
    3. World Food Facts.
    4. School Attendance Data.
  • String Manipulation in R with stringr (by Charlotte Wickham at DataCamp). ~ 4 hours.
    1. String basics.
    2. Introduction to stringr.
    3. Pattern matching with regular expressions.
    4. More advanced matching and manipulation.
    5. Case Studies.
  • Data Manipulation in R with dplyr (by Garrett Grolemund at DataCamp). ~ 4 hours.
    1. Introduction to dplyr and tbls.
    2. Select and mutate.
    3. Filter and arrange.
    4. Summarise and the pipe operator.
    5. Group_by and working with databases.
  • Joining Data in R with dplyr (by Garrett Grolemund at DataCamp). ~ 4 hours.
    1. Mutating joins.
    2. Filtering joins and set operations.
    3. Assembling data.
    4. Advanced joining.
    5. Case Study.
  • Exploratory Data Analysis in R: Case Study (by David Robinson at DataCamp). ~ 4 hours.
    1. Data cleaning and summarizing with dplyr.
    2. Data visualization with ggplot2.
    3. Tidy modeling with broom.
    4. Joining and tidying.
  • Data Analysis in R, the data.table Way (by Matt Dowle and Arun Srinivasan at DataCamp). ~ 4 hours.
    1. Data.table novice.
    2. Data.table yeoman.
    3. Data.table expert.
  • Data Mining and Big Data Analysis (by Itziar Irigoien, Javier Muguerza, Ibai Gurrutxaga, José Ignacio Martín, Olatz Arbelaitz, Txus Perez. Master in Computational Engineering and Intelligent Systems, University of the Basque Country). 3 ECTS ~ 75 hours.
    1. Introduction to data mining.
    2. Data mining: from theory to practice through the use of WEKA software.
    3. Preprocessed data for further analysis. Main techniques and filters..
    4. Introduction of variable selection. Selection of the relevant variables for the analysis. Types of techniques.
    5. Estimation of the percentage of good classifier and statistical tests for the comparison of classifiers: evaluation and credibility of the learned models.
    6. Discussion and study of concrete case studies: Bioinformatics.

4. MACHINE LEARNING (back to top ↑)

Octave/Matlab (back to top ↑)

  • Machine Learning (by Andrew Ng from at Stanford at Coursera). ~ 55 hours.
    1. Introduction.
    2. Linear Regression with One Variable.
    3. Linear Algebra Review.
    4. Linear Regression with Multiple Variables.
    5. Logistic Regression.
    6. Regularization.
    7. Neural Networks: Representation.
    8. Neural Networks: Learning.
    9. Advice for Applying Machine Learning.
    10. Machine Learning System Design.
    11. Support Vector Machines.
    12. Unsupervised Learning.
    13. Dimensionality Reduction.
    14. Anomaly Detection.
    15. Recommender Systems.
    16. Large Scale Machine Learning.
    17. Application Example: Photo OCR.
    18. Course Certificate ✓.
  • Learning from Data (by Yaser Abu-Mostafa at Caltech). ~ 108 hours.
    1. The Learning Problem
    2. Is Learning Feasible?
    3. The Linear Model I.
    4. Error and Noise.
    5. Training versus Testing.
    6. Theory of Generalization.
    7. The VC Dimension.
    8. Bias-Variance Tradeoff.
    9. The Linear Model II.
    10. Neural Networks.
    11. Overfitting.
    12. Regularization.
    13. Validation.
    14. Support Vector Machines.
    15. Kernel Methods.
    16. Radial Basis Functions.
    17. Three Learning Principles.
    18. Epilogue.
  • Unsupervised Learning in Python (by Benjamin Wilson at DataCamp). ~ 4 hours.
    1. Clustering for dataset exploration.
    2. Visualization with hierarchical clustering and t-SNE.
    3. Decorrelating your data and dimension reduction.
    4. Discovering interpretable features.
    5. Course Certificate ✓.
  • Making Predictions with Data and Python (by Alvaro Fuentes at Safari). ~ 4 hours. My github repo.
    1. Chapter 1 : The Tools for Doing Predictive Analytics with Python.
    2. Chapter 2 : Visualization Refresher.
    3. Chapter 3 : Concepts in Predictive Analytics.
    4. Chapter 4 : Regression: Concepts and Models.
    5. Chapter 5 : Regression: Predicting Crime, Stock Prices, and Post Popularity.
    6. Chapter 6 : Classification: Concepts and Models.
    7. Chapter 7 : Classification: Predicting Bankruptcy, Credit Default, and Spam Text Messages.
  • Neural Networks and Deep Learning (by Andrew Ng from deeplearning.ai at Coursera) ~ 12 hours.
    1. Introduction to deep learning.
    2. Neural Networks Basics.
    3. Shallow neural networks.
    4. Deep Neural Networks.
    5. Course Certificate ✓.
  • Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization (by Andrew Ng from deeplearning.ai at Coursera) ~ 9 hours.
    1. Practical aspects of Deep Learning.
    2. Optimization algorithms.
    3. Hyperparameter tuning, Batch Normalization and Programming Frameworks.
    4. Course Certificate ✓.
  • Structuring Machine Learning Projects (by Andrew Ng from deeplearning.ai at Coursera) ~ 6 hours.
    1. ML Strategy (1).
    2. ML Strategy (2).
    3. Course Certificate ✓.
  • Deep Learning for Business (by Jong-Moon Chung from Yonsei University at Coursera) ~ 8 hours.
    1. Deep Learning Products & Services.
    2. Business with Deep Learning & Machine Learning.
    3. Deep Learning Computing Systems & Software.
    4. Basics of Deep Learning Neural Networks.
    5. Deep Learning with CNN & RNN.
    6. Deep Learning Project with TensorFlow Playground.
    7. Course Certificate ✓.
  • Supervised Learning with scikit-learn (by Andreas Müller and Hugo Bowne-Anderson at DataCamp). ~ 4 hours.
    1. Classification.
    2. Regression.
    3. Fine-tuning your model.
    4. Preprocessing and pipelines.
  • Machine Learning with the Experts: School Budgets (by Peter Bull at DataCamp). ~ 4 hours.
    1. Exploring the raw data.
    2. Creating a simple first model.
    3. Improving your model.
    4. Learning from the experts.
  • Deep Learning in Python (by Dan Becker at DataCamp). ~ 4 hours.
    1. Basics of deep learning and neural networks.
    2. Optimizing a neural network with backward propagation.
    3. Building deep learning models with keras.
    4. Fine-tuning keras models.
  • Introduction to Data Science in Python (by Christopher Brooks from University of Michigan at Coursera) ~ 40 hours.
  • Applied Plotting, Charting & Data Representation in Python (by Christopher Brooks from University of Michigan at Coursera) ~ 40 hours.
    1. Principles of Information Visualization.
    2. Basic Charting.
    3. Charting Fundamentals.
    4. Applied Visualizations.
  • Applied Machine Learning in Python (by Kevyn Collins-Thompson from University of Michigan at Coursera) ~ 40 hours.
    1. Fundamentals of Machine Learning - Intro to SciKit Learn.
    2. Supervised Machine Learning - Part 1.
    3. Evaluation.
    4. Supervised Machine Learning - Part 2.
  • Applied Text Mining in Python (by V.G. Vinod Vydiswaran from University of Michigan at Coursera) ~ 40 hours.
    1. Working with Text in Python.
    2. Basic Natural Language Processing.
    3. Classification of Text.
    4. Topic Modeling.
  • Applied Social Network Analysis in Python (by Daniel Romero from University of Michigan at Coursera) ~ 40 hours.
  • Machine Learning Foundations: A Case Study Approach (by Carlos Guestrin and Emily Fox from University of Washington at Coursera) ~ 30 hours.
    1. Welcome.
    2. Regression: Predicting House Prices.
    3. Classification: Analyzing Sentiment.
    4. Clustering and Similarity: Retrieving Documents.
    5. Recommending Products.
    6. Deep Learning: Searching for Images.
    7. Closing Remarks.
  • Machine Learning: Regression (by Emily Fox and Carlos Guestrin from University of Washington at Coursera) ~ 30 hours.
    1. Welcome.
    2. Simple Linear Regression.
    3. Multiple Regression.
    4. Assessing Performance.
    5. Ridge Regression.
    6. Feature Selection & Lasso.
    7. Nearest Neighbors & Kernel Regression.
    8. Closing Remarks.
  • Machine Learning: Classification (by Carlos Guestrin and Emily Fox from University of Washington at Coursera) ~ 35 hours.
    1. Welcome!
    2. Linear Classifiers & Logistic Regression.
    3. Learning Linear Classifiers.
    4. Overfitting & Regularization in Logistic Regression.
    5. Decision Trees.
    6. Preventing Overfitting in Decision Trees.
    7. Handling Missing Data.
    8. Boosting.
    9. Precision-Recall.
    10. Scaling to Huge Datasets & Online Learning.
  • Machine Learning: Clustering & Retrieval (by Emily Fox and Carlos Guestrin from University of Washington at Coursera) ~ 30 hours.
    1. Welcome.
    2. Nearest Neighbor Search.
    3. Clustering with k-means.
    4. Mixture Models.
    5. Mixed Membership Modeling via Latent Dirichlet Allocation.
    6. Hierarchical Clustering & Closing Remarks.
  • Convolutional Neural Networks (by Andrew Ng from deeplearning.ai at Coursera).
  • Sequence Models (by Andrew Ng from deeplearning.ai at Coursera).
  • Neural Networks for Machine Learning (by Geoffrey Hinton from University of Toronto at Coursera). ~ 112 hours.
    1. Introduction.
    2. The Perceptron learning procedure.
    3. The backpropagation learning proccedure.
    4. Learning feature vectors for words.
    5. Object recognition with neural nets.
    6. Optimization: How to make the learning go faster.
    7. Recurrent neural networks.
    8. More recurrent neural networks.
    9. Ways to make neural networks generalize better.
    10. Combining multiple neural networks to improve generalization.
    11. Hopfield nets and Boltzmann machines.
    12. Restricted Boltzmann machines (RBMs).
    13. Stacking RBMs to make Deep Belief Nets.
    14. Deep neural nets with generative pre-training.
    15. Modeling hierarchical structure with neural nets.
    16. Recent applications of deep neural nets.
  • Probabilistic Graphical Models 1: Representation (by Daphne Koller from Stanford at Coursera) ~ 75 hours.
    1. Introduction and Overview.
    2. Bayesian Network (Directed Models).
    3. Template Models for Bayesian Networks.
    4. Structured CPDs for Bayesian Networks.
    5. Markov Networks (Undirected Models).
    6. Decision Making.
    7. Knowledge Engineering & Summary.
  • Probabilistic Graphical Models 2: Inference (by Daphne Koller from Stanford at Coursera) ~ 75 hours.
    1. Inference Overview.
    2. Variable Elimination.
    3. Belief Propagation Algorithms.
    4. MAP Algorithms.
    5. Sampling Methods.
    6. Inference in Temporal Models.
    7. Inference Summary.
  • Probabilistic Graphical Models 3: Learning (by Daphne Koller from Stanford at Coursera) ~ 75 hours.
    1. Learning: Overview.
    2. Review of Machine Learning Concepts from Prof. Andrew Ng's Machine Learning Class (Optional).
    3. Parameter Estimation in Bayesian Networks.
    4. Learning Undirected Models.
    5. Learning BN Structure.
    6. Learning BNs with Incomplete Data.
    7. Learning Summary and Final.
    8. PGM Wrapup.
  • Introduction to Machine Learning (by Katie Malone and Sebastian Thrun at Udacity). ~ 100 hours.
    1. Welcome to Machine Learning.
    2. Naive Bayes.
    3. Support Vector Machines.
    4. Decision Trees.
    5. Choose your own Algorithm.
    6. Datasets and Questions.
    7. Regressions.
    8. Outliers.
    9. Clustering.
    10. Feature Scaling.
  • Machine Learning (by Michael Littman, Charles Isbell and Pushkar Kolhe at Udacity). ~ 160 hours.
    1. Supervised Learning.
    2. Unsupervised Learning.
    3. Reinforcement Learning.
  • Deep Learning (by Vincent Vanhoucke and Arpan Chakraborty at Udacity). ~ 120 hours.
    1. From Machine Learning to Deep Learning.
    2. Deep Neural Networks.
    3. Convolutional Neural Networks.
    4. Deep Models for Text and Sequences.
  • Practical Deep Learning For Coders, Part 1 (by Jeremy Howard from fast.ai) ~ 90 hours.
    1. Image Recognition.
    2. CNNs.
    3. Overfitting.
    4. Embeddings.
    5. NLP.
    6. RNNs.
    7. CNN Architectures.
  • Cutting Edge Deep Learning For Coders, Part 2 (by Jeremy Howard from fast.ai) ~ 90 hours.
    1. Artistic Style.
    2. Generative Models.
    3. Multi-Modals & GANs.
    4. Memory Networks.
    5. Attentional Models.
    6. Neural Translation.
    7. Time Series & Segmentation.
  • Creative Applications of Deep Learning with TensorFlow (by Parag Mital at Kadenze) ~ 30 hours.
    1. Introduction to TensorFlow.
    2. Training A Network W/ TensorFlow.
    3. Unsupervised And Supervised Learning.
    4. Visualizing And Hallucinating Representations.
    5. Generative Models.
  • Neural Networks (by Hugo Larochelle from Université de Sherbrooke). 0. Introduction and math revision.
    1. Feedforward neural network.
    2. Training neural networks.
    3. Conditional random fields.
    4. Training CRFs.
    5. Restricted Boltzmann machine.
    6. Autoencoders.
    7. Deep learning.
    8. Sparse coding.
    9. Computer vision.
    10. Natural language processing.
  • Machine Learning (by Nando de Freitas at Oxford University).
    1. Introduction.
    2. Linear Prediction.
    3. Maximum likelihood.
    4. Regularizers, basis functions and cross-validation.
    5. Optimisation.
    6. Logistic regression.
    7. Back-propagation and layer-wise design of neural nets.
    8. Neural networks and deep learning with Torch.
    9. Convolutional neural networks.
    10. Max-margin learning and siamese networks.
    11. Recurrent neural networks and LSTMs.
    12. Hand-writing with recurrent neural networks (Guest speaker: Alex Graves from Google Deepmind).
    13. Variational autoencoders and image generation (Guest speaker: Karol Gregor from Google Deepmind).
    14. Reinforcement learning with direct policy search.
    15. Reinforcement learning with action-value functions.
  • Practical Machine Learning (by Jeff Leek, Roger D. Peng and Brian Caffo from Johns Hopkins University at Coursera). ~ 16 hours.
    1. Prediction, Errors, and Cross Validation.
    2. The caret Package.
    3. Predicting with trees, Random Forests, & Model Based Predictions.
    4. Regularized Regression and Combining Predictors.
    5. Course Certificate ✓.
  • Machine Learning Toolbox (by Zachary Deane-Mayer and Max Kuhn at DataCamp). ~ 4 hours.
    1. Regression models: fitting them and evaluating their performance.
    2. Classification models: fitting them and evaluating their performance.
    3. Tuning model parameters to improve performance.
    4. Preprocessing your data.
    5. Selecting models: a case study in churn prediction.
  • Introduction to Machine Learning (by Vincent Vankrunkelsven and Gilles Inghelbrecht at DataCamp). ~ 6 hours.
    1. What is Machine Learning.
    2. Performance measures.
    3. Classification.
    4. Regression.
    5. Clustering.
  • Unsupervised Learning in R (by Hank Roark at DataCamp). ~ 4 hours.
    1. Unsupervised learning in R.
    2. Hierarchical clustering.
    3. Dimensionality reduction with PCA.
    4. Putting it all together with a case study.
  • Supervised Learning in R: Regression (by Nina Zumel and John Mount at DataCamp). ~ 4 hours.
    1. What is Regression?
    2. Training and Evaluating Regression Models.
    3. Issues to Consider.
    4. Dealing with Non-Linear Responses.
    5. Tree-Based Methods.

5. TEXT MINING AND NLP (back to top ↑)

  • Text Mining: Bag of Words (by Ted Kwartler at DataCamp). ~ 4 hours.
    1. Jumping into text mining with bag of words.
    2. Word clouds and more interesting visuals.
    3. Adding to your tm skills.
    4. Battle of the tech giants for talent.

6. DATA VISUALIZATION AND REPORTING (back to top ↑)

  • Reproducible Research (by Roger D. Peng, Jeff Leek and Brian Caffo from Johns Hopkins University at Coursera). ~ 16 hours.
    1. Concepts, Ideas, & Structure.
    2. Markdown & knitr.
    3. Reproducible Research Checklist & Evidence-based Data Analysis.
    4. Case Studies & Commentaries.
    5. Course Certificate ✓.
  • Data Analysis and Visualization (by Arpan Chakraborty at Udacity). ~ 160 hours.
    1. Programming in R.
    2. Data Analysis.
    3. Regression.
  • Data Visualization in R (by Ronald Pearson at DataCamp). ~ 4 hours.
    1. A quick introduction to base R graphics.
    2. Different plot types.
    3. Adding details to plots.
    4. How much is too much?
    5. Advanced plot customization and beyond.
  • Data Visualization in R with lattice (by Deepayan Sarkar at DataCamp). ~ 4 hours.
    1. Basic plotting with lattice.
    2. Conditioning and the formula interface.
    3. Controlling scales and graphical parameters.
    4. Customizing plots using panel functions.
    5. Extensions and the lattice ecosystem.
  • Data Visualization with ggplot2 (Part 1) (by Rick Scavetta at DataCamp). ~ 5 hours.
    1. Introduction.
    2. Data.
    3. Aesthetics.
    4. Geometries.
    5. qplot and wrap-up.
  • Data Visualization with ggplot2 (Part 2) (by Rick Scavetta at DataCamp). ~ 5 hours.
    1. Statistics.
    2. Coordinates and Facets.
    3. Themes.
    4. Best Practices.
    5. Case Study.
  • Data Visualization with ggplot2 (Part 3) (by Rick Scavetta at DataCamp). ~ 6 hours.
    1. Statistical plots.
    2. Plots for specific data types (Part 1).
    3. Plots for specific data types (Part 2).
    4. ggplot2 Internals.
    5. Data Munging and Visualization Case Study.
  • Reporting with R Markdown (by Garrett Grolemund at DataCamp). ~ 3 hours.
    1. Authoring R Markdown Reports.
    2. Embedding Code.
    3. Compiling Reports.
    4. Configuring R Markdown (optional).
  • Working with Geospatial Data in R (by Charlotte Wickham at DataCamp). ~ 4 hours.
    1. Basic mapping with ggplot2 and ggmap.
    2. Point and polygon data.
    3. Raster data and color.
    4. Data import and projections.

JavaScript (back to top ↑)

  • Data Visualization and D3.js (by Ryan Orban, Chris Saden and Jonathan Dinu at Udacity) ~ 70 hours.
    1. Visualization Fundamentals.
    2. Building Blocks.
    3. Design Principles.
    4. Dimple js.
    5. Narratives.
    6. Animation and Interaction.

7. PROBABILITY AND STATISTICS (back to top ↑)

  • Probabilistic Modeling and Bayesian Networks (by Borja Calvo and Aritz Pérez. Master in Computational Engineering and Intelligent Systems, University of the Basque Country). 3 ECTS ~ 75 hours.
    1. Introduction to probabilistic modeling.
    2. Modeling of joint probabilities through Bayesian networks.
    3. Automatic learning of Bayesian networks.
    4. Applications of Bayesian networks: supervised classification.
  • Statistical Thinking in Python (Part 1) (by Justin Bois at DataCamp). ~ 3 hours.
    1. Graphical exploratory data analysis.
    2. Quantitative exploratory data analysis.
    3. Thinking probabilistically-- Discrete variables.
    4. Thinking probabilistically-- Continuous variables.
    5. Course Certificate ✓.
  • Developing Data Products (by Brian Caffo, Jeff Leek and Roger D. Peng from Johns Hopkins University at Coursera). ~ 16 hours.
    1. shiny, googleVis, and plotly.
    2. R Markdown and leaflet.
    3. R Packages.
    4. swirl and Course Project.
  • Statistical Thinking in Python (Part 2) (by Justin Bois at DataCamp). ~ 4 hours.
    1. Parameter estimation by optimization.
    2. Bootstrap confidence intervals.
    3. Introduction to hypothesis testing.
    4. Hypothesis test examples.
    5. Putting it all together: a case study.
  • Case Studies in Statistical Thinking (by Justin Bois at DataCamp). ~  4 hours.
    1. Fish sleep and bacteria growth: A review of Statistical Thinking I and II.
    2. Analysis of results of the 2015 FINA World Swimming Championships.
    3. The "Current Controversy" of the 2013 World Championships.
    4. Statistical seismology and the Parkfield region.
    5. Earthquakes and oil mining in Oklahoma.
  • Network Analysis in Python (Part 1) (by Eric Ma at DataCamp). ~ 4 hours.
    1. Introduction to networks.
    2. Important nodes.
    3. Structures.
    4. Bringing it all together.
  • Network Analysis in Python (Part 2) (by Eric Ma at DataCamp). ~ 4 hours.
    1. Bipartite graphs & product recommendation systems.
    2. Graph projections.
    3. Comparing graphs & time-dynamic graphs.
    4. Tying it up!.
  • Statistical Inference (by Brian Caffo, Roger D. Peng and Jeff Leek from Johns Hopkins University at Coursera). ~ 28 hours.
    1. Probability & Expected Values.
    2. Variability, Distribution, & Asymptotics.
    3. Intervals, Testing, & Pvalues.
    4. Power, Bootstrapping, & Permutation Tests.
    5. Course Certificate ✓.
  • Regression Models (by Brian Caffo, Roger D. Peng and Jeff Leek from Johns Hopkins University at Coursera). ~ 16 hours.
    1. Least Squares and Linear Regression.
    2. Linear Regression & Multivariable Regression.
    3. Multivariable Regression, Residuals, & Diagnostics.
    4. Logistic Regression and Poisson Regression.
    5. Course Certificate ✓.
  • Statistical Learning (by Trevor Hastie and Robert Tibshirani from Stanford at Stanford Online). ~ 50 hours.
    1. Introduction.
    2. Overview of Statistical Learning.
    3. Linear Regression.
    4. Classification.
    5. Resampling Methods.
    6. Linear Model Selection and Regularization.
    7. Moving Beyond Linearity.
    8. Tree-Based Methods.
    9. Support Vector Machines.
    10. Unsupervised Learning.
  • Introduction to Data (by Mine Cetinkaya-Rundel at DataCamp). ~ 4 hours.
    1. Language of data.
    2. Study types and cautionary tales.
    3. Sampling strategies and experimental design.
    4. Case study.
  • Exploratory Data Analysis (by Andrew Bray at DataCamp). ~ 4 hours.
    1. Exploring Categorical Data.
    2. Exploring Numerical Data.
    3. Numerical Summaries.
    4. Case Study.
  • Correlation and Regression (by Ben Baumer at DataCamp). ~ 4 hours.
    1. Visualizing two variables.
    2. Correlation.
    3. Simple linear regression.
    4. Interpreting regression models.
    5. Model Fit.
  • Multiple and Logistic Regression (by Ben Baumer at DataCamp). ~ 4 hours.
    1. Parallel Slopes.
    2. Evaluating and extending parallel slopes model.
    3. Multiple Regression.
    4. Logistic Regression.
    5. Case Study: Italian restaurants in NYC.
  • Foundations of Inference (by Jo Hardin at DataCamp). ~ 4 hours.
    1. Introduction to ideas of inference.
    2. Completing a randomization test: gender discrimination.
    3. Hypothesis testing errors: opportunity cost.
    4. Confidence intervals.
  • Foundations of Probability in R (by David Robinson at DataCamp). ~ 4 hours.
    1. The binomial distribution.
    2. Laws of probability.
    3. Bayesian statistics.
    4. Related distributions.
  • Beginning Bayes in R (by Jim Albert at DataCamp). ~ 4 hours.
    1. Introduction to Bayesian thinking.
    2. Learning about a binomial probability.
    3. Learning about a normal mean.
    4. Bayesian comparisons.
  • Spatial Statistics in R (by Barry Rowlingson at DataCamp). ~ 4 hours.
    1. Introduction.
    2. Point Pattern Analysis.
    3. Areal Statistics.
    4. Geostatistics.
  • Sentiment Analysis in R: The Tidy Way (by Julia Silge at DataCamp). ~ 4 hours.
    1. Tweets across the United States.
    2. Shakespeare gets Sentimental.
    3. Analyzing TV News.
    4. Singing a Happy Song (or Sad?!).

8. BIG DATA (back to top ↑)

General (back to top ↑)

  • CS100.1x Introduction to Big Data with Apache Spark (by Anthony D. Joseph from BerkeleyX at edX). ~ 65 hours.
    1. Data Science Background and Course Software Setup.
    2. Introduction to Apache Spark.
    3. Data Management.
    4. Data Quality, Exploratory Data Analysis, and Machine Learning.
    5. Lab 4 Introduction to Machine Learning with Apache Spark.
    6. Course Certificate ✓.
  • CS190.1x Scalable Machine Learning (by Ameet Talwalkar from BerkeleyX at edX). ~ 20 hours.
    1. Course Software Setup.
    2. Course Overview and Machine Learning Basics.
    3. Introduction to Apache Spark.
    4. Linear Regression and Distributed Machine Learning Principles.
    5. Logistic Regression and Click-through Rate Prediction.
    6. Principal Component Analysis and Neuroimaging.
    7. Course Certificate ✓.
  • Introduction to Spark in R using sparklyr (by Richie Cotton at DataCamp). ~ 4 hours.
    1. Light My Fire: Starting To Use Spark With dplyr Syntax.
    2. Tools of the Trade: Advanced dplyr Usage.
    3. Going Native: Use The Native Interface to Manipulate Spark DataFrames.
    4. Case Study: Learning to be a Machine: Running Machine Learning Models on Spark.

9. BOOKS (back to top ↑)

This is a selection of books for Data Science and related disciplines from which I have good references. The books are listed in descending order of publication date.

Title Author Publisher Release Date Code
◻️ Deep Learning with Python Francois Chollet Manning Jan 2018 (*) GitHub
✔️ Python Tricks: The Book Dan Bader Ron Holland Designs Oct 2017
◻️ Python for Data Analysis (2nd ed.) Wes McKinney O'Reilly Oct 2017 GitHub
◻️ Python Machine Learning (2nd ed.) Sebastian Raschka, Vahid Mirjalili Packt Sep 2017 GitHub
◻️ An Introduction to Statistical Learning (2nd ed.) Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani Springer Sep 2017 R code
◻️ Deep Learning Josh Patterson, Adam Gibson O'Reilly Aug 2017
◻️ Fundamentals of Deep Learning Nikhil Buduma O'Reilly Jun 2017 GitHub
◻️ The Elements of Statistical Learning (2nd ed.) Trevor Hastie, Robert Tibshirani, Jerome Friedman Springer May 2017 Datasets
◻️ Practical Statistics for Data Scientists Peter Bruce, Andrew Bruce O'Reilly May 2017 GitHub
◻️ Hands-On Machine Learning with Scikit-Learn and TensorFlow Aurélien Géron O'Reilly Apr 2017 GitHub
◻️ Think Like a Data Scientist Brian Godsey Manning Abr 2017
◻️ Deep Learning Ian Goodfellow, Yoshua Bengio, Aaron Courville MIT Press Jan 2017
◻️ Efficient R Programming Robin Lovelace, Colin Gillespie O'Reilly Dec 2016 GitHub
◻️ Python Data Science Handbook Jake VanderPlas O'Reilly Nov 2016 GitHub
◻️ Introduction to Machine Learning with Python Sarah Guido, Andreas C. Müller O'Reilly Oct 2016 GitHub
◻️ Real-World Machine Learning Henrik Brink, Joseph W. Richards, Mark Fetherolf Manning Sep 2016 GitHub
◻️ Algorithms of the Intelligent Web (2nd ed.) Douglas G. McIlwraith, Haralambos Marmanis, Dmitry Babenko Manning Aug 2016 GitHub
◻️ R for Data Science Garrett Grolemund, Hadley Wickham O'Reilly Jul 2016 GitHub
◻️ Introducing Data Science Davy Cielen, Arno D. B. Meysman, Mohamed Ali Manning May 2016 Code 1, 2
◻️ R Deep Learning Essentials Joshua F. Wiley Packt Mar 2016 GitHub
◻️ R in Action (2nd ed.) Robert I. Kabacoff Manning May 2015 GitHub
◻️ Data Science from Scratch Joel Grus O'Reilly Apr 2015 GitHub
◻️ Data Science at the Command Line Jeroen Janssens O'Reilly Oct 2014 GitHub
✔️ Learning scikit-learn: Machine Learning in Python Raúl Garreta, Guillermo Moncecchi Packt Nov 2013 GitHub

(*) Expected publication date

10. OTHER COURSES IN COMPUTER SCIENCE (back to top ↑)

Software Design (back to top ↑)

  • Domain-Driven Design Distilled (by Vaughn Vernon at Safari). ~ 4 hours.
    1. Introduction.
    2. Lesson 1: DDD for Me.
    3. Lesson 2: Strategic Design with Bounded Contexts and the Ubiquitous Language.
    4. Lesson 3: Strategic Design with Subdomains.
    5. Lesson 4: Strategic Design with Context Mapping.
    6. Lesson 5: Tactical Design with Aggregates.
    7. Lesson 6: Tactical Design with Domain Events.
    8. Lesson 7: Acceleration and Management Tools.
    9. Summary.
  • Microservices: The Big Picture (by Antonio Goncalves at Pluralsight). ~ 2 hours.
    1. Course Overview.
    2. What Are Microservices?
    3. Microservices Elements.
    4. Are Microservices Right for Your Organization?

JavaScript (back to top ↑)

  • Introduction to graphics engines: modeling, animation and graphic representation (by Joseba Makazaga and Aitor Soroa. Master in Computational Engineering and Intelligent Systems, University of the Basque Country). Library three.js. 6 ECTS ~ 150 hours.
    1. Fundamentals: concepts and equipment.
    2. Geometric models: structures.
    3. Modeling Systems.
    4. Imaging systems.
    5. 3d animation and simulation techniques.
    6. Practices.
  • Heuristic Search (by José Antonio Lozano and Roberto Santana. Master in Computational Engineering and Intelligent Systems, University of the Basque Country). 3 ECTS ~ 75 hours.
    1. Introduction to optimization.
    2. Local Search Algorithms.
    3. Population and Hybrid Algorithms.
    4. Multiobjective optimization.
    5. Evaluation of optimization.
  • Python Epiphanies. Exploring Fundamental Concepts (by Stuart Williams at Safari). ~ 2.5 hours.
    1. Introduction to Python Epiphanies.
    2. Objects.
    3. Names.
    4. More About Namespaces.
    5. Import.
    6. Functions.
    7. Decorators.
    8. How Classes Work.
    9. Special Methods.
    10. Iterators and Generators.
    11. Taking Advantage of First Class Objects.
  • Python: Design Patterns (by Jungwoo Ryoo at Lynda.com). ~ 2 hours.
    1. Understanding Design Patterns.
    2. Creational Patterns.
    3. Structural Patterns.
    4. Behavioural Patterns.
    5. Design Best Practices.
  • Enterprise Software with Python (by Mahmoud Hashemi at Safari). ~ 8 hours.
    1. Introduction to Enterprise Software with Python.
    2. Defining the Basics.
    3. Architecture & Design.
    4. Best Practices.
    5. Next Steps.
  • Python: Getting Started (by Bo Milanovich at Pluralsight). ~ 3 hours.
    1. Course overview.
    2. Introduction.
    3. Types, Statements, and Other Goodies.
    4. Functions, Files, Yield, and Lambda.
    5. Object Oriented Programming - Classes and Why Do We Need Them?
    6. Putting It All Together - Let’s Make It a Web App.
    7. Python Tips and Tricks.
  • Python Fundamentals (by Austin Bingham and Robert Smallshire at Pluralsight). ~ 5 hours.
    1. Introduction to the Python Fundamentals Course.
    2. Getting Starting With Python 3.
    3. Strings and Collections.
    4. Modularity.
    5. Objects.
    6. Collections.
    7. Handling exceptions.
    8. Iterables.
    9. Classes.
    10. Files and Resource Management.
    11. Shipping Working and Maintainable Code.
  • Intermediate Python Programming (by Jessica McKellar at Safari). ~ 3 hours.
    1. Introduction.
    2. Wordplay warm-up.
    3. Data structures, a practical intermediate introduction.
    4. Jeopardy database.
    5. Plotting with Matplotlib.
    6. Scraping the NASA Astronomy Picture of the Day Website.
  • Computation in Science and Engineering: numerical simulation (by Ander Murua. Master in Computational Engineering and Intelligent Systems, University of the Basque Country). 6 ECTS ~ 150 hours.
    1. Some examples of initial value problems modeled by differential equations and elementary methods of numerical resolution.
    2. Methods of numerical resolution of ordinary differential equations.
    3. Computational aspects of the numerical resolution of ordinary differential equations. R package deSolve.
    4. Numerical resolution of systems of linear and non-linear algebraic equations.
    5. Special methods for stiff problems.
    6. Introductory examples of numerical resolution of partial differential equations of evolution.
  • Image and signal processing (by Mamen Hernández and Josune Gallego. Master in Computational Engineering and Intelligent Systems, University of the Basque Country). 4.5 ECTS ~ 112.5 hours.
    1. Digital Image Basics.
    2. Processing in the spatial domain.
    3. Processing in the frequency domain.
    4. Morphological processing and image segmentation.
    5. Introduction to sound analysis.
    6. Digital filters for audio processing.
    7. Languages and standards for sound processing.
    8. Sound processing in the time and frequency domain.
  • Cryptography (by Itziar Baragaña and Alicia Roca. Master in Computational Engineering and Intelligent Systems, University of the Basque Country). 4.5 ECTS ~ 112.5 hours.
    1. Introduction to cryptography.
    2. Mathematical fundamentals.
    3. Stream Encryption.
    4. Symmetric block encryption.
    5. Public Key Cryptography.
    6. Applications.

Miscellaneous (back to top ↑)