This repository provides materials for a session that is part of the I2DS Tools for Data Science workshop run at the Hertie School, Berlin in November 2022. The student-run workshop is part of the course Introduction to Data Science taught by Simon Munzert at the Hertie School, Berlin, in Fall 2022.
This session will help you dive into data wrangling with data.table. data.table is an R package that provides an enhanced version of data.frames. Essentially, data.table is a Swiss Army Knife for the entire suit of data wrangling tasks. Importantly, data.table is extremely performance-oriented, making it fast and memory-efficient. Especially for large datasets, data.table outperforms all comparable packages. In the session, we will introduce the mechanics of data.table along a logical sequence of data wrangling tasks. While a new syntax can always seem intimidating at first, it is well worth picking up some data.table basics if you plan to work with big data in R.
There are 5 learning objectives for this session. (1) Grasp the use cases and strengths of data.table, (2) Understand data.table general semantics, (3) Learn to use data.table across different data wrangling tasks, (4) Practically apply what you learnt through exercises, (5) Know how to independently continue your learning journey
- data.table overview at cran.r-project
- DataCamp R For Data Science data.table Cheat Sheet
- Youtube tutorial: data.table in R
The material in this repository is made available under the MIT license.
Gresa Smolica prepared the script and practice materials.
Amin Oueslati prepared the presentation and the script.