R is a powerful programming language and software environment widely used for statistical analysis, data visualization, and machine learning. It provides a vast array of tools and libraries that make it a popular choice among data scientists, statisticians, and researchers.
R excels in statistical analysis and is equipped with a rich set of functions for descriptive statistics, hypothesis testing, regression analysis, time series analysis, and multivariate techniques. This makes it a preferred choice for researchers and analysts working with data from various fields, such as social sciences, finance, healthcare, and environmental studies.
Moreover, R offers exceptional data visualization capabilities. Its default plotting system allows users to create a wide variety of static and interactive visualizations to explore and present data effectively. Additionally, packages like ggplot2
provide a grammar of graphics approach, enabling users to construct complex and customizable plots with ease.
In recent years, R has gained popularity in the field of machine learning. Packages such as caret
, randomForest
, and keras
offer powerful tools for building and evaluating predictive models. R's integration with other languages, such as Python, allows users to leverage popular machine learning frameworks like TensorFlow and scikit-learn within their R workflow.
Download and install both R and RStudio: https://posit.co/download/rstudio-desktop/
This tutorial consists of R markdown files. Kindly refer to this video on how to work with R markdown files on RStudio: https://www.youtube.com/watch?v=DNS7i2m4sB0
If you have Git installed, run the following command on the terminal:
git clone https://github.com/bioinfodlsu/basic-r-tutorial
If Git is not installed, click the green Code
button near the top right of the repository and choose Download ZIP
. Once the zipped folder has been downloaded, extract its contents.
- Introduction to R Syntax
- Groups of Data: Vectors, Matrices & Lists
- Learn R: Data Frames
- Manipulating Data with
dplyr
- Learn R: Fundamentals of Data Visualization with
ggplot2
- Descriptive Statistics
- Inferential Statistics
This tutorial references the following resources:
- https://www.kaggle.com/code/hamelg/intro-to-r-index/notebook
- https://uclouvain-cbio.github.io/WSBIM1207/sec-dplyr.html
The dataset we use in this tutorial was downloaded using INPHARED last September 2022:
- Cook, R., Brown, N., Redgwell, T., Rihtman, B., Barnes, M., Clokie, M., Stekel, D. J., Hobman, J. L., Jones, M. A., & Millard, A. (2021). INfrastructure for a PHAge REference Database: Identification of large-scale biases in the current collection of cultured phage genomes. PHAGE, 2(4), 214-223. http://doi.org/10.1089/phage.2021.0007
-
Daphne Janelyn L. Go
[email protected] -
Mark Edward M. Gonzales
[email protected]
These materials were originally created for the Basic R Workshop, jointly organized by the Bioinformatics Lab with the Systems and Computational Biology Unit, De La Salle University, last July 12, 2023.