4.1 — Creating an R Notebook

Objectives

An R notebook is a report-like document that contains chunks of text explaining a dataset, and some interactive, executable R code that can be outputted and displayed in cool ways. This journal covers my experience writing my very first R notebook!
Time estimated: 2 d; taken 2 w; date started: 2020-02-01; date completed: 2020-02-18

Progress

I had a problem creating a new R Notebook in RStudio. I did not have many of the required packages installed. Specifically the error came from a yaml package that RStudio was not able to install.

- Turns out I had to manually install the package in R, and re-open RStudio.
- > installed.packages()
- In R go to Packages & Data > Package Installer. Select yaml package and hit Install Selected

I kept seeing the prompts like Update all/some/none? [a/s/n]: and Do you want to install from sources the package which needs compilation? (Yes/no/cancel) in each run, which was annoying.

I realized the mistake was that I was not running RStudio from the Docker image that was provided for us with the necessary packages. This was much easier to get RStudio up and running remotely!
I was able to Download the data, and supplementary files using the sample code provided
My data contained over 63,000 rows of information, and it was confusing on how to decide what to keep. I decided to go about it in this order:
- First check for any duplicate row name (there were none!)
- Define the groups:
  - "Treatment": tells us which was transduced in this test, it could be either empty, ELF1, cell or R8A
  - "trial_num" - replicate number
  - "Test_run": is just a concatenation of the 2 columns Treatment and mock_or_IFN to get the name of a sample test run
- Then map the 63,678 Ensembl (ENSG) genes to HUGO symbols.
- Remove the rows with low counts (reduced our data to 14935 rows).
- Keep the rows that do not map to a HUGO symbol

Conclusions and Outlook

Working with RStudio was challenging. It was hard to understand why something was not compiling. After some googling I tried uninstalling and re-installing R and RStudio to start fresh, where I still had problems with certain packages like ``dplyr``.
The RPR-GEO2R.R was a little helpful in guiding me initially, but did not provide much information to solve the package issues I was seeing.. Once I got past the issue of setting up RStudio, this was a great reference
I found normalization to be the challenging part of the assignment. I attempted to follow the normalization methods provided in the lecture slides, however I could not see any major differences between my normalized data and the original data. I was not able to make any meaningful interpretations out of post-normalization.
My data included a lot of outliers in both extremes, so I assumed TTM will allow me to remove the upper and lower percentages to see a difference. However I should have also tried another normalization approach.
Ultimately, I was able to produce a file that contained my cleaned, normalized data including the Ensembl genes with their associated HUGO symbols.

This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4.1 — Creating an R Notebook

Objectives

Progress

Conclusions and Outlook

Clone this wiki locally