-
Notifications
You must be signed in to change notification settings - Fork 1
4.1 — Creating an R Notebook
narasi15 edited this page Feb 18, 2020
·
13 revisions
An R notebook is a report-like document that contains chunks of text explaining a dataset, and some interactive, executable R code that can be outputted and displayed in cool ways. This journal covers my experience writing my very first R notebook!
Time estimated: 2 d; taken 2 w; date started: 2020-02-01; date completed: 2020-02-18
- I had a problem creating a new R Notebook in RStudio. I did not have many of the required packages installed. Specifically the error came from a yaml package that RStudio was not able to install.
- Turns out I had to manually install the package in R, and re-open RStudio.
- > installed.packages()
- In R go to Packages & Data > Package Installer. Select yaml package and hit Install Selected
- I kept seeing the prompts like Update all/some/none? [a/s/n]: and Do you want to install from sources the package which needs compilation? (Yes/no/cancel) in each run, which was annoying.
- I realized the mistake was that I was not running RStudio from the Docker image that was provided for us with the necessary packages. This was much easier to get RStudio up and running remotely!
- I was able to Download the data, and supplementary files using the sample code provided
- My data contained over 63,000 rows of information, and it was confusing on how to decide what to keep. I decided to go about it in this order:
- First check for any duplicate row name (there were none!)
- Define the groups:
- "Treatment": tells us which was transduced in this test, it could be either empty, ELF1, cell or R8A
- "trial_num" - replicate number
- "Test_run": is just a concatenation of the 2 columns Treatment and mock_or_IFN to get the name of a sample test run
- "Treatment": tells us which was transduced in this test, it could be either empty, ELF1, cell or R8A
- Then map the 63,678 Ensembl (ENSG) genes to HUGO symbols.
- Remove the rows with low counts (reduced our data to 14935 rows).
- Keep the rows that do not map to a HUGO symbol
- First check for any duplicate row name (there were none!)
- Working with RStudio was challenging. It was hard to understand why something was not compiling. After some googling I tried uninstalling and re-installing R and RStudio to start fresh, where I still had problems with certain packages like ``dplyr``.
- The RPR-GEO2R.R was a little helpful in guiding me initially, but did not provide much information to solve the package issues I was seeing.. Once I got past the issue of setting up RStudio, this was a great reference
- I found normalization to be the challenging part of the assignment. I attempted to follow the normalization methods provided in the lecture slides, however I could not see any major differences between my normalized data and the original data. I was not able to make any meaningful interpretations out of post-normalization.
- My data included a lot of outliers in both extremes, so I assumed TTM will allow me to remove the upper and lower percentages to see a difference. However I should have also tried another normalization approach.
- Ultimately, I was able to produce a file that contained my cleaned, normalized data including the Ensembl genes with their associated HUGO symbols.
This copyrighted material is licensed under a Creative Commons Attribution 4.0 International License. Follow the link to learn more.