Skip to content

10 — Assignment #2

narasi15 edited this page Mar 4, 2020 · 16 revisions

Table of Contents

Objectives

This journal entry will outline my progress while completing assignment 2.
Time estimated: 2 d; taken: 2+ d; date started: 2020-03-01; date completed:


Progress

Here is some information regarding the dataset that was used in this assignment. This dataset had to be re-created since I was not able to create a dataset with all rows having unique HUGO symbols which are defined as rownames of the dataframe.

Steps to get RStudio up and running (review):

  • Open up Docker QuickStart Terminal
  • docker pull risserlin/bcb420-base-image
  • docker run -e PASSWORD=pass --rm -p 8787:8787 risserlin/bcb420-base-image
  • Go to ip_address_of_the_docker_machine:8787 on browser to see RStudio, log in with rstudio, pass
  • Use docker ps to check all the running containers on whatever ports
  • Use control + c to terminate docker run process

Dataset: GSE136864
Note: Refer to journal entry Assignment 1 for details on the experiment.

Problem 1: Could not read csv normalized data file in R, into a table
  • I executed the R Notebook from A1, and added the following line to output my collected, cleaned, normalized data as a csv file. I also had to set the working directory to the Files Pane Location from Session -> Set Working Directory. When I was able to export and download the file to my local directory, I had to manually shift the HUGO symbol and Ensembl gene identifier columns to the preferred spots.
write.table(counts_filtered, file="counts_filtered.csv", sep=",")
I saw this error when I tried to load my csv into R
Error in `[.data.frame`(normalized_count_data, , 3:ncol(normalized_count_data)) : undefined columns selected

  • That didn't work, so I changed it to txt file, and was able to modify the columns within R.
# Modify the table, rename some columns
counts_filt <- counts_filtered
counts_filt
rownames(counts_filt) <- c()
counts_filt <- counts_filt[, c(1, 19, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)]
names(counts_filt)[1] <- "ensembl_gene_id"
names(counts_filt)[2] <- "hgnc_symbol"
#counts_filt$ensembl_gene_id <- rownames(counts_filt)
counts_filt

# write to a txt file
write.table(counts_filt, file="counts_filtered.txt", sep="")
  • Finally realized that I was using read.table when I should have used read.csv method. The latter helped me load my normalized data.
  • I had some issues with using model_matrix to assemble the factors that I wanted to use.
  • My heatmap also seemed to be too big as I kept getting a issues with no space




Conclusions and Outlook

  • I found the majority of the steps, and procedures easy to understand
  • Unfortunately due to lack of time, and balancing other course assignments and midterms, I was not able to complete the assignment. I have finished the majority of it however.

External Resources