Skip to content

Geo_and_GEOmetadb

dxjasmine edited this page Jan 31, 2020 · 3 revisions

Table of Contents

Objective

  Time estimated: 60 mins; taken 60 mins; date started: 2020-01-30; date completed: 2020-01-31

Procedures

1. install package

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
if (!requireNamespace("GEOmetadb", quietly = TRUE))
  BiocManager::install("GEOmetadb")
library(GEOmetadb)

2. get the most updated meta data: SQLite file

file.info('GEOmetadb.sqlite')
con <- dbConnect(SQLite(),'GEOmetadb.sqlite')

3.connect to GEO meta data database

con <- dbConnect(SQLite(),'GEOmetadb.sqlite')

4.construct sql-based query. I am looking for datasets that are:

  • RNASeq data
  • human
  • dataset from within 5 years
  • related to lung cancer
  • supplementary file is counts
sql <- paste("SELECT DISTINCT gse.title,gse.gse, gpl.title,",
             " gse.submission_date,",
             " gse.supplementary_file",
             "FROM",
             "  gse JOIN gse_gpl ON gse_gpl.gse=gse.gse",
             "  JOIN gpl ON gse_gpl.gpl=gpl.gpl",
             "WHERE",
             "  gse.submission_date > '2015-01-01' AND",
             "  gse.title LIKE '%lUNG CANCER%' AND", 
             "  gpl.organism LIKE '%Homo sapiens%' AND",
             "  gpl.title LIKE '%HiSeq%' ",
             "  ORDER BY gse.submission_date DESC",sep=" ")
5.run query and ensure the samples contains counts data
rs <- dbGetQuery(con,sql)
counts_files <- rs$supplementary_file[grep(rs$supplementary_file,
                                           pattern = "count",ignore.case = TRUE)]

Results

SQLite file
 It is the SQL database that sore the meta data associated with all GEO DATASET INCLUDING GSM,GPL,GSE,GDS
 Once downloaded, it enable sql queries among the database.  
 ref:https://bioconductor.org/packages/devel/bioc/vignettes/GEOmetadb/inst/doc/GEOmetadb.html

Conclusion

Outlook for the next taks

Note and Reference



              
Clone this wiki locally