-
Notifications
You must be signed in to change notification settings - Fork 1
Geo_and_GEOmetadb
dxjasmine edited this page Jan 31, 2020
·
3 revisions
Time estimated: 60 mins; taken 60 mins; date started: 2020-01-30; date completed: 2020-01-31
1. install package
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") if (!requireNamespace("GEOmetadb", quietly = TRUE)) BiocManager::install("GEOmetadb") library(GEOmetadb)
2. get the most updated meta data: SQLite file
file.info('GEOmetadb.sqlite') con <- dbConnect(SQLite(),'GEOmetadb.sqlite')
3.connect to GEO meta data database
con <- dbConnect(SQLite(),'GEOmetadb.sqlite')
4.construct sql-based query. I am looking for datasets that are:
- RNASeq data
- human
- dataset from within 5 years
- related to lung cancer
- supplementary file is counts
sql <- paste("SELECT DISTINCT gse.title,gse.gse, gpl.title,", " gse.submission_date,", " gse.supplementary_file", "FROM", " gse JOIN gse_gpl ON gse_gpl.gse=gse.gse", " JOIN gpl ON gse_gpl.gpl=gpl.gpl", "WHERE", " gse.submission_date > '2015-01-01' AND", " gse.title LIKE '%lUNG CANCER%' AND", " gpl.organism LIKE '%Homo sapiens%' AND", " gpl.title LIKE '%HiSeq%' ", " ORDER BY gse.submission_date DESC",sep=" ")5.run query and ensure the samples contains counts data
rs <- dbGetQuery(con,sql) counts_files <- rs$supplementary_file[grep(rs$supplementary_file, pattern = "count",ignore.case = TRUE)]
- SQLite file
It is the SQL database that sore the meta data associated with all GEO DATASET INCLUDING GSM,GPL,GSE,GDS Once downloaded, it enable sql queries among the database. ref:https://bioconductor.org/packages/devel/bioc/vignettes/GEOmetadb/inst/doc/GEOmetadb.html