Extracting Pubmed ID (PMID) for a list of refseq accession number #40

tauqeer9 · 2019-04-28T04:55:48Z

Hi
I have a list ~500 refseq accession numbers (all bacterial genomes), ~100 of them are completed genomes (CP009681 etc.) while others are assemblies (LALG00000000.1 etc). Most of them are not published and therefore do not have PMID number associate with it. My goal is identify the accession numbers which are published and extract the PMID associated with them. In other words, I want to extract PMID for each accession number if available.
accessions.txt. Accession numbers are in a file, one accession per line. Here is what I have done so far-

cat accessions.txt | epost -db nuccore -format acc | elink -target pubmed | efetch -format xml | xtract -pattern PubmedArticleSet -element PMID

27152133 26048971 25767217 25250641 24970829 24962815 24723721 24051324 23770143

The above output is not in correct format. I need the output in this format-

CP009681 27152133
CP010295 25767217
CP010296 25767217
CP007176 25250641
LALG01000000 26048971
LALH01000000 26048971
LALI01000000 26048971

Any help will be highly appreciated

Tauqeer

vkkodali · 2019-04-29T14:55:32Z

This is how I would do it:

for acc in `cat accessions.txt` ; do 
    pmid=$(esearch -db nuccore -query $acc | elink -db nuccore -target pubmed | esummary -format uid);
    echo -e "$acc\t$pmid" ; 
done
CP009681        27152133
CP009828
CP010295        25767217
CP010296        25767217
CP006630
CP007176        25250641
LALG00000000.1  26048971
LALH00000000.1  26048971

tauqeer9 · 2019-04-29T22:28:38Z

Thank you so much. It works perfectly alright, and saves lot of time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extracting Pubmed ID (PMID) for a list of refseq accession number #40

Extracting Pubmed ID (PMID) for a list of refseq accession number #40

tauqeer9 commented Apr 28, 2019

vkkodali commented Apr 29, 2019

tauqeer9 commented Apr 29, 2019

Extracting Pubmed ID (PMID) for a list of refseq accession number #40

Extracting Pubmed ID (PMID) for a list of refseq accession number #40

Comments

tauqeer9 commented Apr 28, 2019

vkkodali commented Apr 29, 2019

tauqeer9 commented Apr 29, 2019