Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting Pubmed ID (PMID) for a list of refseq accession number #40

Open
tauqeer9 opened this issue Apr 28, 2019 · 2 comments
Open

Extracting Pubmed ID (PMID) for a list of refseq accession number #40

tauqeer9 opened this issue Apr 28, 2019 · 2 comments

Comments

@tauqeer9
Copy link

Hi
I have a list ~500 refseq accession numbers (all bacterial genomes), ~100 of them are completed genomes (CP009681 etc.) while others are assemblies (LALG00000000.1 etc). Most of them are not published and therefore do not have PMID number associate with it. My goal is identify the accession numbers which are published and extract the PMID associated with them. In other words, I want to extract PMID for each accession number if available.
accessions.txt. Accession numbers are in a file, one accession per line. Here is what I have done so far-

cat accessions.txt | epost -db nuccore -format acc | elink -target pubmed | efetch -format xml | xtract -pattern PubmedArticleSet -element PMID

27152133 26048971 25767217 25250641 24970829 24962815 24723721 24051324 23770143

The above output is not in correct format. I need the output in this format-

CP009681 27152133
CP010295 25767217
CP010296 25767217
CP007176 25250641
LALG01000000 26048971
LALH01000000 26048971
LALI01000000 26048971

Any help will be highly appreciated

Tauqeer

@vkkodali
Copy link

This is how I would do it:

for acc in `cat accessions.txt` ; do 
    pmid=$(esearch -db nuccore -query $acc | elink -db nuccore -target pubmed | esummary -format uid);
    echo -e "$acc\t$pmid" ; 
done
CP009681        27152133
CP009828
CP010295        25767217
CP010296        25767217
CP006630
CP007176        25250641
LALG00000000.1  26048971
LALH00000000.1  26048971

@tauqeer9
Copy link
Author

Thank you so much. It works perfectly alright, and saves lot of time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants