You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi
I have a list ~500 refseq accession numbers (all bacterial genomes), ~100 of them are completed genomes (CP009681 etc.) while others are assemblies (LALG00000000.1 etc). Most of them are not published and therefore do not have PMID number associate with it. My goal is identify the accession numbers which are published and extract the PMID associated with them. In other words, I want to extract PMID for each accession number if available. accessions.txt. Accession numbers are in a file, one accession per line. Here is what I have done so far-
Hi
I have a list ~500 refseq accession numbers (all bacterial genomes), ~100 of them are completed genomes (CP009681 etc.) while others are assemblies (LALG00000000.1 etc). Most of them are not published and therefore do not have PMID number associate with it. My goal is identify the accession numbers which are published and extract the PMID associated with them. In other words, I want to extract PMID for each accession number if available.
accessions.txt. Accession numbers are in a file, one accession per line. Here is what I have done so far-
cat accessions.txt | epost -db nuccore -format acc | elink -target pubmed | efetch -format xml | xtract -pattern PubmedArticleSet -element PMID
27152133 26048971 25767217 25250641 24970829 24962815 24723721 24051324 23770143
The above output is not in correct format. I need the output in this format-
CP009681 27152133
CP010295 25767217
CP010296 25767217
CP007176 25250641
LALG01000000 26048971
LALH01000000 26048971
LALI01000000 26048971
Any help will be highly appreciated
Tauqeer
The text was updated successfully, but these errors were encountered: