Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get publication counts per gene per year #39

Open
sanyalab opened this issue Mar 29, 2019 · 6 comments
Open

Get publication counts per gene per year #39

sanyalab opened this issue Mar 29, 2019 · 6 comments

Comments

@sanyalab
Copy link

Hi

I would like to write a edirect query to extract number of publications per gene per year. The group I am interested in is Viridiplantae. So for all species under this group, given a date range, I would like to get the publication count for each gene in that species. The final output that I am looking for is something like

YEAR Genus_Species Gene_Symbol Publication_Count
1970 Arabidopsis thaliana PHYA 3
1971 Arabidopsis thaliana PHYA 2

I can get [PDAT] to work for -db pubmed but not [GENE] or [ORGN]. Need Help. Thanks

@vkkodali
Copy link

You may want to take a look at the gene2pubmed.gz file to see if you can use data from there. For a given list of taxids, you can get a list of all PMIDs associated with each GeneID. From there, you can probably join the PDAT for each PMID.

@sanyalab
Copy link
Author

Hi

I have gotten this far. For an example gene id (816394) in taxon Arabidopsis thaliana (txid3702) I can get the count of all the pubmed articles related to this gene

esearch -db gene -query "txid3702[Organism:exp] AND 816394[UID]" | elink -target pubmed

After this the next step is to download in xml or docsum format the articles and filter the articles by date [PDAT] of publication. This is the strategy I am using. I used this next command but the error was "Too many requests"

esearch -db gene -query "txid3702[Organism:exp] AND 816394[UID]" | elink -target pubmed | efetch -format xml | xtract -pattern PubmedArticle -element PubDate

I don't know how to get around this. Thanks for the help

@vkkodali
Copy link

the error was "Too many requests"

Are you using the eUtils API keys?

@sanyalab
Copy link
Author

sanyalab commented Apr 1, 2019

Hi vkkodali

The initial part of the error looks like below

429 Too Many Requests
No do_post output returned from 'https://eutils.be-md.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&query_key=3&WebEnv=NCID_1_19870438_130.14.18.97_9001_1553967351_471391598_0MetA0_S_MegaStore&rettype=text&retmode=text&retstart=0&retmax=100&edirect=7.40&tool=edirect&email=sanyalab@lxjh218'
Result of do_post http request is
$VAR1 = bless( {
                 '_protocol' => 'HTTP/1.1',
                 '_content' => '{"error":"API rate limit exceeded","api-key":"170.54.61.190","count":"4","limit":"3"}',
                 '_rc' => 429,
                 '_headers' => bless( {
                                        'connection' => 'close',
                                        'x-ratelimit-limit' => '3',
                                        'date' => 'Sat, 30 Mar 2019 17:35:51 GMT',
                                        'vary' => 'Accept-Encoding',
                                        'client-peer' => '130.14.29.110:443',

In the latter part I get a truncated output. The query should yield me 154 articles. I get 54. Thanks for the help.

@vkkodali
Copy link

vkkodali commented Apr 1, 2019

You need to create an API key as mentioned in the 'How do I get a key?' section here. After that, you need to either run the following command before executing esearch, or for a moe permanent fix, add it to your .bashrc file:

export NCBI_API_KEY='abcdef1234567890'

@sanyalab
Copy link
Author

sanyalab commented Apr 1, 2019

It worked!!! Thanks a bunch vkkodali.

One unrelated comment. I download specific EST, cDNA datasets from NCBI every quarter. I use a combination of epost and efetch. There too I face this issue sometimes, and I rerun after a gap of 250 seconds. exporting the API_KEY should take care of this too right?

Thanks for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants