Get publication counts per gene per year #39

sanyalab · 2019-03-29T14:01:20Z

Hi

I would like to write a edirect query to extract number of publications per gene per year. The group I am interested in is Viridiplantae. So for all species under this group, given a date range, I would like to get the publication count for each gene in that species. The final output that I am looking for is something like

YEAR Genus_Species Gene_Symbol Publication_Count
1970 Arabidopsis thaliana PHYA 3
1971 Arabidopsis thaliana PHYA 2

I can get [PDAT] to work for -db pubmed but not [GENE] or [ORGN]. Need Help. Thanks

The text was updated successfully, but these errors were encountered:

vkkodali · 2019-03-29T14:41:47Z

You may want to take a look at the gene2pubmed.gz file to see if you can use data from there. For a given list of taxids, you can get a list of all PMIDs associated with each GeneID. From there, you can probably join the PDAT for each PMID.

sanyalab · 2019-03-30T14:03:50Z

Hi

I have gotten this far. For an example gene id (816394) in taxon Arabidopsis thaliana (txid3702) I can get the count of all the pubmed articles related to this gene

esearch -db gene -query "txid3702[Organism:exp] AND 816394[UID]" | elink -target pubmed

After this the next step is to download in xml or docsum format the articles and filter the articles by date [PDAT] of publication. This is the strategy I am using. I used this next command but the error was "Too many requests"

esearch -db gene -query "txid3702[Organism:exp] AND 816394[UID]" | elink -target pubmed | efetch -format xml | xtract -pattern PubmedArticle -element PubDate

I don't know how to get around this. Thanks for the help

vkkodali · 2019-03-31T11:12:04Z

the error was "Too many requests"

Are you using the eUtils API keys?

sanyalab · 2019-04-01T00:45:53Z

Hi vkkodali

The initial part of the error looks like below

429 Too Many Requests
No do_post output returned from 'https://eutils.be-md.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&query_key=3&WebEnv=NCID_1_19870438_130.14.18.97_9001_1553967351_471391598_0MetA0_S_MegaStore&rettype=text&retmode=text&retstart=0&retmax=100&edirect=7.40&tool=edirect&email=sanyalab@lxjh218'
Result of do_post http request is
$VAR1 = bless( {
                 '_protocol' => 'HTTP/1.1',
                 '_content' => '{"error":"API rate limit exceeded","api-key":"170.54.61.190","count":"4","limit":"3"}',
                 '_rc' => 429,
                 '_headers' => bless( {
                                        'connection' => 'close',
                                        'x-ratelimit-limit' => '3',
                                        'date' => 'Sat, 30 Mar 2019 17:35:51 GMT',
                                        'vary' => 'Accept-Encoding',
                                        'client-peer' => '130.14.29.110:443',

In the latter part I get a truncated output. The query should yield me 154 articles. I get 54. Thanks for the help.

vkkodali · 2019-04-01T01:15:04Z

You need to create an API key as mentioned in the 'How do I get a key?' section here. After that, you need to either run the following command before executing esearch, or for a moe permanent fix, add it to your .bashrc file:

export NCBI_API_KEY='abcdef1234567890'

sanyalab · 2019-04-01T04:00:34Z

It worked!!! Thanks a bunch vkkodali.

One unrelated comment. I download specific EST, cDNA datasets from NCBI every quarter. I use a combination of epost and efetch. There too I face this issue sometimes, and I rerun after a gap of 250 seconds. exporting the API_KEY should take care of this too right?

Thanks for your help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get publication counts per gene per year #39

Get publication counts per gene per year #39

sanyalab commented Mar 29, 2019

vkkodali commented Mar 29, 2019

sanyalab commented Mar 30, 2019

vkkodali commented Mar 31, 2019

sanyalab commented Apr 1, 2019

vkkodali commented Apr 1, 2019

sanyalab commented Apr 1, 2019

Get publication counts per gene per year #39

Get publication counts per gene per year #39

Comments

sanyalab commented Mar 29, 2019

vkkodali commented Mar 29, 2019

sanyalab commented Mar 30, 2019

vkkodali commented Mar 31, 2019

sanyalab commented Apr 1, 2019

vkkodali commented Apr 1, 2019

sanyalab commented Apr 1, 2019