Skip to content

Commit

Permalink
Merge pull request #49 from dvklopfenstein/dvk
Browse files Browse the repository at this point in the history
Summarize NIH citation data for a set of PMIDs
  • Loading branch information
dvklopfenstein authored Dec 31, 2022
2 parents 52f7bb3 + e266295 commit 25b8e24
Show file tree
Hide file tree
Showing 17 changed files with 346 additions and 65 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## Summary

* [**Unreleased**](#unreleased)
* [**Release 2022-12-22 v0.0.41**](#release-2022-12-22-v0042) Added summarize_papers script
* [**Release 2022-12-06 v0.0.41**](#release-2022-12-06-v0041) setup.py updates for make target, install
* [**Release 2022-11-26 v0.0.40**](#release-2022-11-28-v0040) Added pmidcite.scripts.icite; pip3, not pip from Python2
* [**Release 2022-11-26 v0.0.38**](#release-2022-11-26-v0038) Added instructions, and console_script to run script, icite
Expand Down Expand Up @@ -42,6 +43,11 @@

### Unreleased

### release 2022-12-30 v0.0.42
* ADDED summarize_papers script
* ADDED requests package as a pre-requisite
* CHANGED API to NCBI E-utils such that a missing LID (Local ID) is ignored on a PubMed entry

### release 2022-12-06 v0.0.41
* CHANGED setup.py PACKAGES variable to run install make target

Expand Down
56 changes: 54 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# PubMed ID (PMID) Cite
# PubMedj ID (PMID) Cite

[![Tweet](https://img.shields.io/twitter/url/http/shields.io.svg?style=social)](https://twitter.com/intent/tweet?text=Python%20library%20to%20download%20pubmed%20citation%20counts%20and%20data,%20given%20a%20PMID&url=https://github.com/dvklopfenstein/pmidcite&via=dvklopfenstein&hashtags=pubmed,pmid,citations,pubmed2cite,writingtips,scientificwriting)
[![build](https://github.com/dvklopfenstein/pmidcite/actions/workflows/build.yml/badge.svg)](https://github.com/dvklopfenstein/pmidcite/actions/workflows/build.yml)
Expand All @@ -20,6 +20,7 @@ Contact: [email protected]
* [**1) Download citation counts and data for a research paper**](https://github.com/dvklopfenstein/pmidcite#1-download-citation-counts-and-data-for-a-research-paper)
* [**2) Forward citation search**](https://github.com/dvklopfenstein/pmidcite#2-forward-citation-search): following a paper's *Cited by* links or *Forward snowballing*
* [**3) Backward citation search**](https://github.com/dvklopfenstein/pmidcite#3-backward-citation-search): following the links to a paper's references or *Backward snowballing*
* [**4) Summarize a group of citations**](https://github.com/dvklopfenstein/pmidcite#4-summarize-a-group-of-citations)

## 1) Download citation counts and data for a research paper
```$ icite -H 26032263```
Expand Down Expand Up @@ -56,6 +57,57 @@ Also known as following links to a paper's references or *Backward snowballing*
or
```$ icite -H; icite 26032263 -r | sort -k6 -r```

## 4) Summarize a group of citations
* 4a) Examine a paper with PMID `30022098`. Print the column headers(`-H`):
`icite -H 30022098`
* 4b) Download the details about each paper(`-c`) that cites `30022098` into a file(`-o goatools_cites.txt`):
`icite 30022098 -c -o goatools_cites.txt`
* 4c) Summarize the overall performace of the 300+ citing papers contained in `goatools_cites.txt`
`summarize_papers goatools_cites.txt -p TOP CIT CLI`

### 4a) Examine a paper with PMID `30022098`. Print the column headers(`-H`):
```
$ icite -H 30022098
COL 2 3 4 5 6 7 8 9 10 au[11](authors)
TYP PMID RP HAMCc % G YEAR cit cli ref au[00](authors) title
TOP 30022098 R. .A..c 100 4 2018 318 1 23 au[14](D V Klopfenstein) GOATOOLS: A Python library for Gene Ontology analyses.
```

Paper with PMID `30022098` is cited by `318`(`cit`) other research papers and `1`(`cli`) clinical study. It has `23` references(`ref`).

### 4b) Download the details about each paper(`-c`) that cites `30022098` into a file(`-o goatools_cites.txt`):
```
$ icite 30022098 -c -o goatools_cites.txt
```

The requested paper (PMID=`30022098`) is described in one one line in `goatools_cites.txt`:
```
$ grep TOP goatools_cites.txt
TOP 30022098 R. .A..c 100 4 2018 318 1 23 au[14](D V Klopfenstein) GOATOOLS: A Python library for Gene Ontology analyses.
```

The paper (PMID=`30022098`) is cited by 381(`CIT`) research papers plus 1(`CLI`) clinical study:
```
$ grep CIT goatools_cites.txt | wc -l
318
$ grep CLI goatools_cites.txt | wc -l
1
```

### 4c) Summarize all the papers in `goatools_cites.txt`
**NEW FUNCTIONALITY; INPUT REQUESTED: What would you like to see?** [Open an issue](https://github.com/dvklopfenstein/pmidcite/issues) to comment.
```
$ summarize_papers goatools_cites.txt -p TOP CIT CLI
i=033.4% 4=003.4% 3=020.9% 2=021.9% 1=015.9% 0=004.4% 4 years:2018-2022 320 papers goatools_cites.txt
```

* Output is on one line so many files containing sets of PMIDs may be compared. TBD: Add multiline verbose option.
* The groups are from newest(`i`) to top-performing(`4`), great(`3`), very good(`2`), and overlooked(`1` and `0`)
* The percentages of papers in `goatools_citations.txt` in each group follow the group name



# PubMed vs Google Scholar
<p align="center">
<img src="https://github.com/dvklopfenstein/pmidcite/raw/main/docs/images/Search_Features_GS_v_PubMed.png" alt="Google Scholar vs PubMed" width="600"/>
Expand Down Expand Up @@ -456,4 +508,4 @@ Fiorini N ... Lu Zhiyong
[email protected]
https://orcid.org/0000-0003-0161-7603

Copyright (C) 2019-present [pmidcite](https://dvklopfenstein.github.io/pmidcite/), DV Klopfenstein. All rights reserved.
Copyright (C) 2019-present [pmidcite](https://dvklopfenstein.github.io/pmidcite/), DV Klopfenstein, PhD. All rights reserved.
3 changes: 3 additions & 0 deletions makefile
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ p:
d:
find src -regextype posix-extended -regex "[a-z./]*" -type d

cli:
find src/pmidcite/cli -name \*.py

diff0:
git diff --compact-summary

Expand Down
11 changes: 8 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
from setuptools import setup
# import versioneer

__copyright__ = 'Copyright (C) 2019, DV Klopfenstein, PhD. All rights reserved'
__author__ = 'DV Klopfenstein, PhD'

NAME = 'pmidcite'

Expand Down Expand Up @@ -42,7 +44,7 @@ def get_long_description():
setup(
name=NAME,
## version=versioneer.get_version(),
version='0.0.41',
version='0.0.42',
author='DV Klopfenstein, PhD',
author_email='[email protected]',
## cmdclass=versioneer.get_cmdclass(),
Expand All @@ -55,6 +57,7 @@ def get_long_description():
entry_points={
'console_scripts':[
'icite=pmidcite.scripts.icite:main',
'summarize_papers=pmidcite.scripts.icite:summarize_papers',
],
},
# https://pypi.org/classifiers/
Expand All @@ -68,9 +71,11 @@ def get_long_description():
'Topic :: Scientific/Engineering :: Information Analysis',
],
url='http://github.com/dvklopfenstein/pmidcite',
description="Augment's a PubMed literature search with citation data from NIH-OCC's iCite.",
description="Turbocharge a PubMed literature search using citation data from the NIH",
# https://packaging.python.org/guides/making-a-pypi-friendly-readme/
long_description=get_long_description(),
long_description_content_type='text/markdown',
# install_requires=['docopt'],
install_requires=['requests'],
)

# Copyright (C) 2019, DV Klopfenstein, PhD. All rights reserved
6 changes: 3 additions & 3 deletions src/bin/icite.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
#!/usr/bin/env python3
"""Given a PubMed ID (PMID), return a list of citing publications"""

__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein. All rights reserved."
__author__ = "DV Klopfenstein"
__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved."
__author__ = "DV Klopfenstein, PhD"

from pmidcite.cli.icite import NIHiCiteCli # get_argparser
from pmidcite.cfg import get_cfgparser
Expand All @@ -16,4 +16,4 @@ def main():
if __name__ == '__main__':
main()

# Copyright (C) 2019-present, DV Klopfenstein. All rights reserved.
# Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved.
2 changes: 1 addition & 1 deletion src/pmidcite/__version__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
"""Version of pmidcite project"""

__version__ = '0.0.41'
__version__ = '0.0.42'
29 changes: 14 additions & 15 deletions src/pmidcite/cfg.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Manage pmidcite Configuration"""

__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein. All rights reserved."
__author__ = "DV Klopfenstein"
__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved."
__author__ = "DV Klopfenstein, PhD"

from os import environ
from os import getcwd
Expand Down Expand Up @@ -57,7 +57,7 @@ class Cfg(object):
}

def __init__(self, check=True, prt=stdout, prt_fullname=True):
self.cfgfile = self._init_cfgfilename()
self.cfgfile = self._init_cfgfilename(prt)
self.cfgparser = self._get_dflt_cfgparser()
if check:
self._run_chk(prt, prt_fullname)
Expand Down Expand Up @@ -135,14 +135,14 @@ def _get_dirname_str(dirname):
"""Convert None to the str, "None", as needed by configparser"""
return 'None' if dirname is None or dirname == 'None' else dirname

def get_nihgrouper(self):
def get_nihgrouper(self, min1=None, min2=None, min3=None, min4=None):
"""Get an NIH Grouper with default values from the cfg file"""
cfg = self.cfgparser['pmidcite']
return NihGrouper(
float(cfg['group1_min']),
float(cfg['group2_min']),
float(cfg['group3_min']),
float(cfg['group4_min']))
float(cfg['group1_min'] if not min1 else min1),
float(cfg['group2_min'] if not min2 else min2),
float(cfg['group3_min'] if not min3 else min3),
float(cfg['group4_min'] if not min4 else min4))

def _run_chk(self, prt, prt_fullname):
if not self.rd_rc(prt, prt_fullname):
Expand Down Expand Up @@ -215,19 +215,18 @@ def prt_rcfile_dflt(self, prt=stdout):
cfgparser = self._get_dflt_cfgparser()
cfgparser.write(prt)

def _init_cfgfilename(self):
def _init_cfgfilename(self, prt=None):
"""Get the configuration filename"""
if self.envvar in environ:
cfgfile = environ[self.envvar]
if exists(cfgfile):
return cfgfile
print('**WARNING: NO pmidcite CONFIG FILE FOUND AT {ENVVAR}={F}'.format(
F=cfgfile, ENVVAR=self.envvar))
if not exists(self.dfltcfgfile):
print('**WARNING: NO pmidcite CONFIG FILE FOUND: {F}'.format(
F=self.dfltcfgfile))
if prt:
prt.write(f'**WARNING: NO pmidcite CONFIG FILE FOUND AT {self.envvar}={cfgfile}\n')
if not exists(self.dfltcfgfile) and prt:
prt.write(f'**WARNING: NO pmidcite CONFIG FILE FOUND: {self.dfltcfgfile}\n')
return self.dfltcfgfile



# Copyright (C) 2019-present DV Klopfenstein. All rights reserved.
# Copyright (C) 2019-present DV Klopfenstein, PhD. All rights reserved.
16 changes: 8 additions & 8 deletions src/pmidcite/cli/icite.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Manage args for NIH iCite run for one PubMed ID (PMID)"""

__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein. All rights reserved."
__author__ = "DV Klopfenstein"
__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved."
__author__ = "DV Klopfenstein, PhD"

from sys import stdout
import argparse
Expand All @@ -11,7 +11,7 @@
from pmidcite.cli.utils import get_outfile
from pmidcite.cli.utils import get_pmids
from pmidcite.cli.entry_keyset import get_details_cites_refs
from pmidcite.icite.nih_grouper import NihGrouper
from pmidcite.icite.nih_grouper import get_nihgrouper
from pmidcite.icite.downloader import get_downloader
from pmidcite.icite.downloader import prt_hdr
from pmidcite.icite.downloader import prt_keys
Expand Down Expand Up @@ -61,10 +61,10 @@ def get_argparser(self):
help='Load and print a descriptive list of citations and references for each paper.')
parser.add_argument(
'-c', '--load_citations', action='store_true', default=False,
help='Load and print a descriptive list of citations for each paper.')
help='Load and print of papers and clinical studies that cited the requested paper.')
parser.add_argument(
'-r', '--load_references', action='store_true', default=False,
help='Load and print a descriptive list of references for each paper.')
help='Load and print the references for each requested paper.')
# pylint: disable=line-too-long
parser.add_argument(
'-R', '--no_references', action='store_true',
Expand Down Expand Up @@ -120,7 +120,7 @@ def cli(self):
"""Run iCite/PubMed using command-line interface"""
argparser = self.get_argparser()
args = self._get_args(argparser)
## print('ICITE ARGS ../pmidcite/src/pmidcite/cli/icite.py', args)
##print('ICITE ARGS ../pmidcite/src/pmidcite/cli/icite.py', args)
self._run(args, argparser)

def _run(self, args, argparser):
Expand Down Expand Up @@ -173,7 +173,7 @@ def _get_downloader(args):
args.load_citations,
args.load_references,
args.no_references)
groupobj = NihGrouper(args.min1, args.min2, args.min3, args.min4)
groupobj = get_nihgrouper(args.min1, args.min2, args.min3, args.min4)
return get_downloader(
details_cites_refs,
groupobj,
Expand Down Expand Up @@ -261,4 +261,4 @@ def _prt_no_icite(pmids):
Ps=' '.join(str(p) for p in pmids)))


# Copyright (C) 2019-present DV Klopfenstein. All rights reserved.
# Copyright (C) 2019-present DV Klopfenstein, PhD. All rights reserved.
30 changes: 19 additions & 11 deletions src/pmidcite/cli/summarize_papers.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from pmidcite.cli.utils import prt_loc_rcfile
from pmidcite.cli.utils import get_files_exists
from pmidcite.summarize_papers import SummarizePapers
from pmidcite.icite.top_cit_ref import TopCitRef

__copyright__ = "Copyright (C) 2022-present, DV Klopfenstein, PhD. All rights reserved."
__author__ = "DV Klopfenstein, PhD"
Expand All @@ -15,11 +16,12 @@ class SummarizePapersCli:

def __init__(self, cfg):
self.cfg = cfg
self.topcitref = TopCitRef()

def get_argparser(self):
"""Argument parser for summarizing the citations on set(s) of papers"""
parser = ArgumentParser(
description="Summarize NIH's citation on a set(s) of papers",
description="Summarize NIH's citation data on a set(s) of papers",
add_help=False)
##cfg = self.cfg
# https://docs.python.org/3/library/argparse.html
Expand Down Expand Up @@ -48,28 +50,34 @@ def get_argparser(self):
parser.add_argument(
'--print-rcfile', action='store_true',
help='Print the location of the pmidcite configuration file (env var: PMIDCITECONF)')
self.topcitref.add_arguments(parser)
return parser


def cli(self):
"""Run citation summary on a set(s) of PMIDs"""
argparser = self.get_argparser()
args = argparser.parse_args()
print('ARGS CITE SUMMARY ../pmidcite/src/pmidcite/cli/summarize_papers.py', args)
##print('ARGS CITE SUMMARY ../pmidcite/src/pmidcite/cli/summarize_papers.py', args)
if args.print_rcfile:
prt_loc_rcfile(self.cfg, stdout)
return
files = get_files_exists(args.files)
files = get_files_exists(args.files, stdout)
if args.help or not files:
argparser.print_help()
print('\nHelp message printed because: -h or --help == True')
return
##self._run(args, argparser)
nih_grouper = self.cfg.get_nihgrouper()
##print(f'\nHelp message printed because: -h or --help == {args.help} or {args.files}')
nih_grouper = self.cfg.get_nihgrouper(args.min1, args.min2, args.min3, args.min4)
self._summarize_papers(files, nih_grouper, self.topcitref.adjust_args(args.paper_labels))
if args.prt_nihgrpr:
print(nih_grouper)

@staticmethod
def _summarize_papers(files, nih_grouper, top_cit_refs):
"""Summarize papers"""
for filename in files:
sumpap = SummarizePapers.from_file(filename, nih_grouper)
sumpap = SummarizePapers.from_file(
filename=filename,
nih_grouper=nih_grouper,
top_cit_ref=top_cit_refs)
print(sumpap.str_oneline())
return


# Copyright (C) 2022-present, DV Klopfenstein, PhD. All rights reserved.
13 changes: 7 additions & 6 deletions src/pmidcite/eutils/pubmed/rdwr.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Write Python module for downloaded abstracts."""

__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein. All rights reserved."
__author__ = "DV Klopfenstein"
__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved."
__author__ = "DV Klopfenstein, PhD"

import sys
import os
Expand Down Expand Up @@ -151,8 +151,9 @@ def _lid_add_to_dict(fld2objs, fld, line, pmid):
if fld not in fld2objs:
fld2objs[fld] = {}
key0 = line.rfind('[')
# TBD Change these fatals to messages
assert key0 != -1, '**FATAL LID: {} {}'.format(fld, line)
if key0 == -1:
##print(f'**WARNING Local ID (LID): {fld} KEY({key0}) {line}')
return
assert line[-1] == ']', '**FATAL LID: {} {}'.format(fld, line)
key = line[key0 + 1:-1]
val = line[:key0].strip()
Expand Down Expand Up @@ -347,7 +348,7 @@ def _init_date(self, fld2objs, fld, str_date, pmid):

#### mtch = match(r'(\d{4} \S{3} \d{1,2})\s*-', str_date)
#### if mtch:
#### fld2objs[fld] = datetime.datetime.strptime(mtch.group(1), "%Y %b %d")
## fld2objs[fld] = datetime.datetime.strptime(mtch.group(1), "%Y %b %d")
#### mtch = match(r'(\d{4} \S{3})\w?\s*-', str_date)
#### if mtch:
#### fld2objs[fld] = datetime.datetime.strptime(mtch.group(1), "%Y %b")
Expand Down Expand Up @@ -453,4 +454,4 @@ def _extract_fldvals(self, line):
self.fldvals[-1][1].append(line_body)


# Copyright (C) 2019-present, DV Klopfenstein. All rights reserved.
# Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved.
Loading

0 comments on commit 25b8e24

Please sign in to comment.