From 0c3eb4581acf1436e90b542ac95e95eeb5ca7cf3 Mon Sep 17 00:00:00 2001 From: dvklopfenstein Date: Thu, 22 Dec 2022 00:04:07 -0500 Subject: [PATCH 1/8] Added summarize papers functionality --- README.md | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 47 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 1416195..8717fca 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# PubMed ID (PMID) Cite +# PubMedj ID (PMID) Cite [![Tweet](https://img.shields.io/twitter/url/http/shields.io.svg?style=social)](https://twitter.com/intent/tweet?text=Python%20library%20to%20download%20pubmed%20citation%20counts%20and%20data,%20given%20a%20PMID&url=https://github.com/dvklopfenstein/pmidcite&via=dvklopfenstein&hashtags=pubmed,pmid,citations,pubmed2cite,writingtips,scientificwriting) [![build](https://github.com/dvklopfenstein/pmidcite/actions/workflows/build.yml/badge.svg)](https://github.com/dvklopfenstein/pmidcite/actions/workflows/build.yml) @@ -20,6 +20,7 @@ Contact: dvklopfenstein@protonmail.com * [**1) Download citation counts and data for a research paper**](https://github.com/dvklopfenstein/pmidcite#1-download-citation-counts-and-data-for-a-research-paper) * [**2) Forward citation search**](https://github.com/dvklopfenstein/pmidcite#2-forward-citation-search): following a paper's *Cited by* links or *Forward snowballing* * [**3) Backward citation search**](https://github.com/dvklopfenstein/pmidcite#3-backward-citation-search): following the links to a paper's references or *Backward snowballing* +* [**4) Summarize a group of citations**](https://github.com/dvklopfenstein/pmidcite#4-summarize-a-group-of-citations): ## 1) Download citation counts and data for a research paper ```$ icite -H 26032263``` @@ -56,6 +57,50 @@ Also known as following links to a paper's references or *Backward snowballing* or ```$ icite -H; icite 26032263 -r | sort -k6 -r``` +## 4) Summarize a group of citations +Examine a paper with PMID `30022098`. Print the column headers(`-H`): +``` +$ icite -H 30022098 +COL 2 3 4 5 6 7 8 9 10 au[11](authors) +TYP PMID RP HAMCc % G YEAR cit cli ref au[00](authors) title +TOP 30022098 R. .A..c 100 4 2018 318 1 23 au[14](D V Klopfenstein) GOATOOLS: A Python library for Gene Ontology analyses. +``` + +Paper with PMID `30022098` is cited by `318`(`cit`) other reserch papers and `1`(`cli`) clinical study. It has `23` references. + +Download and save details about the citing papers(`-c`) into a file(`-o goatools_cites.txt`): +``` +$ icite 30022098 -c -o goatools_cites.txt +``` + +The requested paper (PMID=`30022098`) is described in one one line in `goatools_cites.txt`: +``` +$ grep TOP goatools_cites.txt +TOP 30022098 R. .A..c 100 4 2018 318 1 23 au[14](D V Klopfenstein) GOATOOLS: A Python library for Gene Ontology analyses. +``` + +The paper (PMID=`30022098`) is cited by 381(`CIT`) research papers plus 1(`CLI`) clinical study: +``` +$ grep CIT goatools_cites.txt | wc -l +318 + +$ grep CLI goatools_cites.txt | wc -l +1 +``` + +**NEW FUNCTIONALITY; INPUT REQUESTED: What would you like to see?** [Open an issue](https://github.com/dvklopfenstein/pmidcite/issues) to comment. +Summarize all the papers in `goatools_cites.txt` +``` +$ summarize_papers goatools_cites.txt -p TOP CIT CLI +i=033.4% 4=003.4% 3=020.9% 2=021.9% 1=015.9% 0=004.4% 4 years:2018-2022 320 papers goatools_cites.txt +``` + +* Output is on one line so many files containing sets of PMIDs may be compared. TBD: Add multiline verbose option. +* The groups are from newest('i`) to top-performing(`4`), great(`3`), very good(`2`), and overlooked(`1` and `0`) +* The percentages in each group follow the group name + + + # PubMed vs Google Scholar

Google Scholar vs PubMed @@ -456,4 +501,4 @@ Fiorini N ... Lu Zhiyong dvklopfenstein@protonmail.com https://orcid.org/0000-0003-0161-7603 -Copyright (C) 2019-present [pmidcite](https://dvklopfenstein.github.io/pmidcite/), DV Klopfenstein. All rights reserved. +Copyright (C) 2019-present [pmidcite](https://dvklopfenstein.github.io/pmidcite/), DV Klopfenstein, PhD. All rights reserved. From 9fcdf2dbc71db964f2a48235b215b07daded93ca Mon Sep 17 00:00:00 2001 From: dvklopfenstein Date: Thu, 22 Dec 2022 00:14:48 -0500 Subject: [PATCH 2/8] spelling/format --- README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 8717fca..e34188e 100644 --- a/README.md +++ b/README.md @@ -20,7 +20,7 @@ Contact: dvklopfenstein@protonmail.com * [**1) Download citation counts and data for a research paper**](https://github.com/dvklopfenstein/pmidcite#1-download-citation-counts-and-data-for-a-research-paper) * [**2) Forward citation search**](https://github.com/dvklopfenstein/pmidcite#2-forward-citation-search): following a paper's *Cited by* links or *Forward snowballing* * [**3) Backward citation search**](https://github.com/dvklopfenstein/pmidcite#3-backward-citation-search): following the links to a paper's references or *Backward snowballing* -* [**4) Summarize a group of citations**](https://github.com/dvklopfenstein/pmidcite#4-summarize-a-group-of-citations): +* [**4) Summarize a group of citations**](https://github.com/dvklopfenstein/pmidcite#4-summarize-a-group-of-citations) ## 1) Download citation counts and data for a research paper ```$ icite -H 26032263``` @@ -58,7 +58,7 @@ or ```$ icite -H; icite 26032263 -r | sort -k6 -r``` ## 4) Summarize a group of citations -Examine a paper with PMID `30022098`. Print the column headers(`-H`): +### 4a) Examine a paper with PMID `30022098`. Print the column headers(`-H`): ``` $ icite -H 30022098 COL 2 3 4 5 6 7 8 9 10 au[11](authors) @@ -66,9 +66,9 @@ TYP PMID RP HAMCc % G YEAR cit cli ref au[00](authors) title TOP 30022098 R. .A..c 100 4 2018 318 1 23 au[14](D V Klopfenstein) GOATOOLS: A Python library for Gene Ontology analyses. ``` -Paper with PMID `30022098` is cited by `318`(`cit`) other reserch papers and `1`(`cli`) clinical study. It has `23` references. +Paper with PMID `30022098` is cited by `318`(`cit`) other reserch papers and `1`(`cli`) clinical study. It has `23` references(`ref`). -Download and save details about the citing papers(`-c`) into a file(`-o goatools_cites.txt`): +### 4b) Download and save the details about each citing paper(`-c`) into a file(`-o goatools_cites.txt`): ``` $ icite 30022098 -c -o goatools_cites.txt ``` @@ -88,8 +88,8 @@ $ grep CLI goatools_cites.txt | wc -l 1 ``` -**NEW FUNCTIONALITY; INPUT REQUESTED: What would you like to see?** [Open an issue](https://github.com/dvklopfenstein/pmidcite/issues) to comment. -Summarize all the papers in `goatools_cites.txt` +### 4c) Summarize all the papers in `goatools_cites.txt` +**NEW FUNCTIONALITY; INPUT REQUESTED: What would you like to see?** [Open an issue](https://github.com/dvklopfenstein/pmidcite/issues) to comment. ``` $ summarize_papers goatools_cites.txt -p TOP CIT CLI i=033.4% 4=003.4% 3=020.9% 2=021.9% 1=015.9% 0=004.4% 4 years:2018-2022 320 papers goatools_cites.txt From f1c554e90d759a09984f558131646fbbe524bd61 Mon Sep 17 00:00:00 2001 From: dvklopfenstein Date: Thu, 22 Dec 2022 00:22:50 -0500 Subject: [PATCH 3/8] Summarize many PMIDs in one line --- README.md | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index e34188e..a8d6934 100644 --- a/README.md +++ b/README.md @@ -58,6 +58,13 @@ or ```$ icite -H; icite 26032263 -r | sort -k6 -r``` ## 4) Summarize a group of citations +* 4a) Examine a paper with PMID `30022098`. Print the column headers(`-H`): +`icite -H 30022098` +* 4b) Download and save the details about each citing paper(`-c`) into a file(`-o goatools_cites.txt`): +`icite 30022098 -c -o goatools_cites.txt` +* 4c) Summarize all the papers in `goatools_cites.txt` +`summarize_papers goatools_cites.txt -p TOP CIT CLI` + ### 4a) Examine a paper with PMID `30022098`. Print the column headers(`-H`): ``` $ icite -H 30022098 @@ -96,8 +103,8 @@ i=033.4% 4=003.4% 3=020.9% 2=021.9% 1=015.9% 0=004.4% 4 years:2018-2022 320 ``` * Output is on one line so many files containing sets of PMIDs may be compared. TBD: Add multiline verbose option. -* The groups are from newest('i`) to top-performing(`4`), great(`3`), very good(`2`), and overlooked(`1` and `0`) -* The percentages in each group follow the group name +* The groups are from newest(`i`) to top-performing(`4`), great(`3`), very good(`2`), and overlooked(`1` and `0`) +* The percentages of papers in `goatools_citations.txt` in each group follow the group name From 1e8e9547a9eb9b7a45f1dafb3901cfc125bea102 Mon Sep 17 00:00:00 2001 From: dvklopfenstein Date: Thu, 22 Dec 2022 00:45:07 -0500 Subject: [PATCH 4/8] Adding functionality to summarize a set of papers in one line --- CHANGELOG.md | 4 ++ README.md | 8 +-- makefile | 3 + setup.py | 3 +- src/pmidcite/__version__.py | 2 +- src/pmidcite/cfg.py | 16 ++--- src/pmidcite/cli/icite.py | 16 ++--- src/pmidcite/cli/summarize_papers.py | 30 +++++--- src/pmidcite/icite/nih_grouper.py | 35 +++++++-- src/pmidcite/icite/top_cit_ref.py | 47 ++++++++++++ src/pmidcite/summarize_papers.py | 103 +++++++++++++++++++++++++++ src/tests/test_speed_api_dnld.py | 6 +- src/tests/test_speed_dnld_load.py | 6 +- src/tests/test_topcitref_args.py | 36 ++++++++++ 14 files changed, 269 insertions(+), 46 deletions(-) create mode 100644 src/pmidcite/icite/top_cit_ref.py create mode 100644 src/pmidcite/summarize_papers.py create mode 100755 src/tests/test_topcitref_args.py diff --git a/CHANGELOG.md b/CHANGELOG.md index 2881971..6cf51c6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,7 @@ ## Summary * [**Unreleased**](#unreleased) +* [**Release 2022-12-22 v0.0.41**](#release-2022-12-22-v0042) Added summarize_papers script * [**Release 2022-12-06 v0.0.41**](#release-2022-12-06-v0041) setup.py updates for make target, install * [**Release 2022-11-26 v0.0.40**](#release-2022-11-28-v0040) Added pmidcite.scripts.icite; pip3, not pip from Python2 * [**Release 2022-11-26 v0.0.38**](#release-2022-11-26-v0038) Added instructions, and console_script to run script, icite @@ -42,6 +43,9 @@ ### Unreleased +### release 2022-12-22 v0.0.42 +* ADDED summarize_papers script + ### release 2022-12-06 v0.0.41 * CHANGED setup.py PACKAGES variable to run install make target diff --git a/README.md b/README.md index a8d6934..d6e1122 100644 --- a/README.md +++ b/README.md @@ -60,9 +60,9 @@ or ## 4) Summarize a group of citations * 4a) Examine a paper with PMID `30022098`. Print the column headers(`-H`): `icite -H 30022098` -* 4b) Download and save the details about each citing paper(`-c`) into a file(`-o goatools_cites.txt`): +* 4b) Download the details about each paper(`-c`) that cites `30022098` into a file(`-o goatools_cites.txt`): `icite 30022098 -c -o goatools_cites.txt` -* 4c) Summarize all the papers in `goatools_cites.txt` +* 4c) Summarize the overall performace of the 300+ citing papers contained in `goatools_cites.txt` `summarize_papers goatools_cites.txt -p TOP CIT CLI` ### 4a) Examine a paper with PMID `30022098`. Print the column headers(`-H`): @@ -73,9 +73,9 @@ TYP PMID RP HAMCc % G YEAR cit cli ref au[00](authors) title TOP 30022098 R. .A..c 100 4 2018 318 1 23 au[14](D V Klopfenstein) GOATOOLS: A Python library for Gene Ontology analyses. ``` -Paper with PMID `30022098` is cited by `318`(`cit`) other reserch papers and `1`(`cli`) clinical study. It has `23` references(`ref`). +Paper with PMID `30022098` is cited by `318`(`cit`) other research papers and `1`(`cli`) clinical study. It has `23` references(`ref`). -### 4b) Download and save the details about each citing paper(`-c`) into a file(`-o goatools_cites.txt`): +### 4b) Download the details about each paper(`-c`) that cites `30022098` into a file(`-o goatools_cites.txt`): ``` $ icite 30022098 -c -o goatools_cites.txt ``` diff --git a/makefile b/makefile index 4b9b8f7..dbbb089 100644 --- a/makefile +++ b/makefile @@ -18,6 +18,9 @@ p: d: find src -regextype posix-extended -regex "[a-z./]*" -type d +cli: + find src/pmidcite/cli -name \*.py + diff0: git diff --compact-summary diff --git a/setup.py b/setup.py index 6eab562..d372944 100755 --- a/setup.py +++ b/setup.py @@ -42,7 +42,7 @@ def get_long_description(): setup( name=NAME, ## version=versioneer.get_version(), - version='0.0.41', + version='0.0.42', author='DV Klopfenstein, PhD', author_email='dvklopfenstein@protonmail.com', ## cmdclass=versioneer.get_cmdclass(), @@ -55,6 +55,7 @@ def get_long_description(): entry_points={ 'console_scripts':[ 'icite=pmidcite.scripts.icite:main', + 'summarize_papers=pmidcite.scripts.icite:summarize_papers', ], }, # https://pypi.org/classifiers/ diff --git a/src/pmidcite/__version__.py b/src/pmidcite/__version__.py index 4705298..a81b6f2 100644 --- a/src/pmidcite/__version__.py +++ b/src/pmidcite/__version__.py @@ -1,3 +1,3 @@ """Version of pmidcite project""" -__version__ = '0.0.41' +__version__ = '0.0.42' diff --git a/src/pmidcite/cfg.py b/src/pmidcite/cfg.py index c880c75..018e150 100644 --- a/src/pmidcite/cfg.py +++ b/src/pmidcite/cfg.py @@ -1,7 +1,7 @@ """Manage pmidcite Configuration""" -__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein. All rights reserved." -__author__ = "DV Klopfenstein" +__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved." +__author__ = "DV Klopfenstein, PhD" from os import environ from os import getcwd @@ -135,14 +135,14 @@ def _get_dirname_str(dirname): """Convert None to the str, "None", as needed by configparser""" return 'None' if dirname is None or dirname == 'None' else dirname - def get_nihgrouper(self): + def get_nihgrouper(self, min1=None, min2=None, min3=None, min4=None): """Get an NIH Grouper with default values from the cfg file""" cfg = self.cfgparser['pmidcite'] return NihGrouper( - float(cfg['group1_min']), - float(cfg['group2_min']), - float(cfg['group3_min']), - float(cfg['group4_min'])) + float(cfg['group1_min'] if not min1 else min1), + float(cfg['group2_min'] if not min2 else min2), + float(cfg['group3_min'] if not min3 else min3), + float(cfg['group4_min'] if not min4 else min4)) def _run_chk(self, prt, prt_fullname): if not self.rd_rc(prt, prt_fullname): @@ -230,4 +230,4 @@ def _init_cfgfilename(self): -# Copyright (C) 2019-present DV Klopfenstein. All rights reserved. +# Copyright (C) 2019-present DV Klopfenstein, PhD. All rights reserved. diff --git a/src/pmidcite/cli/icite.py b/src/pmidcite/cli/icite.py index 7f4bfac..db35437 100644 --- a/src/pmidcite/cli/icite.py +++ b/src/pmidcite/cli/icite.py @@ -1,7 +1,7 @@ """Manage args for NIH iCite run for one PubMed ID (PMID)""" -__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein. All rights reserved." -__author__ = "DV Klopfenstein" +__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved." +__author__ = "DV Klopfenstein, PhD" from sys import stdout import argparse @@ -11,7 +11,7 @@ from pmidcite.cli.utils import get_outfile from pmidcite.cli.utils import get_pmids from pmidcite.cli.entry_keyset import get_details_cites_refs -from pmidcite.icite.nih_grouper import NihGrouper +from pmidcite.icite.nih_grouper import get_nihgrouper from pmidcite.icite.downloader import get_downloader from pmidcite.icite.downloader import prt_hdr from pmidcite.icite.downloader import prt_keys @@ -61,10 +61,10 @@ def get_argparser(self): help='Load and print a descriptive list of citations and references for each paper.') parser.add_argument( '-c', '--load_citations', action='store_true', default=False, - help='Load and print a descriptive list of citations for each paper.') + help='Load and print of papers and clinical studies that cited the requested paper.') parser.add_argument( '-r', '--load_references', action='store_true', default=False, - help='Load and print a descriptive list of references for each paper.') + help='Load and print the references for each requested paper.') # pylint: disable=line-too-long parser.add_argument( '-R', '--no_references', action='store_true', @@ -120,7 +120,7 @@ def cli(self): """Run iCite/PubMed using command-line interface""" argparser = self.get_argparser() args = self._get_args(argparser) - ## print('ICITE ARGS ../pmidcite/src/pmidcite/cli/icite.py', args) + ##print('ICITE ARGS ../pmidcite/src/pmidcite/cli/icite.py', args) self._run(args, argparser) def _run(self, args, argparser): @@ -173,7 +173,7 @@ def _get_downloader(args): args.load_citations, args.load_references, args.no_references) - groupobj = NihGrouper(args.min1, args.min2, args.min3, args.min4) + groupobj = get_nihgrouper(args.min1, args.min2, args.min3, args.min4) return get_downloader( details_cites_refs, groupobj, @@ -261,4 +261,4 @@ def _prt_no_icite(pmids): Ps=' '.join(str(p) for p in pmids))) -# Copyright (C) 2019-present DV Klopfenstein. All rights reserved. +# Copyright (C) 2019-present DV Klopfenstein, PhD. All rights reserved. diff --git a/src/pmidcite/cli/summarize_papers.py b/src/pmidcite/cli/summarize_papers.py index 08e96bb..7195643 100644 --- a/src/pmidcite/cli/summarize_papers.py +++ b/src/pmidcite/cli/summarize_papers.py @@ -5,6 +5,7 @@ from pmidcite.cli.utils import prt_loc_rcfile from pmidcite.cli.utils import get_files_exists from pmidcite.summarize_papers import SummarizePapers +from pmidcite.icite.top_cit_ref import TopCitRef __copyright__ = "Copyright (C) 2022-present, DV Klopfenstein, PhD. All rights reserved." __author__ = "DV Klopfenstein, PhD" @@ -15,11 +16,12 @@ class SummarizePapersCli: def __init__(self, cfg): self.cfg = cfg + self.topcitref = TopCitRef() def get_argparser(self): """Argument parser for summarizing the citations on set(s) of papers""" parser = ArgumentParser( - description="Summarize NIH's citation on a set(s) of papers", + description="Summarize NIH's citation data on a set(s) of papers", add_help=False) ##cfg = self.cfg # https://docs.python.org/3/library/argparse.html @@ -48,28 +50,34 @@ def get_argparser(self): parser.add_argument( '--print-rcfile', action='store_true', help='Print the location of the pmidcite configuration file (env var: PMIDCITECONF)') + self.topcitref.add_arguments(parser) return parser - def cli(self): """Run citation summary on a set(s) of PMIDs""" argparser = self.get_argparser() args = argparser.parse_args() - print('ARGS CITE SUMMARY ../pmidcite/src/pmidcite/cli/summarize_papers.py', args) + ##print('ARGS CITE SUMMARY ../pmidcite/src/pmidcite/cli/summarize_papers.py', args) if args.print_rcfile: prt_loc_rcfile(self.cfg, stdout) - return - files = get_files_exists(args.files) + files = get_files_exists(args.files, stdout) if args.help or not files: argparser.print_help() - print('\nHelp message printed because: -h or --help == True') - return - ##self._run(args, argparser) - nih_grouper = self.cfg.get_nihgrouper() + ##print(f'\nHelp message printed because: -h or --help == {args.help} or {args.files}') + nih_grouper = self.cfg.get_nihgrouper(args.min1, args.min2, args.min3, args.min4) + self._summarize_papers(files, nih_grouper, self.topcitref.adjust_args(args.paper_labels)) + if args.prt_nihgrpr: + print(nih_grouper) + + @staticmethod + def _summarize_papers(files, nih_grouper, top_cit_refs): + """Summarize papers""" for filename in files: - sumpap = SummarizePapers.from_file(filename, nih_grouper) + sumpap = SummarizePapers.from_file( + filename=filename, + nih_grouper=nih_grouper, + top_cit_ref=top_cit_refs) print(sumpap.str_oneline()) - return # Copyright (C) 2022-present, DV Klopfenstein, PhD. All rights reserved. diff --git a/src/pmidcite/icite/nih_grouper.py b/src/pmidcite/icite/nih_grouper.py index 7e89d1d..a4e497e 100644 --- a/src/pmidcite/icite/nih_grouper.py +++ b/src/pmidcite/icite/nih_grouper.py @@ -1,10 +1,22 @@ """Groups papers using the NIH percentile""" -__copyright__ = "Copyright (C) 2021-present, DV Klopfenstein. All rights reserved." -__author__ = "DV Klopfenstein" +__copyright__ = "Copyright (C) 2021-present, DV Klopfenstein, PhD. All rights reserved." +__author__ = "DV Klopfenstein, PhD" from collections import namedtuple +def get_nihgrouper(min1, min2, min3, min4): + """Get NihGrouper, given NIH percentile dividers""" + args = {} + if min1: + args['group1_min'] = min1 + if min2: + args['group2_min'] = min2 + if min3: + args['group3_min'] = min3 + if min4: + args['group4_min'] = min4 + return NihGrouper(**args) class NihGrouper: """Groups papers using the NIH percentile""" @@ -18,6 +30,8 @@ def __init__(self, group1_min=2.1, group2_min=15.7, group3_min=83.9, group4_min= self.min2 = group2_min self.min3 = group3_min self.min4 = group4_min + assert group1_min and group2_min and group3_min and group4_min, \ + f'DIVIDERS MUST BE FLOATs: {str(self)}' #print(f'group1_min: {group1_min}') #print(f'group2_min: {group2_min}') #print(f'group3_min: {group3_min}') @@ -31,6 +45,7 @@ def str_group(self, nih_percentile): def get_group(self, nih_percentile): """Assign group numbers to the NIH percentile values using the 68-95-99.7 rule""" # No NIH percentile yet assigned. This paper should be checked out. + ##print('DVK SSSSSSSSSS', str(self)) if nih_percentile is None or nih_percentile == -1: return 5 # 2.1% -3 SD: Very low citation rate @@ -52,17 +67,23 @@ def add_arguments(self, parser): """Add NIH grouper arguments to the parser""" # pylint: disable=line-too-long parser.add_argument( - '-1', metavar='group1_min', dest='min1', default=self.min1, type=float, + ##'-1', metavar='group1_min', dest='min1', default=self.min1, type=float, + '-1', metavar='group1_min', dest='min1', type=float, help='Minimum NIH percentile to be placed in group 1 (default: {D})'.format(D=self.min1)) parser.add_argument( - '-2', metavar='group2_min', dest='min2', default=self.min2, type=float, + '-2', metavar='group2_min', dest='min2', type=float, help='Minimum NIH percentile to be placed in group 2 (default: {D})'.format(D=self.min2)) parser.add_argument( - '-3', metavar='group3_min', dest='min3', default=self.min3, type=float, + '-3', metavar='group3_min', dest='min3', type=float, help='Minimum NIH percentile to be placed in group 3 (default: {D})'.format(D=self.min3)) parser.add_argument( - '-4', metavar='group4_min', dest='min4', default=self.min4, type=float, + '-4', metavar='group4_min', dest='min4', type=float, help='Minimum NIH percentile to be placed in group 4 (default: {D})'.format(D=self.min4)) + # --print-NIH-dividers => prt_nihgrpr=True + # => prt_nihgrpr=False + parser.add_argument( + '--print-NIH-dividers', dest='prt_nihgrpr', action='store_true', + help='Print the NIH percentile grouper divider percentages') def get_list(self): """Get the dividing values as a list""" @@ -74,4 +95,4 @@ def __str__(self): self.min1, self.min2, self.min3, self.min4) -# Copyright (C) 2021-present DV Klopfenstein. All rights reserved. +# Copyright (C) 2021-present DV Klopfenstein, PhD. All rights reserved. diff --git a/src/pmidcite/icite/top_cit_ref.py b/src/pmidcite/icite/top_cit_ref.py new file mode 100644 index 0000000..a2d8749 --- /dev/null +++ b/src/pmidcite/icite/top_cit_ref.py @@ -0,0 +1,47 @@ +"""Manage paper labels: TOP CIT CLI REF""" + +__copyright__ = "Copyright (C) 2022-present, DV Klopfenstein, PhD. All rights reserved." +__author__ = "DV Klopfenstein, PhD" + + +class TopCitRef: + """Manage paper labels: TOP CIT CLI REF""" + + label_list = [ + 'TOP', # Paper of interest + 'CIT', # A paper (not a clinical study) citing the paper of interest + 'CLI', # A clinical study paper citing the paper of interest + 'REF', # A paper in the reference list of the paper of interest + ] + + label_set = set(label_list) + + choices = label_list + ['CITS', 'ALL'] + + def add_arguments(self, parser): + """Manage paper labels arguments: TOP CIT CLI REF""" + # pylint: disable=line-too-long + parser.add_argument( + '-p', metavar='labels', dest='paper_labels', type=str, nargs='*', + default=['TOP',], + choices=self.choices, + help=f'Paper label choices: {" ".join(self.choices)} (default: TOP)', + ) + + def adjust_args(self, args_paper_labels): + """Given labels and aliases (CITS, ALL), return official label names""" + if not args_paper_labels: + return None + ret = set() + arg_set = set(args_paper_labels) + if 'ALL' in arg_set: + ret.update(self.label_list) + return ret + if 'CITS' in arg_set: + ret.add('CIT') + ret.add('CLI') + ret.update(arg_set.intersection(self.label_list)) + return ret + + +# Copyright (C) 2022-present DV Klopfenstein, PhD. All rights reserved. diff --git a/src/pmidcite/summarize_papers.py b/src/pmidcite/summarize_papers.py new file mode 100644 index 0000000..54d3c33 --- /dev/null +++ b/src/pmidcite/summarize_papers.py @@ -0,0 +1,103 @@ +"""Summarize NIH citation data for requested papers from the commandline or in files""" + +from collections import namedtuple +from collections import defaultdict + +__copyright__ = "Copyright (C) 2022-present, DV Klopfenstein, PhD. All rights reserved." +__author__ = "DV Klopfenstein, PhD" + + +class SummarizePapers: + """Summarize NIH citation data for requested papers from the commandline or in files""" + + def __init__(self, name, nih_grouper=None): + self.name = name + self.nts = None + self.num_papers_all = None + self.nihgrpr = nih_grouper + + def str_oneline(self): + """Get str that is a one-line summary of many papers/citiations""" + grp2nts = self._get_stats_grpr() if self.nihgrpr else self._get_stats_nogrpr() + years = self.get_years() + year_min = min(years) + year_max = max(years) + return '{NIHP} {Ys:3} years:{Y0:4}-{Y1:4} {N:5} papers {NAME}'.format( + NIHP=self._str_group_percs(grp2nts), + Ys=year_max-year_min, + Y0=year_min, + Y1=year_max, + N=self.num_papers_all, + NAME=self.name) + + def get_years(self): + """Get the years of all publications""" + return list(nt.year for nt in self.nts) + + def _str_group_percs(self, grp2nts): + """Get precentages of papers in each group""" + lst = [] + for grp in ['i', '4', '3', '2', '1', '0']: + num_papers_grp = len(grp2nts[grp]) if grp2nts else 0 + abc = '{G}={P}'.format( + G=grp, + P='{:05.1f}%'.format( + num_papers_grp/self.num_papers_all*100) if num_papers_grp != 0 else "......") + lst.append(abc) + return ' '.join(lst) + + def _get_stats_grpr(self): + """Get summary information for list of papers""" + grp2nts = defaultdict(list) + grpr = self.nihgrpr + for ntd in self.nts: + grp2nts[grpr.str_group(ntd.nih_perc)].append(ntd) + ##print('DDDDDDDD', ntd) + return grp2nts + + def _get_stats_nogrpr(self): + """Get summary information for list of papers""" + grp2nts = defaultdict(list) + for ntd in self.nts: + grp2nts[ntd.nih_group].append(ntd) + return grp2nts + + @staticmethod + def read_lines(filename, top_cit_ref): + """Read paper citation lines""" + if top_cit_ref is None: + top_cit_ref = {'TOP',} # TOP, CIT, CLI, REF + nts = [] + nto = namedtuple('iciteline', ( + 'line pmid aart nih_perc nih_group year num_cite_all num_cite num_clin num_refs')) + with open(filename) as ifstrm: + for line in ifstrm: + if line[:3] in top_cit_ref: + flds = line.split(maxsplit=10) + if flds[1].isdigit(): + num_cite = int(flds[7]) + num_clin = int(flds[8]) + nts.append(nto( + line=line.rstrip(), + pmid=int(flds[1]), + aart=f'{flds[2]} {flds[3]}', + nih_perc=int(flds[4]), + nih_group=flds[5], # -i or a number + year=int(flds[6]), + num_cite_all=num_cite + num_clin, + num_cite=num_cite, + num_clin=num_clin, + num_refs=int(flds[9]))) + return nts + + # -- Constructors ------------------------------------------------------------ + @classmethod + def from_file(cls, filename, nih_grouper=None, top_cit_ref=None): + """Get SummarizePapers instance, given a file filled with icite lines w/TOP|CIT|CLI|REF""" + obj = cls(filename, nih_grouper) + obj.nts = obj.read_lines(filename, top_cit_ref) + obj.num_papers_all = len(obj.nts) + return obj + + +# Copyright (C) 2022-present, DV Klopfenstein, PhD. All rights reserved. diff --git a/src/tests/test_speed_api_dnld.py b/src/tests/test_speed_api_dnld.py index 8860dca..9afa1a3 100755 --- a/src/tests/test_speed_api_dnld.py +++ b/src/tests/test_speed_api_dnld.py @@ -13,7 +13,7 @@ from tests.pmids_i3 import PMIDS -def test_dnld_speed(): +def test_speed_api_dnld(): """Test speed for download NIH citation data""" force_dnld = True dnldr = _init_dnldr(force_dnld) @@ -60,6 +60,6 @@ def _init_dnldr(force_dnld): if __name__ == '__main__': - test_dnld_speed() + test_speed_api_dnld() -# Copyright (C) 2021-present, DV Klopfenstein. All rights reserved. +# Copyright (C) 2021-present, DV Klopfenstein, PhD. All rights reserved. diff --git a/src/tests/test_speed_dnld_load.py b/src/tests/test_speed_dnld_load.py index cc2b6d2..9058a4e 100755 --- a/src/tests/test_speed_dnld_load.py +++ b/src/tests/test_speed_dnld_load.py @@ -16,7 +16,7 @@ from tests.pmids_i3 import PMIDS -def test_dnld_speed(): +def test_speed_dnld_load(): """Test speed for download NIH citation data""" fout_log = 'test_speed_dnld_load.log' num = 5000 @@ -82,6 +82,6 @@ def _run_download(dnldr, pmids): if __name__ == '__main__': - test_dnld_speed() + test_speed_dnld_load() -# Copyright (C) 2021-present, DV Klopfenstein. All rights reserved. +# Copyright (C) 2021-present, DV Klopfenstein, PhD. All rights reserved. diff --git a/src/tests/test_topcitref_args.py b/src/tests/test_topcitref_args.py new file mode 100755 index 0000000..da26087 --- /dev/null +++ b/src/tests/test_topcitref_args.py @@ -0,0 +1,36 @@ +#!/usr/bin/env python3 +"""Test paper label args: TOP CIT CLI REF and aliases ALL CITS""" + +from pmidcite.icite.top_cit_ref import TopCitRef + +__copyright__ = "Copyright (C) 2022-present, DV Klopfenstein, PhD. All rights reserved." +__author__ = "DV Klopfenstein, PhD" + + +ADJ = TopCitRef().adjust_args + +def test_topcitref_args(): + """Test paper label args: TOP CIT CLI REF and aliases ALL CITS""" + # pylint: disable=bad-whitespace + + # Arguments Expected paper labels + # ---------------------- ------------------------------- + _chk(0, set(), None) + _chk(1, {'ALL',}, {'TOP', 'CIT', 'CLI', 'REF'}) + _chk(2, {'CITS',}, {'CIT', 'CLI'}) + _chk(3, {'TOP', 'CITS',}, {'TOP', 'CIT', 'CLI'}) + _chk(4, {'TOP', 'CITS', 'REF'}, {'TOP', 'CIT', 'CLI', 'REF'}) + _chk(5, {'TOP', 'MOCK', 'REF'}, {'TOP', 'REF'}) + + +def _chk(num, args, exp): + """Check that args produces correct paper label cites""" + act = ADJ(args) + assert act == exp, f'TEST {num} ACT({act}) != EXP({exp}) WITH ARGS({args})' + print(f'**PASSED TEST {num:2}: ARGS({args}) ADJUSTED TO {exp}') + + +if __name__ == '__main__': + test_topcitref_args() + +# Copyright (C) 2022-present, DV Klopfenstein, PhD. All rights reserved. From 09c46198142aee946e2678c8aac238df7fd81d3b Mon Sep 17 00:00:00 2001 From: dvklopfenstein Date: Fri, 30 Dec 2022 11:39:27 -0500 Subject: [PATCH 5/8] If no Local ID (LID), dont add to dict --- src/pmidcite/eutils/pubmed/rdwr.py | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/src/pmidcite/eutils/pubmed/rdwr.py b/src/pmidcite/eutils/pubmed/rdwr.py index ec8a6b3..076987c 100755 --- a/src/pmidcite/eutils/pubmed/rdwr.py +++ b/src/pmidcite/eutils/pubmed/rdwr.py @@ -1,7 +1,7 @@ """Write Python module for downloaded abstracts.""" -__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein. All rights reserved." -__author__ = "DV Klopfenstein" +__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved." +__author__ = "DV Klopfenstein, PhD" import sys import os @@ -151,8 +151,9 @@ def _lid_add_to_dict(fld2objs, fld, line, pmid): if fld not in fld2objs: fld2objs[fld] = {} key0 = line.rfind('[') - # TBD Change these fatals to messages - assert key0 != -1, '**FATAL LID: {} {}'.format(fld, line) + if key0 == -1: + ##print(f'**WARNING Local ID (LID): {fld} KEY({key0}) {line}') + return assert line[-1] == ']', '**FATAL LID: {} {}'.format(fld, line) key = line[key0 + 1:-1] val = line[:key0].strip() @@ -347,7 +348,7 @@ def _init_date(self, fld2objs, fld, str_date, pmid): #### mtch = match(r'(\d{4} \S{3} \d{1,2})\s*-', str_date) #### if mtch: - #### fld2objs[fld] = datetime.datetime.strptime(mtch.group(1), "%Y %b %d") + ## fld2objs[fld] = datetime.datetime.strptime(mtch.group(1), "%Y %b %d") #### mtch = match(r'(\d{4} \S{3})\w?\s*-', str_date) #### if mtch: #### fld2objs[fld] = datetime.datetime.strptime(mtch.group(1), "%Y %b") @@ -453,4 +454,4 @@ def _extract_fldvals(self, line): self.fldvals[-1][1].append(line_body) - # Copyright (C) 2019-present, DV Klopfenstein. All rights reserved. + # Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved. From fdcc92fbcfc85cddfddea08d95e13d612bea4823 Mon Sep 17 00:00:00 2001 From: dvklopfenstein Date: Fri, 30 Dec 2022 11:41:27 -0500 Subject: [PATCH 6/8] Don't print cfg file not found --- src/bin/icite.py | 6 +++--- src/pmidcite/cfg.py | 13 ++++++------- src/pmidcite/scripts/icite.py | 6 +++--- 3 files changed, 12 insertions(+), 13 deletions(-) diff --git a/src/bin/icite.py b/src/bin/icite.py index df17e01..3290e05 100755 --- a/src/bin/icite.py +++ b/src/bin/icite.py @@ -1,8 +1,8 @@ #!/usr/bin/env python3 """Given a PubMed ID (PMID), return a list of citing publications""" -__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein. All rights reserved." -__author__ = "DV Klopfenstein" +__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved." +__author__ = "DV Klopfenstein, PhD" from pmidcite.cli.icite import NIHiCiteCli # get_argparser from pmidcite.cfg import get_cfgparser @@ -16,4 +16,4 @@ def main(): if __name__ == '__main__': main() -# Copyright (C) 2019-present, DV Klopfenstein. All rights reserved. +# Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved. diff --git a/src/pmidcite/cfg.py b/src/pmidcite/cfg.py index 018e150..d25ef73 100644 --- a/src/pmidcite/cfg.py +++ b/src/pmidcite/cfg.py @@ -57,7 +57,7 @@ class Cfg(object): } def __init__(self, check=True, prt=stdout, prt_fullname=True): - self.cfgfile = self._init_cfgfilename() + self.cfgfile = self._init_cfgfilename(prt) self.cfgparser = self._get_dflt_cfgparser() if check: self._run_chk(prt, prt_fullname) @@ -215,17 +215,16 @@ def prt_rcfile_dflt(self, prt=stdout): cfgparser = self._get_dflt_cfgparser() cfgparser.write(prt) - def _init_cfgfilename(self): + def _init_cfgfilename(self, prt=None): """Get the configuration filename""" if self.envvar in environ: cfgfile = environ[self.envvar] if exists(cfgfile): return cfgfile - print('**WARNING: NO pmidcite CONFIG FILE FOUND AT {ENVVAR}={F}'.format( - F=cfgfile, ENVVAR=self.envvar)) - if not exists(self.dfltcfgfile): - print('**WARNING: NO pmidcite CONFIG FILE FOUND: {F}'.format( - F=self.dfltcfgfile)) + if prt: + prt.write(f'**WARNING: NO pmidcite CONFIG FILE FOUND AT {self.envvar}={cfgfile}\n') + if not exists(self.dfltcfgfile) and prt: + prt.write(f'**WARNING: NO pmidcite CONFIG FILE FOUND: {self.dfltcfgfile}\n') return self.dfltcfgfile diff --git a/src/pmidcite/scripts/icite.py b/src/pmidcite/scripts/icite.py index 5ee02cc..a4239ea 100755 --- a/src/pmidcite/scripts/icite.py +++ b/src/pmidcite/scripts/icite.py @@ -1,7 +1,7 @@ """Given a PubMed ID (PMID), return a list of citing publications""" -__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein. All rights reserved." -__author__ = "DV Klopfenstein" +__copyright__ = "Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved." +__author__ = "DV Klopfenstein, PhD" from pmidcite.cli.icite import NIHiCiteCli # get_argparser from pmidcite.cfg import get_cfgparser @@ -12,4 +12,4 @@ def main(): NIHiCiteCli(get_cfgparser(prt=None)).cli() -# Copyright (C) 2019-present, DV Klopfenstein. All rights reserved. +# Copyright (C) 2019-present, DV Klopfenstein, PhD. All rights reserved. From a467a64db97d7dcd9bdc50dfddaa825e290fb787 Mon Sep 17 00:00:00 2001 From: dvklopfenstein Date: Fri, 30 Dec 2022 11:41:49 -0500 Subject: [PATCH 7/8] Add requests package as a requirement --- setup.py | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/setup.py b/setup.py index d372944..f579d26 100755 --- a/setup.py +++ b/setup.py @@ -10,6 +10,8 @@ from setuptools import setup # import versioneer +__copyright__ = 'Copyright (C) 2019, DV Klopfenstein, PhD. All rights reserved' +__author__ = 'DV Klopfenstein, PhD' NAME = 'pmidcite' @@ -69,9 +71,11 @@ def get_long_description(): 'Topic :: Scientific/Engineering :: Information Analysis', ], url='http://github.com/dvklopfenstein/pmidcite', - description="Augment's a PubMed literature search with citation data from NIH-OCC's iCite.", + description="Turbocharge a PubMed literature search using citation data from the NIH", # https://packaging.python.org/guides/making-a-pypi-friendly-readme/ long_description=get_long_description(), long_description_content_type='text/markdown', - # install_requires=['docopt'], + install_requires=['requests'], ) + +# Copyright (C) 2019, DV Klopfenstein, PhD. All rights reserved From e266295f784f12f0aebfa90be025a0fffabce9fd Mon Sep 17 00:00:00 2001 From: dvklopfenstein Date: Fri, 30 Dec 2022 11:42:37 -0500 Subject: [PATCH 8/8] Added new functionality to ADDED/CHANGED lines --- CHANGELOG.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 6cf51c6..826591f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -43,8 +43,10 @@ ### Unreleased -### release 2022-12-22 v0.0.42 +### release 2022-12-30 v0.0.42 * ADDED summarize_papers script +* ADDED requests package as a pre-requisite +* CHANGED API to NCBI E-utils such that a missing LID (Local ID) is ignored on a PubMed entry ### release 2022-12-06 v0.0.41 * CHANGED setup.py PACKAGES variable to run install make target