You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Submitting Author: (@T-Strojny)
All current maintainers: @T-Strojny
Package Name: BlockingPy
One-Line Description of Package: Blocking records for record linkage and deduplication with Approximate Nearest Neighbor algorithms.;
Repository Link: https://github.com/ncn-foreigners/BlockingPy
Version submitted: v0.1.7
EiC: TBD
Editor: TBD
Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
JOSS DOI: TBD
Version accepted: TBD
Date accepted (month/day/year): TBD
Code of Conduct & Commitment to Maintain Package
I agree to abide by pyOpenSci's Code of Conduct during the review process and in maintaining my package after should it be accepted.
Include a brief paragraph describing what your package does: BlockingPy is a package that speeds up record linkage and deduplication tasks by using Approximate Nearest Neighbor (ANN) algorithms to create blocks with candidate record pairs. When linking or deduplicating large datasets, comparing all possible record pairs becomes computationally infeasible. BlockingPy solves this by using ANN algorithms to quickly identify similar records while significantly reducing the number of required comparisons.
Scope
Please indicate which category or categories.
Check out our package scope page to learn more about our
scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):
For all submissions, explain how and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):
Data processing/munging : BlockingPy transforms raw data into feature vectors and applies ANN algorithms and graphs to reduce the comparison space which enables scalable record linkage and deduplication.
Who is the target audience and what are scientific applications of this package?
BlockingPy is targeted for data scientists, researchers, and analysts working with large datasets that require record matching or deduplication and need a scalable approach.
Are there other Python packages that accomplish the same thing? If so, how does yours differ?
There are many packages around Record Linkage, however ours specializes in the blocking task and uses novel approach which is the use of ANN algorithms.
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:
No inquiry was made
Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
does not violate the Terms of Service of any service it interacts with.
The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
The package is deposited in a long-term repository with the DOI:
Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.
Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.
Confirm each of the following by checking the box.
Submitting Author: (@T-Strojny)
All current maintainers: @T-Strojny
Package Name: BlockingPy
One-Line Description of Package: Blocking records for record linkage and deduplication with Approximate Nearest Neighbor algorithms.;
Repository Link: https://github.com/ncn-foreigners/BlockingPy
Version submitted: v0.1.7
EiC: TBD
Editor: TBD
Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
JOSS DOI: TBD
Version accepted: TBD
Date accepted (month/day/year): TBD
Code of Conduct & Commitment to Maintain Package
Description
Scope
Please indicate which category or categories.
Check out our package scope page to learn more about our
scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):
Domain Specific
Community Partnerships
If your package is associated with an
existing community please check below:
Data processing/munging : BlockingPy transforms raw data into feature vectors and applies ANN algorithms and graphs to reduce the comparison space which enables scalable record linkage and deduplication.
Who is the target audience and what are scientific applications of this package?
BlockingPy is targeted for data scientists, researchers, and analysts working with large datasets that require record matching or deduplication and need a scalable approach.
Are there other Python packages that accomplish the same thing? If so, how does yours differ?
There are many packages around Record Linkage, however ours specializes in the blocking task and uses novel approach which is the use of ANN algorithms.
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or
@tag
the editor you contacted:No inquiry was made
Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication Options
JOSS Checks
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.
Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Confirm each of the following by checking the box.
Please fill out our survey
submission and improve our peer review process. We will also ask our reviewers
and editors to fill this out.
P.S. Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
The editor template can be found here.
The review template can be found here.
Footnotes
Please fill out a pre-submission inquiry before submitting a data visualization package. ↩
The text was updated successfully, but these errors were encountered: