Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readme update #71

Closed
wants to merge 311 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
311 commits
Select commit Hold shift + click to select a range
663f08d
corp names function update
npashilkar Jan 31, 2024
45008e6
Merge pull request #11 from dsi-clinic/line_1_from_full_address
averyschoen Jan 31, 2024
1ab1d42
updated corp names
npashilkar Jan 31, 2024
b5f764a
addressed comments, but address function back in
Jan 31, 2024
df1dbd1
fixing linter error
Jan 31, 2024
6aad87e
moved dict to constants file
npashilkar Jan 31, 2024
5b4de8c
updated constants file
npashilkar Jan 31, 2024
e4fe9fc
updated constants file
npashilkar Jan 31, 2024
844d20e
updated constants file
npashilkar Jan 31, 2024
976fc3f
updated function
adilkassim Jan 31, 2024
87ea3da
Adding Avery's feedback
Jan 31, 2024
23a8c1f
Adding Avery's feedback
Jan 31, 2024
4081715
saving personal work before merging, no need to look or review @Avery…
Jan 31, 2024
585f63e
Merge branch 'main' of https://github.com/dsi-clinic/2024-winter-clim…
Jan 31, 2024
50537f9
updating requirements.txt to include names-dataset package
adilkassim Jan 31, 2024
3fcbc5b
precommit checks
npashilkar Jan 31, 2024
fe540b6
Merge pull request #18 from dsi-clinic/standardizing-corporate-names
averyschoen Jan 31, 2024
f07dae2
get address number from line 1 function
npashilkar Jan 31, 2024
b21fd52
initial name_rank function
adilkassim Jan 31, 2024
8849f46
get address number from line 1 function
npashilkar Jan 31, 2024
d0086ef
get address number from line 1 function
npashilkar Jan 31, 2024
5f65159
attempt so far at dedup
Feb 1, 2024
28c0034
edited function
adilkassim Feb 1, 2024
71a3174
attempt so far at dedup
Feb 1, 2024
56cde5f
attempt so far at dedup
Feb 1, 2024
72eeffb
progress on dedup function
Feb 1, 2024
161a175
updates on linkage doc, ignore notebooks/Test.ipynb
Feb 1, 2024
8a75d81
Merge pull request #21 from dsi-clinic/get-building-number-from-address
averyschoen Feb 1, 2024
b519fa1
modifications to dedup function, not yet done, no need to review yet
Feb 1, 2024
de9bb3a
merging results for local to remote branches
Feb 1, 2024
4ac551f
passing pre-commits and doctests
adilkassim Feb 2, 2024
38ee4bc
Merge branch 'main' into cleaning_company_column
averyschoen Feb 2, 2024
37dcbf7
Update linkage.py
averyschoen Feb 2, 2024
7f9135f
finished dedup function with helper function to output to a csv_file …
Feb 4, 2024
fb10654
updated function
adilkassim Feb 5, 2024
29ee6bb
made modifications to the deduplication function
Feb 6, 2024
cfa15d0
received a git push error stating that the tip of my branch is behind…
Feb 6, 2024
3d26fde
trying to see what the git branch issues are...no need to review this…
Feb 7, 2024
9646030
Merge pull request #16 from dsi-clinic/cleaning_company_column
averyschoen Feb 7, 2024
5843485
implementing PR feedback
Feb 8, 2024
97b89dd
addressing linter tests failure due to formatting
Feb 8, 2024
61c731f
Merge branch 'main' of https://github.com/dsi-clinic/2024-winter-clim…
Feb 8, 2024
7dc5b70
updating requirements.txt
adilkassim Feb 8, 2024
a3310a1
adding pre_process pipeline funcion
adilkassim Feb 8, 2024
270d532
fixed error in row_matches
nrposner Feb 13, 2024
99dc781
Merge branch 'main' into row_similarity
nrposner Feb 14, 2024
7b3e8f0
fixing linter errors
nrposner Feb 14, 2024
6655192
updates to dedup file and beginning steps on netorkx
Feb 14, 2024
b24041d
Delete notebooks/Test.ipynb
averyschoen Feb 14, 2024
cbe4d1e
Merge pull request #20 from dsi-clinic/clean_rm_unnecessary_row_info
averyschoen Feb 14, 2024
869a2ea
(not complete) splink
npashilkar Feb 14, 2024
21e8575
trying to fix disconnect
Feb 14, 2024
495db81
updated classify
Feb 15, 2024
dbaad50
updated name_rank function
adilkassim Feb 15, 2024
fb48450
splink to .py
npashilkar Feb 15, 2024
a943442
Merge branch 'main' into splink-library-notebook
npashilkar Feb 15, 2024
fbc579c
Update linkage.py
averyschoen Feb 15, 2024
6a41aa0
discovered logic error in dedup function...no need to review yet
Feb 15, 2024
2e28eec
file that tests my deduplicate function
Feb 15, 2024
5621652
file that tests my deduplicate function
Feb 15, 2024
cfb6d26
testing if path to complete_orgs_table.csv is working
Feb 18, 2024
e8f2200
Merge branch 'main' into row_similarity
nrposner Feb 19, 2024
d9356a2
added test for row_matches
Feb 19, 2024
256caf7
changes to linkage
Feb 19, 2024
899263f
merging linkage
Feb 19, 2024
1d11b52
fixing linter issues
Feb 19, 2024
2c03548
fixing test
Feb 19, 2024
1b90fc4
fixing linter errors
Feb 19, 2024
3dd4b4d
fixing typo
Feb 19, 2024
4df8236
fixing typo again
Feb 19, 2024
7aab13e
added match_confidence function
Feb 19, 2024
8796fa6
fixing linter
Feb 19, 2024
f932563
removed duplicate test
Feb 19, 2024
ce54095
updating classifier
Feb 19, 2024
ff02e3d
fixing linter
Feb 19, 2024
caa3f99
Merge pull request #15 from dsi-clinic/row_similarity
averyschoen Feb 19, 2024
e8d246d
Merge branch 'main' into name_uniqueness
adilkassim Feb 19, 2024
4e35327
slight formatting changes
adilkassim Feb 19, 2024
0ed5538
Merge pull request #23 from dsi-clinic/name_uniqueness
averyschoen Feb 19, 2024
c3c8def
preprocess file and function initial commit
adilkassim Feb 19, 2024
97e78ae
update readme
Feb 19, 2024
71cbb37
adding tests to appropriate winter repo
Feb 19, 2024
204c330
Merge branch 'main' of https://github.com/dsi-clinic/2024-winter-clim…
Feb 19, 2024
cccc7cc
slight edits
adilkassim Feb 19, 2024
25eaf60
fixing linter errors
Feb 19, 2024
57c6070
removing preprocess function from linkage.py
adilkassim Feb 19, 2024
2776636
slight changes
adilkassim Feb 19, 2024
d3df75b
changing branches, no need to review
Feb 19, 2024
531453c
changing branches, no need to review
Feb 19, 2024
75082e4
finishing up with dedup func
Feb 19, 2024
1ea09b4
Renaming File
adilkassim Feb 19, 2024
8be737b
Merge pull request #27 from dsi-clinic/afs/update_readme
trevorspreadbury Feb 19, 2024
6d5cee8
update on function to add nodes and their attributes to graph
Feb 19, 2024
e92192b
checking for issue with linter test
Feb 19, 2024
6e04344
Delete tests/tester.ipynb
averyschoen Feb 20, 2024
2ce3d66
Merge pull request #26 from dsi-clinic/dedup_func_testing
averyschoen Feb 21, 2024
976d2af
Saving notebook on networkx
Feb 21, 2024
462cbbc
Merge branch 'main' into preprocess
adilkassim Feb 21, 2024
9534af9
combine test files
Feb 21, 2024
f91ee0e
precommit
Feb 21, 2024
06e91e0
modified get_likely_name function to accomodate non-str inputs
Feb 21, 2024
2a30961
Delete tests directory
averyschoen Feb 21, 2024
9399f10
Merge pull request #30 from dsi-clinic/afs/fix_testing_files
averyschoen Feb 21, 2024
2bea6e7
finishing merge process, no need to review
Feb 21, 2024
e007f3c
splink notebook
npashilkar Feb 21, 2024
89af1cb
splink function clean-up
npashilkar Feb 21, 2024
811165a
Merge branch 'main' into splink-library-notebook
npashilkar Feb 21, 2024
a5eb7a1
splink function clean-up
npashilkar Feb 22, 2024
8a43ca5
Merge branch 'splink-library-notebook' of https://github.com/dsi-clin…
npashilkar Feb 22, 2024
ae1db64
splink function clean-up2
npashilkar Feb 22, 2024
21af2c9
updates
adilkassim Feb 22, 2024
4d7bdfb
adding output csv
adilkassim Feb 22, 2024
fa8c0da
Saving work on networkx branch
Feb 22, 2024
1e4a550
Saving work on networkx branch
Feb 22, 2024
0a043a9
updating docstring of dedup func based on feedback
Feb 22, 2024
cd94c08
pipeline progress so far on network linkage
Feb 24, 2024
22607e7
saving changes in networkx, no need for review
Feb 24, 2024
661feff
updated column names and docstring of dedup func based on Avery's fee…
Feb 24, 2024
4425afe
Merge pull request #31 from dsi-clinic/get_likely_name_func
averyschoen Feb 25, 2024
fa011a7
Merge pull request #32 from dsi-clinic/dedup_func_testing
averyschoen Feb 25, 2024
d58795b
splink output edit
npashilkar Feb 26, 2024
d0f36b6
saving Networkx work before merge...no need to review
Feb 26, 2024
2595098
concluding merge
Feb 26, 2024
f8df69f
saving work for merge, no need to review
Feb 26, 2024
7b2ca08
splink output edits
npashilkar Feb 27, 2024
42ca58e
pipeline changes
adilkassim Feb 28, 2024
77bc2b3
adding removed files
adilkassim Feb 28, 2024
485fe43
basics of makefile and added classify fns
Feb 28, 2024
e4b3a0a
linter fixes
Feb 28, 2024
3ce0c50
modifying classify to fit makefile
Feb 28, 2024
517f909
linter fixes
Feb 28, 2024
244fe94
make should run classification properly
Feb 28, 2024
c273a17
moved names to constants
Feb 28, 2024
9519d67
linter fixes
Feb 28, 2024
0611585
added classification wrapper
Feb 28, 2024
f8c4dc1
linter fix
Feb 28, 2024
3c61937
proper updates
adilkassim Feb 28, 2024
4e32543
removing duplicated function
adilkassim Feb 28, 2024
d94243a
attempting to pass dev checks
adilkassim Feb 28, 2024
4336d3b
modified readme
Feb 28, 2024
df41e42
reformatting files
adilkassim Feb 28, 2024
c687295
add usage instructions
Feb 28, 2024
ddfd126
Merge pull request #34 from dsi-clinic/update_readme
averyschoen Feb 28, 2024
f363bbe
splink changes + deleted notebook
npashilkar Feb 28, 2024
1901341
moved to original makefile
Feb 29, 2024
a7db7d8
Delete utils/Makefile
averyschoen Feb 29, 2024
56a78d8
Merge branch 'main' of https://github.com/dsi-clinic/2024-winter-clim…
Feb 29, 2024
9561f82
Merge pull request #33 from dsi-clinic/make_file
averyschoen Feb 29, 2024
b626fc8
Update linkage.py
averyschoen Feb 29, 2024
807a69d
Merge branch 'main' into splink-library-notebook
npashilkar Feb 29, 2024
1fc2a2a
splink output edits
npashilkar Feb 29, 2024
ecd73d0
Merge pull request #25 from dsi-clinic/splink-library-notebook
averyschoen Feb 29, 2024
bfefce6
Merge branch 'main' into preprocess
adilkassim Feb 29, 2024
26d4773
classify function
adilkassim Feb 29, 2024
609220d
saving work for graph work. No need to review yet
Mar 3, 2024
3266ce7
slight changes
adilkassim Mar 4, 2024
0cdc61a
pulling from main and pipeline additions
adilkassim Mar 4, 2024
d262dee
possible splink implementation fix
adilkassim Mar 4, 2024
6e12bac
Merge branch 'main' of https://github.com/dsi-clinic/2024-winter-clim…
Mar 4, 2024
5a81b23
graph work so far with plotly
Mar 4, 2024
b377acd
Test notebook with functions for merging datasets, no need to review,…
Mar 4, 2024
b8da98e
updating splink function
adilkassim Mar 4, 2024
0185093
pipeline updates
adilkassim Mar 4, 2024
f05778b
passing linter
adilkassim Mar 4, 2024
6de450d
linter
adilkassim Mar 4, 2024
96b8e0b
updated network graph work
Mar 4, 2024
51cc9de
updated classify test
Mar 4, 2024
4cc7ce4
fix pytest
Mar 4, 2024
7f3483c
updated classify and test_classifier
Mar 4, 2024
94f807c
Revert "fix pytest"
Mar 4, 2024
d62f3b7
Revert "updated classify test"
Mar 4, 2024
621e35a
expanded docstrings for classify
Mar 4, 2024
cdf035a
updated visualizations for the graph
Mar 4, 2024
74996a5
updates to the README files under the output and data directories
Mar 4, 2024
0cebc4c
latest version of networkx work
Mar 5, 2024
4625248
linkage.py clean up including additions to constants.py
npashilkar Mar 5, 2024
133dadc
addressing comments on classify and test_classify
Mar 5, 2024
feda102
Delete utils/tests/test_classifier.py
averyschoen Mar 5, 2024
863cfab
Merge pull request #36 from dsi-clinic/update_classify
averyschoen Mar 5, 2024
9c5ff3c
making revisions to data/README and network.py per Avery's feedback
Mar 5, 2024
cb9613c
Merge branch 'main' of https://github.com/dsi-clinic/2024-winter-clim…
Mar 5, 2024
18a52ff
making revisions to data/README and network.py per Avery's feedback
Mar 5, 2024
743b306
updating readme and makefile as well as location of data for linkage_…
Mar 5, 2024
793b8af
removing unneccessary tests
npashilkar Mar 5, 2024
a571d91
slight update to splink_dedupe function
adilkassim Mar 5, 2024
1db2839
pre-commit fixes
adilkassim Mar 5, 2024
083f92f
last minute modifications to network file. final version
Mar 5, 2024
09aca55
removing main() from file
Mar 5, 2024
269998c
removing main() from file
Mar 5, 2024
d6167df
updated README.md to show networkX portion of the pipeline
Mar 5, 2024
8d62939
saving Test.ipynb work
Mar 5, 2024
c9752a0
Delete notebooks/Test.ipynb
averyschoen Mar 5, 2024
0f7d07e
Update Makefile
averyschoen Mar 5, 2024
3c2005f
Merge pull request #29 from dsi-clinic/networkx_record_linkage
averyschoen Mar 5, 2024
039768b
matching local branch with main
Mar 5, 2024
afda3a6
updated network metrics
npashilkar Mar 5, 2024
f16083f
experimental changes do not push
Mar 5, 2024
1d83e54
tying to stash
Mar 5, 2024
4bd4ca3
likewise
Mar 5, 2024
1e75f1e
merged most recent from main in here
Mar 5, 2024
914c5e3
changed README
Mar 5, 2024
9bf8eaf
modified network
Mar 5, 2024
a9662fd
adding small sample files to test on
Mar 5, 2024
2e6050e
modifications to linkage pipeline for test version, does not include …
Mar 5, 2024
0af314c
Merge pull request #37 from dsi-clinic/linkage-code-clean-up
averyschoen Mar 5, 2024
e0147df
Merge branch 'main' of https://github.com/dsi-clinic/2024-winter-clim…
Mar 6, 2024
d06da14
Merge branch 'main' into preprocess
adilkassim Mar 6, 2024
0a3b4e7
slight modifications to linkage.py for cleaning purposes
Mar 6, 2024
9f980ff
slight modifications to linkage.py for cleaning purposes, now passing…
Mar 6, 2024
d9fb38a
docstring edits
npashilkar Mar 6, 2024
7ebe2a2
slight changes
adilkassim Mar 6, 2024
51f82d4
revert changes to standardize_corp_names...the logic goes through man…
Mar 6, 2024
9a03521
renaming file
adilkassim Mar 6, 2024
85d07b4
adding nameparser to the list of required packages to be downloaded
Mar 6, 2024
d4161f6
updating functions to latest versions
adilkassim Mar 6, 2024
45347e2
slight changes to match function changes in linkage.py
adilkassim Mar 6, 2024
7317230
Merge branch 'main' into final_tech
nrposner Mar 6, 2024
56c46c9
fixing precommit errors
Mar 6, 2024
8d5ab45
fixing errors in tets files
Mar 6, 2024
ec026b6
linter errors
Mar 6, 2024
5871dcb
import errors
Mar 6, 2024
c832f3c
fixing pytest errors
Mar 6, 2024
22200e3
yet more pytest errors
Mar 6, 2024
d20d4f0
linter errors
Mar 6, 2024
d58970a
this version actually works with make
Mar 6, 2024
1010032
linter error
Mar 6, 2024
f25be09
fix makefile to run through docker
trevorspreadbury Mar 6, 2024
e084aba
fix linter issues -- possible blake and isort conflict
trevorspreadbury Mar 6, 2024
24ef142
Merge pull request #39 from dsi-clinic/networkx_record_linkage
averyschoen Mar 6, 2024
9435715
updated docstring
npashilkar Mar 6, 2024
ad2ed0f
slight changes
adilkassim Mar 6, 2024
4b0de47
readme changes
adilkassim Mar 6, 2024
0c79023
data/ readme changes
adilkassim Mar 6, 2024
48470c2
pre-commit formatting changes
adilkassim Mar 6, 2024
3fbf913
Merge pull request #28 from dsi-clinic/preprocess
averyschoen Mar 6, 2024
4838026
Merge branch 'main' into metric-functions-update
naynapashilkar Mar 6, 2024
53db9f6
Merge branch 'main' into fix_makefile
averyschoen Mar 6, 2024
0417af5
Delete utils/linkage_pipeline.py
averyschoen Mar 6, 2024
f4536e2
Update linkage.py
averyschoen Mar 6, 2024
bee198a
Update Makefile
adilkassim Mar 6, 2024
8126372
Update linkage.py
averyschoen Mar 6, 2024
f7c185e
Update network.py
averyschoen Mar 6, 2024
99b4f96
Merge pull request #41 from dsi-clinic/fix_makefile
averyschoen Mar 6, 2024
3a63118
Merge pull request #43 from dsi-clinic/makefile_path
averyschoen Mar 6, 2024
be77596
Update network.py
averyschoen Mar 6, 2024
b3eeefe
Merge pull request #38 from dsi-clinic/metric-functions-update
averyschoen Mar 6, 2024
985c7f8
Update README.md
adilkassim Mar 6, 2024
faa391e
Update README.md
adilkassim Mar 6, 2024
fb7d4ef
Merge pull request #44 from dsi-clinic/readmechanges
trevorspreadbury Mar 6, 2024
7b670e9
updated readme to contain defualt file names and have main access tho…
Mar 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
edited function
  • Loading branch information
adilkassim committed Feb 1, 2024
commit 28c003433545676a9f09827e29814d40543ff4c4
31 changes: 18 additions & 13 deletions utils/linkage.py
Original file line number Diff line number Diff line change
@@ -151,20 +151,25 @@ def name_rank(first_name: str, last_name: str) -> list:
second element is the element corresponds to the rank of the last name
"""

if first_name is None or last_name is None:
return [None, None]

if not isinstance(first_name, str) or not isinstance(last_name, str):
return [None, None]

first_name_result = nd.search(first_name)
last_name_result = nd.search(last_name)
first_name_rank = 0
last_name_rank = 0
try:
first_name_rank = first_name_result["first_name"]["rank"][
"United States"
]
except KeyError:
pass

try:
last_name_rank = last_name_result["last_name"]["rank"]["United States"]
except KeyError:
pass
first_name_rank = None
last_name_rank = None

if first_name_result and isinstance(first_name_result, dict):
first_name_data = first_name_result.get("first_name")
if first_name_data and "rank" in first_name_data:
first_name_rank = first_name_data["rank"].get("United States", None)

if last_name_result and isinstance(last_name_result, dict):
last_name_data = last_name_result.get("last_name")
if last_name_data and "rank" in last_name_data:
last_name_rank = last_name_data["rank"].get("United States", None)

return [first_name_rank, last_name_rank]