Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve classify.py #89

Closed
wants to merge 101 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
1241662
reviewed network.py and added preliminary comments for later refactoring
klee2020 Mar 29, 2024
6289156
reviewed and refactored network.py
klee2020 Mar 30, 2024
4199814
transform questions to 'TODO's, remove meta-comments
trevorspreadbury Apr 2, 2024
0a90bfc
Merge pull request #51 from dsi-clinic/week1_network.py_review
trevorspreadbury Apr 2, 2024
1aa638c
Merge branch 'uchicago-dsi:main' into main
trevorspreadbury Apr 2, 2024
43dd135
deleted old state EDA notebooks
trevorspreadbury Apr 2, 2024
0266b2f
make ruff review notebooks
trevorspreadbury Apr 2, 2024
6dbaa6c
Merge pull request #54 from dsi-clinic/ruff-notebook-support
trevorspreadbury Apr 2, 2024
fbd9d80
added lists of companies in oil and coal from fossil free funds
klee2020 Apr 4, 2024
cae4657
get list of F orgs based from CSVs, conduct EDA on orgs
klee2020 Apr 5, 2024
e8e20d6
moved get_ff_companies.py to src/utils
klee2020 Apr 5, 2024
10e5e76
moved get_ff_companies.py to src/utils AGAIN
klee2020 Apr 5, 2024
5f2af25
Delete data/raw_classification/ff_companies/testing_dfs.py
klee2024 Apr 5, 2024
1810e4d
fixing linter issues
klee2020 Apr 5, 2024
44dcc58
refactoring and modifying network.py to display network viz
bhavyapan Apr 8, 2024
bbac608
edited ReadMe for datafiles, started conducting analysis on the UChic…
klee2020 Apr 8, 2024
28eb3ed
harvard data collection
ygxu01 Apr 8, 2024
f7f1199
dataset eda
ygxu01 Apr 8, 2024
1ffbd71
conducting EDA on company database from 2023 UChicago file
klee2020 Apr 9, 2024
a02a2de
Merge branch 'improve-classify.py' of github.com:dsi-clinic/2024-wint…
klee2020 Apr 9, 2024
80eb827
update action versions
trevorspreadbury Apr 9, 2024
7f97b9d
update precommit version
trevorspreadbury Apr 9, 2024
4d13916
Merge pull request #57 from dsi-clinic/update-github-actions
trevorspreadbury Apr 9, 2024
f45cb89
Merge origin/main with github actions fixes into improve-classify.py
trevorspreadbury Apr 9, 2024
bde1c06
reorganized get_ff_companies.py into pipeline to combine known compan…
klee2020 Apr 10, 2024
9db429d
reorganized get_ff_companies.py into pipeline to compile data from FF…
klee2020 Apr 10, 2024
88d8168
pass linter + add TODOs for org classification pipeline and industry …
trevorspreadbury Apr 11, 2024
4d4a864
Merge pull request #55 from dsi-clinic/improve-classify.py
trevorspreadbury Apr 11, 2024
fe255eb
Merge remote-tracking branch 'origin' into collect_data_harvard
trevorspreadbury Apr 11, 2024
7db8c2c
add TODOs for moving harvard pipeline out of state transformer
trevorspreadbury Apr 11, 2024
e25ae13
add network graph todos
trevorspreadbury Apr 11, 2024
0366b79
Merge remote-tracking branch 'origin' into refactor-network.py
trevorspreadbury Apr 11, 2024
5e8310c
Merge pull request #59 from dsi-clinic/refactor-network.py
trevorspreadbury Apr 11, 2024
253bb06
implemented data aggregation of FFF and relevant InfoGroup data. TODO…
klee2020 Apr 12, 2024
38e7c3b
merged from root repo
klee2020 Apr 12, 2024
e106927
harvard data eda file
ygxu01 Apr 12, 2024
5bba185
Merge branch 'collect_data_harvard' of github.com:dsi-clinic/2024-win…
ygxu01 Apr 12, 2024
d6f2b45
add new transformer, edit harvard file, and edit other transfomer fil…
ygxu01 Apr 14, 2024
b1d1e5f
move eda file to Notebook folder
ygxu01 Apr 14, 2024
0a64adf
add ind_transform file to include functions
ygxu01 Apr 14, 2024
436b020
removed harvard_eda from data folder
ygxu01 Apr 15, 2024
c117c1d
harvard pipeline
ygxu01 Apr 15, 2024
747e73e
aggregated and standardized companies from FFF and InfoGroup datasets…
klee2020 Apr 16, 2024
5ba3782
pipeline update
ygxu01 Apr 16, 2024
e7773e8
update readme
ygxu01 Apr 16, 2024
58f5642
readme.md update
ygxu01 Apr 16, 2024
509b56a
rewriting network viz, adding more details
bhavyapan Apr 22, 2024
0395022
added parent company matching, refactored functions, added testing pa…
klee2020 Apr 22, 2024
96f70d3
removed comments, ran linter
bhavyapan Apr 23, 2024
1c0227a
update on folders
ygxu01 Apr 23, 2024
f9cd407
added data origins and how to download in data/README.md and added pa…
klee2020 Apr 23, 2024
9588111
modification on pipeline file
ygxu01 Apr 29, 2024
47f064d
add election folder
ygxu01 Apr 29, 2024
bd40cf2
remove relocated files
ygxu01 Apr 29, 2024
944ef0c
added a function for simple macro-level viz
bhavyapan Apr 29, 2024
166fb82
refactored parent company code, added UUID, got stock symbols
klee2020 Apr 30, 2024
a86a4eb
removed spacy dependency package for now since it's not being used in…
klee2020 Apr 30, 2024
ed9a9ed
update on dedupe code
ygxu01 Apr 30, 2024
ce20311
rewrite dict calls as literals
trevorspreadbury Apr 30, 2024
6bdfb6a
adding todos
trevorspreadbury Apr 30, 2024
21f00c1
updated classification csv and edited code to retrieve all relevant i…
klee2020 Apr 30, 2024
f0bc931
Merge pull request #61 from dsi-clinic/refactor-network.py
trevorspreadbury May 2, 2024
b2ee382
update requirements to install spacy model
trevorspreadbury May 2, 2024
3e3f0d2
add todos for org classification
trevorspreadbury May 2, 2024
573ad62
Merge remote-tracking branch 'refs/remotes/origin/improve-classify.py…
trevorspreadbury May 2, 2024
808552a
Merge branch 'main' into improve-classify.py
trevorspreadbury May 2, 2024
cd7f650
Merge pull request #60 from dsi-clinic/improve-classify.py
trevorspreadbury May 2, 2024
4f978d2
fix for pre-commit linters
trevorspreadbury May 2, 2024
3e716e0
ignore formatting commit in git blame
trevorspreadbury May 2, 2024
8216ce0
corrected updated import path in notebook
trevorspreadbury May 2, 2024
3508ecd
add election readme todo
trevorspreadbury May 2, 2024
2dc3423
Merge branch 'main' into collect_data_harvard
trevorspreadbury May 2, 2024
f3f3d5f
Merge pull request #56 from dsi-clinic/collect_data_harvard
trevorspreadbury May 2, 2024
6044136
dedupe functio update
ygxu01 May 3, 2024
8141e3d
added link in readme file
ygxu01 May 3, 2024
0350b7d
refactored company classification pipeline into different files and e…
klee2020 May 7, 2024
961f3f8
Merge branch 'improve-classify.py' of github.com:dsi-clinic/2024-wint…
klee2020 May 7, 2024
3ac77b1
fixed linkage testing notebook and added docstring to classify_InfoGr…
klee2020 May 7, 2024
a07cf60
dedupe update
ygxu01 May 7, 2024
c773f0d
Add license
timhannifan May 13, 2024
b03287d
cleaned up classification pipeline code and attempted linkage
klee2020 May 14, 2024
d672937
finalised visualization code
bhavyapan May 16, 2024
a6f968c
tx statetransformer update
ygxu01 May 17, 2024
4f810ba
implemented within company linkage and applied to campaign finance data
klee2020 May 22, 2024
51acd62
fixed code that was there for testing
klee2020 May 22, 2024
c9acc84
update on transform pipelines
ygxu01 May 22, 2024
648f995
code clean up and format, update documentation
bhavyapan May 22, 2024
4294e3c
Merge branch 'refactor-network.py' of github.com:dsi-clinic/2024-wint…
trevorspreadbury May 22, 2024
7ea3244
Merge pull request #64 from dsi-clinic/refactor-network.py
trevorspreadbury May 22, 2024
bad8d30
final commit
klee2020 May 22, 2024
79e36e8
script folder update
ygxu01 May 22, 2024
47cc1f0
update utils folder
ygxu01 May 22, 2024
cf41007
Merge branch 'collect_data_harvard' of github.com:dsi-clinic/2024-win…
trevorspreadbury May 22, 2024
1786e04
remove unused variable
trevorspreadbury May 22, 2024
c7b2076
Merge branch 'main' into collect_data_harvard
trevorspreadbury May 22, 2024
4e4336e
fix formatting
trevorspreadbury May 22, 2024
e325e25
ignore formatting changes in git blame
trevorspreadbury May 22, 2024
af1f546
Merge pull request #63 from dsi-clinic/collect_data_harvard
trevorspreadbury May 22, 2024
f904f4a
Merge branch 'main' into improve-classify.py
trevorspreadbury May 22, 2024
2097732
restore tests and rename constants
trevorspreadbury May 22, 2024
02a88e9
Merge branch 'improve-classify.py' of github.com:dsi-clinic/2024-wint…
trevorspreadbury May 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fixed code that was there for testing
klee2020 committed May 22, 2024
commit 51acd62e3b02de1102880919d9567f63ce47ebfc
6 changes: 3 additions & 3 deletions src/utils/constants.py
Original file line number Diff line number Diff line change
@@ -78,9 +78,9 @@
cl.jaro_winkler_at_thresholds(
"company_name", [0.9, 0.6]
), # threshold will catch typos and shortenings,
cl.exact_match(
"zipcode"
), # want to get rid of this actually bc too many false positives
# cl.exact_match(
# "zipcode"
# ), # want to get rid of this actually bc too many false positives
cl.jaro_winkler_at_thresholds("address", [0.9, 0.6]),
],
}