Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

h #12

Open
wants to merge 145 commits into
base: main
Choose a base branch
from
Open

h #12

Changes from 1 commit
Commits
Show all changes
145 commits
Select commit Hold shift + click to select a range
aa7eb11
Update .gitignore to exclude OSX specific files
dhesenkamp Oct 4, 2021
f6e4072
Merge remote-tracking branch 'upstream/main' into main
dhesenkamp Oct 5, 2021
00d49da
Merge branch 'lbechberger:main' into main
imartirosov Oct 5, 2021
beee080
Added uniform classifier
dhesenkamp Oct 5, 2021
d806664
Added F1 score evaluation metric
dhesenkamp Oct 5, 2021
8c1addb
Merge pull request #1 from dhesenkamp/classifier
dhesenkamp Oct 6, 2021
5d35082
Added tweet tokenization
dhesenkamp Oct 6, 2021
4b43523
Update preprocessing.sh
dhesenkamp Oct 6, 2021
5f186cc
Create Documentation.md
dhesenkamp Oct 6, 2021
537acf7
Merge pull request #2 from dhesenkamp/tokenizer
dhesenkamp Oct 6, 2021
ec5eeeb
Modified punctuation_remover.py
dhesenkamp Oct 7, 2021
841745f
Merge pull request #4 from dhesenkamp/tokenizer
dhesenkamp Oct 7, 2021
473af68
Revert "Merge pull request #1 from dhesenkamp/classifier"
dhesenkamp Oct 7, 2021
f7e9e15
Merge branch 'main' of https://github.com/dhesenkamp/MLinPractice
dhesenkamp Oct 7, 2021
7dfd46a
Resolve merge conflict
dhesenkamp Oct 7, 2021
2f6b559
Resolve merge conflict
dhesenkamp Oct 7, 2021
b92fea9
Merge branch 'lbechberger:main' into main
dhesenkamp Oct 7, 2021
a377bcc
Update Documentation.md
dhesenkamp Oct 7, 2021
189843b
Merge branch 'main' of https://github.com/dhesenkamp/MLinPractice
dhesenkamp Oct 7, 2021
69fbf07
Testing of tokenize_input
dhesenkamp Oct 7, 2021
5d4f975
Added stopword remover
dhesenkamp Oct 7, 2021
d577dc3
Refined stopword remover
dhesenkamp Oct 7, 2021
50c422f
Update stopword_remover.py
dhesenkamp Oct 7, 2021
126812d
Further refining of stopword remover
dhesenkamp Oct 8, 2021
e8d5b86
StopwordRemover(), minor changes
dhesenkamp Oct 8, 2021
45e049b
Short info on Cohen's kappa
dhesenkamp Oct 8, 2021
f8f9ef3
Merge pull request #6 from dhesenkamp/stop_word_removal
dhesenkamp Oct 8, 2021
f83dc31
Added Lemmatizer() class
dhesenkamp Oct 8, 2021
ca27622
Added command line arguments etc
dhesenkamp Oct 8, 2021
2e25f66
Merge branch 'lbechberger:main' into main
dhesenkamp Oct 8, 2021
f46090b
Merge pull request #7 from dhesenkamp/lemmatizer
dhesenkamp Oct 11, 2021
4a4f00f
Merge branch 'lbechberger:main' into main
dhesenkamp Oct 11, 2021
0aa2c88
Update README.md
dhesenkamp Oct 11, 2021
71c4ef9
Merge branch 'main' of https://github.com/dhesenkamp/MLinPractice
dhesenkamp Oct 11, 2021
263e39d
Added feature extraction for month
dhesenkamp Oct 12, 2021
84dafa0
Command line args for month extractor
dhesenkamp Oct 12, 2021
1e849de
Trying to resolve merge conflict manually
dhesenkamp Oct 12, 2021
179d236
Merge pull request #8 from dhesenkamp/feature_month
dhesenkamp Oct 12, 2021
04c515a
Merge conflict, readme, documentation
dhesenkamp Oct 12, 2021
c794c7a
Added SentimentAnalyzer class
dhesenkamp Oct 13, 2021
f896c03
SentimentAnalyzer() command line args + script
dhesenkamp Oct 13, 2021
41e6385
readme and documentation for SentimentAnalyzer
dhesenkamp Oct 13, 2021
4fc38ec
Merge feature_sentiment into main
dhesenkamp Oct 13, 2021
aa41d1b
Update readme.md wrt SentimentAnalyser
dhesenkamp Oct 13, 2021
18fd5df
mlflow added to README.md
dhesenkamp Oct 13, 2021
7c59805
Added decision tree classifier
dhesenkamp Oct 13, 2021
5c67318
Param optimization
dhesenkamp Oct 14, 2021
98e646f
Update .gitignore
dhesenkamp Oct 14, 2021
4704f20
Removed .DS_Store
dhesenkamp Oct 14, 2021
f99d6bf
Classifier testing
dhesenkamp Oct 15, 2021
259d0dd
Update .gitignore
dhesenkamp Oct 15, 2021
edd8a82
Added SVM classifier
dhesenkamp Oct 19, 2021
02e9982
Update .gitignore to exlcude mlruns subfolder
dhesenkamp Oct 19, 2021
a8e4099
SVM classifier testing
dhesenkamp Oct 20, 2021
f670e81
Merge pull request for classifier_svm
dhesenkamp Oct 20, 2021
88ffa97
Implemented Photos feature extractor
dhesenkamp Oct 21, 2021
0c6a790
Command line arguments for Photos feature extractor
dhesenkamp Oct 21, 2021
17181d9
Troubleshooting & testing
dhesenkamp Oct 21, 2021
9e61fc4
Testing complete for Photos() feature extractor
dhesenkamp Oct 21, 2021
b1c2e6d
Merge pull request feature_photos
dhesenkamp Oct 21, 2021
065092e
Created & implemented Mentions() feature extractor
dhesenkamp Oct 21, 2021
bfc71e4
Command line args for Mentions() feature extractor
dhesenkamp Oct 21, 2021
bb41b84
Mentions() feature extractor testing
dhesenkamp Oct 21, 2021
690c621
Merge pull request feature_mention
dhesenkamp Oct 21, 2021
bd37301
Merge conflict - manual resolve
dhesenkamp Oct 21, 2021
92fca4b
Update feature_extraction.sh
dhesenkamp Oct 21, 2021
fe60d7e
Manually resolved merge conflict of previous pull request from featur…
dhesenkamp Oct 21, 2021
4aa460e
URL() feature extractor testing
dhesenkamp Oct 21, 2021
31d794c
Variable renaming
dhesenkamp Oct 21, 2021
da54a04
Feature extraction script testing
dhesenkamp Oct 21, 2021
3badd5b
Update stopword_remover.py
dhesenkamp Oct 21, 2021
1b912ba
Updated svm classifier
dhesenkamp Oct 21, 2021
3c61ef7
Pipeline testing
dhesenkamp Oct 21, 2021
2083d71
Created retweets.py, Update extract_features.py
Yannik101010 Oct 21, 2021
f984812
Update examples.py, feature_extraction.py, feature_extraction.sh
Yannik101010 Oct 21, 2021
6890fdb
Create replies.py
Yannik101010 Oct 21, 2021
900e9fb
Update util.py, feature_extraction.py feature_extraction.sh
Yannik101010 Oct 21, 2021
c7e7d1d
Update extract_features.py, replies.py
Yannik101010 Oct 21, 2021
45b5395
Created hastags.py; Update util.py, extract_feature.py, extract_featu…
Yannik101010 Oct 22, 2021
b2198bc
Update classification.sh
dhesenkamp Oct 22, 2021
db5c2f8
Merge branch 'main' of https://github.com/dhesenkamp/MLinPractice
dhesenkamp Oct 22, 2021
160f08a
Merge conflict - random forest classifier
dhesenkamp Oct 22, 2021
a132258
Merge conflict Likes() feature extractor
dhesenkamp Oct 22, 2021
35a70ae
Pipeline testing
dhesenkamp Oct 22, 2021
cac72e5
Create daytime.py
Yannik101010 Oct 23, 2021
eea0bbd
Update daytime.py, extract_feature.sh, extract_feature.py, util.py, e…
Yannik101010 Oct 23, 2021
ed23f5c
Update run_classifier.py, classification.sh
Yannik101010 Oct 23, 2021
38db958
Daytime() feature extractor (added one-hot)
dhesenkamp Oct 24, 2021
2ea07f8
Update classification.sh
dhesenkamp Oct 24, 2021
62da2ee
Merge conflict
dhesenkamp Oct 24, 2021
ff83062
manual merge
dhesenkamp Oct 24, 2021
a455a0f
Merge pull request #15 from dhesenkamp/feature_daytime
dhesenkamp Oct 24, 2021
40fe057
Update daytime.py
Yannik101010 Oct 24, 2021
ce10d4b
Update daytime.py
Yannik101010 Oct 25, 2021
cdf5305
Update .gitignore
dhesenkamp Oct 25, 2021
a6eaaf4
Update .gitignore
dhesenkamp Oct 25, 2021
d710a30
Update Documentation.md
dhesenkamp Oct 25, 2021
081795c
Update Documentation.md
dhesenkamp Oct 26, 2021
762fe55
Corrections, code examples
dhesenkamp Oct 26, 2021
cbc45cc
Added documentation for lemmatization
dhesenkamp Oct 26, 2021
3d5debc
Lemmatization
dhesenkamp Oct 27, 2021
fa5696c
Fixed StopwordRemover()
dhesenkamp Oct 27, 2021
bd44893
Added all feature extraction steps
dhesenkamp Oct 27, 2021
bc01ffc
Merge pull request #16 from dhesenkamp/documentation-visualization
dhesenkamp Oct 27, 2021
860d200
Created ner.py
Yannik101010 Oct 28, 2021
e3595a9
Update .gitignore
dhesenkamp Oct 29, 2021
3c7aa56
Update .gitignore
dhesenkamp Oct 29, 2021
6e0af72
Untrack classifier.pickle file (too big)
dhesenkamp Oct 29, 2021
ef5e7a2
Updates to .gitignore - untracking of some previously tracked files
dhesenkamp Oct 29, 2021
be6f28b
Minor cleanup, documentation
dhesenkamp Oct 29, 2021
0c96227
Added weights arg to knn classifier
dhesenkamp Oct 29, 2021
56c5d3b
Added criterion for split to decision tree
dhesenkamp Oct 29, 2021
89505da
Added additional cl args for random forest + updated documentation
dhesenkamp Oct 29, 2021
c7f95ea
Removed standardization for random forest (not needed)
dhesenkamp Oct 29, 2021
cad897d
Update examples.py, ner.py, extract_feature.py, feature_extraction.sh
Yannik101010 Oct 30, 2021
5df0545
Fine tuning for NER() feature extractor
dhesenkamp Oct 30, 2021
a29e5d1
Revert "Fine tuning for NER() feature extractor"
dhesenkamp Oct 30, 2021
d48bb55
Fine tuning NER() - manually resolving merge conflict
dhesenkamp Oct 30, 2021
3f94e33
manually resolve merge conflict
dhesenkamp Oct 30, 2021
6a76e57
Merge pull request #17 from dhesenkamp/ner
dhesenkamp Oct 30, 2021
7456d26
Documentation + minor cleanup
dhesenkamp Oct 30, 2021
857007d
Added MLP classifier
dhesenkamp Oct 30, 2021
f8970ae
Merge pull request #18 from dhesenkamp/classifier_mlp
dhesenkamp Oct 30, 2021
01c18f3
Added Gaussian NB classifier
dhesenkamp Oct 30, 2021
b4517de
Changed from Gaussian to Complement NB
dhesenkamp Oct 30, 2021
69b68b8
Updated SentimentAnalyzer to only return pos values
dhesenkamp Oct 30, 2021
5c0f26d
Merge pull request #19 from dhesenkamp/classifier_bayes
dhesenkamp Oct 30, 2021
6532381
Update: Clean Code
Yannik101010 Oct 30, 2021
45ed17b
Updated documentation
dhesenkamp Oct 30, 2021
87025a5
Update README.md
Yannik101010 Oct 30, 2021
0b69335
Merge branch 'Readme' into main1
Yannik101010 Oct 30, 2021
b150f53
Added evaluation section to documentation
dhesenkamp Oct 31, 2021
68e1101
Updated classifier to work for param optimization
dhesenkamp Oct 31, 2021
ecd08a0
Hyperparameter optimization script
dhesenkamp Oct 31, 2021
c3c6665
Update Documentation.md
dhesenkamp Oct 31, 2021
605bb61
Summary plots for evaluation metrics
dhesenkamp Oct 31, 2021
f572b02
Added plots for visualization of results to documentation
dhesenkamp Oct 31, 2021
ef63032
Added more plots with summary stats
dhesenkamp Oct 31, 2021
a325be9
Merge pull request #20 from dhesenkamp/param_optimization
dhesenkamp Oct 31, 2021
eb22e87
Documentation + visuals
dhesenkamp Oct 31, 2021
2a5d349
Merge pull request #21 from dhesenkamp/param_optimization
dhesenkamp Oct 31, 2021
f09d753
Added .py file for plots
dhesenkamp Oct 31, 2021
1ac4d25
Update Documentation.md
dhesenkamp Oct 31, 2021
8c94aba
Added tracking results from param optimization
dhesenkamp Oct 31, 2021
8cebc47
Added missing resources & citations
dhesenkamp Nov 1, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Removed .DS_Store
MacOS specific file explorer setting file
  • Loading branch information
dhesenkamp committed Oct 14, 2021
commit 4704f20af6a2ba0048d363885f5a71d051518642
Binary file removed .DS_Store
Binary file not shown.
Binary file removed code/.DS_Store
Binary file not shown.
Binary file removed code/application/.DS_Store
Binary file not shown.
4 changes: 2 additions & 2 deletions code/classification.sh
Original file line number Diff line number Diff line change
@@ -5,10 +5,10 @@ mkdir -p data/classification/

# run feature extraction on training set (may need to fit extractors)
echo " training set"
python -m code.classification.run_classifier data/dimensionality_reduction/training.pickle -e data/classification/classifier.pickle --tree 5 -s 42 --accuracy --kappa
python -m code.classification.run_classifier data/dimensionality_reduction/training.pickle -e data/classification/classifier.pickle -m --knn 5 --tree 5 -s 42 -a -k -f1

# run feature extraction on validation set (with pre-fit extractors)
echo " validation set"
python -m code.classification.run_classifier data/dimensionality_reduction/validation.pickle -i data/classification/classifier.pickle --accuracy --kappa
python -m code.classification.run_classifier data/dimensionality_reduction/validation.pickle -i data/classification/classifier.pickle -m -a -k -f1

# don't touch the test set, yet, because that would ruin the final generalization experiment!
Binary file removed code/classification/.DS_Store
Binary file not shown.
Binary file removed code/preprocessing/.DS_Store
Binary file not shown.
Binary file removed data/.DS_Store
Binary file not shown.
Binary file modified data/classification/classifier.pickle
Binary file not shown.