Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Free Throws, Blocks and Personal Fouls #12

Open
Goujer opened this issue Mar 21, 2019 · 3 comments
Open

Free Throws, Blocks and Personal Fouls #12

Goujer opened this issue Mar 21, 2019 · 3 comments

Comments

@Goujer
Copy link
Contributor

Goujer commented Mar 21, 2019

I noticed that free throws, blocks and personal fouls are not gathered by getSeasonData().
I'm not 100% familiar with the algorithm yet but I figured I'd add them to the list and see what happens. I am curious though why these were emitted from the start and I figured you guys might have a reason.

On another note much of the missing columns like TOV, Opp., ORB, MP and others have been added to sports-reference, I'm working on importing the data into the CSVs and I am seeing a slight accuracy boost.

@adeshpande3
Copy link
Owner

No particular season they were emitted. At the beginning, I just wanted to get more of the basic features before adding other ones. Adding more features definitely does provide more information, it would create vectors of high dimensions which generally does make it a bit harder for the models to learn, so I'm not sure if a stat like minutes played would help that much. Although I could be wrong and like you mentioned, if you're seeing a slight performance boost, it could be worth it.

@Goujer
Copy link
Contributor Author

Goujer commented Mar 28, 2019

The increase is very slight like 0.3% average increase in accuracy. It's small but consistent in that increase.
I'm not the most familiar with sklearn or the models it uses so when you say it makes it harder, is that harder in a way that it could get things wrong or harder in that it takes more time?

@adeshpande3
Copy link
Owner

Harder in the sense that increasing the number of features generally increases the number of training samples you may need to get a sufficiently trained model.

https://en.wikipedia.org/wiki/Curse_of_dimensionality
http://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants