-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drop NaN before sending to ML? #3
Comments
Also for the text column. I did dropna() on that one too, and the final table now has about 18k rows. |
Finally, should the The CSV is big enough as it is, might as well trim down the stuff that's not used. |
Yes you're right on dropping the filename column. I added that here c5b1458 As for another dropna, there should not be anything to drop there if you're joining it with the financial dataframe where that column has already had the nulls dropped https://github.com/PlatorSolutions/quarterly-earnings-machine-learning-algo/blob/c5b145852bbf8f17f3e472eb5fb319e254a554a3/cloudml_prepare_local_csv.py#L8 |
https://github.com/PlatorSolutions/quarterly-earnings-machine-learning-algo/blob/4ce8e1e829ed6f6ecd74f8abbfaf91114af1201b/cloudml_prepare_local_csv.py#L31
Should there be a
df.dropna(subset=['prc_change_t2'])
here?I collected the data for the last 10 years. I get 91k rows at that point. But if I run
.dropna(subset=['prc_change_t2'])
, only about 20k rows remain. I think the NaN rows should not even be sent to ML.The text was updated successfully, but these errors were encountered: