drop NaN before sending to ML? #3

FlorinAndrei · 2019-12-20T21:01:37Z

https://github.com/PlatorSolutions/quarterly-earnings-machine-learning-algo/blob/4ce8e1e829ed6f6ecd74f8abbfaf91114af1201b/cloudml_prepare_local_csv.py#L31

Should there be a df.dropna(subset=['prc_change_t2']) here?

I collected the data for the last 10 years. I get 91k rows at that point. But if I run .dropna(subset=['prc_change_t2']), only about 20k rows remain. I think the NaN rows should not even be sent to ML.

The text was updated successfully, but these errors were encountered:

FlorinAndrei · 2019-12-20T22:48:20Z

Also for the text column. I did dropna() on that one too, and the final table now has about 18k rows.

FlorinAndrei · 2019-12-20T22:50:08Z

Finally, should the filename column even be kept in the CSV? It's not used by ML at all, is it?

The CSV is big enough as it is, might as well trim down the stuff that's not used.

Ben-Sherman · 2019-12-21T06:15:25Z

Yes you're right on dropping the filename column. I added that here c5b1458

As for another dropna, there should not be anything to drop there if you're joining it with the financial dataframe where that column has already had the nulls dropped https://github.com/PlatorSolutions/quarterly-earnings-machine-learning-algo/blob/c5b145852bbf8f17f3e472eb5fb319e254a554a3/cloudml_prepare_local_csv.py#L8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

drop NaN before sending to ML? #3

drop NaN before sending to ML? #3

FlorinAndrei commented Dec 20, 2019 •

edited

Loading

FlorinAndrei commented Dec 20, 2019 •

edited

Loading

FlorinAndrei commented Dec 20, 2019 •

edited

Loading

Ben-Sherman commented Dec 21, 2019 •

edited

Loading

drop NaN before sending to ML? #3

drop NaN before sending to ML? #3

Comments

FlorinAndrei commented Dec 20, 2019 • edited Loading

FlorinAndrei commented Dec 20, 2019 • edited Loading

FlorinAndrei commented Dec 20, 2019 • edited Loading

Ben-Sherman commented Dec 21, 2019 • edited Loading

FlorinAndrei commented Dec 20, 2019 •

edited

Loading

FlorinAndrei commented Dec 20, 2019 •

edited

Loading

FlorinAndrei commented Dec 20, 2019 •

edited

Loading

Ben-Sherman commented Dec 21, 2019 •

edited

Loading