You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for that issue report. From the top of my head I'd guess this is because most encoders convert the input to dataframes and create a deep copy of it. Maybe this wasn't the case yet in old versions. I'd need some time to check if this is really the reason. I'm also not sure if the deep copies can safely be removed, probably there was a reason the add them in the first place.
If you want to investigate it feel free, otherwise I'll have a look and keep you posted
Hi, thanks for your quick reply!
I have observed the same memory usage issue with WOEEncoder (#364 ). About the root cause of the memory increase in these two APIs, I tried to look for it in the code changes in version 2.0.0, but it was not a good choice due to the number of code changes. I will take your suggestion and see if it is due to "deep copy".
Expected Behavior
Similar memory usage for the different category_encoders versions or better performance for higher category_encoders versions.
Actual Behavior
According to the experiment results, when the category_encoders version is higher than 2.0.0, the performance of the model is worse.
Steps to Reproduce the Problem
Step 1: download above dataset
train & test (63MB)
Step 2: install category_encoders
pip install category_encoders == #version#
Step 3: change category_encoders version and save the memory usage
Specifications
The text was updated successfully, but these errors were encountered: