-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Distributed version of the model giving Arrow Capacity error #295
Comments
With regards to number of workers, by setting num_workers parameter to the number of workers I had worked! (xgboost documentation). |
Hey @iamyihwa, thanks for the great report. Do you have very long ids? This answer suggests that it may be an issue with large strings. This seems to be coming from the fit step, so I don't think it's related to your transformation. Increasing the number of partitions may help. |
Thanks @jmoralez ! It worked!! As you suggested increased the number of partitions , and i didn't have that error anymore! Now the training is done seamlessly and very fast, however when getting the forecasted results it takes very long time. (Training took 10 minutes with 8 workers, however when i am getting a glimpse of forecasted results (.take()) it is taking more than 10 minutes, and still counting .. )
What would be the best way to train very large dataset in mlforecast/ neuralforecast @jmoralez ?? |
The lag transformations can take a long time if you have very long series. Can you try using the built-in ones? They should be significantly faster and also support multithreading, so try also setting Also I don't think spark is able to know it can take the first five rows from a single partition, so you can try saving the result first and then getting a subset, otherwise the whole computation will run only to return 5 rows. |
#301 should also make the predict step faster for distributed. |
This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. |
What happened + What you expected to happen
I wonder if it is due to the target transformation routine, that is using pandas, or distributed mlforecast uses pyarrow underneath, and there is a limitation in the size.
Thank you in advance!!!
Versions / Dependencies
0.11.5
Reproduction script
Issue Severity
None
The text was updated successfully, but these errors were encountered: