-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: possibility to pause/resume on-going vecotirzer jobs #336
Comments
AFAIK in order to reproduce this behavior i should create a vectorizer with an EMPTY QUEUE, and then start loading rows little by litte at night time. the problem here is that i should wait every time that the worker is done before adding more rows in the queue, and this makes less sense because the automation should be a concern of the worker itself and not a periodic job outside the worker and in timescale |
To stop the vectorizer worker, you just need to stop the workers. That's going to depend on what infra you're running. You can set a job that stops every vectorizer worker container you're running at a specific time of day. If you're running everything locally, you can start the containers, then set a cronjob that runs On cloud we expose start and stop, but the approach on cloud is different. We import the vectorizer as a library, and use a push approach, with the DB generating http events instead of the worker polling for the vectorizers. The events are created by using There are multiple approaches you can take. I'd go with starting and stoping the containers, that seems to be the easiest solution. |
the problem with this approach is that i might need to start all workers with the "-i flag" isn't it ? this does not scale that well if i dont know the number of vectorizers in advance, which is what i am aiming for. or am i misunderstanding the approach here ? |
Hey @dberardo-com if you don't want to process any vectorizer, then the solution is to stop the worker. If you want the worker to keep working processing other vectorizers, but you want it to skip specific vectorizers, then I'd say this is correct:
I agree this doesn't scale, for that use case. I need to double check something. But I think this is a feature we can add. |
What problem does the new feature solve?
i have a very big dataset (wikimedia.en) that i would like to vectorize, and would like the vectorizer to run only at night time.
i am downloading the whole wikimedia dataset in a single shot, but would like the vectorizer to work it out only at night.
What does the feature do?
Implementation challenges
No response
Are you going to work on this feature?
None
The text was updated successfully, but these errors were encountered: