How to prevent Dataset from being deleted when running the crawler? #1878
-
I want have a dataset file that save my scraped data by crawler and append to it each time I start the crawler (I want to avoid deleting the Dataset file): So I config the crawler by this way:
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
My guess is that the purge is happening before this code is executed, or through something else than the crawler object itself, e.g. if you use some static helpers like Alternative to what you are doing can be using env vars or |
Beta Was this translation helpful? Give feedback.
-
@B4nan, Thank you for your reply. I checked the code but did not find any line like
When purgeOnStart is false What do you think? I created a new project with default template of main.js
In this case if I use bellow
When purgeOnStart is false |
Beta Was this translation helpful? Give feedback.
-
Thank you @B4nan, I managed it by your way. is it possible to answer the following question of me too? |
Beta Was this translation helpful? Give feedback.
You need to purge on start, otherwise only requests that were never processed can go through. That's the whole point of the auto-purging, to clear the crawler state. If you disable it, things can run only once, then the queue will consist only of processed requests, so it's expected you wont get inside the request handler at all.
You could use named dataset, those are not purged anyhow (its up to you to remove the data, either manually or via
drop
method).