Skip to content

How to prevent Dataset from being deleted when running the crawler? #1878

Answered by B4nan
Ramin-Bateni asked this question in Q&A
Discussion options

You must be logged in to vote

You need to purge on start, otherwise only requests that were never processed can go through. That's the whole point of the auto-purging, to clear the crawler state. If you disable it, things can run only once, then the queue will consist only of processed requests, so it's expected you wont get inside the request handler at all.

You could use named dataset, those are not purged anyhow (its up to you to remove the data, either manually or via drop method).

// create named dataset
const ds = await Dataset.open('my-data');

// push to it
await ds.pushData({
  url: request.loadedUrl,
  title,
});

Replies: 3 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@B4nan
Comment options

Answer selected by Ramin-Bateni
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants