Passing crawl metadata to identify the context #286

kasvith · 2024-01-21T22:31:20Z

kasvith
Jan 21, 2024

Hi, we have a usecase where we have a set of urls stored in a DB. There is a process which fetch set of urls from db and send it to crawler. After crawler finishes the job we want to mark the fetched at timestamp in db.

lets imagine this job id is 1

But the problem is, one of this page may contain links that we need to crawl more to obtain further information. Imagine we find 6 links on first page, we need to crawl all 6 pages and extract information to mark this job(job id 1) done.

Extracting links from first page and crawling them with Crawly is not a problem. My problem is passing our context(we have an id in DB for each entity, each crawl is associated with this entity) so at some point we can mark this job done in the DB

What should be my approach with Crawly?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Passing crawl metadata to identify the context #286

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Passing crawl metadata to identify the context #286

kasvith Jan 21, 2024

Replies: 0 comments

kasvith
Jan 21, 2024