You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, we have a usecase where we have a set of urls stored in a DB. There is a process which fetch set of urls from db and send it to crawler. After crawler finishes the job we want to mark the fetched at timestamp in db.
lets imagine this job id is 1
But the problem is, one of this page may contain links that we need to crawl more to obtain further information. Imagine we find 6 links on first page, we need to crawl all 6 pages and extract information to mark this job(job id 1) done.
Extracting links from first page and crawling them with Crawly is not a problem. My problem is passing our context(we have an id in DB for each entity, each crawl is associated with this entity) so at some point we can mark this job done in the DB
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi, we have a usecase where we have a set of urls stored in a DB. There is a process which fetch set of urls from db and send it to crawler. After crawler finishes the job we want to mark the fetched at timestamp in db.
lets imagine this job id is
1
But the problem is, one of this page may contain links that we need to crawl more to obtain further information. Imagine we find 6 links on first page, we need to crawl all 6 pages and extract information to mark this job(job id
1
) done.Extracting links from first page and crawling them with Crawly is not a problem. My problem is passing our context(we have an id in DB for each entity, each crawl is associated with this entity) so at some point we can mark this job done in the DB
What should be my approach with Crawly?
Beta Was this translation helpful? Give feedback.
All reactions