Custom queing logic? #5

ArsalanDotMe · 2014-12-26T19:37:19Z

It'd be great if we could create custom queues or frontiers and inject them into the crawler as an option or parameter.
What I want to do is to use a database to store what urls I have visited and how many times. I also want to have my own custom logic about what url queuing. Maybe I could inject least recently visited urls back in the queue if queue becomes empty to update existing data.

It'd be great if that queuing logic is made into a separate module that we can implement. Let me know if you understand what I'm trying to say. English is not really my native language.

jculvey · 2014-12-28T19:13:13Z

Definitely. Your english is great, and I do understand what your talking about. I do have plans to make the url frontier more like a plugin that people can swap out.

I'm working on doing something similar with redis, which would make it easy to distribute the crawl. I'm going to make the url frontier pluggable, and doc the interface for it so that anyone can write their own. This is something that changes from use case to use case.

ArsalanDotMe · 2014-12-29T05:15:53Z

I've also modified the url frontier to work with couchdb. It's not a
pluggable architecture yet but mostly it's about implementing the public
functions (the ones that don't start with _) in your own class and
providing it to the crawler at the start.

One change that is crucial is that the public functions of url frontier
should all be async. I had to modify crawler.js too to make it work with
async methods.

On Mon, Dec 29, 2014, 12:13 AM James Culveyhouse [email protected]
wrote:

Definitely. Your english is great, and I do understand what your talking
about. I do have plans to make the url frontier more like a plugin that
people can swap out.

I'm working on doing something similar with redis, which would make it
easy to distribute the crawl. I'm going to make the url frontier pluggable,
and doc the interface for it so that anyone can write their own. This is
something that changes from use case to use case.

—
Reply to this email directly or view it on GitHub
#5 (comment).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom queing logic? #5

Custom queing logic? #5

ArsalanDotMe commented Dec 26, 2014

jculvey commented Dec 28, 2014

ArsalanDotMe commented Dec 29, 2014

Custom queing logic? #5

Custom queing logic? #5

Comments

ArsalanDotMe commented Dec 26, 2014

jculvey commented Dec 28, 2014

ArsalanDotMe commented Dec 29, 2014