Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom queing logic? #5

Open
ArsalanDotMe opened this issue Dec 26, 2014 · 2 comments
Open

Custom queing logic? #5

ArsalanDotMe opened this issue Dec 26, 2014 · 2 comments

Comments

@ArsalanDotMe
Copy link
Contributor

It'd be great if we could create custom queues or frontiers and inject them into the crawler as an option or parameter.
What I want to do is to use a database to store what urls I have visited and how many times. I also want to have my own custom logic about what url queuing. Maybe I could inject least recently visited urls back in the queue if queue becomes empty to update existing data.

It'd be great if that queuing logic is made into a separate module that we can implement. Let me know if you understand what I'm trying to say. English is not really my native language.

@jculvey
Copy link
Owner

jculvey commented Dec 28, 2014

Definitely. Your english is great, and I do understand what your talking about. I do have plans to make the url frontier more like a plugin that people can swap out.

I'm working on doing something similar with redis, which would make it easy to distribute the crawl. I'm going to make the url frontier pluggable, and doc the interface for it so that anyone can write their own. This is something that changes from use case to use case.

@ArsalanDotMe
Copy link
Contributor Author

I've also modified the url frontier to work with couchdb. It's not a
pluggable architecture yet but mostly it's about implementing the public
functions (the ones that don't start with _) in your own class and
providing it to the crawler at the start.

One change that is crucial is that the public functions of url frontier
should all be async. I had to modify crawler.js too to make it work with
async methods.

On Mon, Dec 29, 2014, 12:13 AM James Culveyhouse [email protected]
wrote:

Definitely. Your english is great, and I do understand what your talking
about. I do have plans to make the url frontier more like a plugin that
people can swap out.

I'm working on doing something similar with redis, which would make it
easy to distribute the crawl. I'm going to make the url frontier pluggable,
and doc the interface for it so that anyone can write their own. This is
something that changes from use case to use case.


Reply to this email directly or view it on GitHub
#5 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants