-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom queing logic? #5
Comments
Definitely. Your english is great, and I do understand what your talking about. I do have plans to make the url frontier more like a plugin that people can swap out. I'm working on doing something similar with redis, which would make it easy to distribute the crawl. I'm going to make the url frontier pluggable, and doc the interface for it so that anyone can write their own. This is something that changes from use case to use case. |
I've also modified the url frontier to work with couchdb. It's not a One change that is crucial is that the public functions of url frontier On Mon, Dec 29, 2014, 12:13 AM James Culveyhouse [email protected]
|
It'd be great if we could create custom queues or frontiers and inject them into the crawler as an option or parameter.
What I want to do is to use a database to store what urls I have visited and how many times. I also want to have my own custom logic about what url queuing. Maybe I could inject least recently visited urls back in the queue if queue becomes empty to update existing data.
It'd be great if that queuing logic is made into a separate module that we can implement. Let me know if you understand what I'm trying to say. English is not really my native language.
The text was updated successfully, but these errors were encountered: