-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Kingfisher Process integration #745
Conversation
Dockerizing scrapyd makes it more complicated to deploy new spiders, so I will remove that part and deploy scrapyd normally. |
…id "new new generation" in future
I'm not sure why this is needed: [deploy:kingfisher]
url = http://localhost:6800
project = kingfisher Maybe it was needed for the Docker deployment? But we will not need it for the regular deployment. |
Comment from #656:
I think we decided not to continue this feature in the new version of Kingfisher Process. Analysts can instead check the Scrapy log to find these errors. See #531.
This still needs to be addressed. Update: Done now.
Need to check how the new Process is implemented to see if this needs to be addressed or or not. Update: See #745 |
I'm not sure how the bug referenced in this commit could occur: 04135ea |
- Disable KingfisherProcessAPI2 if DatabaseStore would be enabled - Pass spider to helper, instead of setting it in one handler - Increase logging level for some exceptions - Use a single RABBIT_URL environment variable - Put the Kingfisher Process API basic authentication in the URL - Don't use set_value, since inc_value already uses 0 as default Style changes: - if-statements should have the error case in the else branch - Follow style guide for logging messages - Increase consistency of variable names - Use consistent quote characters
@jakubkrafka Do you remember how this error occurred? The |
If I understand the integration correctly, then FILES_STORE must be an absolute directory. Otherwise, I'm not sure how Kingfisher Process can reliably read the file. Update: It is now guaranteed to be absolute. |
The following were sent to the old Kingfisher Process, but are not sent to the new Kingfisher Process:
|
Got it: FileError does not have files_store or path. I can't reproduce a case where |
- Move ExpectedError to tests/__init__.py - Configure KingfisherProcessAPI2 in spider_with_files_store - Re-order test_item_scraped_plucked_item test - Re-add yields in inlineCallbacks tests
- Ensure KingfisherProcessAPI2.channel is always defined - Ensure sample is a boolean as expected by Kingfisher Process NG - Use [] instead of get() to avoid shadowing errors - Add more tests - Add docstrings
- Open files as binary for ijson - Move instance variables into __init__ method - Use spider.logger instead of new logger instance - In-line absolute_crawl_directory (which is not guaranteed to be absolute)
No description provided.