Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Page loaded before all requests done #319

Closed

Conversation

EvaSDK
Copy link
Collaborator

@EvaSDK EvaSDK commented Dec 22, 2016

I was trying to compute checksum of resources downloaded by Ghost.py but some sites stream their content. This is not a problem for regular web content but file that go through unsupported_content do not behave properly.

One resource is created for each chunk of the file received which obvsiouly does not help in any way with getting the complete content that we can expect. The fix here is to just let NetworkAccessManager do its job.

The second problem that was detected with this case is that QtWebKit emits pageLoaded even though the QNetworkReply object in charge of downloading the file is still running. Hence keep a registry of in-flight requests and only allow wait_for_page_loaded to return when both pageLoaded has been emited and no queries are still running.

This is basically what was reported in PR #265 hopefully with a clearer wording.

@EvaSDK
Copy link
Collaborator Author

EvaSDK commented Dec 27, 2016

On the topic of in-flight requests tracking, see also PR #274. It is imho better to handle the registry in one place and not mix registry handling code between NetworkAccessManager and Session.

@EvaSDK
Copy link
Collaborator Author

EvaSDK commented Dec 28, 2016

After more work on the topic, it appears my fix for unsupported content is wrong.

The last commit I added tried to fix normal content downloading by removing the replyReadyRead function because this function accumulated excessive data in reply.data for unsupported content.

The reason for this is that there is most likely another callback connected to this slot that reads the reply's buffer for normal content but not for unsupported one.

I'll push an update shortly.

@EvaSDK EvaSDK force-pushed the page_loaded_before_all_requests_done branch from 0a1da28 to ddea4b3 Compare December 28, 2016 13:14
@EvaSDK
Copy link
Collaborator Author

EvaSDK commented Dec 28, 2016

Everything should be fine now.

@EvaSDK
Copy link
Collaborator Author

EvaSDK commented Feb 6, 2017

Would it be possible to get a feedback on this PR ? Is it of any interest to you ?

@EvaSDK EvaSDK force-pushed the page_loaded_before_all_requests_done branch from 3dfa605 to 6a031c5 Compare February 20, 2017 14:41
@jeanphix
Copy link
Owner

jeanphix commented May 5, 2017

@EvaSDK Could be nice to get this one rebased on dev?

@EvaSDK
Copy link
Collaborator Author

EvaSDK commented May 5, 2017

Actually I have been trying to cleanup that branch a couple of times already but with dev now being python3 only, it is unlikely that I'll work on merging it as the production I run is still using python 2 and this is what I am targeting for the coming weeks.

@jeanphix
Copy link
Owner

jeanphix commented May 5, 2017

@EvaSDK As PySide2 supports python2, we could probably revert this choice.

EvaSDK and others added 21 commits September 22, 2017 11:20
Add missing Mozilla/

Signed-off-by: Gilles Dartiguelongue <[email protected]>
Signed-off-by: Gilles Dartiguelongue <[email protected]>
Keep version introspectable while avoiding ImportError when dealing with
setup.py.
Unsupported content goes through NetworkAccessManager as well,
no need to make it special for downloading.
Some responses take a while to download so have some logs to see what is
going on. This code should probably be enhanced to skip small downloads
and or start emitting logs if downloads takes more than a pre-defined
amount of time but for now it is more helpful as is to help debug
network problems in the current code.
The method actually calls peek, not read. A new method will be added
that uses read and does consume the reply buffer data.
Note that this seems to reveal a problem with requests being still in
flight while the page is considered loaded which might break some script
that relied on the previously broken behavior. Will fix it in an
upcoming merge request.
Also read files in binary mode as this is the expected behavior for
this kind of HTTP transfers.
Behave more like a real browser and only care about text/* Content-Type
when reading content to encode it properly.

Other content is now intended to be available as bytes. Update unittests
to reflect this. Fixes tests under PyQt4 as well.
As written at [1], this might be a cause for the segfaults observed at
interpreter shutdown time.

[1] http://enki-editor.org/2014/08/23/Pyqt_mem_mgmt.html
Also change the generic Exception by the more specific RuntimeError.
Just call in QT event processing and avoid unneeded sleep time.
Reduce time spent just sleeping and allow more QT event processing to
happen according to actual time value passed to sleep and wait_for.
Because super is super.
Most of the time, QtWebkit emits pageLoaded when all resources are
indeed loaded, however when downloading a file directly for example,
the signal is emitted even though content is still flowing down.

Keeping a registry allows delaying closing the session until all
requests created during the session are indeed complete.
Cannot get my mind around this problem so implement a workaround for now.
With all signals properly connected, I could not find a reason to keep this around.
@EvaSDK EvaSDK force-pushed the page_loaded_before_all_requests_done branch from 6a031c5 to a34ae95 Compare October 4, 2017 16:52
@EvaSDK EvaSDK closed this Oct 5, 2017
@EvaSDK EvaSDK deleted the page_loaded_before_all_requests_done branch October 5, 2017 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants