Page loaded before all requests done #319

EvaSDK · 2016-12-22T17:45:50Z

I was trying to compute checksum of resources downloaded by Ghost.py but some sites stream their content. This is not a problem for regular web content but file that go through unsupported_content do not behave properly.

One resource is created for each chunk of the file received which obvsiouly does not help in any way with getting the complete content that we can expect. The fix here is to just let NetworkAccessManager do its job.

The second problem that was detected with this case is that QtWebKit emits pageLoaded even though the QNetworkReply object in charge of downloading the file is still running. Hence keep a registry of in-flight requests and only allow wait_for_page_loaded to return when both pageLoaded has been emited and no queries are still running.

This is basically what was reported in PR #265 hopefully with a clearer wording.

EvaSDK · 2016-12-27T09:53:22Z

On the topic of in-flight requests tracking, see also PR #274. It is imho better to handle the registry in one place and not mix registry handling code between NetworkAccessManager and Session.

EvaSDK · 2016-12-28T11:00:28Z

After more work on the topic, it appears my fix for unsupported content is wrong.

The last commit I added tried to fix normal content downloading by removing the replyReadyRead function because this function accumulated excessive data in reply.data for unsupported content.

The reason for this is that there is most likely another callback connected to this slot that reads the reply's buffer for normal content but not for unsupported one.

I'll push an update shortly.

EvaSDK · 2016-12-28T13:55:23Z

Everything should be fine now.

EvaSDK · 2017-02-06T14:00:18Z

Would it be possible to get a feedback on this PR ? Is it of any interest to you ?

jeanphix · 2017-05-05T05:54:17Z

@EvaSDK Could be nice to get this one rebased on dev?

EvaSDK · 2017-05-05T19:03:25Z

Actually I have been trying to cleanup that branch a couple of times already but with dev now being python3 only, it is unlikely that I'll work on merging it as the production I run is still using python 2 and this is what I am targeting for the coming weeks.

jeanphix · 2017-05-05T19:27:32Z

@EvaSDK As PySide2 supports python2, we could probably revert this choice.

Add missing Mozilla/ Signed-off-by: Gilles Dartiguelongue <[email protected]>

Signed-off-by: Gilles Dartiguelongue <[email protected]>

Keep version introspectable while avoiding ImportError when dealing with setup.py.

Closes jeanphix#271. Refs jeanphix#268 and jeanphix#269.

Unsupported content goes through NetworkAccessManager as well, no need to make it special for downloading.

Some responses take a while to download so have some logs to see what is going on. This code should probably be enhanced to skip small downloads and or start emitting logs if downloads takes more than a pre-defined amount of time but for now it is more helpful as is to help debug network problems in the current code.

The method actually calls peek, not read. A new method will be added that uses read and does consume the reply buffer data.

Note that this seems to reveal a problem with requests being still in flight while the page is considered loaded which might break some script that relied on the previously broken behavior. Will fix it in an upcoming merge request.

Also read files in binary mode as this is the expected behavior for this kind of HTTP transfers.

…tions

Behave more like a real browser and only care about text/* Content-Type when reading content to encode it properly. Other content is now intended to be available as bytes. Update unittests to reflect this. Fixes tests under PyQt4 as well.

As written at [1], this might be a cause for the segfaults observed at interpreter shutdown time. [1] http://enki-editor.org/2014/08/23/Pyqt_mem_mgmt.html

Also change the generic Exception by the more specific RuntimeError.

Just call in QT event processing and avoid unneeded sleep time.

Reduce time spent just sleeping and allow more QT event processing to happen according to actual time value passed to sleep and wait_for.

Because super is super.

Most of the time, QtWebkit emits pageLoaded when all resources are indeed loaded, however when downloading a file directly for example, the signal is emitted even though content is still flowing down. Keeping a registry allows delaying closing the session until all requests created during the session are indeed complete.

Cannot get my mind around this problem so implement a workaround for now.

With all signals properly connected, I could not find a reason to keep this around.

EvaSDK force-pushed the page_loaded_before_all_requests_done branch from 0a1da28 to ddea4b3 Compare December 28, 2016 13:14

EvaSDK force-pushed the page_loaded_before_all_requests_done branch from 3dfa605 to 6a031c5 Compare February 20, 2017 14:41

EvaSDK and others added 21 commits September 22, 2017 11:20

Fix rendering of URLs in logging messages

43fdc8b

Log a message when the QT application is stopping

8a60682

Enable coverage reports from unittests run

a98e18d

Use a modern defaut user agent

b6de6b3

Add missing Mozilla/ Signed-off-by: Gilles Dartiguelongue <[email protected]>

Use xvfbwrapper

e9506a4

Unlock xvfbwrapper dependency to compatible releases

ab72360

Added display server debugging messages

b84d8b0

Signed-off-by: Gilles Dartiguelongue <[email protected]>

Enhance display server handling log messages

0e7faef

Read version without importing the module

1edfb45

Keep version introspectable while avoiding ImportError when dealing with setup.py.

Use isort for consistent import sorting

11ecf2c

Remove outdated note about pyside_postinstall script

0d5c71c

Closes jeanphix#271. Refs jeanphix#268 and jeanphix#269.

Leave NetworkAccessManager do its job

167316b

Unsupported content goes through NetworkAccessManager as well, no need to make it special for downloading.

Rename replyReadyRead callback and add docstring

eb0f4b6

The method actually calls peek, not read. A new method will be added that uses read and does consume the reply buffer data.

Add callback to consume data for unsupported content download

b0e5fb4

Reduce py2/3 difference when dealing with files

efced48

Also read files in binary mode as this is the expected behavior for this kind of HTTP transfers.

Add fonction to deal with varying Qt bindings bytes/string representa…

32534db

…tions

Restrict content encoding to text types

2861838

Behave more like a real browser and only care about text/* Content-Type when reading content to encode it properly. Other content is now intended to be available as bytes. Update unittests to reflect this. Fixes tests under PyQt4 as well.

Do not use lambda to declare signal callbacks

00acecd

As written at [1], this might be a cause for the segfaults observed at interpreter shutdown time. [1] http://enki-editor.org/2014/08/23/Pyqt_mem_mgmt.html

Merge branch 'content-encoding-and-pyqt4' into ghost-py-0-2

020d881

EvaSDK added 13 commits September 27, 2017 11:47

Fix calls to Ghost.__del__ on missing binding condition

e37a863

Also change the generic Exception by the more specific RuntimeError.

Actually log application stop if it is started

d2c8da0

Do not use sleep unnecessarily

bf614e5

Just call in QT event processing and avoid unneeded sleep time.

Increase reactivity

b64693b

Reduce time spent just sleeping and allow more QT event processing to happen according to actual time value passed to sleep and wait_for.

Add missing tox.ini to MANIFEST.in

92c48b6

Merge branch 'misc-fixes' into ghost-py-0-2

16aa560

Use super() in NetworkAccessManager

dd5f59f

Because super is super.

Do not error out when receiving QNetworkReply two times

e2ae25a

Log a message when a QtNetworkReply errors out

2cb9433

Handle QNetworkReply not sending required signals

e9d8aff

Cannot get my mind around this problem so implement a workaround for now.

Remove unneeded sleep

f45e897

With all signals properly connected, I could not find a reason to keep this around.

Log in-flight requests count when timeout occurs

a34ae95

EvaSDK force-pushed the page_loaded_before_all_requests_done branch from 6a031c5 to a34ae95 Compare October 4, 2017 16:52

EvaSDK closed this Oct 5, 2017

EvaSDK deleted the page_loaded_before_all_requests_done branch October 5, 2017 12:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Page loaded before all requests done #319

Page loaded before all requests done #319

EvaSDK commented Dec 22, 2016

EvaSDK commented Dec 27, 2016

EvaSDK commented Dec 28, 2016

EvaSDK commented Dec 28, 2016

EvaSDK commented Feb 6, 2017

jeanphix commented May 5, 2017

EvaSDK commented May 5, 2017

jeanphix commented May 5, 2017

Page loaded before all requests done #319

Page loaded before all requests done #319

Conversation

EvaSDK commented Dec 22, 2016

EvaSDK commented Dec 27, 2016

EvaSDK commented Dec 28, 2016

EvaSDK commented Dec 28, 2016

EvaSDK commented Feb 6, 2017

jeanphix commented May 5, 2017

EvaSDK commented May 5, 2017

jeanphix commented May 5, 2017