-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to download inline links directly after their parent page when downloading recursively #2
base: master
Are you sure you want to change the base?
Conversation
dab4254
to
ff0ccd6
Compare
prepend non-HTML links
k i get it. thats a better solution. i pushed it to #2 bandwidth
low bandwidth makes this patch more important because low bandwidth increase the time between parent and child download and most pages should support low bandwidth browsing. f.e. imagevenue probably have a long enough timeout to allow img.php to load its image with low bandwidth too |
217b8c6
to
3a053f9
Compare
git patch
|
make it optional
i can add that. what do u wanna call the option? |
3a053f9
to
b0c5beb
Compare
make it optional
k it's done https://github.com/mirror/wget/pull/2/files debug message
i changed the DEBUGP that say Enqueuing to Appending/Prepending to tell whats happening test
there's a test in #2 (comment) /test that compare them i release the copyright
sounds like a lot of work. i release the copyright. i dont care |
bf85000
to
873ee10
Compare
64341d9
to
a552ea5
Compare
9380c61
to
680d9c5
Compare
git patch
its in https://github.com/mirror/wget/pull/2.patch testenv
i added a test that fail unless the browser queue order is used |
680d9c5
to
5cb85cf
Compare
wget it
u can download it with wget
|
attach it
k i attached it |
5cb85cf
to
da3bbb3
Compare
c3e1e7b
to
a1ba90d
Compare
it's supposed to test the element order but symmetric_difference ignore that
…when downloading recursively because it's more likely to download temporary inline links before they expire because it's more similar to the browsing experience
a1ba90d
to
f4bd594
Compare
because it's more likely to download temporary links before they expire because it's more similar to the browsing experience
difference from browsing experience cause problem
the problem is described in #1
discarded LIFO solution
a convoluted LIFO solution with a similar result (the difference is that LIFO result in a bottom-to-top download order) is in #1
non-inline links aren't prepended
ATTR_HTML links are never prepended and might contain non-html files that aren't downloaded directly after its parent page even with browser queue type
if these cause a problem an option to read the header Content-Type of the link before enqueuing could be added
test
testenv
this check the link order
changing browser to fifo in Test--spider-r-browser.py fail the test
download order
this show the download order for the test page in #1 /test
the current code sometimes download links long after its parent page (see #1 /test)
this patch download links directly after its parent page