You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to grab all the URLs in this search result, but the extension only grabs some of the links despite me loading way more than that into the browser. The filter used is https://archive.org/details/. I scrolled down and loaded the subsequent pages that way, but it seems to only grab what is roughly on screen instead of the entire list (around 177 URLs). You can test this yourself by scrolling down and letting it load more pages, and then searching for a URL from the top of the search results (which won't appear in the extracted list).
There is no limit to the number of links that can be extracted. While there may be some upper limit that I have not tested, I have tested over 2,000 links when I added datatables.
On further review, it seems that archive.org is adding/removing nodes as you scroll, regardless of direction. You can watch in the network tab in developer tools (Ctrl+Shift+I). Nodes are being added when you scroll down or up. It seems the maximum number of links it shows at one time is 300.
That being said, I can possibly see a feature request out of this. Possibly you could click Start Collecting and the extension could start collecting links in the current tab, then when your done navigating the in the current tab, click Stop Collecting at which point it could open the results of all collected links.
This would require a MutationObserver to listen for added elements, and while that is non-trivial, I would have to figure out how to get it to work on all shadowRoots (like those used by archive.org). This feature request may shit on the back burner for a while, but let me know your thoughts...
I need the MutationObserver for another feature I want to implement, Live Links where it will always show the number of links on the page in the toolbar icon, and optionally extract to a sidebar that can also auto-update with collected links, but, this is way on the back burner. The MutationObserver is a good start tho.
I see, thanks for the explanation. I figured something weird was up, since this is the only extension of this type that actually manages to grab any URLs at all (I suspect it's because how IA truncates their URLs into h4 headers). Oh well, for now the alternate method I use it to load all the cover thumbnails and then replace the URLs in the network tab. So while that feature would definitely be useful, since I doubt I'm the only one who wants to do something like this, I'm not in any particular rush. Thanks.
Site Link
https://archive.org/details/software?tab=collection&query=-subject%3A%28ps2%29+-subject%3A%28ps1%29+-subject%3A%28sega+saturn%29+-subject%3A%28ps3%29&page=7&sort=-publicdate&and%5B%5D=subject%3A%22PC+Game%22&and%5B%5D=subject%3A%22PC-98%22&and%5B%5D=subject%3A%22IBM+PC%22&and%5B%5D=subject%3A%22macintosh%22&and%5B%5D=subject%3A%22IBM+PC+Compatible%22&and%5B%5D=subject%3A%22mac%22&and%5B%5D=subject%3A%22Doujin%22&and%5B%5D=subject%3A%22Doujin+Games%22&and%5B%5D=subject%3A%22doujin+soft%22&and%5B%5D=subject%3A%22Doujin+games%22&and%5B%5D=subject%3A%22doujin+games%22&and%5B%5D=subject%3A%22Doujin+Game%22&and%5B%5D=subject%3A%22Doujin+game%22&and%5B%5D=subject%3A%22doujin+game%22&and%5B%5D=subject%3A%22doujin%22&and%5B%5D=mediatype%3A%22software%22&and%5B%5D=language%3A%22Japanese%22
Details
I'm trying to grab all the URLs in this search result, but the extension only grabs some of the links despite me loading way more than that into the browser. The filter used is
https://archive.org/details/
. I scrolled down and loaded the subsequent pages that way, but it seems to only grab what is roughly on screen instead of the entire list (around 177 URLs). You can test this yourself by scrolling down and letting it load more pages, and then searching for a URL from the top of the search results (which won't appear in the extracted list).Support Information
The text was updated successfully, but these errors were encountered: