How is the list of URLs generated? #17

grigri9 · 2019-08-14T20:24:34Z

First off, this is awesome and I just wanted to say thank you for keeping all this up to date!

Is there some kind of automated process for generating the list of URLs?

It looks like this is pulling from all http://resources.docs.salesforce.com/* URL paths.

I was thinking the https://www.salesforce.com/content/dam/web/en_us/www/documents/ URL path also has a good amount of useful content. There are sales pdfs in there but also whitepapers, datasheets and similar items that are very useful.

If this list of URLs is being generated by a google custom search engine or something similar it may be worthwhile to add that domain.

richardvanhook · 2019-08-15T13:26:22Z

Some basic shell scripting and crawling, but also significantly manual. :-(

Would love to expand it but unfortunately I'm time constrained at the moment with my current customer. Will leave this open as a future reminder.

mattandneil · 2020-01-22T02:31:12Z

These steps can be mechanical, here's an example that yields about 150 PDF files:

Web search using the site operator (use the option with omitted results included)

google.com/search?q=site:https://resources.docs.salesforce.com/sfdc/pdf&filter=0

Log the hyperlinks to console, for copying and pasting to a shell script

var h3s = document.getElementsByClassName('LC20lb')
for (var i in h3s) if (h3s.hasOwnProperty(i))
console.log(h3s[i].parentNode.getAttribute('href'));

Next page, rinse and repeat, solve any CAPTCHA etc...

It finds PDF resources that have been linked on the public internet eg from help files and articles. However, many files in the catalog have zero backlinks and tend to disappear as the docs change over time. An attempt is also made to link retired files at their final version by linking to the specific release number.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is the list of URLs generated? #17

How is the list of URLs generated? #17

grigri9 commented Aug 14, 2019

richardvanhook commented Aug 15, 2019

mattandneil commented Jan 22, 2020 •

edited

Loading

How is the list of URLs generated? #17

How is the list of URLs generated? #17

Comments

grigri9 commented Aug 14, 2019

richardvanhook commented Aug 15, 2019

mattandneil commented Jan 22, 2020 • edited Loading

mattandneil commented Jan 22, 2020 •

edited

Loading