Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning "[]%" for some domains #6

Open
dzoba opened this issue Jul 30, 2014 · 4 comments
Open

Returning "[]%" for some domains #6

dzoba opened this issue Jul 30, 2014 · 4 comments

Comments

@dzoba
Copy link

dzoba commented Jul 30, 2014

➜  crawl git:(master) ✗ crawl -v https://creativelive.com
[]%

I've confirmed it does work on some domains, and doesn't work on others. Above is the output for a domain it breaks with.

@heathdutton
Copy link

I've been noticing this too when trying to crawl a site that is hosted locally. Any idea why?

@markuspfeifenberger
Copy link

Same for me: https://franks-travelbox.com - I think that crawl is not supporting HTTPS protocol. I wish this would be possible to have HTTPS support in near future ;-)

@heathdutton
Copy link

Agreed, it's the https protocol apparently.

@markuspfeifenberger
Copy link

Here is the soltution, to be changed in lib/crawler.js line 63:

    if (urlParts.protocol == "https:" || urlParts.protocol == "http:") {

    var protocol = urlParts.protocol.substr(0, urlParts.protocol.length - 1);
    var port = urlParts.port ? urlParts.port : 80,
        siteCrawler = new Crawler(urlParts.hostname, urlParts.path, port);

    // overrule port for https
    if ("https" == protocol && 80 == port) {
      port = 443;
    }

    // configure crawler
    siteCrawler.interval = 10;
    siteCrawler.maxConcurrency = 10;
    siteCrawler.scanSubdomains = true;
    siteCrawler.downloadUnsupported = false;
    // this is needed for https
    siteCrawler.initialProtocol = protocol;
    siteCrawler.initialPort = port;

@onlineth onlineth mentioned this issue Jun 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants