A Node app to crawls a given web site.
npm install -g console-crawler;
console-crawler http://en.wikipedia.org/ --legs=8
console-crawler http://en.wikipedia.org/ --legs=2 --phantom
- This is a Node app, so you'll need node/npm to run it.
- Clone down the repo
- Install the dependencies
npm install
. - Fire up the crawler.
git clone https://github.com/robcolburn/console-crawler;
cd console-crawler;
npm install;
./console-crawler.js http://en.wikipedia.org/ --legs=8;
-
On Mac, you'll likely need X-Code Command Line tools installed.
-
If you'd like to use PhantomJS. You'll need to download PhatomJS, and install it separately since it has it's own binary.
-
If you need target a different "Host", you may just need to edit your hosts file. For instance, say I wanted to hit 5.5.5.5, but with the host of example.com which isn't ready to go live just yet. I might add the following to my hosts file.
5.5.5.5 example.com