Wappalyzer-puppeteer is a simple library built on top of Wappalyzer dataset that uncovers the technologies used on websites.
Wappalyzer uses zombie which can handle most websites, but fails on complex / large / js heavy sites.
Since puppeteer is stable and available for many years now, let's use the best of both words, a real browser and Wappalyzer's dataset.
The internal logic is rewritten from scratch, since the original Wappalyzer code has a lot of Promises, on-the-fly regex parsing.
$ npm i -g wappalyzer-puppeteer # Globally
$ npm i wappalyzer-puppeteer --save # As a dependency
There are three main dependencies for this project:
- Wappalyzer - for apps.json only
- puppeteer-cluster
- puppeteer
wappalyzer [url] [options]
--max-wait=ms Wait no more than ms milliseconds for page resources to load.
--user-agent=str Set the user agent string.
const { AppAnalytics, PuppeteerCluster, Cluster } = require('wappalyzer-puppeteer');
const url = 'https://www.wappalyzer.com';
const options = {
maxWait: 5000,
userAgent: 'Wappalyzer',
// puppeteerClusterOptions is passed to puppeteer-cluster
// More options here: https://github.com/thomasdondorf/puppeteer-cluster
puppeteerClusterOptions: {
concurrency: Cluster.CONCURRENCY_CONTEXT,
maxConcurrency: 2,
puppeteerOptions: {
headless: true,
ignoreHTTPSErrors: true
}
}
};
const appAnalytics = new AppAnalytics();
const wappalyzer = new PuppeteerCluster(appAnalytics, options);
// Load apps.json (you can provide your own json file as well)
appAnalytics
.loadAppsjson()
// start the puppeteer cluster
.then(() => wappalyzer.startCluster())
// queue an url and wait for the result
.then(() => wappalyzer.analyze(url))
// do whatever you want with the result
.then(json => {
process.stdout.write(`${JSON.stringify(json)}\n`);
})
// close the cluster
.then(() => wappalyzer.closeCluster())
.catch(error => {
process.stderr.write(`${error}\n`);
process.exit(1);
});