Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node.js - How to scrape JSON from puppeteer? - Stack Overflow #95

Open
kemistep opened this issue Nov 17, 2020 · 0 comments
Open

node.js - How to scrape JSON from puppeteer? - Stack Overflow #95

kemistep opened this issue Nov 17, 2020 · 0 comments

Comments

@kemistep
Copy link
Contributor

I login to a site and it gives a browser cookie.

I go to a URL and it is a json response.

How do I scrape the page after entering await page.goto('blahblahblah.json'); ?

asked Jan 29 '18 at 22:54

[

](https://stackoverflow.com/users/9278676/amy-coin)

Amy CoinAmy Coin

611 gold badge1 silver badge3 bronze badges

Another way which doesn't give you intermittent issues is to evaluate the body when it becomes available and return it as JSON e.g.

const puppeteer = require('puppeteer'); 

async function run() {

    const browser = await puppeteer.launch( {
        headless: false  
    }); 

    const page = await browser.newPage(); 

    await page.goto('https://raw.githubusercontent.com/GoogleChrome/puppeteer/master/package.json');

    var content = await page.content(); 

    innerText = await page.evaluate(() =>  {
        return JSON.parse(document.querySelector("body").innerText); 
    }); 

    console.log("innerText now contains the JSON");
    console.log(innerText);

    
    

    await browser.close(); 

};

run(); 

answered Jan 30 '18 at 19:55

[

](https://stackoverflow.com/users/15410/rippo)

RippoRippo

19.9k13 gold badges67 silver badges111 bronze badges

You can intercept the network response, like this:

const puppeteer = require('puppeteer');
const fs = require('fs');
(async () => {
  const browser = await puppeteer.launch()
  const page = await browser.newPage()
  page.on('response', async response => {
    console.log('got response', response._url)
    const data = await response.buffer()
    fs.writeFileSync('/tmp/response.json', data)
  })
  await page.goto('https://raw.githubusercontent.com/GoogleChrome/puppeteer/master/package.json', {waitUntil: 'networkidle0'})
  await browser.close()
})() 

answered Jan 30 '18 at 8:21

[

](https://stackoverflow.com/users/504811/pasi)

PasiPasi

1,93212 silver badges13 bronze badges

Not the answer you're looking for? Browse other questions tagged node.js scrape puppeteer or ask your own question.


https://stackoverflow.com/questions/48511357/how-to-scrape-json-from-puppeteer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant