Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queued URLs Not Executing loadPage Method After cluster.queue is Called #547

Open
kewang opened this issue Oct 9, 2024 · 1 comment
Open

Comments

@kewang
Copy link

kewang commented Oct 9, 2024

Hi @thomasdondorf,

Below is my code. I am currently using puppeteer-cluster to implement a prerendering feature, but I often encounter an issue where a URL has been passed in through ExpressJS, and cluster.queue is executed, but the loadPage method is not triggered for a long time. On average, the render time is around 3000ms, and there are about 5-10 requests per minute. However, there are always some URLs that are already queued with cluster.queue but remain unprocessed, even though the cluster is in an idle state.

I was originally using cluster.execute to handle the requests, but after reading #481, I switched to using cluster.queue, which seems to be the correct approach. Unfortunately, the issue still persists, and I am unsure how to resolve it.

const browserCluster = await launchCluster();

const loadPage = async ({ page, data: url }) => {
  console.log(`rendered url: ${url}`);

  let response;

  try {
    response = await page.goto(url, {
      waitUntil: "networkidle2",
    });

    if (!response) {
      throw new Error("response is null");
    }
  } catch (error) {
    console.error(`[PUPPETEER-CLUSTER] ${url} ${error}`);

    return res.sendStatus(500);
  }

  const content = await page.content();

  console.log(`[PUPPETEER-CLUSTER] Retrieve ${url}`);

  return res.status(response.status()).send(content);
};

router.get("/render", async (req, res) => {
  const url = req.query.url;

  console.log(`queue url: ${url}`);

  browserCluster.queue(url, loadPage);
});
@sch-28
Copy link

sch-28 commented Jan 8, 2025

I had the same issue, sometimes requests were not being handled for ~15s. For some reason Date.now() drifts randomly in the Cluster.work() function.

I think I fixed it for my case by replacing it with performance.now()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants