Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different result when using API and Local #245

Open
KevinG-Acc opened this issue Oct 13, 2022 · 0 comments
Open

Different result when using API and Local #245

KevinG-Acc opened this issue Oct 13, 2022 · 0 comments

Comments

@KevinG-Acc
Copy link

Hello ! First of all thanks for you work !

I have an issue. I do use this package for analyzing webpage and i have different results when using web API and local.

Here is the running code.

const validator = require('html-validator')
const puppeteer = require('puppeteer')

class W3cService {}

/**
 * This function run the W3C Validator by calling the W3C API.
 * @param {Array} urlsList is a list of urls that needs to be analysed by W3C HTML Validator
 * @returns a list of HTML issues
 */
W3cService.prototype.w3cAnalysisWithAPI = async function (urlsList) {
  const resultForUrlsList = []
  try {
    for (const url of urlsList) {
      const options = {
        url

      }
      console.log(`W3C ANALYSIS : launching analyse for ${url} `)
      const resultForUrl = await validator(options)
      if (resultForUrl.messages[0].type === 'non-document-error') {
        console.error('\x1b[31m%s\x1b[0m', `W3C ANALYSIS : URL ${url} cannot be found`)
        console.log('\x1b[31m%s\x1b[0m', `W3C ANALYSIS : ${url} has been removed from result `)
      } else {
        resultForUrlsList.push(resultForUrl)
        console.log(`W3C ANALYSIS : Analyse ended for ${url} `)
      }
    }
    return resultForUrlsList
  } catch (error) {
    console.error('\x1b[31m%s\x1b[0m', error.message)
  }
}

/**
 * This function run the W3C Validator locally by setting the validator to WHATWG. It needs no external connection to the W3C API.
 * @param {Array} urlsList is a list of urls that needs to be analysed by W3C HTML Validator
 * @returns a list of HTML issues
 */
W3cService.prototype.w3cAnalysisLocal = async function (urlsList) {
  // Initializing variables
  const resultForUrlsList = []
  const htmlResults = []

  // Browser arguments
  const browserArgs = [
    '--no-sandbox', // can't run inside docker without
    '--disable-setuid-sandbox', // but security issues
    '--ignore-certificate-errors'
  ]

  // Starting the browser
  const browser = await puppeteer.launch({
    headless: true,
    args: browserArgs,
    ignoreHTTPSErrors: true,
    // Keep gpu horsepower in headless
    ignoreDefaultArgs: [
      '--disable-gpu'
    ]
  })

  try {
    // Extracting the HTML content of each url with pupeteer
    for (const url of urlsList) {
      const page = await browser.newPage()

      try {
        await page.goto(url, { timeout: 0, waitUntil: 'networkidle2' })
        // const html = await page.content()
        const html = await page.evaluate(() => document.querySelector('*').outerHTML)
        console.log(typeof html)

        htmlResults.push({ url, html })
      } catch {
        console.error('\x1b[31m%s\x1b[0m', `W3C ANALYSIS : URL ${url} cannot be found`)
        console.log('\x1b[31m%s\x1b[0m', `W3C ANALYSIS : ${url} has been removed from result `)
      }
      await page.close()
    }
    await browser.close()

    // console.info(htmlResults)
    // Analysing the HTML with W3C
    for (const htmlResult of htmlResults) {
      const options = {
        validator: 'WHATWG',
        // data: htmlResult.html
        data: htmlResult.html,
        isFragment: false
      }
      console.log(`W3C ANALYSIS : launching analyse for ${htmlResult.url} `)
      const resultForHtml = await validator(options)
      resultForUrlsList.push(htmlResult.url, resultForHtml)
      console.log(`W3C ANALYSIS : Analyse ended for ${htmlResult.url} `)
    }

    return resultForUrlsList
  } catch (error) {
    console.error('\x1b[31m%s\x1b[0m', error)
  }
}
const w3cService = new W3cService()
module.exports = w3cService

When i try to use it on our website (https://ecosonar.org) the result with the first function (with API) is :


[
    {
        "url": "https://ecosonar.org",
        "messages": [
            {
                "type": "info",
                "lastLine": 1,
                "lastColumn": 123,
                "firstColumn": 101,
                "message": "Trailing slash on void elements has no effect and interacts badly with unquoted attribute values.",
                "extract": "Logo.ico\"><meta charset=\"utf-8\"/><meta ",
                "hiliteStart": 10,
                "hiliteLength": 23
            },
            {
                "type": "info",
                "lastLine": 1,
                "lastColumn": 191,
                "firstColumn": 124,
                "message": "Trailing slash on void elements has no effect and interacts badly with unquoted attribute values.",
                "extract": "=\"utf-8\"/><meta name=\"viewport\" content=\"width=device-width,initial-scale=1\"/><meta ",
                "hiliteStart": 10,
                "hiliteLength": 68
            },
            {
                "type": "info",
                "lastLine": 1,
                "lastColumn": 235,
                "firstColumn": 192,
                "message": "Trailing slash on void elements has no effect and interacts badly with unquoted attribute values.",
                "extract": "scale=1\"/><meta name=\"theme-color\" content=\"#ffffff\"/><meta ",
                "hiliteStart": 10,
                "hiliteLength": 44
            },
            {
                "type": "info",
                "lastLine": 1,
                "lastColumn": 363,
                "firstColumn": 236,
                "message": "Trailing slash on void elements has no effect and interacts badly with unquoted attribute values.",
                "extract": "#ffffff\"/><meta name=\"description\" content=\"EcoSonar the ecodesign and accessibility audit tool to minimize web carbon footprint easily\"/><meta ",
                "hiliteStart": 10,
                "hiliteLength": 128
            },
            {
                "type": "error",
                "lastLine": 1,
                "lastColumn": 461,
                "firstColumn": 364,
                "message": "Bad value “Cache-Control” for attribute “http-equiv” on element “meta”.",
                "extract": " easily\"/><meta http-equiv=\"Cache-Control\" content=\"max-age: 31536000, no-cache, no-store, must-revalidate\"><meta ",
                "hiliteStart": 10,
                "hiliteLength": 98
            },
            {
                "type": "error",
                "lastLine": 1,
                "lastColumn": 506,
                "firstColumn": 462,
                "message": "Bad value “Pragma” for attribute “http-equiv” on element “meta”.",
                "extract": "validate\"><meta http-equiv=\"Pragma\" content=\"no-cache\"><meta ",
                "hiliteStart": 10,
                "hiliteLength": 45
            },
            {
                "type": "error",
                "lastLine": 1,
                "lastColumn": 545,
                "firstColumn": 507,
                "message": "Bad value “Expires” for attribute “http-equiv” on element “meta”.",
                "extract": "no-cache\"><meta http-equiv=\"Expires\" content=\"0\"><link ",
                "hiliteStart": 10,
                "hiliteLength": 39
            },
            {
                "type": "info",
                "lastLine": 1,
                "lastColumn": 589,
                "firstColumn": 546,
                "message": "Trailing slash on void elements has no effect and interacts badly with unquoted attribute values.",
                "extract": "ntent=\"0\"><link rel=\"manifest\" href=\"/manifest.json\"/><title",
                "hiliteStart": 10,
                "hiliteLength": 44
            }
        ]
    }
]

Ruuning it locally need to use pupeteer to extract the HTML (can't use Axios since the page is using javascript and axios does not manage to extract html with javascript enabled). So i got the following HTML after line 80 :

<!DOCTYPE html><html lang="en" style="height:100%"><head><link rel="icon" href="./EcoSonarLogo.ico"><meta charset="utf-8"><meta name="viewport" content="width=device-width,initial-scale=1"><meta name="theme-color" content="#ffffff"><meta name="description" content="EcoSonar the ecodesign and accessibility audit tool to minimize web carbon footprint easily"><meta http-equiv="Cache-Control" content="max-age: 31536000, no-cache, no-store, must-revalidate"><meta http-equiv="Pragma" content="no-cache"><meta http-equiv="Expires" content="0"><link rel="manifest" href="/manifest.json"><title>EcoSonar</title><script defer="defer" src="/static/js/main.0b151232.js"></script><link href="/static/css/main.1a8cec83.css" rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="root"><div><nav class="navbar"><div class="container-fluid"><div><a href="/"><img src="/static/media/EcoSonarLogo.d350d54b80ed2014c867c2342b6c9d6a.svg" class="home-logo" alt="go to home page"></a></div><ul class="nav navbar-nav navbar-right margin-right row"><li><a class="active-tab column" href="/">Home</a></li><li><a class="navigation column" href="/how-it-works">How it works</a></li><li><a class="navigation column" href="/best-practices">Best practices</a></li><li><a class="navigation column" href="/who-are-we">Who are we?</a></li></ul></div></nav></div><div><div class="banner-home"><div class="title-introduction"><p class="banner banner-title no-margin">EcoSonar</p><div class="banner-flex-title"><p class="banner banner-subtitle no-margin">Eco-design audit tool to<br><span class="banner"> minimize web carbon footprint</span> easily</p></div><form action="https://github.com/Accenture/EcoSonar" method="get" target="_blank"><button aria-label="redirection to the github of ecosonar" class="btn-access-tool" type="submit">Get started</button></form></div></div><div class="key-figures"><p class="title key-figures-text">Why promoting <b>eco-design?</b></p><div class="row"><div class="card-home card-panel"><img class="icon" alt="pie icon" src="/static/media/Co2.a6febba71564d8f092616c3515925fb3.svg"><p class="text-key">Web creation represents</p><p class="key-number">4%</p><p class="text-key">Share of digital in global carbon emission</p></div><div class="card-home card-panel"><img class="icon" alt="pie icon" src="/static/media/Pie.01d562122d2628fbe2cf001eae698d60.svg"><p class="text-key">Digital carbon footprint increases</p><p class="key-number">8%</p><p class="text-key">per year</p></div><div class="card-home card-panel"><img class="icon" alt="pie icon" src="/static/media/Bubble.fca120bc18a4dec1a29d7243872e6a43.svg"><p class="key-number">55%</p><p class="text-key">Comes from usages, driven by developers</p></div><div class="card-home card-panel"><img class="icon" alt="pie icon" src="/static/media/Server.9e31c57ed6355116b0776a5c2f445805.svg"><p class="key-number">45 million</p><p class="text-key">operative servers around the world</p></div></div></div><div class="objectives-ecosonar"><p class="title section-title"><b>EcoSonar, the eco-design audit tool</b></p><div class="row content"><div class="objectives"><div><p class="sub-title">Our <b>objectives</b></p><div class="objective"><img alt="" src="/static/media/FlowerIcon.5f66f28ce01101fee9f34283b3b033c3.svg"><p class="objective-text"><span>Raising awareness</span> to environmental issues and eco-design practices</p></div><div class="objective"><img alt="" src="/static/media/FlowerIcon.5f66f28ce01101fee9f34283b3b033c3.svg" class="img-rotate"><p class="objective-text">Measuring the <span>carbon impact</span> across digital services</p></div><div class="objective"><img alt="" src="/static/media/FlowerIcon.5f66f28ce01101fee9f34283b3b033c3.svg"><p class="objective-text">Proving that <span>web-based applications can have minimal carbon footprint</span></p></div><div class="objective"><img alt="" src="/static/media/FlowerIcon.5f66f28ce01101fee9f34283b3b033c3.svg" class="img-rotate"><p class="objective-text">Get an <span>environmental &amp; performance</span> monitoring solution</p></div></div><form action="https://github.com/Accenture/EcoSonar/blob/main/USER_GUIDE.md" method="get" target="_blank"><button aria-label="redirection to the EcoSonar user guide" type="submit">User guide</button></form></div><div class="youtube-video"><a href="https://www.youtube.com/watch?v=DoAoMxHIYAE" aria-label="EcoSonar Presentation Video" target="_blank" class="youlazy init" data-w="480" data-h="270" data-start="100" rel="noreferrer" data-id="DoAoMxHIYAE" data-title="EcoSonar Presentation Video"><div class="iframe-wrapper"></div></a></div></div></div><div class="solutions-differentiators"><img alt="" src="/static/media/FlowerWhite1.719dce6d64230f822d1407960a4d3450.svg" class="img-flower1"><div class="row"><div class="text-solutions-differentiators"><div class="column"><p class="title">About <b>EcoSonar</b></p><div class="check-solutions-differentiators"><p><img src="/static/media/Check.9297a3144c66b8c355bb92eb0dcadeed.svg" alt="">Open source</p><p><img src="/static/media/Check.9297a3144c66b8c355bb92eb0dcadeed.svg" alt="">Integrated into CI/CD pipelines: through Sonarqube or API Request</p><p><img src="/static/media/Check.9297a3144c66b8c355bb92eb0dcadeed.svg" alt="">Based on standard, recognized and open-source audits</p><p><img src="/static/media/Check.9297a3144c66b8c355bb92eb0dcadeed.svg" alt="">Decision support to help delivery teams reduce efficiently environmental impact</p></div></div></div></div></div><div class="contribute-project"><div class="contribute-block"><p class="title"><b>Contribute</b> to the project</p><div><p class="content">EcoSonar is a young tool that will be <span class="green-text">constantly evolving</span> and adapting the documentation to the latest technologies and solutions. Keeping this release pace is hard but <span> with your help we can build features faster</span> and make our team grow.</p><p class="content">Contributing to the project will help <span class="green-text">eco-design practices to be in the mainstream landscape</span> of website creation.</p></div><div class="contribute-buttons"><form action="https://github.com/Accenture/EcoSonar/blob/main/CONTRIBUTING.md" method="get" target="_blank"><button aria-label="redirection to contributing" class="contained" type="submit">Make a contribution</button></form><form action="https://github.com/Accenture/EcoSonar" method="get" target="_blank"><button class="outlined" aria-label="redirection to github repo" type="submit">Go to our Github</button></form></div></div><div class="img-flower-right"><img alt="" src="/static/media/FlowerWhite2.0a9250b57485868c0cc2d0b3c01ce458.svg" class="img-flower2"></div></div><div class="news"><p class="title">Latest<b> News</b></p><div class="row center"><div class="news-card-home card-panel"><p class="date-news-card">October 4th 2022</p><p class="title-news-card">EcoSonar V2.3</p><hr class="divider"><p class="desc-news-card">Integration of crawler to set up automatically your project configuration, filter your recommendations from a page or website perspective, login to your website and few bugs fixes</p></div><div class="news-card-home card-panel"><p class="date-news-card">September 26th 2022</p><p class="title-news-card">ecosonar.org</p><hr class="divider"><p class="desc-news-card">Release of EcoSonar Website : ecosonar.org</p></div><div class="news-card-home card-panel"><p class="date-news-card">July 13th 2022</p><p class="title-news-card">EcoSonar V2</p><hr class="divider"><p class="desc-news-card">Integration of Google Lighthouse audits with 24 new ecodesign and 44 accessibility audits</p></div><div class="news-card-home card-panel"><p class="date-news-card">April 12th 2022</p><p class="title-news-card">EcoSonar Release</p><hr class="divider"><p class="desc-news-card">First Release of EcoSonar with integration of 24 ecodesign best practices</p></div></div></div></div><footer class="footer"><div class="footer-display"><p class="no-margin">© 2022 EcoSonar | All rights reserved</p></div><div class="footer-display"><p class="no-margin small-font"><a class="footer-legal-issues" href="/legal-issues">Legal issues</a> | Powered by<img src="/static/media/AccentureLogo.629c0413841161923ef68240d1aad60d.svg" alt="Accenture logo" class="logo-accenture"></p></div></footer></div></body></html>

Wich obviusly contains DOCTYPE, meta etc etc BUT when running the analysis with WHATWG it doesn't detect any issue.

image

Is WHATWG waiting for some specific type (actually it's a string) or does the rules are differents from the API?

Thanks a lot !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant