Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't strip index.html from external links #11527

Open
mckabue opened this issue Nov 25, 2024 · 8 comments
Open

Don't strip index.html from external links #11527

mckabue opened this issue Nov 25, 2024 · 8 comments
Assignees
Labels
bug Something isn't working websites Issues creating websites
Milestone

Comments

@mckabue
Copy link

mckabue commented Nov 25, 2024

Bug description

This issue was previously reported at #8852 but was not actually resolved. A change was made at #8853

Simply, /index.html in the following link will be replaced, which is not okay.

https://api.visitorbadge.io/api/visitors?path=https://github.com/index.html

The issue is still the same:

links[i].href = links[i].href.replace(/\/index\.html/, "/");

const links = window.document.querySelectorAll("a");
    for (let i = 0; i < links.length; i++) {
      if (links[i].href) {
        links[i].dataset.originalHref = links[i].href;
        links[i].href = links[i].href.replace(/\/index\.html/, "/");
      }
    }

There are two issues:

  • .replace(/\/index\.html/, "/") replaces /index.html anywhere in the link, not just the end of the string
"https://github.com/index.html#diff".replace(/\/index\.html/, "/") => 'https://github.com/#diff'
"https://github.com/index.html/diff".replace(/\/index\.html/, "/") => 'https://github.com//diff'
  • const links = window.document.querySelectorAll("a"); selects all links. It should probably avoid links with certain class, or attributes, eg:
// Exclude links with data-exclude attribute
document.querySelectorAll('a:not([data-exclude])')

Steps to reproduce

No response

Expected behavior

No response

Actual behavior

No response

Your environment

No response

Quarto check output

Quarto 1.5.57
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.2.0: OK
      Dart Sass version 1.70.0: OK
      Deno version 1.41.0: OK
      Typst version 0.11.0: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 1.5.57
      Path: /opt/quarto/bin

[✓] Checking tools....................OK
      TinyTeX: (external install)
      Chromium: (not installed)

[✓] Checking LaTeX....................OK
      Using: TinyTex
      Version: 2024

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.10.12
      Jupyter: 5.7.2
      Kernels: python3

[✓] Checking Jupyter engine render....OK

[✓] Checking R installation...........(None)

      Unable to locate an installed version of R.
      Install R from https://cloud.r-project.org/
@mckabue mckabue added the bug Something isn't working label Nov 25, 2024
@cderv
Copy link
Collaborator

cderv commented Nov 25, 2024

The piece of code you shared needs to be understood in the more generic context of this feature from #8853. The important part is

links[i].dataset.originalHref = links[i].href

This is used to store original href as indeed the following replace for everything link. But later this happens

var links = window.document.querySelectorAll('a[href]:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item):not(.quarto-navigation-tool):not(.about-link)');
for (var i=0; i<links.length; i++) {
const link = links[i];
if (!isInternal(link.href)) {
// undo the damage that might have been done by quarto-nav.js in the case of
// links that we want to consider external
if (link.dataset.originalHref !== undefined) {
link.href = link.dataset.originalHref;
}

Any external non-navigation links will be "fixed" by putting back the original href.

So for your example: https://api.visitorbadge.io/api/visitors?path=https://github.com/index.html
where does this link is used in your project ?

.replace(//index.html/, "/") replaces /index.html anywhere in the link, not just the end of the string

This seems definitely a problem but I don't think we expect index.html to be something else inside internal link in the project. For external link we are supposed to "fix the damage" as comments says

Can you help with more context on where you find the issue with external link being incorrectly modified ?

@mckabue
Copy link
Author

mckabue commented Nov 25, 2024

Also, attribution links for all other pages are also changed, for example: https://toknow.ai/posts/private-domain-checker/index.html has been changed to https://toknow.ai/posts/private-domain-checker/ . This particularly okay, but there should be a way to disable/override that behaviour.

@cscheid cscheid self-assigned this Nov 25, 2024
@cscheid cscheid added this to the v1.7 milestone Nov 25, 2024
@cscheid cscheid added the websites Issues creating websites label Nov 25, 2024
@cderv
Copy link
Collaborator

cderv commented Nov 25, 2024

Thanks for sharing your website. For what I can see, you may be using something like

link-external-filter: "https://toknow.ai"

This is translated to

var filterRegex = new RegExp("https:\/\/toknow\.ai");

and so when trying to decide if !isInternal(link.href) to only fix external link, any url having https://toknow.ai is seen as internal url and not as external.

You could use a regex in this config so that you match link starting with https://toknow.ai as internal and others as external.
See https://quarto.org/docs/output-formats/html-basics.html#external-links

If I am right, this could help

link-external-filter: '^(?:http:|https:)\/\/toknow\.ai\/'

This would not match I think your special urls api.visitorbadge.io

Hope this helps understand

Also, attribution links for all other pages are also changed, for example: toknow.ai/posts/private-domain-checker/index.html has been changed to toknow.ai/posts/private-domain-checker . This particularly okay, but there should be a way to disable/override that behaviour.

Regarding this, stripping index.html is expected for internal link. I don't recall the exact rationale but @cscheid could probably explain;

There was recently a related discussion on foo/ vs foo/index.html at #11365

@mckabue
Copy link
Author

mckabue commented Nov 25, 2024

I don't have a link-external-filter option in my settings, only link-external-newwindow: true.

Let me try link-external-filter: '^(?:http:|https:)\/\/toknow\.ai\/' and see if it works.

@cderv
Copy link
Collaborator

cderv commented Nov 25, 2024

I don't have a link-external-filter option in my settings, only link-external-newwindow: true.

Ok so this is probably the default applying using

site-url: https://toknow.ai

do you have that ?

Otherwise, our last default is using windows.host

Logic is in the EJS template we use

<% if (linkExternalFilter && linkExternalFilter.match(/\\\\/)) { %>
<%- `var filterRegex = new RegExp(/${linkExternalFilter}/);` %>
<% } else if (linkExternalFilter) { %>
<%= `var filterRegex = new RegExp("${linkExternalFilter}");` %>
<% } else { %>
var filterRegex = new RegExp('/' + window.location.host + '/');
<% } %>

and site-url is used as default if no option provided and it is present

options.linkExternalFilter = format.render[kLinkExternalFilter];
// If there is a site URL, we can use that as the default filter
const siteMetadata = format.metadata[kWebsite] as Metadata;
if (!options.linkExternalFilter && siteMetadata) {
const siteUrl = siteMetadata[kSiteUrl] as string;
if (siteUrl) {
options.linkExternalFilter = siteUrl.replaceAll(".", "\\.").replaceAll(
"/",
"\\/",
);
}
}

@mckabue
Copy link
Author

mckabue commented Nov 25, 2024

yes, I have site-url: https://toknow.ai

@mckabue
Copy link
Author

mckabue commented Nov 25, 2024

after adding link-external-filter: '^(?:http:|https:)\/\/toknow\.ai\/' it now works...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working websites Issues creating websites
Projects
None yet
Development

No branches or pull requests

3 participants