Fix navbar behaviour when `<body>` isn't on its own line; add tests for this #10

penelopeysm · 2025-02-01T18:07:35Z

The current navbar script searches for lines with <body> and inserts the navbar after those lines. This is great when <body> is on its own line, but fails when there's stuff that follows it on the same line (TuringLang/TuringBenchmarking.jl#39).

This PR also adds some tests to make sure that it behaves correctly when <body> is preceded / followed by other HTML on the same line.

penelopeysm · 2025-02-01T18:12:07Z

Honestly I couldn't see myself doing this a year ago, but gpt actually wrote almost this whole PR for me. 🙃
I guess I still had to know what to ask it to do, but gosh is it a lifesaver not having to look up the syntax for github workflows again.

penelopeysm · 2025-02-01T18:21:30Z

DocsNav/scripts/insert_navbar.sh

+    /<body[^>]*>/ {
+        sub(/(<body[^>]*>)/, "&\n" nav);


The new regex is more flexible in that it would picks up cases where the body tag has some attribute <body class="myclass">. I suppose it's still technically sort of fragile, in the sense that if the attribute contains > it will break. Happy to go either way on this although I think the new regex is a bit better in that the failure case is much less likely than the failure case for the old one.

penelopeysm · 2025-02-01T18:25:18Z

.github/workflows/test_navbar_script.yml

+          for file in ins_html/*; do
+            if diff <(tr -d '[:space:]' < "$file") <(tr -d '[:space:]' < $OUT); then


This is also not technically correct, because HTML is not totally whitespace-insignificant - for example tag names and attributes must be separated - but I suppose it's not really possible to test this without using a proper HTML parser, and for the limited test examples I reckon this should be fine.

shravanngoswamii · 2025-02-01T21:17:48Z

I don't think it will still work for us, I don't know why but our current script works fine locally when I run it on my WSL Ubuntu, but it breaks in GHA for many cases sometimes like the current scenario in DynamicPPL and TuringBenchmarking.jl. Just asking for suggestion, can we try something other than shell script? Because it's giving different results in different environments...

Documenter generated HTML usually gets wrapped up in very few lines and our script breaks when <body> tag have some particular thing after it, I don't know what's that!

This PR also adds some tests to make sure that it behaves correctly when <body> is preceded / followed by other HTML on the same line.

Our script works for this as of now as far as I know from my old tests, but it breaks for particular unknown edge cases!

See, it works for TuringGLM which had content on same line as tag, actually whole page was in single line I guess, Documenter do this for efficiency maybe: https://github.com/TuringLang/TuringGLM.jl/actions/runs/13092370899 (I just ran it):

And it still works for old DynamicPPL versions see:

https://turinglang.org/DynamicPPL.jl/v0.32/
HTML difference in old and new:
- New: https://github.com/TuringLang/DynamicPPL.jl/blob/gh-pages/v0.34.2/index.html
- Old: https://github.com/TuringLang/DynamicPPL.jl/blob/gh-pages/v0.32.2/index.html

Honestly I couldn't see myself doing this a year ago, but gpt actually wrote almost this whole PR for me. 🙃

Which model did you used btw, just for curiosity😅?

penelopeysm · 2025-02-01T21:31:22Z

Oh, that's so weird. :( Maybe it's an upgrade on ubuntu versions? actions/runner-images#10636 which might have come with different versions of bash/awk.

I'm happy to use something else more proper. For me, this PR was fairly easy to put together so I thought it could be a decent stopgap. But it's definitely not the 'correct' solution!

Gumbo.jl seems like a sensible option. I'm not convinced that Julia's the prettiest language to do this sort of ugly, html-wrangling stuff, but I guess it means that other people will be able to edit it. Did you want to work on it? Let me know either way:)

Which model

4o haha. The main reason why I used it was because I spent one day 4 years ago learning about awk but have since forgotten all of it 😄 I did have to prompt it quite carefully by telling it exactly what the problem was (with stuff happening after the tag). I don't think the GHA workflow it generated was perfect, I did actually edit quite a few things like the working-directory and the call to chmod, but it took care of all the boilerplate quite well which meant that I could focus on the more fun bits.

shravanngoswamii · 2025-02-01T21:41:51Z

4o haha.

4o usually don't give required code in one go😅, o1 and Deepseek-R1 are cool btw!

Did you want to work on it? Let me know either way:)

Sure, I will try some different approaches on this tomorrow, will come up with a solution for our best case, maybe GUMBO or else something Node.js based or maybe stay with bash! It's 3:30 AM, right now🦉!

shravanngoswamii · 2025-02-05T13:47:12Z

@penelopeysm I've tried several approaches for this, including Gumbo.jl, [pup](https://github.com/ericchiang/pup), Cheerio (Node.js), BeautifulSoup, and others. However, I felt it wasn't ideal to add unnecessary dependencies for something that can be done without them. Thats why I did it with bash at first, but it keeps breaking!

Finally, I've decided to implement this in Julia, with only HTTP for fetching URLs and regex for manipulation, without needing Gumbo. I’ve tested this locally across different scenarios, and it worked well for me.

https://github.com/shravanngoswamii/DynamicPPL.jl/actions/runs/13159797886/job/36725344673
https://shravangoswami.com/DynamicPPL.jl/dev/

If you have any alternative suggestions or approaches, please let me know. Here’s the script for your review—once you’ve had a look, we can commit it after cleaning a bit and add some comments if everything looks good.

using HTTP

function read_file(filename::String)
    open(filename, "r") do io
        read(io, String)
    end
end

function write_file(filename::String, contents::String)
    open(filename, "w") do io
        write(io, contents)
    end
end

function should_exclude(filename::String, patterns::Vector{String})
    for pat in patterns
        if occursin(pat, filename)
            return true
        end
    end
    return false
end

function remove_existing_navbar(html::String)
    start_marker = "<!-- NAVBAR START -->"
    end_marker   = "<!-- NAVBAR END -->"
    while occursin(start_marker, html) && occursin(end_marker, html)
        start_idx_range = findfirst(start_marker, html)
        end_idx_range   = findfirst(end_marker, html)
        start_idx = first(start_idx_range)
        end_idx = first(end_idx_range)
        prefix = start_idx > 1 ? html[1:start_idx-1] : ""
        suffix = lstrip(html[end_idx + length(end_marker) : end])
        html = string(prefix, suffix)
    end
    return html
end

function wrap_navbar(navbar_html::String)
    if !occursin("NAVBAR START", navbar_html) || !occursin("NAVBAR END", navbar_html)
        return "<!-- NAVBAR START -->\n" * navbar_html * "\n<!-- NAVBAR END -->"
    else
        return navbar_html
    end
end

function insert_navbar(html::String, navbar_html::String)
    html = remove_existing_navbar(html)
    m = match(r"(?i)(<body[^>]*>)", html)
    if m === nothing
        println("Warning: Could not find <body> tag in the file; skipping insertion.")
        return html
    end
    prefix = m.match
    inserted = string(prefix, "\n", navbar_html, "\n")
    html = replace(html, prefix => inserted; count = 1)
    return html
end

function process_file(filename::String, navbar_html::String)
    println("Processing: $filename")
    html = read_file(filename)
    html_new = insert_navbar(html, navbar_html)
    if html_new == html
        println("Skipped: No <body> tag found in $filename")
    else
        write_file(filename, html_new)
        println("Updated: $filename")
    end
end

function main()
    if length(ARGS) < 2
        println("Usage: julia insert_navbar.jl <html-file-or-directory> <navbar-file-or-url> [--exclude \"pat1,pat2,...\"]")
        return
    end
    target = ARGS[1]
    navbar_source = ARGS[2]
    
    exclude_patterns = String[]
    if length(ARGS) ≥ 4 && ARGS[3] == "--exclude"
        exclude_patterns = map(x -> string(strip(x)), split(ARGS[4], ','))
    end

    navbar_html = ""
    if startswith(lowercase(navbar_source), "http")
        resp = HTTP.get(navbar_source)
        if resp.status != 200
            error("Failed to download navbar from $navbar_source")
        end
        navbar_html = String(resp.body)
    else
        navbar_html = read_file(navbar_source)
    end
    navbar_html = string(navbar_html)
    navbar_html = wrap_navbar(navbar_html)

    if isfile(target)
        if !should_exclude(target, exclude_patterns)
            process_file(target, navbar_html)
        else
            println("Skipping excluded file: $target")
        end
    elseif isdir(target)
        for (root, _, files) in walkdir(target)
            for file in files
                if endswith(file, ".html")
                    fullpath = joinpath(root, file)
                    if !should_exclude(fullpath, exclude_patterns)
                        process_file(fullpath, navbar_html)
                    else
                        println("Skipping excluded file: $fullpath")
                    end
                end
            end
        end
    else
        error("Target $target is neither a file nor a directory.")
    end
end

main()

penelopeysm · 2025-02-05T17:11:39Z

@shravanngoswamii, yes, I'm happy with the overall approach! I do have some code comments. If you open a PR I can review :)

penelopeysm · 2025-02-05T21:24:32Z

Superseded by #11.

penelopeysm force-pushed the py/fix-body-tag branch 3 times, most recently from fb02664 to 54205c0 Compare February 1, 2025 18:10

Add tests for navbar

aecd373

penelopeysm force-pushed the py/fix-body-tag branch from 54205c0 to aecd373 Compare February 1, 2025 18:13

penelopeysm requested a review from shravanngoswamii February 1, 2025 18:14

Loosen regex

fff84d1

penelopeysm commented Feb 1, 2025

View reviewed changes

penelopeysm marked this pull request as draft February 3, 2025 15:20

shravanngoswamii mentioned this pull request Feb 5, 2025

Replacing bash script with Julia script for inserting navbar #11

Merged

penelopeysm closed this Feb 5, 2025

penelopeysm deleted the py/fix-body-tag branch February 5, 2025 21:24

penelopeysm restored the py/fix-body-tag branch February 5, 2025 21:24

shravanngoswamii deleted the py/fix-body-tag branch February 6, 2025 12:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix navbar behaviour when `<body>` isn't on its own line; add tests for this #10

Fix navbar behaviour when `<body>` isn't on its own line; add tests for this #10

penelopeysm commented Feb 1, 2025

penelopeysm commented Feb 1, 2025

penelopeysm Feb 1, 2025 •

edited

Loading

penelopeysm Feb 1, 2025 •

edited

Loading

shravanngoswamii commented Feb 1, 2025 •

edited

Loading

penelopeysm commented Feb 1, 2025

shravanngoswamii commented Feb 1, 2025

shravanngoswamii commented Feb 5, 2025 •

edited

Loading

penelopeysm commented Feb 5, 2025

penelopeysm commented Feb 5, 2025

		for file in ins_html/*; do
		if diff <(tr -d '[:space:]' < "$file") <(tr -d '[:space:]' < $OUT); then

Fix navbar behaviour when <body> isn't on its own line; add tests for this #10

Fix navbar behaviour when <body> isn't on its own line; add tests for this #10

Conversation

penelopeysm commented Feb 1, 2025

penelopeysm commented Feb 1, 2025

penelopeysm Feb 1, 2025 • edited Loading

Choose a reason for hiding this comment

penelopeysm Feb 1, 2025 • edited Loading

Choose a reason for hiding this comment

shravanngoswamii commented Feb 1, 2025 • edited Loading

penelopeysm commented Feb 1, 2025

shravanngoswamii commented Feb 1, 2025

shravanngoswamii commented Feb 5, 2025 • edited Loading

penelopeysm commented Feb 5, 2025

penelopeysm commented Feb 5, 2025

Fix navbar behaviour when `<body>` isn't on its own line; add tests for this #10

Fix navbar behaviour when `<body>` isn't on its own line; add tests for this #10

penelopeysm Feb 1, 2025 •

edited

Loading

penelopeysm Feb 1, 2025 •

edited

Loading

shravanngoswamii commented Feb 1, 2025 •

edited

Loading

shravanngoswamii commented Feb 5, 2025 •

edited

Loading