-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix navbar behaviour when <body>
isn't on its own line; add tests for this
#10
Conversation
fb02664
to
54205c0
Compare
Honestly I couldn't see myself doing this a year ago, but gpt actually wrote almost this whole PR for me. 🙃 |
54205c0
to
aecd373
Compare
/<body[^>]*>/ { | ||
sub(/(<body[^>]*>)/, "&\n" nav); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new regex is more flexible in that it would picks up cases where the body tag has some attribute <body class="myclass">
. I suppose it's still technically sort of fragile, in the sense that if the attribute contains >
it will break. Happy to go either way on this although I think the new regex is a bit better in that the failure case is much less likely than the failure case for the old one.
for file in ins_html/*; do | ||
if diff <(tr -d '[:space:]' < "$file") <(tr -d '[:space:]' < $OUT); then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also not technically correct, because HTML is not totally whitespace-insignificant - for example tag names and attributes must be separated - but I suppose it's not really possible to test this without using a proper HTML parser, and for the limited test examples I reckon this should be fine.
I don't think it will still work for us, I don't know why but our current script works fine locally when I run it on my WSL Ubuntu, but it breaks in GHA for many cases sometimes like the current scenario in Documenter generated HTML usually gets wrapped up in very few lines and our script breaks when
Our script works for this as of now as far as I know from my old tests, but it breaks for particular unknown edge cases! See, it works for
And it still works for old DynamicPPL versions see:
Which model did you used btw, just for curiosity😅? |
Oh, that's so weird. :( Maybe it's an upgrade on ubuntu versions? actions/runner-images#10636 which might have come with different versions of bash/awk. I'm happy to use something else more proper. For me, this PR was fairly easy to put together so I thought it could be a decent stopgap. But it's definitely not the 'correct' solution! Gumbo.jl seems like a sensible option. I'm not convinced that Julia's the prettiest language to do this sort of ugly, html-wrangling stuff, but I guess it means that other people will be able to edit it. Did you want to work on it? Let me know either way:)
4o haha. The main reason why I used it was because I spent one day 4 years ago learning about awk but have since forgotten all of it 😄 I did have to prompt it quite carefully by telling it exactly what the problem was (with stuff happening after the tag). I don't think the GHA workflow it generated was perfect, I did actually edit quite a few things like the working-directory and the call to chmod, but it took care of all the boilerplate quite well which meant that I could focus on the more fun bits. |
4o usually don't give required code in one go😅, o1 and Deepseek-R1 are cool btw!
Sure, I will try some different approaches on this tomorrow, will come up with a solution for our best case, maybe GUMBO or else something Node.js based or maybe stay with bash! It's 3:30 AM, right now🦉! |
@penelopeysm I've tried several approaches for this, including Finally, I've decided to implement this in Julia, with only https://github.com/shravanngoswamii/DynamicPPL.jl/actions/runs/13159797886/job/36725344673 If you have any alternative suggestions or approaches, please let me know. Here’s the script for your review—once you’ve had a look, we can commit it after cleaning a bit and add some comments if everything looks good. using HTTP
function read_file(filename::String)
open(filename, "r") do io
read(io, String)
end
end
function write_file(filename::String, contents::String)
open(filename, "w") do io
write(io, contents)
end
end
function should_exclude(filename::String, patterns::Vector{String})
for pat in patterns
if occursin(pat, filename)
return true
end
end
return false
end
function remove_existing_navbar(html::String)
start_marker = "<!-- NAVBAR START -->"
end_marker = "<!-- NAVBAR END -->"
while occursin(start_marker, html) && occursin(end_marker, html)
start_idx_range = findfirst(start_marker, html)
end_idx_range = findfirst(end_marker, html)
start_idx = first(start_idx_range)
end_idx = first(end_idx_range)
prefix = start_idx > 1 ? html[1:start_idx-1] : ""
suffix = lstrip(html[end_idx + length(end_marker) : end])
html = string(prefix, suffix)
end
return html
end
function wrap_navbar(navbar_html::String)
if !occursin("NAVBAR START", navbar_html) || !occursin("NAVBAR END", navbar_html)
return "<!-- NAVBAR START -->\n" * navbar_html * "\n<!-- NAVBAR END -->"
else
return navbar_html
end
end
function insert_navbar(html::String, navbar_html::String)
html = remove_existing_navbar(html)
m = match(r"(?i)(<body[^>]*>)", html)
if m === nothing
println("Warning: Could not find <body> tag in the file; skipping insertion.")
return html
end
prefix = m.match
inserted = string(prefix, "\n", navbar_html, "\n")
html = replace(html, prefix => inserted; count = 1)
return html
end
function process_file(filename::String, navbar_html::String)
println("Processing: $filename")
html = read_file(filename)
html_new = insert_navbar(html, navbar_html)
if html_new == html
println("Skipped: No <body> tag found in $filename")
else
write_file(filename, html_new)
println("Updated: $filename")
end
end
function main()
if length(ARGS) < 2
println("Usage: julia insert_navbar.jl <html-file-or-directory> <navbar-file-or-url> [--exclude \"pat1,pat2,...\"]")
return
end
target = ARGS[1]
navbar_source = ARGS[2]
exclude_patterns = String[]
if length(ARGS) ≥ 4 && ARGS[3] == "--exclude"
exclude_patterns = map(x -> string(strip(x)), split(ARGS[4], ','))
end
navbar_html = ""
if startswith(lowercase(navbar_source), "http")
resp = HTTP.get(navbar_source)
if resp.status != 200
error("Failed to download navbar from $navbar_source")
end
navbar_html = String(resp.body)
else
navbar_html = read_file(navbar_source)
end
navbar_html = string(navbar_html)
navbar_html = wrap_navbar(navbar_html)
if isfile(target)
if !should_exclude(target, exclude_patterns)
process_file(target, navbar_html)
else
println("Skipping excluded file: $target")
end
elseif isdir(target)
for (root, _, files) in walkdir(target)
for file in files
if endswith(file, ".html")
fullpath = joinpath(root, file)
if !should_exclude(fullpath, exclude_patterns)
process_file(fullpath, navbar_html)
else
println("Skipping excluded file: $fullpath")
end
end
end
end
else
error("Target $target is neither a file nor a directory.")
end
end
main() |
@shravanngoswamii, yes, I'm happy with the overall approach! I do have some code comments. If you open a PR I can review :) |
Superseded by #11. |
The current navbar script searches for lines with
<body>
and inserts the navbar after those lines. This is great when<body>
is on its own line, but fails when there's stuff that follows it on the same line (TuringLang/TuringBenchmarking.jl#39).This PR also adds some tests to make sure that it behaves correctly when
<body>
is preceded / followed by other HTML on the same line.