-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#147] Update readme #216
[#147] Update readme #216
Conversation
README.md
Outdated
It is able to check multiple repositores at once if they are gathered in one folder. | ||
Being written on JavaScript, it is fairly slow on large repositories. | ||
Being written in JavaScript, it is fairly slow on large repositories. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Martoon-00 do we have any reference for this claim? Was this something you tried yourself (way back)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that was my personal observation at that moment.
However given that we are planning a public release, we may need to verify this again 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Would you mind if we delete this particular sentence? It's just that, even if we conclude that the application's slow, I don't think we have enough confidence to assert that it's slow because of being written in js :/ Also, "slow" can be subjective, and I think this section should be more objective when describing other people's projects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that making such statements does not sound too proper 🤔
On the other hand, if we are just left with the mention of this alternative solution without mentioning any weak point, then it's not clear why another tool (xrefcheck) was necessary.
From my experience, at least one of those JavaScript-based tools was noticeably slow. I had an impression that it didn't do any parallelization (thus the claim about JS being the reason, but such a claim is really invalid), and it took a dozen seconds to check a small-to-middle-sized repository that contained only local (!) links.
Given that, maybe we could form more objective statements here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I tried using remark
on the xrefcheck repo, here are the steps to reproduce my experiment:
$ npm init
$ npm install remark-cli remark-lint-no-dead-urls remark-validate-links --save-dev
Edit package.json
and add this:
"remarkConfig": {
"plugins": [
"remark-lint-no-dead-urls",
"remark-validate-links"
]
}
Run it and time it:
$ time npm run env -- remark . --frail --ignore-pattern 'tests/markdowns/**/*' --ignore-pattern 'tests/golden/**/*'
It seems to take about 8s on average to verify the xrefcheck repo.
The xrefcheck
tools takes about 6~8s, so it's not a huge difference :/
However, I did notice that it does not handle "429 Too Many Requests" - so eventually links to github.com start failing and the tool reports false positives.
I replaced our claim about it being slow with a new claim. And also added "Resilience" as one of xrefcheck's aims.
Please have a look at the latest fixup and let me know what you think ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (learned new English words, e.g. resilience). As I understand there was no waiting for 429's during speed comparison (since default waiting interval is 30s) so it was fair test . Maybe one could compare speed on a repository with many local links since your test looks like competition of network libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a file with 240 local links, re-ran the experiment and this time also ignored CHANGES.md
(where most of our external links are). xrefcheck took 1.5-2s, remark took 2.5-3s. So not a huge difference here either :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this exploration!
I also wanted to see a check for local links only, but it turns out to be not faster...
So yeah, the rewrite of the claim looks good 👍
xrefcheck took 1.5-2s
This is quite suspicious though, to me it looks like there is nothing that forced the time to be more than a portion of second.
Initially, I wanted to make xrefcheck instant when checking local links (so that several seconds time demonstrated by the other tools seemed really long). Most user's mistakes come from local links (external links are usually copy-pasted), and a shorter time would facilitate documentation writing a lot.
Created #219 to investigate this later.
If after optimizations xrefcheck
takes like 0.4s on your example, how do you think, would it be correct to include the mention of xrecheck being faster? Not in vague and incorrect terms as before, but referring to our experiment showing that remark
takes X seconds and xrefcheck
takes Y?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite suspicious though, to me it looks like there is nothing that forced the time to be more than a portion of second.
Oh I just remember I compiled xrefcheck with no optimizations (stack install --fast
) ^^ Just tried again with stack install
, but got very similar results (1.3~1.5secs) :/
Another interesting data point: I re-ran the experiment, this time with --mode local-only
. Like before, I added this file with 242 local links to the xrefcheck repo (don't forget git add a.md
), and then ran:
$ time xrefcheck --ignore 'tests/markdowns/**/*' --ignore 'tests/golden/**/*' --ignore CHANGES.md --mode local-only
1.05s user 0.28s system 370% cpu 0.358 total
0.3 secs 😮
Even the 0.75 seconds you observed in morley in #219 is in my opinion really fast 😅
Regarding #219: I'm tempted to say that xrefcheck is pretty much as fast as it can be.
I suspect (but am not 100% sure) the primary bottleneck is in the network when checking external links (and is the reason for the 1.5-2 secs I saw earlier) and the secondary bottleneck in disk reads, and we can't do anything about either of those :/
Any performance gains we can squeeze out will (I think) be very marginal, and unnoticeable by a real world user.
If after optimizations xrefcheck takes like 0.4s on your example, how do you think, would it be correct to include the mention of xrecheck being faster? Not in vague and incorrect terms as before, but referring to our experiment showing that remark takes X seconds and xrefcheck takes Y?
I'd be okay with that, but if we decide to do that, then I think it would be fair to construct a proper reproducible benchmark and compare xrefcheck against the other alternatives we mentioned in the readme (not just remark
), and then publish a table with the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that likely I/O takes the most part of the time and we don't be able to introduce any significant performance improvement, unless there is some place in the verification logic that appears to be very non optimal.
If I fail to find such places, it would be reasonable to abort the issue.
then I think it would be fair to construct a proper reproducible benchmark and compare xrefcheck against the other alternatives we mentioned in the readme
Really, agreed.
README.md
Outdated
It is able to check multiple repositores at once if they are gathered in one folder. | ||
Being written on JavaScript, it is fairly slow on large repositories. | ||
Being written in JavaScript, it is fairly slow on large repositories. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that was my personal observation at that moment.
However given that we are planning a public release, we may need to verify this again 🤔
bec5bf3
to
0641f8f
Compare
Problem: We have a pipeline step to tag docker images on dockerhub whenever a new version is released: https://github.com/serokell/xrefcheck/blob/7dd5c4c3c954a531b5cad89857f31b27245f0ef9/.buildkite/pipeline.yml#L51-L56 However, this doesn't seem to be working, dockerhub only contains the `latest` tag: https://hub.docker.com/r/serokell/xrefcheck/tags The problem *seems* to be that the CI step is only triggered when it builds a branch with a name matching the regex `/^v[0-9]+.*/`. But we never use that format for branch names, so it's never triggered. Solution: 1. Change the CI step to trigger when it detects a tag with a version number 2. Enable the "Build tags" option in buildkite: https://buildkite.com/serokell/xrefcheck/settings/repository
Improved the readme and fixed several problems: * Mention support for GitLab - this is important and wasn't mentioned anywhere. * Add a FAQ clarifying how xrefcheck behaves in some important situations. * We don't need to get into a lot of detail about the syntax of the `xrefcheck: ignore` annotations, where they're allowed and where they're not. A general idea and a couple of examples are more than enough. * Added the backlink `[↑](#xrefcheck)` where it was missing. * Fixed inconsistent level headers: we we're using `###` where we should be using `##` * `nix run` should now be `nix shell` * Add a link to `tests/configs/github-config.yaml` which contains a list of all supported config options. * Instead of mentioning GitHub Actions in the "usage" section and nix in a separate section, mention everything in the "usage" section. * Fixed link to `stack2cabal` * Fixed typos and rephrased some bits.
0641f8f
to
2fd11bf
Compare
Description
Improved the readme and fixed several problems:
xrefcheck: ignore
annotations, where they're allowed and where they're not. A general idea and a couple of examples are more than enough.[↑](#xrefcheck)
where it was missing.###
where we should be using##
nix run
should now benix shell
tests/configs/github-config.yaml
which contains a list of all supported config options.stack2cabal
Also noticed an issue with the dockerhub tags, so I fixed it while I'm here:
Related issue(s)
Fixes #147
✅ Checklist for your Pull Request
Ideally a PR has all of the checkmarks set.
If something in this list is irrelevant to your PR, you should still set this
checkmark indicating that you are sure it is dealt with (be that by irrelevance).
Related changes (conditional)
Tests
silently reappearing again.
Documentation
Public contracts
of Public Contracts policy.
and
Stylistic guide (mandatory)
✓ Release Checklist
package.yaml
.under the "Unreleased" section to a new section for this release version.
vX.Y.Z
.xrefcheck-action
.