-
Notifications
You must be signed in to change notification settings - Fork 775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hash for each prefix of a file (for verifying input) #266
Comments
Idea communicated to me by https://codeforces.com/profile/enwask https://codeforces.com/profile/Alg01 |
I implemented this for my lib https://github.com/programming-team-code/programming_team_code/releases/download/ptc/ptc.pdf |
after some of the North American ICPC regionals, I received some feedback that these hashes were useful (for example by using I have a script which adds in these comments: for header in ../library/**/*.hpp; do
echo "adding hash codes for $header"
for i in $(seq "$(wc --lines <"$header")" -5 1); do
hash=$(head --lines "$i" "$header" | sed '/^#include/d' | cpp -dD -P -fpreprocessed | ./../library/contest/hash.sh)
line_length=$(sed --quiet "${i}p" "$header" | wc --chars)
# PDF wraps at 68 chars, and hash comment takes 8 chars total
padding_length=$((68 - 8 - line_length))
padding_length=$((padding_length > 0 ? padding_length : 0))
padding=$(printf '%*s' "$padding_length" '')
sed --in-place "${i}s/$/$padding\/\/${hash}/" "$header"
done
done But there's one problem: the script passes prefixes of each file to the The other thing this script assumes is that the code is formatted such that each line is 8-characters less than the line-wrap-length in the PDF. As the script will append |
In theory it would be enough with one character per line (with a ~1/16 chance that the typo is actually on the previous line from the first mismatching one, or ~1/256 that it's two lines up). That's requires a more complex hashing script, but maybe it's fine if you combine it with a single complete hash to cover the majority case of no mistakes? I don't know, there's definitely value in having the hashing operation be as easy to use as possible. I feel like adding |
I've been using a version of this in my implementation of kactl for a while, check out the preprocessor in my repo for more information link. It's still a bit ugly but it was mostly intended to be a quick hack implementation to use until I have the time to fix it. Additionally for the overflow issue, my plan was to make the preprocessor issue an error if any line length is too long and to manually edit the input files to fix this. There are some other places in kactl that overflow even without the addition of the comment, so erroring if there is any wraparound should help catch those issues as well. (Normally wraparound is not an issue, but we want the exact line counts to line up so that the hash-every-five-lines is consistent.) |
Here's my one-hash-char-per-line idea: # Hashes a file, ignoring all whitespace and comments. Use for
# verifying that code was correctly typed.
cpp -dD -P -fpreprocessed | python3 -c 'import sys, hashlib
y,z = b"",[b"b"]*5
x = [y.join(l.split()) for l in sys.stdin.buffer if l.strip()]
print("".join(hashlib.md5(y:=y+l).hexdigest()[0]for l in x+z))' It's not ideal -- too long and the output hash is 89 chars long for geometry/FastDelaunay.h. |
related to #72 but have a hash for each prefix of the file. Then you can binary search over lines to find your typo
The text was updated successfully, but these errors were encountered: