Add hashing for verifying correct input of code #72

Chillee · 2019-04-25T02:55:18Z

I don't think that hashing sections is worth it. MIT does hashing in 8 snippets: LCT, LinearRecurrence,Simplex.h, Polynomial,CycleCounting,GraphDominator, and both suffix arrays

I would split that into
"Should be split into different sections": Polynomial, CycleCounting
"trying to avoid hashing the typedef": LinearRecurrence
"Has parts that you don't always want": Both suffix arrays (ie: don't always need LCP)
"Not sure": LCT, Simplex, and GraphDominator (I don't know enough about the algorithms to understand whether you pretty much always want all functions)

That's a maximum of 4 snippets where it might be advantageous to have section-wise hashing.

The other argument for hashing sections is that if the hash fails, then you need to look at less of your code. I haven't done many offline contests with a TCR, but from my experience, knowing that you have a mistype in 50 lines of code is only marginally better than knowing you have a mistype in 100 lines of code. Both of these are massively better than not knowing whether you have a mistype or a logic error.

If we were to hash by section, I would propose having some kind of lightweight syntax (like a //<-- ) to demarcate sections, and then putting the hashes (truncated to 5 characters) in the header.

Like so:

Another question with hashing is how we deal with things like typedefs, especially if they're typedefs that are likely to be typed multiple times (for example, typedef vector<ll> Poly). I think it's not too big of a deal, I would suggest to just get used to typing them in for the purpose of hashing.

My biggest problem with avoiding them automatically is ambiguity with what hashes represent. "We hash everything that's printed" is obvious. "We hash everything after the typedefs" is less obvious.

The text was updated successfully, but these errors were encountered:

simonlindholm · 2019-04-25T13:21:53Z

There are actually a fair number of cases where you might/will type in only parts of the code: Treap, FastSubsetTransform (that one's weird), euclid, chinese, 2sat, TreePower, HLD (on the chopping block), sideOf, Angle, KMP, SuffixTree, Hashing, AhoCorasick, IntervalContainer. And in several more I can imagine that the 100->50 line reduction is handy. So if we could come up with some slick UI for indicating sections I'd be all for it. I agree with your comment about ambiguity, though, and I think we can start simple.

ecnerwala · 2019-04-26T17:40:11Z

Just a note: I updated the hash script in our book to include the -dD flag, which preserves macro definitions. It's now cpp -dD -P -fpreprocessed | tr -d '[:space:]' | md5sum -

ecnerwala · 2019-04-26T17:43:49Z

The other argument for hashing sections is that if the hash fails, then you need to look at less of your code. I haven't done many offline contests with a TCR, but from my experience, knowing that you have a mistype in 50 lines of code is only marginally better than knowing you have a mistype in 100 lines of code. Both of these are massively better than not knowing whether you have a mistype or a logic error.

I think knowing you have a mistype in 50 vs 100 lines of code is actually linearly (~2x) better for finding the bug, which amounts to maybe 5 minutes of time (and feeling a lot happier).

ecnerwala · 2019-04-26T17:45:54Z

Also, I'll note that we would've hashed sections in more files if we used them more/weren't too lazy to add the annotations. Honestly, we mostly used kactl for the stuff we added (which we broke into sections) and the geometry (which is short to begin with).

simonlindholm · 2019-04-26T18:09:50Z

Thanks for the note, I've made that change: dcdc34a (note also the golfed vimrc: ca Hash w !cpp -dD -P -fpreprocessed \| tr -d '[:space:]' \| md5sum \| cut -c-6)

lrvideckis · 2024-03-07T19:29:46Z

Hi, I want to propose an idea for "partial hashes", idea communicated to me by https://codeforces.com/profile/camc

let's say you want a struct:

struct LCA {
...
	LCA(vector<vi>& C) : time(sz(C)), rmq((dfs(C,0,-1), ret)) {}
	void dfs(vector<vi>& C, int v, int par) {
...
	}

	int lca(int a, int b) {
		if (a == b) return a;
		tie(a, b) = minmax(time[a], time[b]);
		return path[rmq.query(a, b)];
	}
	int dist(a,b) {return depth[a] + depth[b] - 2*depth[lca(a,b)];}
        int inSubtree(a,b) {return time[a] <= time[b] && time[b] < timeOut[a];}
        int nodeOnPath(u,v,w) {...}
...
};

you can split it up like:
LCA.h:

struct LCA {
...
	LCA(vector<vi>& C) : time(sz(C)), rmq((dfs(C,0,-1), ret)) {}
	void dfs(vector<vi>& C, int v, int par) {
...
	}
#include "lcaFunc.h"
#include "dist.h"
#include "inSubtree.h"
#include "nodeOnPath.h"
};

lcaFunc.h:

#pragma once
	int lca(int a, int b) {
		if (a == b) return a;
		tie(a, b) = minmax(time[a], time[b]);
		return path[rmq.query(a, b)];
	}

dist.h:

#pragma once
	int dist(a,b) {return depth[a] + depth[b] - 2*depth[lca(a,b)];}

inSubtree.h:

#pragma once
        int inSubtree(a,b) {return time[a] <= time[b] && time[b] < timeOut[a];}

... etc

Now each member function is in it's own file, thus has it's own hash. Furthermore, you type exactly what you need: if you only need lca function, you only type it;verify hash, then copy into struct.

If you need lca,dist, inSubtree, you type all three, verify all their hashes, then copy them into the struct

Furthermore, the include statements tell you exactly where to put the member functions

lrvideckis · 2024-03-07T19:35:20Z

Now you don't want to force the user to type those include statements, so for me, when I generate the .pdf, I have this in a script:

contest/hash.sh:

tr -d '[:space:]' | md5sum | cut -c-6

generate_pdf.sh:

shopt -s globstar
for header in ../content/**/*.h; do
	hash=$(sed '/^#include/d' "$header" | cpp -dD -P -fpreprocessed | ./../contest/hash.sh)
	sed --in-place "1i //hash: $hash" "$header"
done

lrvideckis · 2024-03-07T19:37:46Z

furthermore, if you use something like the expander script for codeforces rounds where you can copy-paste; this method should still work

lrvideckis · 2024-03-07T19:53:00Z

for example for you can split apart fenwick tree lower bound https://github.com/kth-competitive-programming/kactl/blob/main/content/data-structures/FenwickTree.h#L24 as you rarely need that function

for example for this

https://github.com/kth-competitive-programming/kactl/blob/main/content/graph/CompressTree.h#L18

where you pass in LCA& lca as a parameter, Instead, you could add compressTree as a member function of LCA; splitting up files using this trick; now no need to pass in lca as a param; also instead of lca.lca(a, b) syntax, it's now lca(a, b) syntax

simonlindholm added the enhancement label Apr 28, 2019

lrvideckis mentioned this issue Oct 19, 2023

now asserts are included in the hash code lrvideckis/programming_team_code#517

Merged

lrvideckis mentioned this issue Sep 16, 2024

hash for each prefix of a file (for verifying input) #266

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add hashing for verifying correct input of code #72

Add hashing for verifying correct input of code #72

Chillee commented Apr 25, 2019

simonlindholm commented Apr 25, 2019

ecnerwala commented Apr 26, 2019

ecnerwala commented Apr 26, 2019

ecnerwala commented Apr 26, 2019 •

edited

Loading

simonlindholm commented Apr 26, 2019

lrvideckis commented Mar 7, 2024

lrvideckis commented Mar 7, 2024

lrvideckis commented Mar 7, 2024

lrvideckis commented Mar 7, 2024 •

edited

Loading

Add hashing for verifying correct input of code #72

Add hashing for verifying correct input of code #72

Comments

Chillee commented Apr 25, 2019

simonlindholm commented Apr 25, 2019

ecnerwala commented Apr 26, 2019

ecnerwala commented Apr 26, 2019

ecnerwala commented Apr 26, 2019 • edited Loading

simonlindholm commented Apr 26, 2019

lrvideckis commented Mar 7, 2024

lrvideckis commented Mar 7, 2024

lrvideckis commented Mar 7, 2024

lrvideckis commented Mar 7, 2024 • edited Loading

ecnerwala commented Apr 26, 2019 •

edited

Loading

lrvideckis commented Mar 7, 2024 •

edited

Loading