-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use proper normalization in Hamming #250
Conversation
Harrison pointed out in a comment on our paper that our Hamming implementation has an implicit $N^2$ instead of $N(N-1)$ normalization, so it's wrong for graphs without selfloops. This corrects that, similar to #242. A couple of notes: 1. I think this could be a little cleaner; the fact that `np.triu_indices()` et al return 2-tuples cramped my style a bit. 2. The fact that this and #242 exist raise concern that this normalization issue may be present elsewhere. Perhaps we should open a checklist issue, like we have for #245? 3. I have not applied the same correction to `HammingIpsenMikhailov`, on the grounds that: (i) it's sufficiently different from regular `Hamming` to consider separately, and (ii) it probably deserves a more thorough cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving in spite of my comments, feel free to ignore them.
Also, I was going to say we could think of normalization in terms of #174, but since this was truly implicit I'm not sure how...
# directed case: consider all but the diagonal | ||
if nx.is_directed(G1) or nx.is_directed(G2): | ||
new_mask = np.tril_indices(N, k=-1) | ||
mask = (np.append(mask[0], new_mask[0]), np.append(mask[1], new_mask[1])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it can be achieved in a single append
or hstack
call? No need to change, just thinking out loud here..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be? It's a tuple of np.array
s, so (i) they're immutable, and (ii) they're two separate objects. I think what I wrote is pretty ugly and I don't like it, but I wasn't sure about a better way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooooh you're right it returns a tuple because it's stupid. What about
mask = np.array(np.triu_indices(N, k=1))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't seem to work.
In [12]: m = np.triu_indices(10, k=1)
In [13]: M = np.array(m)
In [14]: A = np.zeros((10,10))
In [15]: len(A[M])
Out[15]: 2
In [16]: len(A[m])
Out[16]: 45
In [17]: A[M].shape
Out[17]: (2, 45, 10)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, ok. There must be a way to do it nicely, but I don't care enough right now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. There's some DRY here so I thought about wrapping it in a function, but the idea of needing a function to do that bothered me...
Harrison pointed out in a comment on our paper that our Hamming implementation$N^2$ instead of $N(N-1)$ normalization, so it's wrong for
has an implicit
graphs without selfloops. This corrects that, similar to #242.
A couple of notes:
np.triu_indices()
etal return 2-tuples cramped my style a bit.
may be present elsewhere. Perhaps we should open a checklist issue, like we have
for Validating distances against reference implementations #245?
HammingIpsenMikhailov
, on thegrounds that: (i) it's sufficiently different from regular
Hamming
to considerseparately, and (ii) it probably deserves a more thorough cleanup.