Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use proper normalization in Hamming #250

Merged
merged 4 commits into from
Sep 23, 2019
Merged

Use proper normalization in Hamming #250

merged 4 commits into from
Sep 23, 2019

Conversation

sdmccabe
Copy link
Collaborator

Harrison pointed out in a comment on our paper that our Hamming implementation
has an implicit $N^2$ instead of $N(N-1)$ normalization, so it's wrong for
graphs without selfloops. This corrects that, similar to #242.

A couple of notes:

  1. I think this could be a little cleaner; the fact that np.triu_indices() et
    al return 2-tuples cramped my style a bit.
  2. The fact that this and Use N or N-1 instead of empirical kmax in degree histogram #242 exist raise concern that this normalization issue
    may be present elsewhere. Perhaps we should open a checklist issue, like we have
    for Validating distances against reference implementations #245?
  3. I have not applied the same correction to HammingIpsenMikhailov, on the
    grounds that: (i) it's sufficiently different from regular Hamming to consider
    separately, and (ii) it probably deserves a more thorough cleanup.

Harrison pointed out in a comment on our paper that our Hamming implementation
has an implicit $N^2$ instead of $N(N-1)$ normalization, so it's wrong for
graphs without selfloops. This corrects that, similar to #242.

A couple of notes:

1. I think this could be a little cleaner; the fact that `np.triu_indices()` et
al return 2-tuples cramped my style a bit.
2. The fact that this and #242 exist raise concern that this normalization issue
may be present elsewhere. Perhaps we should open a checklist issue, like we have
for #245?
3. I have not applied the same correction to `HammingIpsenMikhailov`, on the
grounds that: (i) it's sufficiently different from regular `Hamming` to consider
separately, and (ii) it probably deserves a more thorough cleanup.
Copy link
Collaborator

@leotrs leotrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving in spite of my comments, feel free to ignore them.

Also, I was going to say we could think of normalization in terms of #174, but since this was truly implicit I'm not sure how...

# directed case: consider all but the diagonal
if nx.is_directed(G1) or nx.is_directed(G2):
new_mask = np.tril_indices(N, k=-1)
mask = (np.append(mask[0], new_mask[0]), np.append(mask[1], new_mask[1]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it can be achieved in a single append or hstack call? No need to change, just thinking out loud here..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be? It's a tuple of np.arrays, so (i) they're immutable, and (ii) they're two separate objects. I think what I wrote is pretty ugly and I don't like it, but I wasn't sure about a better way to do it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooh you're right it returns a tuple because it's stupid. What about

mask = np.array(np.triu_indices(N, k=1))

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't seem to work.

In [12]: m = np.triu_indices(10, k=1)                                                                                                                                   

In [13]: M = np.array(m)                                                                                                                                                

In [14]: A = np.zeros((10,10))                                                                                                                                          

In [15]: len(A[M])                                                                                                                                                      
Out[15]: 2

In [16]: len(A[m])                                                                                                                                                      
Out[16]: 45

In [17]: A[M].shape                                                                                                                                                     
Out[17]: (2, 45, 10)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, ok. There must be a way to do it nicely, but I don't care enough right now...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. There's some DRY here so I thought about wrapping it in a function, but the idea of needing a function to do that bothered me...

netrd/distance/hamming.py Outdated Show resolved Hide resolved
@sdmccabe sdmccabe merged commit 0a2eeb0 into netsiphd:master Sep 23, 2019
@sdmccabe sdmccabe deleted the hamming-tweak branch September 23, 2019 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants