dk-series distance #243

sdmccabe · 2019-08-02T16:18:24Z

First pass at the dk-series (2k-series) distance. Do not merge without
discussion. This is essentially the DegreeDivergence, but instead of the
degree distribution it's the distribution of edges between degree-labelled
nodes. Some outstanding questions and concerns:

This is not memory-efficient because it uses NxN dense matrices.
Have I understood the dk-series correctly? That is, does the dk2_series
function return something meaningful?
I'm not sure if the dk-series is defined for directed graphs. For simplicity
I have coerced to undirected graphs.

First pass at the dk-series (2k-series) distance. **Do not merge without discussion.** This is essentially the `DegreeDivergence`, but instead of the degree distribution it's the distribution of edges between degree-labelled nodes. Some outstanding questions and concerns: 1. This is not memory-efficient because it uses NxN dense matrices. 2. Have I understood the dk-series correctly? That is, does the `dk2_series` function return something meaningful? 3. I'm not sure if the dk-series is defined for directed graphs. For simplicity I have coerced to undirected graphs.

netrd/distance/__init__.py

leotrs · 2019-08-12T15:58:21Z

netrd/distance/dk2_distance.py

+
+        """
+
+        def dk2_series(G):


Being pedantic: can we put this outside the distance class? This is a prime candidate for refactoring if/when #174 ever becomes a thing.

netrd/distance/dk2_distance.py

leotrs · 2019-08-12T16:01:39Z

netrd/distance/dk2_distance.py

+        D1 = np.zeros((N, N))
+        D2 = np.zeros((N, N))
+
+        for (i, j), k in G1_dk.items():


Is it worth it to look into making all of these use sparse matrices? Pretty sure COO matrices would speed this up by a bunch. Even with small N, COO matrices would avoid this loop.

I knew you would call me out on this! I'll look into it. How would it avoid the loop, though?

Uuuuh I guess it wouldn't avoid it, but at least we would be off-loading it to scipy, and I trust them to profile their loops.

OK, that's fine. I think it should be pretty straightforward to change.

I wound up using DOK matrices instead because it seemed most convenient. I'm open to reconsidering this if people have strong opinions about sparse matrix formats.

leotrs · 2019-08-12T16:03:17Z

Awesome! Just curious: did you find a paper that uses this? If so, please add the reference at the top. If not, please add some general reference to dk series at the top.

(Curious how you decided to use JSD?)

sdmccabe · 2019-08-12T16:04:07Z

There's no paper (beside the dk-series papers), it's part of the graphend project.

sdmccabe · 2019-08-12T16:09:33Z

(Curious how you decided to use JSD?)

Gonna tag @jkbren in on this. He might also have thoughts on names.

sdmccabe · 2019-08-12T21:43:08Z

I'm cool with merging. The outstanding issue is the name. I see a couple possibilities here:

dk2
dk2Distance
dk2Series

The second is fine with me; the third might be better? I don't like the first.

sdmccabe · 2019-08-12T22:02:59Z

Other names raised:

dk2Divergence
JointDegreeDivergence

Another possibility is to make it a general dkSeries distance, where if k==1 call DegreeDivergence, if k==2 run this code, and for anything else raise a NotimplementedError.

leotrs · 2019-08-12T22:05:03Z

Let's do the last thing!

sdmccabe · 2019-08-12T22:06:17Z

Should we merge DegreeDivergence into this, or just call it? My instinct is the latter, since people unfamiliar with the dk series might be confused.

leotrs · 2019-08-12T22:12:51Z

Yes just call it.

…

On Mon, Aug 12, 2019 at 6:06 PM Stefan McCabe ***@***.***> wrote: Should we merge DegreeDivergence into this, or just call it? My instinct is the latter, since people unfamiliar with the dk series might be confused. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#243?email_source=notifications&email_token=AAILYAAMI34ONTCSESE5DATQEHNFTA5CNFSM4II7GIX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4D62MI#issuecomment-520613169>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAILYAAPTN325R3KHTZD5ETQEHNFTANCNFSM4II7GIXQ> .

-- www.leotrs.com | [email protected] PhD student at the Network Science Institute, Northeastern University

sdmccabe · 2019-08-12T22:32:05Z

Can someone double-check the docs to make sure it all still reads right under the newly expanded goal of the module? Also, I ran the tests with k==1 and k==2 as the default argument and both passed.

sdmccabe · 2019-08-13T01:50:23Z

If someone other than me ends up merging this, please squash and merge, this PR wound up being a ton of commits.

sdmccabe · 2019-08-13T13:52:42Z

Leaving for my own reference: I will try to add some doc tweaks to clarify that d==1 is explicitly calling DegreeDivergence under the hood, and write a test that covers both values of d (#202).

sdmccabe added 4 commits August 2, 2019 12:13

appease the autoformatter

8fd998c

store 2k-distribution objects in dk2-dist

7d087ba

add a few more clarifying comments about the dk2 logic

f9d345a

sdmccabe marked this pull request as ready for review August 12, 2019 15:52

sdmccabe requested review from leotrs and tlarock August 12, 2019 15:53

leotrs requested changes Aug 12, 2019

View reviewed changes

sdmccabe added 6 commits August 12, 2019 12:34

move dk2_series function out of class definition

7f2d5e3

use sparse matrices instead of dense matrices

ea29f04

more autoformatting

d10d3e5

correct doc error

6f074c2

allow arbitrary in dk2_series

ef4c55c

use coo instead of dok sparse matrix

9785e7d

leotrs approved these changes Aug 12, 2019

View reviewed changes

dk2 distance -> dk series

c7ca20e

sdmccabe changed the title ~~dk2-series distance~~ dk-series distance Aug 12, 2019

rename dk series in __all__

9066f14

leotrs merged commit 4d9c927 into netsiphd:master Aug 13, 2019

sdmccabe deleted the dk-dist branch August 13, 2019 13:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dk-series distance #243

dk-series distance #243

sdmccabe commented Aug 2, 2019

leotrs Aug 12, 2019

leotrs Aug 12, 2019

sdmccabe Aug 12, 2019

leotrs Aug 12, 2019

sdmccabe Aug 12, 2019

sdmccabe Aug 12, 2019

leotrs commented Aug 12, 2019 •

edited

Loading

sdmccabe commented Aug 12, 2019

sdmccabe commented Aug 12, 2019

sdmccabe commented Aug 12, 2019

sdmccabe commented Aug 12, 2019

leotrs commented Aug 12, 2019

sdmccabe commented Aug 12, 2019

leotrs commented Aug 12, 2019 via email

sdmccabe commented Aug 12, 2019

sdmccabe commented Aug 13, 2019

sdmccabe commented Aug 13, 2019

dk-series distance #243

dk-series distance #243

Conversation

sdmccabe commented Aug 2, 2019

leotrs Aug 12, 2019

Choose a reason for hiding this comment

leotrs Aug 12, 2019

Choose a reason for hiding this comment

sdmccabe Aug 12, 2019

Choose a reason for hiding this comment

leotrs Aug 12, 2019

Choose a reason for hiding this comment

sdmccabe Aug 12, 2019

Choose a reason for hiding this comment

sdmccabe Aug 12, 2019

Choose a reason for hiding this comment

leotrs commented Aug 12, 2019 • edited Loading

sdmccabe commented Aug 12, 2019

sdmccabe commented Aug 12, 2019

sdmccabe commented Aug 12, 2019

sdmccabe commented Aug 12, 2019

leotrs commented Aug 12, 2019

sdmccabe commented Aug 12, 2019

leotrs commented Aug 12, 2019 via email

sdmccabe commented Aug 12, 2019

sdmccabe commented Aug 13, 2019

sdmccabe commented Aug 13, 2019

leotrs commented Aug 12, 2019 •

edited

Loading