Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meaning of "directed", "undirected", and "bipartite" keywords/concepts. #409

Open
krivit opened this issue Nov 3, 2021 · 11 comments
Open

Comments

@krivit
Copy link
Member

krivit commented Nov 3, 2021

Right now, we use "directed" if a term can work on directed networks and analogously for undirected. On the other hand, we use "bipartite" for terms that only work on bipartite networks. For example,

  • absdiff: works for everything, has keywords "directed" and "undirected"
  • b1factor: works for bipartite undirected only, has keywords "bipartite" and "undirected"

This is not very consistent conceptually, and it also produces inconsistent search results:

  • Searching with keyword "undirected" returns all terms (including bipartite-only) suitable for undirected networks (e.g., both absdiff and b1factor).
  • Searching with keyword "bipartite" returns terms that work only on bipartite networks (e.g., b1factor but not absdiff).

The question is what should we do?

My sense is that the most common use cases would be something like:

  • List terms that work for a bipartite (undirected) network.
  • List terms that work for a unipartite undirected network.
  • List terms that work for a unipartite directed network.

Here are some ideas:

  1. We could accomplish this by declaring "bipartite" to be a third type of network. Then, absdiff would get all three keywords, whereas b1factor would get only "bipartite". The downside of this is that it's not technically correct and is not future-proof, if we ever decide to implement directed bipartite networks.
  2. We could keep the status quo but also implement some way of specifying logical expressions in the search. For example, ~undirected&!bipartite would include absdiff but not b1factor (i.e., terms that work for undirected unipartite networks), whereas ~undirected would include both (i.e., everything that works for bipartite undirected networks). However, this is cumbersome and counterintuitive.
  3. We could declare that "bipartite" should be used the way "directed" and "undirected" are (i.e., so that absdiff gets the keyword as well) and also implement some way of specifying logical expressions in the search. Then, ~bipartite would get absdiff and b1factor (i.e., terms that work for undirected bipartite networks), but so would ~undirected. We may want to the introduce a keyword "unipartite". (A term that supports both should have both keywords.)

Any thoughts?

@mbojan @CarterButts @martinamorris @drh20drh20 @sgoodreau @handcock

@CarterButts
Copy link

CarterButts commented Nov 3, 2021 via email

@krivit
Copy link
Member Author

krivit commented Nov 4, 2021

@CarterButts, in that case, do you have a preference between 2 and 3?

@krivit
Copy link
Member Author

krivit commented Nov 4, 2021

@CarterButts, or some fourth option?

@CarterButts
Copy link

Option two, I think. I admit that I was thinking about it not from a search angle, but an InitErgmTerm angle - how does a term decide if it is pleased with the network on which it has been called? If one can define a logical expression for the term that tests to TRUE in graphs for which the term is safe and FALSE otherwise, then that fixes it.

I've not kept up with the latest on what you and Joyce are doing vis a vis searching for terms, so I don't have as much of a sense of whether this would indeed be cumbersome. Depending on how you are handling this task, it might be possible to support simplified cases with special-case syntax. So (this isn't thought out, caveat emptor) you might have something like:

findErgmTerm(termstr, full=FALSE, bipartite=FALSE, directed=TRUE)

which does something like this:

  1. When full==TRUE, ignores all other terms and uses the logical expression in termstr on all terms in the way we are describing.
  2. When full==FALSE, performs a simple search that uses the supplied flags (here directed and bipartite) in a simplified mode that embodies "logical defaults" that follow typical use cases. So, e.g., we assume that the user doesn't need bipartite support unless bipartite==TRUE, etc. And we can further let NULL or NA be wildcards (admitting anything). We have to spell out whatever that syntax does in the docs, but the point would be that it doesn't have to cover all possible use cases...it just has to be fast/simple for the most common one. Users can always use the full logical interface if they want that power. Also, from an implementation standpoint, the simple cases can be implemented recursively by constructing a proper formula and then calling the function again with full==TRUE (passing the formula in question). That greatly reduces maintenance costs, since there is only ever one real mechanism, and the rest of the interface is cosmetic.

Again, that is neither deeply thought out, nor based on a full understanding of the implementations you have been cooking up - am writing en passant between tasks. So ignore if this is not helpful.

@krivit
Copy link
Member Author

krivit commented Jul 14, 2024

@martinamorris , @CarterButts , what if we introduced additional concepts, e.g., bipartite only, directed only, undirected only, etc.? Then, we could adjust our search functions accomplish everything. Someone would have to go through the docs and populate them.

@CarterButts
Copy link

Well, I think my comments above are probably still where I would fall. Seems that we need (1) a sensible language that lets us specify what conditions are needed for a term to function (and a term is allowed iff the conditions are satisfied), (2) a function that evaluates that against a network, and (3) a natural way to invoke it for both InitErgmTerm and ergmTerm? use cases. It might work by having elementary CONDITIONs, along with negators, and, and or. The obvious initial candidate CONDITIONs would be directed, bipartite, and valued. (One could then think about e.g. different kinds of edge values, measurement levels, or other exotica, if one wanted.) We could also have an any condition just to make it trivial to have a term that claims to be universal. If I want to then ask for terms from ergmTerm? that are only for directed bipartite networks, I'd pass directed & bipartite somewhere, and everything evaluating TRUE for that condition would be considered.

Is there a better way?

@martinamorris
Copy link
Member

martinamorris commented Jul 16, 2024 via email

@krivit
Copy link
Member Author

krivit commented Jul 16, 2024

On further thought, and in light of #571, I think the fundamental problem is that of data format. We are relying on help files for the term search (and that's a good thing), but we are not storing sufficient information to tell whether a term can work on a particular network, or at least not storing it consistently.

There are three types of networks we currently support: unipartite directed, unipartite undirected, and bipartite undirected. (Valued terms are a separate family, with a parallel classification.) Any given term can support any combination of these, so there are 3 bits of information we need to represent what the term does and doesn't support. (Actually, it's slightly less since a term that doesn't support any cases makes no sense, and, also, it is rare for a term to support directed and bipartite undirected but not unipartite undirected---but it does make sense for terms such as diff.)

We could, therefore, in principle, "encode" any term's support using only three keywords, but we have to use them in a very particular way: essentially, option 1 from what I described, so that both absdiff and b1cov have bipartite but only absdiff has undirected. I don't know how intuitive this is, though, hence my proposal for additional keywords, which could be processed internally by the search functions.

@martinamorris
Copy link
Member

There are three types of networks we currently support: unipartite directed, unipartite undirected, and bipartite undirected.

you're proposing 3 binary keywords: unipartite (Y/N), bipartite (Y/N), directed (Y/N) ?

or is there a value to having something like: partite (uni/bi/both), directed (Y/N/both)?

(Valued terms are a separate family, with a parallel classification.)

and we would use the same logic here?

@krivit
Copy link
Member Author

krivit commented Jul 18, 2024

There are three types of networks we currently support: unipartite directed, unipartite undirected, and bipartite undirected.

you're proposing 3 binary keywords: unipartite (Y/N), bipartite (Y/N), directed (Y/N) ?

This could be one solution, but it has the downside that it's using our current keywords but not the way they are used right now.

or is there a value to having something like: partite (uni/bi/both), directed (Y/N/both)?

Perhaps. We could have a pool of keywords, then have the search logic try to figure out what a given combination actually means.

(Valued terms are a separate family, with a parallel classification.)

and we would use the same logic here?

Whatever we do would transfer automatically.

@martinamorris
Copy link
Member

martinamorris commented Jul 20, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants