Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added permutations_array to Moran_Local class #345

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

martirenom
Copy link

This allows users to define a custom permutations array for the Moran_Local class to be used in crand. This is useful for custom applications. This feature was commented in issue #344.

We have decided to include a new argument to the function Moran_Local called "permutations_array", which is passed later to crand(). This was necessary instead of using "permutations" itself to avoid dealing with replacing entirely permutations as integer, which is used several other places in the code. If alternative, more elegant, solutions to our proposal exist, please feel free to propose them.

This allows users to define a custom permutations array for the Moran_Local class to be used in crand. This is useful for custom applications.
@sjsrey
Copy link
Member

sjsrey commented Aug 1, 2024

@martirenom thanks for this. Would you be able to add a test suite and documentation so that users have an understanding of the scope of this option and what the intended use is?

Added new tests for the permutations_array argument in Moran_Local and crand.
Fixed some test that were broken due to the new argument.
@martirenom
Copy link
Author

I have added a test for the new argument in Moran_Local and crand function, which allows now the use of a user-defirned permutations_array.

As for generating a documentation, I would appreciate if this is done by ESDA developers as have no clue how to do it. In a nutshell, what the new parameter allows is the input of a permutation array that is not totally random (as by default it is in Moran_Local). In our case, we needed to maintain certain linear correlations existing between points in the "map" as we work on genomics and point one is linked by polymer connection to point 2 and so on. Thus, we wanted to maintain this linearity when generating a null model to calculate the empirical p-values of the Moran_Local.

Thanks!

@sjsrey sjsrey requested a review from ljwolf August 4, 2024 23:22
@sjsrey sjsrey self-assigned this Aug 4, 2024
@ljwolf
Copy link
Member

ljwolf commented Aug 15, 2024

This is great functionality to have!

I would prefer overloading the permutations argument at the user level, since a permutation array specifies the number of permutations. So, we would just have a check the type of permutations in crand(), construct permutations_array (or assign), and set permutations to permuted_ids.shape[0].

I think this would be better because it allows any crand() powered class to use the functionality immediately, rather than having to add an explicit permutation_array argument to each statistic.

If I'm understanding the statistical theory right, I would add the following to document the method:

permutations should be specified as an array of indices if the user needs to specify additional structure onto the conditional permutation null hypothesis. Common reasons to do this include exchangeability violations, which might then require us to shuffle observations within (but not between) groups, or linearity constraints, which may require certain sequences of observation relationships to be preserved.

Am I understanding the theory correctly? I think the linearity constraint is a special type of an exchangeability violation... but an example of when this might be used would be super helpful. I can construct one for the group exchangeability case.

Copy link
Member

@ljwolf ljwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the overloading changes in #348, so other developers can see what that would look like.

If that is OK to other devs, we can push that onto martirenom:permutations_array, update this, and close #348.

@ljwolf
Copy link
Member

ljwolf commented Aug 15, 2024

Hmm... Because of the design of the simulator, we can't necessarily use this for block-wise shuffling, since the block constraint either (a) changes by observation or (b) remains the same for all observations, but refers to positions within each chunk. We could vary the y so that only one group is passed to each compute_chunk() call, but but that would be substantively different than this. So let's park that for now, and move forward only with allowing permutation arrays to be specified as here.

I'll keep pushing to clean this up for merging, but an example notebook where the linear constraints on the permutation array would be great!

@martirenom
Copy link
Author

The reason why we decided to pass an array with the shuffling was to make it user-defined. That is, the array is build by the user externally from esda... This makes it very flexible. We thought that was simpler and allowed versatile implementations.

@martirenom
Copy link
Author

@ljwolf Thanks for approving these changes. Let us know if there is anything necessary from our side in order for it to be merged into the main branch.

@martinfleis
Copy link
Member

The CI needs to pass before we can merge this. It seems that the changes here are failing at the moment, so it will need to be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants