Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

explicit split settings #487

Open
jreps opened this issue Oct 14, 2024 · 3 comments
Open

explicit split settings #487

jreps opened this issue Oct 14, 2024 · 3 comments

Comments

@jreps
Copy link
Collaborator

jreps commented Oct 14, 2024

At the moment we can split the data into train/test and folds by patientId, rowId or time.

It would be nice to have an explicit splitter where you can provide the rowIds for the test/train/folds. That way you can ensure the same split even with different features etc.

@jreps
Copy link
Collaborator Author

jreps commented Oct 15, 2024

Here is code that seems to work for me:

createExplicitSplitSetting <- function(
testRowIds,
trainRowIds,
trainFolds
){

splitSettings <- list(testRowIds = testRowIds,
trainRowIds = trainRowIds,
trainFolds = trainFolds
)

attr(splitSettings, "fun") <- "explicitSplitter"
class(splitSettings) <- "splitSettings"
return(splitSettings)
}

explicitSplitter <- function(
population,
splitSettings
) {
testRowIds = splitSettings$testRowIds
trainRowIds = splitSettings$trainRowIds
trainFolds = splitSettings$trainFolds

split <- data.frame(
rowId = c(testRowIds,trainRowIds),
index = c(rep(-1, length(testRowIds)), trainFolds)
)

return(split)
}

@egillax
Copy link
Collaborator

egillax commented Oct 17, 2024

Does the proposed code also allow for controlling the training folds ? Like if you need to ensure exactly the same split not only into train/test but as well each fold in train.

@egillax
Copy link
Collaborator

egillax commented Nov 19, 2024

I added a suggested feature for this in #504

egillax added a commit that referenced this issue Nov 21, 2024
Add existing splitSettings and tests. Solves #487
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants