Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential issues with DataMask or dplyr_lazy_vec_chop_impl and R-CMD-check failures (but only on Mac OS)?? #6961

Closed
samuel-marsh opened this issue Nov 9, 2023 · 2 comments

Comments

@samuel-marsh
Copy link

Hi dplyr Team,

Hope you might be able to help and I'll explain why I think dplyr and DataMask or dplyr_lazy_vec_chop_impl might source of errors as multiple errors seem to be tracing back to DataMask/dplyr_lazy_vec_chop_impl but need to give you some background. Weirdly the issue is only occurring on macos checks and I'm wondering if this is bug, issue with how I'm using dplyr in code, GitHubActions issue, or something else?

In short, I having an issue with GithubActions checks for newest release of my CRAN package scCustomize (The PR with the GitHubAction checks can be found here samuel-marsh/scCustomize#141). The issues are coming from errors in running function examples and the error is only occurring on macos check (ubuntu and windows pass with no issues). I originally made a posit post on this see here, but after some more debugging I'm thinking that something might be either wrong in my code or a bug in dplyr (again no idea why it is only occurring on one platform but maybe I'm missing something) or something else?

The error first occurred with function that has been present in package prior to this version and previous passed checks. The full function can be found here but the relevant portion I believe is this piece of code:

# Explantation of variables:
    # meta_df is data.frame that the function creates in code prior to this segment with one row per measurement and the measurements belong to X number of groups.
    # sample_name is user supplied variable that is the name of column in meta_df that defines the groups

sample_meta_df <- meta_df %>%
    grouped_df(vars = sample_name) %>%
    slice(1)

When GitHubActions check runs it aborts with this error:

* checking examples with --run-donttest ...sh: line 1:  3803 Segmentation fault: 11  LANGUAGE=en _R_CHECK_INTERNALS2_=1 R_LIBS=/var/folders/3s/vfzpb5r51gs6y328rmlgzm7c0000gn/T//RtmpL8bovr/RLIBS_a0e38ba220b R_ENVIRON_USER='' R_LIBS_USER='NULL' R_LIBS_SITE='NULL' '/Library/Frameworks/R.framework/Resources/bin/R' --vanilla > 'scCustomize-Ex.Rout' 2>&1 < 'scCustomize-Ex.R'
 [39s/40s] ERROR
Running examples in ‘scCustomize-Ex.R’ failed
The error most likely occurred in:
> base::assign(".ptime", proc.time(), pos = "CheckExEnv")
> ### Name: Extract_Sample_Meta
> ### Title: Extract sample level meta.data
> ### Aliases: Extract_Sample_Meta
> 
> ### ** Examples
> 
> library(Seurat)
> pbmc_small[["batch"]] <- sample(c("batch1", "batch2"), size = ncol(pbmc_small), replace = TRUE)
> 
> sample_meta <- Extract_Sample_Meta(object = pbmc_small, sample_name = "orig.ident")
 *** caught segfault ***
address 0x0, cause 'unknown'
Traceback:
 1: .Call(dplyr_lazy_vec_chop_impl, data, rows, private$env_current_group_info,     private$grouped, private$rowwise)
 2: initialize(...)
 3: DataMask$new(data, by, "slice", error_call = error_call)
 4: slice_rows(.data, dots, by)
 5: slice.data.frame(., 1)
 6: slice(., 1)
 7: meta_df %>% grouped_df(vars = sample_name) %>% slice(1)
 8: Extract_Sample_Meta(object = pbmc_small, sample_name = "orig.ident")
An irrecoverable exception occurred. R is aborting now ...

For the sake of trying to quickly solve issue while I waited to see if anyone responded to posit post I simply added dontrun to that example and pushed update. However, the macos check again errored when checking function from another function (see here for full function but again I think relevant code section is this:

# Explanation of variables:
    # split.by is user supplied variable that is name of data.frame column which specifies sample grouping
    # cor_data is three column data.frame extracted from larger data.frame with columns: "nCount_RNA", "nFeature_RNA", split.by
    # meta_sample_list is character vector of the unique values found in the data.frame column named split.by

    cor_data <- FetchData(object = seurat_object, vars = c("nCount_RNA", "nFeature_RNA", split.by))

    cor_values <- lapply(1:length(x = meta_sample_list), function(i) {
      cor_data_filtered <- cor_data %>%
        filter(.data[[split.by]] == meta_sample_list[[i]])
      round(x = cor(x = cor_data_filtered[, "nCount_RNA"], y = cor_data_filtered[, "nFeature_RNA"]), digits = 2)
    })

The error that is returned is:

* checking examples with --run-donttest ...sh: line 1:  3719 Segmentation fault: 11  LANGUAGE=en _R_CHECK_INTERNALS2_=1 R_LIBS=/var/folders/3s/vfzpb5r51gs6y328rmlgzm7c0000gn/T//RtmpYZnliT/RLIBS_98166930af0 R_ENVIRON_USER='' R_LIBS_USER='NULL' R_LIBS_SITE='NULL' '/Library/Frameworks/R.framework/Resources/bin/R' --vanilla > 'scCustomize-Ex.Rout' 2>&1 < 'scCustomize-Ex.R'
 [22s/23s] ERROR
Running examples in ‘scCustomize-Ex.R’ failed
The error most likely occurred in:
  > base::assign(".ptime", proc.time(), pos = "CheckExEnv")
  > ### Name: FeatureScatter_scCustom
  > ### Title: Modified version of FeatureScatter
  > ### Aliases: FeatureScatter_scCustom
  > 
  > ### ** Examples
  > 
  > ## No test: 
  > library(Seurat)
  > pbmc_small$sample_id <- sample(c("sample1", "sample2"), size = ncol(pbmc_small), replace = TRUE)
  > 
  > FeatureScatter_scCustom(seurat_object = pbmc_small, feature1 = "nCount_RNA",
  + feature2 = "nFeature_RNA", split.by = "sample_id")
  
  NOTE: FeatureScatter_scCustom returns split plots as layout of all plots each
  with their own axes as opposed to Seurat which returns with shared x or y axis.
  To return to Seurat behvaior set `split_seurat = TRUE`.
  
  -----This message will be shown once per session.-----
 

 *** caught segfault ***
address 0x0, cause 'unknown'
Traceback:
 1: .Call(dplyr_lazy_vec_chop_impl, data, rows, private$env_current_group_info,     private$grouped, private$rowwise)
 2: initialize(...)
 3: DataMask$new(data, by, "filter", error_call = error_call)
 4: filter_rows(.data, dots, by)
 5: filter.data.frame(., .data[[split.by]] == meta_sample_list[[i]])
 6: filter(., .data[[split.by]] == meta_sample_list[[i]])
 7: cor_data %>% filter(.data[[split.by]] == meta_sample_list[[i]])
 8: FUN(X[[i]], ...)
 9: lapply(1:length(x = meta_sample_list), function(i) {    cor_data_filtered <- cor_data %>% filter(.data[[split.by]] ==         meta_sample_list[[i]])    round(x = cor(x = cor_data_filtered[, "nCount_RNA"], y = cor_data_filtered[,         "nFeature_RNA"]), digits = 2)})
10: scCustomze_Split_FeatureScatter(seurat_object = seurat_object,     feature1 = feature1, feature2 = feature2, split.by = split.by,     group.by = group.by, colors_use = colors_use, pt.size = pt.size,     aspect_ratio = aspect_ratio, title_size = title_size, num_columns = num_columns,     raster = raster, raster.dpi = raster.dpi, ggplot_default_colors = ggplot_default_colors,     color_seed = color_seed, ...)
11: FeatureScatter_scCustom(seurat_object = pbmc_small, feature1 = "nCount_RNA",     feature2 = "nFeature_RNA", split.by = "sample_id")
An irrecoverable exception occurred. R is aborting now ...

So again you can see the error gets traced backed to DataMask and dplyr_lazy_vec_chop_impl just as it did with the other error (except this time it's with filter instead of slice.

Hope you might be able to shed some light on the issue? In terms of reproducible example it's obviously tricky but If it helps I can provide binary build of the development version of package for testing on your end.

Thanks very much!!
Sam

@samuel-marsh
Copy link
Author

As an addition, running devtools::check locally on my Mac with R4.3.1 has no issues.

Also wasn’t immediately sure whether this was r-lib bug (hence posit post) and if you think this isn’t dplyr and would be better to post in r-lib issues please let me know and I can move this post there.

Thanks!!
Sam

@samuel-marsh
Copy link
Author

Hi,

Closing issue as further tracing/debugging showing it’s not dplyr but something else weird going on. Still hunting root cause.

Best,
Sam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant