Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between corPlot() and cor() with missing values. #375

Open
CamiloMontesM opened this issue Jan 18, 2024 · 2 comments
Open

Difference between corPlot() and cor() with missing values. #375

CamiloMontesM opened this issue Jan 18, 2024 · 2 comments
Labels
question Questions about function use or interpretation

Comments

@CamiloMontesM
Copy link

Question

Hello, I am comparing 2 correlation methods in R: corPlot and cor (from stats package). The results of the correlation coefficients between corPlot and cor are identical when in the cor function I use use = "pairwise.complete.obs".

However, when using use = "na.or.complete" in the cor function, the results are completely different in some cases. In the example below, the pair C1 and E4 have no missing values and yet the correlation coefficient using use = "na.or.complete" in cor function, is different from that delivered by corPlot.

Why does corPlot use the "pairwise.complete.obs" method, instead of other methods to take care of missing values?

library(openair)

data <- tibble::tribble(
  ~Inicio.muestreo, ~Parámetro, ~E4, ~E3, ~E2, ~E1, ~E10, ~E9, ~E8, ~E7, ~E6, ~E5, ~O1, ~C1, ~C2, ~C3, ~GNL1, ~GNL2,
      "2023-05-25",  "Benceno",   1.87,    2.9,   4.18,   3.93,    1.82,   2.18,   1.43,   1.34,   1.06,   1.18,   2.79,   3.08,   4.46,   2.25,      1.3,     1.15,
      "2023-06-22",  "Benceno",    0.5,   0.81,   0.63,    0.6,    1.46,   1.26,   0.81,   0.78,   0.49,   0.61,     NA,   2.58,   2.53,   1.21,      0.4,       NA,
      "2023-04-13",  "Benceno",   0.61,   0.71,   1.41,    0.8,    1.14,   0.83,   0.88,   0.86,   0.56,   0.52,   0.68,   3.47,    2.2,   0.94,     0.43,     0.49,
      "2023-06-08",  "Benceno",   1.08,   1.04,   1.09,   1.11,    1.49,   1.42,   1.12,   1.19,   0.85,   0.89,   1.01,   2.86,   3.58,   1.66,     0.82,     0.92,
      "2023-05-10",  "Benceno",   1.91,   3.04,   3.03,   2.76,    1.59,   1.55,   1.43,   1.32,   1.12,   1.21,   2.34,   3.95,   3.38,   1.83,     1.71,     1.62,
      "2023-04-27",  "Benceno",   0.85,   0.99,   1.38,   1.25,    1.65,   1.52,   1.12,   1.07,    0.8,   0.76,   0.89,   4.01,   3.18,   2.01,     0.76,     0.69,
      "2023-07-06",  "Benceno",   1.16,   1.31,   1.51,   1.49,    1.74,   1.62,   1.37,   1.15,   0.95,   1.01,   1.17,   2.72,     NA,     NA,     1.05,     0.99,
      "2023-07-20",  "Benceno",    0.9,   1.01,   0.93,   1.11,    2.07,   1.45,   1.15,   0.97,   0.65,   0.66,   0.86,   2.78,   2.41,   1.33,     0.99,     0.86,
      "2023-08-17",  "Benceno",   0.57,   0.48,   0.62,   0.86,    0.77,   0.79,    0.7,   0.58,   0.47,   0.56,   0.73,   1.48,   1.29,   0.77,     0.56,     0.58,
      "2023-08-31",  "Benceno",    0.6,   0.59,   0.65,   1.06,    1.34,   1.42,   1.05,   0.66,   0.58,    0.6,   0.91,   2.41,   2.67,   1.33,     0.57,     0.74,
      "2023-09-14",  "Benceno",   0.58,   0.61,   0.74,   1.15,    0.87,   0.77,   0.72,   0.64,   0.49,   0.41,   0.81,   1.59,   1.16,   0.73,     0.56,     0.59,
      "2023-09-28",  "Benceno",   0.57,   0.66,   1.12,    0.5,    1.47,   0.87,   0.67,   0.72,   0.61,   0.63,   0.87,   2.16,   1.36,   0.68,     0.57,     0.67,
      "2023-10-12",  "Benceno",   0.67,   0.87,   1.78,   0.96,    1.12,   0.77,   0.82,   0.75,   0.55,   0.71,   0.92,   3.81,   1.23,   0.66,     0.51,     0.54,
      "2023-10-25",  "Benceno",   0.92,   0.47,   0.49,    0.5,    1.12,   1.02,    0.7,   0.54,   0.37,   0.38,   0.57,   3.59,   1.14,   0.59,     0.43,      0.3
  )

corPlot(data[,3:18],
        lower = TRUE,
        method = "spearman",
        dendrogram = TRUE,
        cluster = TRUE)

cor(data[,3:18],
    method = "spearman",
    use = "pairwise.complete.obs") # same to openair

cor_b <- cor(data[,3:18],
    method = "spearman",
    use = "na.or.complete") # different to openair

Thank you very much.

Camilo Montes.

@CamiloMontesM CamiloMontesM added the question Questions about function use or interpretation label Jan 18, 2024
@davidcarslaw
Copy link
Collaborator

Hi Camilo

Thanks for this question and the good example. The question is a good one and ultimately depends on the nature of the data being looked at. Originally it was thought that "pairwise.complete.obs" would be a sensible default. However, the function really ought to give the user the option of changing this important parameter. I can see there would be some situations where "pairwise.complete.obs" would not be appropriate and could be potentially misleading.

I will amend the function to explicitly pass on that argument and make it clearer what the default is. This would not take long and will appear on GitHub before a new release to CRAN...

Thanks again for your message.

David

@CamiloMontesM
Copy link
Author

Thank you very much for your quick reply, it is very clear to me and I will wait for a new version of the function.

Best regards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Questions about function use or interpretation
Projects
None yet
Development

No branches or pull requests

2 participants