Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with NA's on spacetime objects #53

Open
rafalopespx opened this issue Apr 21, 2024 · 2 comments
Open

Dealing with NA's on spacetime objects #53

rafalopespx opened this issue Apr 21, 2024 · 2 comments

Comments

@rafalopespx
Copy link

rafalopespx commented Apr 21, 2024

Hi Josiah, thank you for the amazing package! It helped me a lot!

I don't know if this is the right place for this kind of issue, but I think it worth asking.

So, I'm using a lot of function here to work an dataset that might have some time periods of a lot NA on the value variable that I'm interested to produce an Emerging hot spot analysis like here

I've been able to reproduce my error when running this piece of code with random NAs introduced at the value column of the bos spacetime cube.

gi_stars <- bos_nb |> 
  group_by(time_period) |> 
  mutate(gi_star = local_gstar_perm(value, nb, wt)) |> 
  tidyr::unnest(gi_star)


gi_stars |> 
  select(.region_id, time_period, gi_star, p_folded_sim)

and the error as follow:

Error in mutate():
ℹ In argument: gi_star = local_gstar_perm(value, nb, wt).
ℹ In group 1: time_period = 2010-01-01.
Caused by error in card():
! not a neighbours list
Run rlang::last_trace() to see where the error occurred.

I think the issue here is that as I have NA at the value column this might have introduced some neighbours that aren't any more neighbouring other places, but maybe I'm wrong. I tried alternatives and workaorunds like filtering the bos spacetime before giving to the Gi* pipe and using queen = T and/or allow_zero = T, at st_conguity and st_weights, respectively, But none of this haven't worked, the error returned is the same.

Thanks in advance for any hlep or insights Josiah!

Here I give you my whole pipeline adapated from the vignette:

### Create objects to store file paths 
df_fp <- system.file("extdata", "bos-ecometric.csv", package = "sfdep")
geo_fp <- system.file("extdata", "bos-ecometric.geojson", package = "sfdep")

### read in data 
df <- readr::read_csv(df_fp, col_types = "ccidD")
df <- df |> 
  mutate(value = ifelse(rbinom(n(), 1, 0.1), NA, value))
geo <- sf::st_read(geo_fp)

bos_expanded <- expand.grid(time_period = unique(df$time_period), 
                            .region_id = unique(geo$.region_id))

bos <- bos_expanded |> 
  left_join(df)

### Create spacetime object called `bos`
bos <- spacetime(bos, geo, 
                 .loc_col = ".region_id",
                 .time_col = "time_period")

bos

bos_nb <- bos |> 
  activate("geometry") |> 
  mutate(
    nb = include_self(st_contiguity(geometry, queen = T)),
    wt = st_weights(nb, allow_zero = T, zero.policy = T)
  ) |> 
  set_nbs("nb") |> 
  set_wts("wt")

head(bos_nb)

gi_stars <- bos_nb |> 
  filter(!is.na(value)) |> 
  group_by(time_period) |> 
  mutate(gi_star = local_gstar_perm(value, nb, wt)) |> 
  tidyr::unnest(gi_star)

gi_stars |> 
  select(.region_id, time_period, gi_star, p_folded_sim)

### conduct EHSA
ehsa <- emerging_hotspot_analysis(
  x = bos, 
  .var = "value", 
  k = 1, 
  nsim = 99
)

### preview some values 
head(ehsa)

### evaluate the classifications of our hotspots
count(ehsa, classification)
@JosiahParry
Copy link
Owner

Thanks, @rafalopespx! In the future, would you mind wrapping the R code in backticks like this? It would make this a lot more easy to read!

```r
# R code here
```

First and foremost, you cannot have NA values when calculating the local Gi. If you want to use emerging_hotspot_analysis() you will need to impute those values. Or if you want to do something that is not exactly emerging hotspot analysis but quite close (like the vignette) you will need to filter those locations out, then calculate the neighbors based on each time period like so:

library(sfdep)
library(dplyr)

df_fp <- system.file("extdata", "bos-ecometric.csv", package = "sfdep")
geo_fp <- system.file("extdata", "bos-ecometric.geojson", package = "sfdep")

df <- readr::read_csv(df_fp, col_types = "ccidD")
df <- df |>
mutate(value = ifelse(rbinom(n(), 1, 0.1), NA, value))
geo <- sf::st_read(geo_fp)

bos_expanded <- expand.grid(time_period = unique(df$time_period),
.region_id = unique(geo$.region_id))

bos <- bos_expanded |>
  left_join(df)

bos |> 
  filter(!is.na(value)) |> 
  left_join(geo) |> 
  group_by(time_period) |> 
  mutate(
    nb = list(st_contiguity(geometry)),
    wt = st_weights(nb[[1]], allow_zero = TRUE),
    g = local_g_perm(value, nb[[1]], wt)
  ) |> 
    arrange(time_period)-> tmp

tmp |> 
  tidyr::unnest(g)

@rafalopespx
Copy link
Author

rafalopespx commented Apr 22, 2024

Amazing Josiah! It worked, do you have any tips to speed up the process? My dataset is 7665 polygons by 628 time-periods

Edit: One more thing, do you think which way is better? Imputing values, I would do that imputing 0 to then, as I need to have a sense that the values are not significant or calculate the neighbors based on each for each period?

Edit2: I might realizing this now, I just can't produce an emerging_hotspot_analysis once I got NA on some of the slices of the spacetime cube, right? There is no workaround to this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants