Dealing with NA's on spacetime objects #53

rafalopespx · 2024-04-21T23:53:57Z

Hi Josiah, thank you for the amazing package! It helped me a lot!

I don't know if this is the right place for this kind of issue, but I think it worth asking.

So, I'm using a lot of function here to work an dataset that might have some time periods of a lot NA on the value variable that I'm interested to produce an Emerging hot spot analysis like here

I've been able to reproduce my error when running this piece of code with random NAs introduced at the value column of the bos spacetime cube.

gi_stars <- bos_nb |> 
  group_by(time_period) |> 
  mutate(gi_star = local_gstar_perm(value, nb, wt)) |> 
  tidyr::unnest(gi_star)


gi_stars |> 
  select(.region_id, time_period, gi_star, p_folded_sim)

and the error as follow:

Error in mutate():
ℹ In argument: gi_star = local_gstar_perm(value, nb, wt).
ℹ In group 1: time_period = 2010-01-01.
Caused by error in card():
! not a neighbours list
Run rlang::last_trace() to see where the error occurred.

I think the issue here is that as I have NA at the value column this might have introduced some neighbours that aren't any more neighbouring other places, but maybe I'm wrong. I tried alternatives and workaorunds like filtering the bos spacetime before giving to the Gi* pipe and using queen = T and/or allow_zero = T, at st_conguity and st_weights, respectively, But none of this haven't worked, the error returned is the same.

Thanks in advance for any hlep or insights Josiah!

Here I give you my whole pipeline adapated from the vignette:

### Create objects to store file paths 
df_fp <- system.file("extdata", "bos-ecometric.csv", package = "sfdep")
geo_fp <- system.file("extdata", "bos-ecometric.geojson", package = "sfdep")

### read in data 
df <- readr::read_csv(df_fp, col_types = "ccidD")
df <- df |> 
  mutate(value = ifelse(rbinom(n(), 1, 0.1), NA, value))
geo <- sf::st_read(geo_fp)

bos_expanded <- expand.grid(time_period = unique(df$time_period), 
                            .region_id = unique(geo$.region_id))

bos <- bos_expanded |> 
  left_join(df)

### Create spacetime object called `bos`
bos <- spacetime(bos, geo, 
                 .loc_col = ".region_id",
                 .time_col = "time_period")

bos

bos_nb <- bos |> 
  activate("geometry") |> 
  mutate(
    nb = include_self(st_contiguity(geometry, queen = T)),
    wt = st_weights(nb, allow_zero = T, zero.policy = T)
  ) |> 
  set_nbs("nb") |> 
  set_wts("wt")

head(bos_nb)

gi_stars <- bos_nb |> 
  filter(!is.na(value)) |> 
  group_by(time_period) |> 
  mutate(gi_star = local_gstar_perm(value, nb, wt)) |> 
  tidyr::unnest(gi_star)

gi_stars |> 
  select(.region_id, time_period, gi_star, p_folded_sim)

### conduct EHSA
ehsa <- emerging_hotspot_analysis(
  x = bos, 
  .var = "value", 
  k = 1, 
  nsim = 99
)

### preview some values 
head(ehsa)

### evaluate the classifications of our hotspots
count(ehsa, classification)

The text was updated successfully, but these errors were encountered:

JosiahParry · 2024-04-22T12:47:08Z

Thanks, @rafalopespx! In the future, would you mind wrapping the R code in backticks like this? It would make this a lot more easy to read!

```r
# R code here
```

First and foremost, you cannot have NA values when calculating the local Gi. If you want to use emerging_hotspot_analysis() you will need to impute those values. Or if you want to do something that is not exactly emerging hotspot analysis but quite close (like the vignette) you will need to filter those locations out, then calculate the neighbors based on each time period like so:

library(sfdep)
library(dplyr)

df_fp <- system.file("extdata", "bos-ecometric.csv", package = "sfdep")
geo_fp <- system.file("extdata", "bos-ecometric.geojson", package = "sfdep")

df <- readr::read_csv(df_fp, col_types = "ccidD")
df <- df |>
mutate(value = ifelse(rbinom(n(), 1, 0.1), NA, value))
geo <- sf::st_read(geo_fp)

bos_expanded <- expand.grid(time_period = unique(df$time_period),
.region_id = unique(geo$.region_id))

bos <- bos_expanded |>
  left_join(df)

bos |> 
  filter(!is.na(value)) |> 
  left_join(geo) |> 
  group_by(time_period) |> 
  mutate(
    nb = list(st_contiguity(geometry)),
    wt = st_weights(nb[[1]], allow_zero = TRUE),
    g = local_g_perm(value, nb[[1]], wt)
  ) |> 
    arrange(time_period)-> tmp

tmp |> 
  tidyr::unnest(g)

rafalopespx · 2024-04-22T14:10:08Z

Amazing Josiah! It worked, do you have any tips to speed up the process? My dataset is 7665 polygons by 628 time-periods

Edit: One more thing, do you think which way is better? Imputing values, I would do that imputing 0 to then, as I need to have a sense that the values are not significant or calculate the neighbors based on each for each period?

Edit2: I might realizing this now, I just can't produce an emerging_hotspot_analysis once I got NA on some of the slices of the spacetime cube, right? There is no workaround to this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dealing with NA's on spacetime objects #53

Dealing with NA's on spacetime objects #53

rafalopespx commented Apr 21, 2024 •

edited

Loading

JosiahParry commented Apr 22, 2024

rafalopespx commented Apr 22, 2024 •

edited

Loading

Dealing with NA's on spacetime objects #53

Dealing with NA's on spacetime objects #53

Comments

rafalopespx commented Apr 21, 2024 • edited Loading

JosiahParry commented Apr 22, 2024

rafalopespx commented Apr 22, 2024 • edited Loading

rafalopespx commented Apr 21, 2024 •

edited

Loading

rafalopespx commented Apr 22, 2024 •

edited

Loading