Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetchHenry() NA-padding for weekly / monthly granularity #265

Open
4 tasks
dylanbeaudette opened this issue Sep 13, 2022 · 6 comments
Open
4 tasks

fetchHenry() NA-padding for weekly / monthly granularity #265

dylanbeaudette opened this issue Sep 13, 2022 · 6 comments

Comments

@dylanbeaudette
Copy link
Member

TODO:

  • generalize .fill_missing_days() with new function and gran argument: days, weeks, months
  • adapt .formatDates() with new function
  • new argument to fetchHenry() for generic NA padding
  • updated docs and tutorial

Further research: https://stackoverflow.com/questions/22439540/how-to-get-week-numbers-from-dates

First approximation here.

.fillMissingGran <- function(x, gran) {
  
  ## TODO this doesn't account for leap-years
  # 366 days
  # 53 weeks
  
  # sequence of possible values
  g.vect <- switch(
    gran,
    'day' = 1:365,
    'week' = 1:52,
    'month' = 1:12
  )
  
  # column to use
  # week / month_numeric are missing
  g.col <- switch(
    gran,
    'day' = 'doy',
    'week' = 'week',
    'month' = 'month_numeric'
  )
  
  # format string
  g.fmt <- switch(
    gran,
    'day' = '%Y %j %H:%M',
    'week' = '%Y %W %H:%M',
    'month' = '%Y %m %H:%M'
  )
  
  
  # add time ID columns as-needed
  # doi is always present
  
  ## "week" not as simple as it seems
  # https://stackoverflow.com/questions/22439540/how-to-get-week-numbers-from-dates
  
  # week
  if(gran == 'week') {
    x$week <- as.integer(format(x$date_time, '%W'))
  }
  
  # month
  if(gran == 'month') {
    x$month_numeric <- as.integer(format(x$date_time, '%m'))
  }
  
  
  # ID missing time IDs
  missing <- which(is.na(match(g.vect, x[[g.col]])))
  
  # short-circuit
  if (length(missing) < 1) {
    return(x)
  }
  
  
  # make fake date-times for missing time IDs
  fake.datetimes <- paste0(x$year[1], ' ', missing, ' 00:00')
  
  # TODO: this will result in timezone specific to locale; 
  #  especially an issue when granularity is less than daily or for large extents
  fake.datetimes <- as.POSIXct(fake.datetimes, format = g.fmt)
  
  # generate DF with missing information
  fake.data <- data.frame(
    sid = x$sid[1],
    date_time = fake.datetimes, 
    year = x$year[1],
    doy = missing.days, 
    month = format(fake.datetimes, "%b")
  )
  
  fill.cols <- which(!colnames(x) %in% colnames(fake.data))
  if (length(fill.cols) > 0) {
    na.data <- as.data.frame(x)[, fill.cols, drop = FALSE][0,, drop = FALSE][1:nrow(fake.data),, drop = FALSE]
    fake.data <- cbind(fake.data, na.data)
  }
  
  # make datatypes for time match
  x$date_time <- as.POSIXct(x$date_time, format = "%Y-%m-%d %H:%M:%S")
  
  # splice in missing data
  y <- rbind(x, fake.data)
  
  # re-order by DOY and return
  return(y[order(y$doy), ])
}




# generate example data
w <- fetchHenry(project = 'CA790', gran = 'week', soiltemp.summaries = FALSE, pad.missing.days = TRUE)

x <- w$soiltemp[w$soiltemp$sid == 392 & w$soiltemp$year == '1998', ]

plot(x$date_time, x$sensor_value, type = 'p')

.fillMissingGran(x, gran = 'week')
@brownag
Copy link
Member

brownag commented Oct 2, 2022

A note to extend methods where possible so that they can work with other data sources e.g. SCAN, CDEC

@brownag
Copy link
Member

brownag commented Oct 4, 2022

Looks like we will also need to change the usage of base::as.POSIXct() format argument in soilDB:::.fill_missing_days() as it is breaking with R devel.

══ Failed tests ════════════════════════════════════════════════════════════════
── Error (test-fetchHenry.R:122:3): summarizeSoilTemperature() works as expected ──
Error in `.POSIXct(x, tz, ...)`: unused argument (format = "%Y-%m-%d %H:%M:%S")
Backtrace:
    ▆
 1. ├─soilDB:::.formatDates(x, gran = "day", pad.missing.days = TRUE) at test-fetchHenry.R:122:2
 2. │ ├─...[]
 3. │ └─data.table:::`[.data.table`(...)
 4. └─soilDB:::.fill_missing_days(.SD)
 5.   ├─base::as.POSIXct(x$date_time, format = "%Y-%m-%d %H:%M:%S")
 6.   └─base::as.POSIXct.default(x$date_time, format = "%Y-%m-%d %H:%M:%S")
── Error (test-fetchHenry.R:165:3): .fill_missing_days() works as expected ─────
Error in `.POSIXct(x, tz, ...)`: unused argument (format = "%Y-%m-%d %H:%M:%S")
Backtrace:
    ▆
 1. └─soilDB:::.fill_missing_days(x) at test-fetchHenry.R:165:2
 2.   ├─base::as.POSIXct(x$date_time, format = "%Y-%m-%d %H:%M:%S")
 3.   └─base::as.POSIXct.default(x$date_time, format = "%Y-%m-%d %H:%M:%S")

@dylanbeaudette
Copy link
Member Author

I'll try to take a look next week sometime, unless you have time before then. Can you tackle the POSIX thing?

brownag added a commit that referenced this issue Oct 4, 2022
@brownag
Copy link
Member

brownag commented Oct 4, 2022

I'll try to take a look next week sometime, unless you have time before then.

Take a look at this issue as a whole? I can probably take a crack at it this week sometime

Can you tackle the POSIX thing?

This is sorted w/ 6d4c02b as.Date() still takes format arg, so I converted character->Date explicitly with as.Date(..., format=) and then to POSIXct and we are good

@dylanbeaudette
Copy link
Member Author

I'll try to take a look next week sometime, unless you have time before then.

Take a look at this issue as a whole? I can probably take a crack at it this week sometime

Go for it if you have some time. I'm not going to have enough time this week.

Can you tackle the POSIX thing?

This is sorted w/ 6d4c02b as.Date() still takes format arg, so I converted character->Date explicitly with as.Date(..., format=) and then to POSIXct and we are good

Thanks, the as.Date( fix was news to me.

1 similar comment
@dylanbeaudette
Copy link
Member Author

I'll try to take a look next week sometime, unless you have time before then.

Take a look at this issue as a whole? I can probably take a crack at it this week sometime

Go for it if you have some time. I'm not going to have enough time this week.

Can you tackle the POSIX thing?

This is sorted w/ 6d4c02b as.Date() still takes format arg, so I converted character->Date explicitly with as.Date(..., format=) and then to POSIXct and we are good

Thanks, the as.Date( fix was news to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants