Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace MUR opendap with cloud enabled #2

Open
btupper opened this issue Aug 10, 2023 · 20 comments
Open

replace MUR opendap with cloud enabled #2

btupper opened this issue Aug 10, 2023 · 20 comments

Comments

@btupper
Copy link
Member

btupper commented Aug 10, 2023

https://podaac.jpl.nasa.gov/dataset/MUR-JPL-L4-GLOB-v4.1?ids=&values=&search=MUR%20v4.1

@villesci
Copy link

Any updates on how to navigate to GHRSST dataset? No longer on opendap, as created by the mur_url function.

@btupper btupper changed the title replace MUR opendap with could enabled replace MUR opendap with cloud enabled Nov 14, 2024
@btupper
Copy link
Member Author

btupper commented Nov 14, 2024

Good question! We have been using the podaac-data-downloader appthat PODAAC provides. https://github.com/podaac/data-subscriber

It works well for us.

@btupper
Copy link
Member Author

btupper commented Nov 19, 2024

The data-subscriber test branch https://github.com/BigelowLab/ghrsst/tree/data-subscriber doesn't use opendap. Instead it uses the command line apps that PODAAC makes freely available. @villesci and @jevanilla might you give it a try? Installation instruction are on the data-subscriber branch README

@btupper
Copy link
Member Author

btupper commented Nov 21, 2024

I just added archiving and lightweight database utilities. I guess I'm done with it for now until I get some feedback.

@jevanilla
Copy link

Registered, installed and gave it a spin. Everything seems to be working for me. @btupper

@villesci
Copy link

villesci commented Nov 25, 2024

I'm having difficulty getting the podaac.downloader function to recognize my path - it seems to work fine with the PODAAC bulk data downloader in command line. Additionally, it's my understanding that this currently downloads the global extent for these dates with planned bbox functionality for future updates?

suppressPackageStartupMessages({
  library(rnaturalearth)
  library(dplyr)
  library(sf)
  library(ghrsst)
  library(stars)
})

start_date<-"2023-07-28"
end_date<-"2023-07-29"

ghrsst::set_root_path("C:/Users/drewv/OneDrive - USNH/Documents/0_People/GradStudents/DrewVilleneuve/ghrsst_data")

ok = ghrsst::podaac_downloader(start_date = as.Date(start_date), end_date = as.Date(end_date))

usage: PO.DAAC bulk-data downloader [-h] -c COLLECTION -d OUTPUTDIRECTORY
                                    [--cycle SEARCH_CYCLES] [-sd STARTDATE]
                                    [-ed ENDDATE] [-f] [-b BBOX] [-dc]
                                    [-dydoy] [-dymd] [-dy] [--offset OFFSET]
                                    [-e EXTENSIONS] [-gr GRANULENAME]
                                    [--process PROCESS_CMD] [--version]
                                    [--verbose] [-p PROVIDER] [--limit LIMIT]
                                    [--dry-run] [--subset]
PO.DAAC bulk-data downloader: error: unrecognized arguments: - USNH/Documents/0_People/GradStudents/DrewVilleneuve/ghrsst_data/tmp >> C:/Users/drewv/OneDrive - USNH/Documents/0_People/GradStudents/DrewVilleneuve/ghrsst_data/downloader-log

@btupper
Copy link
Member Author

btupper commented Nov 25, 2024

Ah, it's the space in that path C:/Users/drewv/OneDrive - USNH/Documents/0_People/GradStudents/DrewVilleneuve/ghrsst_data. I'll have to experiment with shell quoting the command string (well, at least the path portion). I don't have access to a windows machine, so it would be great if you are willing to be the tester. I'll post a message here when I have updated the package.

You are correct that underlying app, podaac-data-downloader downloads the entire dataset. According to the app author a --subset rgument is in the works that allows for downloading just a region of interest. That means for now podaac-data-downloader downloads the entire data set. We add R functionality (see read_podaac()) to read subsets of that locally downloaded file.

@btupper
Copy link
Member Author

btupper commented Nov 25, 2024

OK - now the the download path and log filename are single-quoted before passing to the command line, that seems to resolve it on macos.

I noticed that your initial end date would have resulted in a full year's worth of data. If you have the space for it is fine, but if not maybe a loop that downloads one (or a few), reads and saves the subset you want and then cleans up will be better for your needs?

@villesci
Copy link

villesci commented Nov 25, 2024

Hm still having problems even when I change the path to a folder in my R working directory with no spaces

> suppressPackageStartupMessages({
+   library(rnaturalearth)
+   library(dplyr)
+   library(sf)
+   library(ghrsst)
+   library(stars)
+ })
> start_date<-"2023-07-28"
> #end_date<-"2024-10-27"
> end_date<-"2023-07-29"
> path="tiles"
> ghrsst::make_path(path)
[1] TRUE
> ghrsst::set_root_path("tiles")
> ok = ghrsst::podaac_downloader(start_date = as.Date(start_date), end_date = as.Date(end_date))
usage: PO.DAAC bulk-data downloader [-h] -c COLLECTION -d OUTPUTDIRECTORY
                                    [--cycle SEARCH_CYCLES] [-sd STARTDATE]
                                    [-ed ENDDATE] [-f] [-b BBOX] [-dc]
                                    [-dydoy] [-dymd] [-dy] [--offset OFFSET]
                                    [-e EXTENSIONS] [-gr GRANULENAME]
                                    [--process PROCESS_CMD] [--version]
                                    [--verbose] [-p PROVIDER] [--limit LIMIT]
                                    [--dry-run] [--subset]
PO.DAAC bulk-data downloader: error: unrecognized arguments: >> ‘tiles/downloader-log’

And yes, subsetting each day's data in a loop is what I was intending to do once I realized the size of even the subsetted data from the PODAAC command line.

FYI, in the data-subscriber vignette, the github installation code is missing a double quote after BigelowLab/ghrsst. Should be:
remotes::install_github("BigelowLab/ghrsst", ref="data-subscriber")
Additionally, the make_path function is written as make_make in the same vignette.

@btupper
Copy link
Member Author

btupper commented Nov 25, 2024

Thanks for the catches! I often tell people that I am a professional typo-ist.

Let me try to replicate your method with "tiles"; at first glance it seems like it should work. Very puzzling.

@btupper
Copy link
Member Author

btupper commented Nov 25, 2024

This update forces the shell quoting to not use fancy quoting (R defaults to fancy quotes). Might you try again?

@villesci
Copy link

This must have to do with me working on a windows machine/command line? For reference, the program only works in command line when I double quote my path, like so:

C:\Users\drewv>podaac-data-downloader -c MUR-JPL-L4-GLOB-v4.1 -d "C:/Users/drewv/OneDrive - USNH/Documents/0_People/GradStudents/DrewVilleneuve/ghrsst_data" -sd 2023-07-28T00:00:00Z -ed 2023-07-29T23:59:59Z --verbose
Digging into the podaac_downloader code, squote applies single quotes around the paths

Thanks for all of your help! Would be good to see if this issue is replicated on another windows machine

@btupper
Copy link
Member Author

btupper commented Nov 26, 2024

Oh, so you tried single quoting ...

C:\Users\drewv>podaac-data-downloader -c MUR-JPL-L4-GLOB-v4.1 -d 'C:/Users/drewv/OneDrive - USNH/Documents/0_People/GradStudents/DrewVilleneuve/ghrsst_data' -sd 2023-07-28T00:00:00Z -ed 2023-07-29T23:59:59Z --verbose

and found no joy?

@btupper
Copy link
Member Author

btupper commented Nov 26, 2024

@jevanilla wonders if it might the foreward-vs-back slash separating path segments. What does R print for the command in the log?

@villesci
Copy link

villesci commented Nov 26, 2024

[2024-11-26 17:20:25] downloader: podaac-data-downloader -c MUR-JPL-L4-GLOB-v4.1 -d 'C:/Users/drewv/Documents/R/UNH/mada/tiles' -sd 2023-07-28T00:00:00Z -ed 2023-07-29T23:59:59Z --verbose >> 'C:/Users/drewv/Documents/R/UNH/mada/tiles/downloader-log'

Spaces don't seem an issue:

[2024-11-26 17:21:25] downloader: podaac-data-downloader -c MUR-JPL-L4-GLOB-v4.1 -d 'C:/Users/drewv/OneDrive - USNH/Documents/0_People/GradStudents/DrewVilleneuve/ghrsst_data' -sd 2023-07-28T00:00:00Z -ed 2023-07-29T23:59:59Z --verbose >> 'C:/Users/drewv/OneDrive - USNH/Documents/0_People/GradStudents/DrewVilleneuve/ghrsst_data'

When pasted in windows command line, neither option works, but once single quotes are replaced with double quotes, downloads are succesful.

@btupper
Copy link
Member Author

btupper commented Nov 27, 2024

OK, double quotes it is! I'll fix that.

I suspect that Johnathan is on to another issue. We are invoking a shell command using unix-y paths (with forward slash) on a window-y platform (back slash.)

Might you try the following and copy the output to this thread?

.Platform$OS.type

pathW = "C:\Users\drewv\OneDrive - USNH\Documents\0_People\GradStudents\DrewVilleneuve\ghrsst_data"
normalizePath(pathW, mustWork = FALSE)

pathU = "C:/Users/drewv/OneDrive - USNH/Documents/0_People/GradStudents/DrewVilleneuve/ghrsst_data"
normalizePath(pathU, mustWork = FALSE)

@villesci
Copy link

villesci commented Nov 27, 2024

Thanks for all the help so far! Here's the output in R console:

> .Platform$OS.type
[1] "windows"
> 
> pathW = "C:\Users\drewv\OneDrive - USNH\Documents\0_People\GradStudents\DrewVilleneuve\ghrsst_data"
Error: '\U' used without hex digits in character string (<input>:1:13)
> pathU = "C:/Users/drewv/OneDrive - USNH/Documents/0_People/GradStudents/DrewVilleneuve/ghrsst_data"
> normalizePath(pathU, mustWork = FALSE)
[1] "C:\\Users\\drewv\\OneDrive - USNH\\Documents\\0_People\\GradStudents\\DrewVilleneuve\\ghrsst_data"

@btupper
Copy link
Member Author

btupper commented Nov 27, 2024

I just pushed an update to the data-subscriber branch that adds double quotes (works on unix-y) and uses normalizePath() (works on unix-y). Fingers crossed with two stones we get one turkey.

@villesci
Copy link

villesci commented Nov 27, 2024

Almost! I figured out a solution thanks to chatgpt. The R console output giving this log entry fails:

  library(rnaturalearth)
  library(dplyr)
  library(sf)
  library(ghrsst)
  library(stars)


start_date<-"2023-07-28"
end_date<-"2023-07-29"

path="C:/Users/drewv/Documents/R/UNH/mada/tiles"
ghrsst::make_path(path)
ghrsst::set_root_path("C:/Users/drewv/Documents/R/UNH/mada/tiles")

ok = ghrsst::podaac_downloader(start_date = as.Date(start_date), end_date = as.Date(end_date))
> ok = ghrsst::podaac_downloader(start_date = as.Date(start_date), end_date = as.Date(end_date))
usage: PO.DAAC bulk-data downloader [-h] -c COLLECTION -d OUTPUTDIRECTORY
                                    [--cycle SEARCH_CYCLES] [-sd STARTDATE]
                                    [-ed ENDDATE] [-f] [-b BBOX] [-dc]
                                    [-dydoy] [-dymd] [-dy] [--offset OFFSET]
                                    [-e EXTENSIONS] [-gr GRANULENAME]
                                    [--process PROCESS_CMD] [--version]
                                    [--verbose] [-p PROVIDER] [--limit LIMIT]
                                    [--dry-run] [--subset]
PO.DAAC bulk-data downloader: error: unrecognized arguments: >> C:\Users\drewv\Documents\R\UNH\mada\tiles\downloader-log

With the new update, the log entry results in:

[2024-11-27 09:26:23] downloader: podaac-data-downloader -c MUR-JPL-L4-GLOB-v4.1 -d "C:\Users\drewv\Documents\R\UNH\mada\tiles" -sd 2023-07-28T00:00:00Z -ed 2023-07-29T23:59:59Z --verbose >> "C:\Users\drewv\Documents\R\UNH\mada\tiles\downloader-log" 

When I copy and paste the above in command line, downloads ok in the correct directory. So there must be some issue with how the function is sending the command to command prompt on windows.

Digging into the podaac_downloader function, the output of app is: [1] "podaac-data-downloader" and the output of 'args' is: 1] "-c MUR-JPL-L4-GLOB-v4.1 -d \"C:\\Users\\drewv\\Documents\\R\\UNH\\mada\\tiles\" -sd 2023-07-28T00:00:00Z -ed 2023-07-29T23:59:59Z --verbose >> \"C:\\Users\\drewv\\Documents\\R\\UNH\\mada\\tiles\\downloader-log\"". Failure occurs at: ok = system2(app, args)

SOLUTION: I asked chatgpt and it came up with this solution using shell, which does work!

command <- paste(
  "podaac-data-downloader",
  "-c MUR-JPL-L4-GLOB-v4.1",
  "-d \"C:\\Users\\drewv\\Documents\\R\\UNH\\mada\\tiles\"",
  "-sd 2023-07-28T00:00:00Z",
  "-ed 2023-07-29T23:59:59Z",
  "--verbose",
  ">> \"C:\\Users\\drewv\\Documents\\R\\UNH\\mada\\tiles\\downloader-log\""
)

# Execute using shell
ok <- shell(command, intern = TRUE)

It also suggested wrapping system2 if you need to keep that function for Unix:

command <- paste(
  "podaac-data-downloader",
  "-c MUR-JPL-L4-GLOB-v4.1",
  "-d \"C:\\Users\\drewv\\Documents\\R\\UNH\\mada\\tiles\"",
  "-sd 2023-07-28T00:00:00Z",
  "-ed 2023-07-29T23:59:59Z",
  "--verbose",
  ">> \"C:\\Users\\drewv\\Documents\\R\\UNH\\mada\\tiles\\downloader-log\""
)

# Pass the full shell command to cmd.exe using system2
ok <- system2("cmd.exe", c("/c", command), stdout = TRUE, stderr = TRUE)

So, maybe there can be a detect OS function that directs the command to be sent to command line via system2 on Unix and shell on windoes?

@btupper
Copy link
Member Author

btupper commented Nov 27, 2024

shell isn't built-in to R - or at least ?shell brings me to the help for system. Perhaps shell is an internal-to-R function (or windows OS function). I could switch to system, which would be ironic because system2 is documented with this nugget... "system2 is a more portable and flexible interface than system." I think system2 doesn't like the redirect >> of output. I'll switch to system

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants