Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: refresh F10.7 and ap data handling with partial data support #74

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

mananapr
Copy link

@mananapr mananapr commented Jan 6, 2025

Refresh F10.7 and ap data handling with partial data support and automatic updates

Closes #69

Copy link
Member

@greglucas greglucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you @mananapr!

I've got a few comments/suggestions, but happy to get this in!

pymsis/utils.py Outdated
@@ -207,7 +280,7 @@ def get_f107_ap(dates: npt.ArrayLike) -> tuple[npt.NDArray, npt.NDArray, npt.NDA
| prior to current time
"""
dates = np.asarray(dates, dtype=np.datetime64)
data = _DATA or _load_f107_ap_data()
data = _DATA or _load_f107_ap_data(dates[-1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a long running process with _DATA already loaded, then this won't update the data for us. I think the logic should probably be a new if block here instead after the original loading potentially?

Suggested change
data = _DATA or _load_f107_ap_data(dates[-1])
# Load/download data right away.
data = _DATA or _load_f107_ap_data()
if dates[-1] > data["dates"][-1]:
data = _new_function_to_download_recent_data()

pymsis/utils.py Outdated
fout.write(line.encode("utf-8"))
fout.seek(0)
arr = np.loadtxt(
fout, delimiter=",", dtype=dtype, usecols=usecols, skiprows=1
) # type: ignore

# Check if the file needs to be refreshed after parsing
if last_obs_date is not None:
file_mod_time = datetime.fromtimestamp(os.path.getmtime(_F107_AP_PATH))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need file modification time? I think we should just leave that out and only rely on the requested date being greater than the observations we have.

pymsis/utils.py Outdated
Comment on lines 158 to 160
if ",OBS," in line:
# Capture last observed date
last_obs_date = np.datetime64(line.split(",")[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get this from the _DATA numpy arrays? Maybe warn_data contains this information already and we could do something like (have not tried this) _DATA["dates"][~_DATA["warn_data"]][-1].

pymsis/utils.py Outdated
writer.writeheader()
writer.writerows(sorted_data)

os.remove(_PARTIAL_F107_AP_PATH)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be wary of this because if there is any download error above, this file will be left hanging around.

A few quick thoughts:

  • Do we need to save it to disk, or could you store it in a BytesIO object in memory instead?
  • Would it make sense to save _DATA arrays with np.savez() instead of the csv files for easier manipulation? (Then you wouldn't need to mess with the dictreader/writer.
  • Honestly, now that I'm thinking about it more, is this just more headache than it is worth and we should just use your logic below and re-download the full file for the simplest case. More bandwidth, but at least we are getting the full file built for us rather than building it from bits and pieces.

@mananapr
Copy link
Author

Hi @greglucas, I've pushed another commit based on your suggestions.

  • I have removed the partial download functionality because like you said, it is a lot of headache
  • I am getting the last observation date from the warnings data like you mentioned.
  • I am still keeping the check on file update date to prevent pymsis from re-downloading the SW file each time the program runs in case the end date is not present in the csv file

Let me know what you think.

Copy link
Member

@greglucas greglucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mananapr, this looks great to me. Just one minor suggestion.

pymsis/utils.py Outdated
@@ -208,6 +207,12 @@ def get_f107_ap(dates: npt.ArrayLike) -> tuple[npt.NDArray, npt.NDArray, npt.NDA
"""
dates = np.asarray(dates, dtype=np.datetime64)
data = _DATA or _load_f107_ap_data()
if dates[-1] > data["dates"][~np.repeat(data["warn_data"], 8)][-1]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone didn't sort their inputs/request, we should use a max here I think. Could you also add a quick comment about what this if statement is doing.

Suggested change
if dates[-1] > data["dates"][~np.repeat(data["warn_data"], 8)][-1]:
# If our requested data time is after the cached values we have, go and download a new file
# to refresh the local file cache
if dates.max() > data["dates"][~np.repeat(data["warn_data"], 8)][-1]:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @greglucas, I've commented the date comparison check

Manan and others added 2 commits January 24, 2025 18:16
Add tests to make sure the downloads are triggered appropriately.
This required updating the location to download to a separate temporary
directory for each test. Before we were overwriting the actual test file
and that caused issues in subsequent tests when going to load from that.

Change the condition to only refresh if the data being requested is
before the current time AND after the last date in the file.
@greglucas
Copy link
Member

@mananapr I pushed a commit up to your branch to try and fix up some of the failing tests. Unfortunately, it was more involved than I hoped, so there is a bit more work there than anticipated. It also apparently didn't fix the issue :-/ But it does work locally for me. Feel free to drop that commit if you don't like it and continue on in a different way to try and get the tests to pass too.

@mananapr
Copy link
Author

mananapr commented Feb 3, 2025

@mananapr I pushed a commit up to your branch to try and fix up some of the failing tests. Unfortunately, it was more involved than I hoped, so there is a bit more work there than anticipated. It also apparently didn't fix the issue :-/ But it does work locally for me. Feel free to drop that commit if you don't like it and continue on in a different way to try and get the tests to pass too.

Ah, I totally missed that. Will look into the changes, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Downloaded SW file is not refreshed if it is out of date
2 participants