Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define "feature groups" for training #9

Open
tnigon opened this issue Apr 16, 2020 · 1 comment
Open

Define "feature groups" for training #9

tnigon opened this issue Apr 16, 2020 · 1 comment

Comments

@tnigon
Copy link
Contributor

tnigon commented Apr 16, 2020

Background

When training a model to predict some y, we have to choose which features to include in X. This can be time consuming, and tends to be temporary (at least during development/evaluation of models) because new features are constantly becoming available (e.g., DAE, N applied thus far, etc.).

To do

Determine which "feature groups" (for lack of a better term?) should be used for our base training scenarios. Include a name fore each group, as well as the features included (must be specific).

For example, "base" might include:

  1. DAE
  2. N applied thus far
  3. wx 1 (AVG max temp 0-20 DAE)
  4. wx 2 (AVG max temp 21-40 DAE)
  5. wx 3 (AVG weekly water (rainfall + irr) since planting)
    ...etc.

"cropscan" might include all of "base" plus:

  1. 550 nm reflectance
  2. 680 nm ref
  3. 710 nm ref
    ...etc.

"sentinel" might include all of "base" plus the Sentinel bands

Keep in mind, we have to eventually be able to access any data we decide to use for each group. It does not have to be available now, but we have to have a somewhat clear idea how we're going to store the data in a table/database. Similar to getting DAE for small plot studies, a function will be developed to pull all the relevant data from these "feature groups" to include in the appropriate X matrices that are input for model training.

This must be dynamic because we don't want to have to keep updating every single X matrix anytime new data are available. Instead, we just want to check the database and pull everything that is available. If anything is null, we probably want to exclude that observation rather than remove the entire feature (if we want to exclude a feature, we should create a new "feature group" that excludes that feature).

@tnigon
Copy link
Contributor Author

tnigon commented Apr 16, 2020

The "feature groups" determined as a result of this issue will be the basis for new issues that will be created to create the actual functions for making all the data usable (so don't worry about details now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants