You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When training a model to predict some y, we have to choose which features to include in X. This can be time consuming, and tends to be temporary (at least during development/evaluation of models) because new features are constantly becoming available (e.g., DAE, N applied thus far, etc.).
To do
Determine which "feature groups" (for lack of a better term?) should be used for our base training scenarios. Include a name fore each group, as well as the features included (must be specific).
For example, "base" might include:
DAE
N applied thus far
wx 1 (AVG max temp 0-20 DAE)
wx 2 (AVG max temp 21-40 DAE)
wx 3 (AVG weekly water (rainfall + irr) since planting)
...etc.
"cropscan" might include all of "base" plus:
550 nm reflectance
680 nm ref
710 nm ref
...etc.
"sentinel" might include all of "base" plus the Sentinel bands
Keep in mind, we have to eventually be able to access any data we decide to use for each group. It does not have to be available now, but we have to have a somewhat clear idea how we're going to store the data in a table/database. Similar to getting DAE for small plot studies, a function will be developed to pull all the relevant data from these "feature groups" to include in the appropriate X matrices that are input for model training.
This must be dynamic because we don't want to have to keep updating every single X matrix anytime new data are available. Instead, we just want to check the database and pull everything that is available. If anything is null, we probably want to exclude that observation rather than remove the entire feature (if we want to exclude a feature, we should create a new "feature group" that excludes that feature).
The text was updated successfully, but these errors were encountered:
The "feature groups" determined as a result of this issue will be the basis for new issues that will be created to create the actual functions for making all the data usable (so don't worry about details now).
Background
When training a model to predict some y, we have to choose which features to include in X. This can be time consuming, and tends to be temporary (at least during development/evaluation of models) because new features are constantly becoming available (e.g., DAE, N applied thus far, etc.).
To do
Determine which "feature groups" (for lack of a better term?) should be used for our base training scenarios. Include a name fore each group, as well as the features included (must be specific).
For example, "base" might include:
...etc.
"cropscan" might include all of "base" plus:
...etc.
"sentinel" might include all of "base" plus the Sentinel bands
Keep in mind, we have to eventually be able to access any data we decide to use for each group. It does not have to be available now, but we have to have a somewhat clear idea how we're going to store the data in a table/database. Similar to getting DAE for small plot studies, a function will be developed to pull all the relevant data from these "feature groups" to include in the appropriate X matrices that are input for model training.
This must be dynamic because we don't want to have to keep updating every single X matrix anytime new data are available. Instead, we just want to check the database and pull everything that is available. If anything is null, we probably want to exclude that observation rather than remove the entire feature (if we want to exclude a feature, we should create a new "feature group" that excludes that feature).
The text was updated successfully, but these errors were encountered: