-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fill out a few function docstrings #196
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Contributing | ||
|
||
|
||
|
||
## Contributing Time-Series Features | ||
|
||
We gratefully accept contributions of new time-series features, be they | ||
domain-specific or general. Please follow the below guidelines in order that | ||
your features may be successfully incorporated into the Cesium feature base. | ||
|
||
*Coming soon* |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,7 +20,19 @@ | |
|
||
|
||
def rectangularize_featureset(featureset): | ||
"""Convert xarray.Dataset into (2d) Pandas.DataFrame for use with sklearn.""" | ||
"""Convert xarray.Dataset into (2d) Pandas.DataFrame for use with sklearn. | ||
|
||
Params | ||
------ | ||
featureset : xarray.Dataset | ||
The xarray.Dataset object containing features. | ||
|
||
Returns | ||
------- | ||
Pandas.DataFrame | ||
2-D, sklearn-compatible Dataframe containing features. | ||
|
||
""" | ||
featureset = featureset.drop([coord for coord in featureset.coords | ||
if coord not in ['name', 'channel']]) | ||
feature_df = featureset.to_dataframe() | ||
|
@@ -69,15 +81,53 @@ def fit_model_optimize_hyperparams(data, targets, model, params_to_optimize, | |
|
||
|
||
def build_model_from_featureset(featureset, model=None, model_type=None, | ||
model_options={}, params_to_optimize=None, | ||
model_parameters={}, params_to_optimize=None, | ||
cv=None): | ||
"""Build model from (non-rectangular) xarray.Dataset of features.""" | ||
"""Build model from (non-rectangular) xarray.Dataset of features. | ||
|
||
Parameters | ||
---------- | ||
featureset : xarray.Dataset of features | ||
Features for training model. | ||
model : scikit-learn model, optional | ||
Instantiated scikit-learn model. If None, `model_type` must not be. | ||
Defaults to None. | ||
model_type : str, optional | ||
String indicating model to be used, e.g. "RandomForestClassifier". | ||
If None, `model` must not be. Defaults to None. | ||
model_parameters : dict, optional | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @stefanv whatcha think of this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hrm, this makes me wonder why we have a params_to_optimize dictionary. Why not just check each model_parameters entry, if it is a list, and the number of entries > 1, then hyper optimize, otherwise just use as-is. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @stefanv There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That doesn't matter if you just put the list inside of a list. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure what you mean by that... Put the list inside of a list? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, so you always have a list of parameters. If those parameters themselves are lists, then so be it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correction - There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, Ari, this look good. |
||
Dictionary with hyperparameter values to be used in model building. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Describe the structure of this dictionary. |
||
Keys are parameter names, values are the associated parameter values. | ||
These hyperparameters will be passed to the model constructor as-is | ||
(for hyperparameter optimization, see `params_to_optimize`). | ||
If None, default values will be used (see scikit-learn documentation | ||
for specifics). | ||
params_to_optimize : dict or list of dict, optional | ||
During hyperparameter optimization, various model parameters | ||
are adjusted to give an optimal fit. This dictionary gives the | ||
different values that should be explored for each parameter. E.g., | ||
`{'alpha': [1, 2], 'beta': [4, 5, 6]}` would fit models on all | ||
6 combinations of alpha and beta and compare the resulting models' | ||
goodness-of-fit. If None, only those hyperparameters specified in | ||
`model_parameters` will be used (passed to model constructor as-is). | ||
Defaults to None. | ||
cv : int, cross-validation generator or an iterable, optional | ||
Number of folds (defaults to 3 if None) or an iterable yielding | ||
train/test splits. See documentation for `GridSearchCV` for details. | ||
Defaults to None (yielding 3 folds). | ||
|
||
Returns | ||
------- | ||
sklearn estimator object | ||
The fitted sklearn model. | ||
|
||
""" | ||
if featureset.get('target') is None: | ||
raise ValueError("Cannot build model for unlabeled feature set.") | ||
|
||
if model is None: | ||
if model_type: | ||
model = MODELS_TYPE_DICT[model_type](**model_options) | ||
model = MODELS_TYPE_DICT[model_type](**model_parameters) | ||
else: | ||
raise ValueError("If model is None, model_type must be specified") | ||
feature_df = rectangularize_featureset(featureset) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are the below guidelines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefanv bnaul offered to fill this out - he's going to contribute to my branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bnaul still up for adding this? If not, I'd like to sit down for a few minutes together to get a better sense of what this needs to be.