diff --git a/description.md b/description.md index c24e813..3711201 100644 --- a/description.md +++ b/description.md @@ -31,7 +31,7 @@ In addition to the primary dataset, we also utilized the background dataset, whi ### Tree selection We decided to focus on the train set first and only afterwards on the background survey. -Our initial aim was to craft a procedure which could test our initial hypothesis. The idea was to develop a quick and automatic way to assess with sustainable precision the marginal predictive power of every feature available. We decided to evaluate the predictive performance on the task of predicting ferility of a univariate model, using a stratified cross-validation with 5 folds and iterating over every feature available. Our model chosen for this procedure was the decision tree, for several reasons: it can gracefully handle missing data, it's better suited with categorical/ordinal features stored as numbers that a linear method like logistic regression, and, last but not least, the optimized implementations available (together with the modest number of rows) allowed us to iterate across tens of thousands of features fairly quickly. The results are summarized below. +Our initial aim was to craft a procedure which could test our initial hypothesis. The idea was to develop a quick and automatic way to assess with sustainable precision the marginal predictive power of every feature available. We decided to evaluate the predictive performance on the task of predicting ferility of a univariate model, using a stratified cross-validation with 5 folds and iterating over every feature available. The chosem model for this procedure was a decision tree, for several reasons: it can gracefully handle missing data, it's better suited with categorical/ordinal features stored as numbers that a linear method like logistic regression, and, last but not least, the optimized implementations available (together with the modest number of rows) allowed us to iterate across tens of thousands of features fairly quickly. The results are summarized below. ![gas](./saved/tree_selection_big_year.png)