Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This pull request adds a random forest algorithm utilizing features from the Sine Coulomb Matrix and MagPie featurization algorithms. Here are the key details of the algorithm:
Sine Coulomb Matrix: Creates structural features based on Coulombic interactions within a periodic boundary condition (suitable for crystalline materials with known structures).
MagPie Features: Weighted elemental features derived from elemental data such as electronegativity, melting point, and electron affinity.
Both algorithms were executed within the Automatminer v1.0.3.20191111 framework for convenience, although no auto-featurization or AutoML processes were applied.
Data Processing
Data Cleaning: Features with more than 1% NaN samples were dropped. Missing samples were imputed using the mean of the training data.
Featurization:
For structure problems: Both Sine Coulomb Matrix and MagPie features were applied.
For problems without structure: Only MagPie features were applied.
Model Details
Random Forest: Utilizes 500 estimators.
Hyperparameter Tuning: None performed. A large, constant number of trees were used in constructing each fold's model, using the entire training+validation set as training data for the random forest.
Additional Information
Raw Data and Example Notebook: Available on the matbench repository.
Included files