Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import public datasets from OpenML #3

Open
gfournier opened this issue May 28, 2019 · 0 comments
Open

Import public datasets from OpenML #3

gfournier opened this issue May 28, 2019 · 0 comments

Comments

@gfournier
Copy link
Member

gfournier commented May 28, 2019

Use scikit-learn function: sklearn.datasets.fetch_openml

Titanic dataset: https://www.openml.org/d/40945
Abalone: https://www.openml.org/t/9900

@gfournier gfournier added the good first issue Good for newcomers label May 28, 2019
@gfournier gfournier changed the title Import public datasets with openml Import public datasets from OpenML May 28, 2019
@gfournier gfournier removed the good first issue Good for newcomers label Jun 8, 2019
gfournier pushed a commit that referenced this issue Aug 7, 2019
* fix casting bug + test on filter/map function on dicos

* add function to retrieve 2-uple list of edges from generic tuple edges

* fix bug on DebugPassThrough

* add 'get_subpipeline' methods to create sub GraphPipeline from a given GraphPipeline

* add docstring get_subpipeline
gfournier pushed a commit that referenced this issue Aug 7, 2019
* fix casting bug + test on filter/map function on dicos

* add function to retrieve 2-uple list of edges from generic tuple edges

* fix bug on DebugPassThrough

* add 'get_subpipeline' methods to create sub GraphPipeline from a given GraphPipeline

* add docstring get_subpipeline
gfournier added a commit that referenced this issue Aug 7, 2019
* Bump version to 0.1.0

* Change output type vectorizer (#1)

* change setup

* change default output type of countvectorizer to bet in32

* change dtype to numerical encoder as well + tests

* add output type test on NumImputer

* fix bug NumericalEncoder when new column (#4)

* Block Search + other (#2)

* add make_pipeline function (works like sklearn)

* fix type "_if_fitted" -> "_already_fitted"

*  * add handling of columns_to_encode == "--object--" in target encoder

 * corresponding test

* add Numerical encoder test for "columns_to_encode == '--object--' "

* expose command argument parser outside, to be able to add new arguments.

* change WordVectorizer in char mod distributions

+ fix bug in HyperRangeBetaInt

* change default behavior : encode "columns_to_encode == '--object--' "

* remove 'bug' (double return)

* allow text preprocessors to concat their inputs

* add 'RandomTrainTestCv' and 'IndexTrainCv' cv-like object.

* same api as a regular cv object ...
* ... but only one split

* add 'use_for_block_search' attribute + filter models based on that

* * add block search iterator

* automl config : models_to_keep_block_search

* fix typo in test

* ignore Warning in test

* Graph pipeline subgraph from dev (#3)

* fix casting bug + test on filter/map function on dicos

* add function to retrieve 2-uple list of edges from generic tuple edges

* fix bug on DebugPassThrough

* add 'get_subpipeline' methods to create sub GraphPipeline from a given GraphPipeline

* add docstring get_subpipeline

* Fix numerical encoder max_cum_proba (#6)

* Fix bug automl group (#5)

* allow reload of groups

*  * add average_precision default transformation

 * go back to default transformation if unknown

* return dataframe in command

* Fix dataset load from SG premises

* Fix dummy encoding type in NumericalEncoder
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant