Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] umap dirty_cat on colab #607

Open
maksim-mihtech opened this issue Oct 27, 2024 · 10 comments
Open

[BUG] umap dirty_cat on colab #607

maksim-mihtech opened this issue Oct 27, 2024 · 10 comments
Labels

Comments

@maksim-mihtech
Copy link

With the last version of Pygraphistry I get the following error while running g.umap()

graphistry_bug

@lmeyerov
Copy link
Contributor

Thanks @maksim-mihtech

Any chance you can share the file of df.sample(100).to_parquet('nodes.parquet') ?

cc @silkspace

@lmeyerov
Copy link
Contributor

Also, what versions of python, dirty_cat , sci-kit, and pandas are you on?

@lmeyerov
Copy link
Contributor

@maksim-mihtech it sounds like there is a repro in colab, can you share how you installed it for the repro, pip install graphistry[umap_learn] ?

@lmeyerov
Copy link
Contributor

More info:

redteam-umap-gtc-gpu.ipynb, and I installed graphistry(CPU)/graphistryai

@lmeyerov lmeyerov changed the title [BUG] [BUG] umap dirty_cat on colab Oct 28, 2024
@maksim-mihtech
Copy link
Author

@lmeyerov
It's installed like <pip install graphistry[ai]>

@maksim-mihtech
Copy link
Author

Graphistry

@lmeyerov
Copy link
Contributor

lmeyerov commented Jan 1, 2025

Thanks!

We are working on an end-run around this by upgrading from dirty-cat to skrub, you can track here: #626

@maksim-mihtech
Copy link
Author

@lmeyerov

As you mentioned, I have tried the version(0.35.4+18.g60177c52) from dev/dev-skrub branch and it solved the problem. Thank you.

@lmeyerov
Copy link
Contributor

lmeyerov commented Jan 4, 2025

That is great to hear!

We are still working on making that branch pass all our tests -- skrub is more picky on column names so we are adding more auto data cleaning to avoid skrub throwing exns on inputs that used to work -- but outside of that, the branch seems good!

@maksim-mihtech
Copy link
Author

Thank you and whole Graphistry team!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants