You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be great to have a more-verbose option for joins (inner_join, left_join etc). This would show not just the number of rows filtered from each input df, but also the level combinations that were unique. So if you joined two datasets about cars on make, it might report that levels Ferrari and Mazda were only found in df_left and filtered out, and Ford was only found in df_right and filtered out. The issue is the output length will grow unbounded with the number of factor levels, so I think it would have to be optional and truncate the output past a certain number of lines. I am happy to help with development.
The text was updated successfully, but these errors were encountered:
Hi! At first glance I'm not sure whether this would be in scope for tidylog, as the join logic is already fairly complex, and this would add a lot of additional output. A few questions:
what's the specific use case?
if this is implemented, this should be opt in. How would one opt in?
how to deal with high-cardinality values - as you mention, you might have 100 of levels
why only factors? (could work as well for booleans, strings, even ints)
It would be great to have a more-verbose option for joins (inner_join, left_join etc). This would show not just the number of rows filtered from each input df, but also the level combinations that were unique. So if you joined two datasets about cars on
make
, it might report that levelsFerrari
andMazda
were only found indf_left
and filtered out, andFord
was only found indf_right
and filtered out. The issue is the output length will grow unbounded with the number of factor levels, so I think it would have to be optional and truncate the output past a certain number of lines. I am happy to help with development.The text was updated successfully, but these errors were encountered: