Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Vector conversion #294

Open
pdeffebach opened this issue Aug 18, 2020 · 13 comments
Open

Add Vector conversion #294

pdeffebach opened this issue Aug 18, 2020 · 13 comments
Labels

Comments

@pdeffebach
Copy link

Currently there is an inconsistency between Array and Vector conversions for categorical arrays

julia> x = CategoricalArray([1, 1, 1, 2, 2, 3]);

julia> Array(x)
6-element Array{Int64,1}:
 1
 1
 1
 2
 2
 3

julia> Vector(x)
6-element Array{CategoricalValue{Int64,UInt32},1}:
 1
 1
 1
 2
 2
 3
@bkamins bkamins added the bug label Aug 18, 2020
@pdeffebach
Copy link
Author

As mentioned on slack, a better solution overall would be to make Vector return a CategoricalArray and have an explicit uncategorize method.

@bkamins
Copy link
Member

bkamins commented Aug 18, 2020

Vector may not return CategoricalArray, as Vector has a very concrete meaning in Base (it should call a constructor of Vector that should allocate a fresh Vector as opposed to e.g. convert functions).

@nalimilan
Copy link
Member

nalimilan commented Aug 19, 2020

Good catch. These indeed have to return an Array, but we should decide whether to return an Array{Int} or an Array {<:CategoricalValue{Int}}. This is related to whether we keep the current similar and collect overloads, which ensure Array{<:CategoricalValue} is never produced: if we drop them, we could return Array{<:CategoricalValue{Int}}, which is somewhat more logical than Array{Int}.

EDIT: though reading the Slack thread it seems that Array{Int} is what was expected (in that case at least)

@bkamins
Copy link
Member

bkamins commented Aug 19, 2020

It was expected for consistency with Array. However, I think that in these cases we could allow some flexibility and change the behaviour if it helps in other areas.

@pdeffebach
Copy link
Author

if we drop them, we could return Array{<:CategoricalValue{Int}}, which is somewhat more logical than Array{Int}.

I think most people converting to a Array are just saying "I want out! give me numbers again!". I think Array{<:CategoricalValue{Int}} is fine as long as we give people an explicit decategorize command.

@bkamins
Copy link
Member

bkamins commented Aug 20, 2020

broadcasted get now almost does it (except that it fails on missing)

@pdeffebach
Copy link
Author

Do we have a plan for get to work with missing?

I just had a frustrating experience with factors in R and thought Julia would be nicer, but this is still annoying.

@nalimilan
Copy link
Member

passmissing(get)?

@pdeffebach
Copy link
Author

Ah that does work. Fair enough!

I don't really understand the choice of get to be honest, since a categorical value isn't a collection. If we made our own function we could define it for missing. But I appreciate the need to get to 1.0 and it's not that big a deal.

@bkamins
Copy link
Member

bkamins commented Feb 12, 2021

Actually get is also my long-standing gripe #142. I would change it if @nalimilan supported this (and then passmissing would not be needed).

@nalimilan
Copy link
Member

AFAICT we agreed on using unwrap at #142. We just need to move the definition from Tables.jl to DataAPI.jl.

@bkamins
Copy link
Member

bkamins commented Feb 13, 2021

@quinnj - would you have time to make this move? Then we can update CategoricalArrays.jl

@quinnj
Copy link
Member

quinnj commented Feb 14, 2021

PR up: JuliaData/DataAPI.jl#35. So I realized we don't really need to move anything from Tables.jl; the definitions there are...not really related and not really necessary. Like, they're not useful generically. So we just define the single unwrap(x) = x definition in DataAPI.jl that CategoricalArrays can overload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants