Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proportionmap accepts iterators #855

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion src/counts.jl
Original file line number Diff line number Diff line change
Expand Up @@ -450,5 +450,13 @@ Return a dictionary mapping each unique value in `x` to its proportion in `x`.
If a vector of weights `wv` is provided, the proportion of weights is computed rather
than the proportion of raw counts.
"""
proportionmap(x::AbstractArray) = _normalize_countmap(countmap(x), length(x))
function proportionmap(x)
countm = Dict{eltype(x), Int}()
n = 0
for y in x
countm[y] = get(countm, y, 0) + 1
n += 1
end
Comment on lines +454 to +459
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reinvents countmap. Better make countmap allow iterators instead, so that both functions benefit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

countmap already accepts iterators; I did that to keep a count of n while iterating.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. The problem is that countmap uses different algorithms under the hood for performance. By using a Dict here, you lose the benefit of the fast radix sort and count sort algorithms.

I see two solutions:

  • do n = Base.IteratorSize(x) isa Union{HasLength, HasShape} ? length(x) : sum(values(countm))
  • adjust all _addcounts! methods to return the number of elements (this should be cheap so not a big deal if it's not used by addcounts)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am looking to help get this across the line. Is this your first proposed solution?

function proportionmap(x)
    countm = countmap(x)
    n = Base.IteratorSize(x) isa Union{Base.HasLength, Base.HasShape} ? length(x) : sum(values(countm))
    _normalize_countmap(countm, n)
end

_normalize_countmap(countm, n)
end
proportionmap(x::AbstractArray, wv::AbstractWeights) = _normalize_countmap(countmap(x, wv), sum(wv))
7 changes: 7 additions & 0 deletions test/counts.jl
Original file line number Diff line number Diff line change
Expand Up @@ -209,3 +209,10 @@ if VERSION >= v"1.9.0-DEV"
# countmap and proportionmap only support the :dict algorithm for weighted sums.
end
end

@testset "proportionmap with iterator" begin
a = [1, 2, 3, 4]
b =[true, true ,false, false, true, false]
@test proportionmap(skipmissing(a)) == Dict(1 => 0.25, 2 => 0.25, 3 => 0.25, 4 => 0.25)
@test proportionmap(skipmissing(b)) == Dict(true => 0.5, false => 0.5)
end