-
Notifications
You must be signed in to change notification settings - Fork 557
HLL++ offer() always returns true when format SPARSE #91
Comments
Confirmed that it returns true. This behavior doesn't actually result in The issue is that for efficiency purposes we do not merge the temp set In order to correctly return 'false' on each entry we would have to merge What are peoples thoughts on optimal solution here? We should at least add Matt On Thu, Jul 2, 2015 at 4:32 AM Pavel Martynov [email protected]
|
IMHO you should measure how much will slow down contract support and if it really quite a bit - fix I want to explain why I need it: my business logic should perfom some notification when counter becomes larger than some predefined value. Pseudocode:
If And I think contract violation with explanation in docs in general bad idea leading to poor API. Before stream-lib I used Redis (http://redis.io/commands/pfadd). |
I've created a branch that provides the correct behavior. Basic Are you interested in helping with performance analysis? Matt On Fri, Jul 3, 2015 at 12:36 AM Pavel Martynov [email protected]
|
@abramsm I will try to compare perf on my dataset on next week. thanks! |
@abramsm I run fixed |
OfferHashed() now returns correct value for Sparse format
Thanks. Can you share any more details and results of your performance Matt On Mon, Jul 6, 2015, 1:50 AM Pavel Martynov [email protected]
|
I have two issues with the proposed changes: (1) I don't see the purpose of keeping around the tmpSet and the sparseSet if we introduce the transientSparseSet. (2) Presumably the tmpSet and the sparseSet were introduced because of some combination of (a) lower memory footprint and/or (b) amortized cost of insertions into those data structures is cheaper than using a hash set. Maybe we could use a specialized hash table for int primitives for the transientSparseSet and eliminate the tmpSet and the sparseSet. I'm not certain. It's likely that @tea-dragon has some objections too. |
Yeah, those are good points and I think why we didn't do this initially. Pavel - can you try your performance tests again using the current Matt On Mon, Jul 6, 2015 at 8:37 AM mspiegel [email protected] wrote:
|
While I am experementing with performance of fixed @Test
public void testOfferReturn() {
HyperLogLogPlus hll = new HyperLogLogPlus(5, 25);
int uniqOffers = 10000;
int hllUpdates = 0;
for (int i = 0; i < uniqOffers; ++i) {
if (hll.offer(UUID.randomUUID().toString()))
++hllUpdates;
}
assertEquals(uniqOffers, hllUpdates);
} Test fails with:
Test run on f4c88af |
This has to do with the precision properties of the counter. In normal HyperLogLogPlus(5,25) to HyperLogLogPlus(25,25) You will get a result like: Expected :10000 Which is about as close as you can get. Give this fact I think the proper course of action is to update the "returns true if it alters the internal state of the of the cardinality Matt On Wed, Jul 8, 2015 at 2:23 AM Pavel Martynov [email protected]
|
@abramsm auh, I see, you are right. I have some additional questions about your HLL++ implementation, will ask it in mail list. Thanks! |
Hi,
HyperLogLogPlus.offerHashed()
constantly returnstrue
in SPARSE format, even if hash already counted.https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/cardinality/HyperLogLogPlus.java#L309
Test:
The text was updated successfully, but these errors were encountered: