Skip to content
This repository has been archived by the owner on Oct 22, 2024. It is now read-only.

Fix issue 355 added get frequency getter to iterable #357

Closed
13 changes: 13 additions & 0 deletions lib/src/iterable_extensions.dart
Original file line number Diff line number Diff line change
Expand Up @@ -601,6 +601,19 @@ extension IterableExtension<T> on Iterable<T> {
yield slice;
}
}

/// Returns a map where the keys are the unique elements of the iterable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single line first paragraph. Format as sentences.
Don't use Returns for a getter. (Don't start any DartDoc with it, there is always a better word).

I'd say:

  /// The count of occurrences of each element.
  ///
  /// The map has an entry for each distinct element of this iterable,
  /// as determined by `==`, where the value is the number of eelements
  /// in this iterable which are equal to the key. 
  /// If there are elements that are equal, but not identical, its unspecified
  /// which of the elements is used as the key.
  ///
  /// For example `['a', 'b', 'c', 'b', 'c', 'c'].frequencies` 
  /// is a map with entries like `{'a': , 'b': 2, 'c': 3}`.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected

/// and the values are the counts of those elements.
///
/// For example, `['a', 'b', 'b', 'c', 'c', 'c'].countFrequency()`
/// returns `{'a': 1, 'b': 2, 'c': 3}`.
Map<T, int> countFrequency() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would probably make it get frequencies instead of a function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, I agree this way is better, please check my changes

var frequencyMap = <T, int>{};
for (var item in this) {
frequencyMap[item] = (frequencyMap[item] ?? 0) + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use update, which avoids doing two lookups per item.

   frequencyMap.update(item, _increment, ifAbsent: _one);

with top-level/static helpers:

  int _increment(int value) => value + 1;
  int _one() => 1;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, I agree this way is better, please check my changes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How expensive is a lookup? Shouldn't it O(1)? So we are basically avoiding O(2)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are avoiding O(2) and lowering the constant factor. I hope, that assumes the map is actually implemented efficiently, but it's at least possible.

Also, for degenerate cases, hash table lookup is linear. And quick-sort is quadratic. We always hope to not hit those cases, but when we do, it's better to have a low constant.

For library code that other people will use, especially low-level code like this, efficiency is important.
If all your helper libraries are 10% slower than necessary, then your entire application is at least 10% slower than necessary. (If helper libraries use helper libraries, it may compound.)
And there is nothing you can do about it.

So when you provide code for others to depend on, to build their algorithms and applications on, you should never say "a constant factor doesn't matter", because to end users, the difference between 15ms and 17ms may be a missed animation frame, and that matters.

(So, that's my pocket philosophy about writing code for others: The only person who can say whether performance is "good enough" is the application writer. A helper library doesn't know that. It may choose how it optimizes, for speed or size or a tradeoff, but it shouldn't say "it's good enough" and leave easy optimizations on the table. Every level below the application has to do their best and not be part of the problem.)

The one thing we should actually do is to measure. If operator[] and operator[]= are so heavily optimizied that they beat update even if they do the same lookup twice, then we need to optimize update too. I want to choose the operation that has the best possible performance, rather than using workarounds that are faster right now, and never come back and check if they stay that way. If update isn't faster today, we should file an issue, so that it is tomorrow.

(I can see that I flip between pragmatic and ideological reasoning - be fast, because it matters, but don't be fast by doing it the wrong way, even if it is faster today. Tradeoffs. It's tradeoffs all the way down. So reasonable people can disagree on where to trade what. )

Copy link

@mateusfccp mateusfccp Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not disagreeing, just wanted to understand the reasoning.

Thanks for the explanation.

(I actually didn't even know .update could be potentially faster, I never considered that it would avoid a lookup.)

}
return frequencyMap;
}
}

/// Extensions that apply to iterables with a nullable element type.
Expand Down
79 changes: 79 additions & 0 deletions test/extensions_test.dart
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.

import 'dart:collection';
import 'dart:math' show Random, pow;

import 'package:collection/collection.dart';
Expand Down Expand Up @@ -1352,6 +1353,84 @@ void main() {
expect(l3.toList(), [4, 5]);
});
});
group('FrequencyCounter tests', () {
test('should return correct frequency map for List of integers', () {
var list = [1, 2, 2, 3, 3, 3];
var frequencyMap = list.countFrequency();
expect(frequencyMap, {1: 1, 2: 2, 3: 3});
});

test('should return correct frequency map for List of strings', () {
var list = ['a', 'b', 'b', 'c', 'c', 'c'];
var frequencyMap = list.countFrequency();
expect(frequencyMap, {'a': 1, 'b': 2, 'c': 3});
});

test('should handle empty List', () {
var list = [];
var frequencyMap = list.countFrequency();
expect(frequencyMap, {});
Copy link
Contributor

@lrhn lrhn Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider isEmpty as matcher. I think it works for maps too, and it should a more precise an error message like "is not empty".

});

test('should handle single element List', () {
var list = [42];
var frequencyMap = list.countFrequency();
expect(frequencyMap, {42: 1});
});

test('should return correct frequency map for Set of integers', () {
// ignore: equal_elements_in_set
var set = {1, 2, 2, 3, 3, 3};
var frequencyMap = set.countFrequency();
expect(frequencyMap, {1: 1, 2: 1, 3: 1});
});

test('should return correct frequency map for Set of strings', () {
// ignore: equal_elements_in_set
var set = {'a', 'b', 'b', 'c', 'c', 'c'};
var frequencyMap = set.countFrequency();
expect(frequencyMap, {'a': 1, 'b': 1, 'c': 1});
});

test('should handle empty Set', () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would drop the tests for sets and queues unless there is a reason to believe that the implementation treats them differently (which it doesn't).

We're not testing iteration, that is assumed to work, so all that matters is the sequence of values we get by iterating, and a List is fine for that.

(If we start trying to detect lists or sets, to do more efficient iteration or maybe assuming that sets cannot have duplicates so we don't need to count - which isn't true because sets can have non-standard equality - then there is reason to test with, and without, those types specifically.)

var set = <int>{};
var frequencyMap = set.countFrequency();
expect(frequencyMap, {});
});

test('should handle single element Set', () {
var set = {42};
var frequencyMap = set.countFrequency();
expect(frequencyMap, {42: 1});
});

test('should return correct frequency map for Queue of integers', () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably overkill to go throguh all kinds of iterables.
I'd rather have more diversity in the elements.

For example having elements that are equal, but not identical, or having elements that are records.
It's the building of the map that is tricky, iterating the iterable will probably work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added get frequencies tests extended group

var queue = Queue<int>();
queue.addAll([1, 2, 2, 3, 3, 3]);
var frequencyMap = queue.countFrequency();
expect(frequencyMap, {1: 1, 2: 2, 3: 3});
});

test('should return correct frequency map for Queue of strings', () {
var queue = Queue<String>();
queue.addAll(['a', 'b', 'b', 'c', 'c', 'c']);
var frequencyMap = queue.countFrequency();
expect(frequencyMap, {'a': 1, 'b': 2, 'c': 3});
});

test('should handle empty Queue', () {
var queue = Queue<int>();
var frequencyMap = queue.countFrequency();
expect(frequencyMap, {});
});

test('should handle single element Queue', () {
var queue = Queue<int>();
queue.add(42);
var frequencyMap = queue.countFrequency();
expect(frequencyMap, {42: 1});
});
});
});

group('Comparator', () {
Expand Down