Skip to content
This repository has been archived by the owner on Oct 22, 2024. It is now read-only.

Fix issue 355 added get frequency getter to iterable #357

Closed

Conversation

pvlKryu
Copy link

@pvlKryu pvlKryu commented Aug 2, 2024

[yes] I’ve reviewed the contributor guide and applied the relevant portions to this PR.

Description:
This Pull Request introduces a new extension getter frequencies for the Iterable class. The countFrequency method returns a map where the keys are the unique elements of the iterable and the values are the counts of those elements. This addition provides an easy and efficient way to count the frequency of elements in any iterable.

Changes:

  • Added frequencies getter extension to lib/src/iterable_extensions.dart
  • Implemented comprehensive tests for the frequencies getter in test/extensions_test.dart. The tests cover various subclasses of Iterable including List, Set, and Queue.

Relevant Issues:
This PR addresses Issue dart-lang/core#679.

Thank you!

Co-authored-by: Mateus Felipe C. C. Pinto <[email protected]>
@pvlKryu
Copy link
Author

pvlKryu commented Sep 3, 2024

@mateusfccp Hey Mateus, can you approve?

@pvlKryu
Copy link
Author

pvlKryu commented Sep 3, 2024

@kevmoo can you please review this one?

@mateusfccp
Copy link

@mateusfccp Hey Mateus, can you approve?

I can't, I'm not a member of the team.

///
/// For example, `['a', 'b', 'b', 'c', 'c', 'c'].countFrequency()`
/// returns `{'a': 1, 'b': 2, 'c': 3}`.
Map<T, int> countFrequency() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would probably make it get frequencies instead of a function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, I agree this way is better, please check my changes

Map<T, int> countFrequency() {
var frequencyMap = <T, int>{};
for (var item in this) {
frequencyMap[item] = (frequencyMap[item] ?? 0) + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use update, which avoids doing two lookups per item.

   frequencyMap.update(item, _increment, ifAbsent: _one);

with top-level/static helpers:

  int _increment(int value) => value + 1;
  int _one() => 1;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, I agree this way is better, please check my changes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How expensive is a lookup? Shouldn't it O(1)? So we are basically avoiding O(2)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are avoiding O(2) and lowering the constant factor. I hope, that assumes the map is actually implemented efficiently, but it's at least possible.

Also, for degenerate cases, hash table lookup is linear. And quick-sort is quadratic. We always hope to not hit those cases, but when we do, it's better to have a low constant.

For library code that other people will use, especially low-level code like this, efficiency is important.
If all your helper libraries are 10% slower than necessary, then your entire application is at least 10% slower than necessary. (If helper libraries use helper libraries, it may compound.)
And there is nothing you can do about it.

So when you provide code for others to depend on, to build their algorithms and applications on, you should never say "a constant factor doesn't matter", because to end users, the difference between 15ms and 17ms may be a missed animation frame, and that matters.

(So, that's my pocket philosophy about writing code for others: The only person who can say whether performance is "good enough" is the application writer. A helper library doesn't know that. It may choose how it optimizes, for speed or size or a tradeoff, but it shouldn't say "it's good enough" and leave easy optimizations on the table. Every level below the application has to do their best and not be part of the problem.)

The one thing we should actually do is to measure. If operator[] and operator[]= are so heavily optimizied that they beat update even if they do the same lookup twice, then we need to optimize update too. I want to choose the operation that has the best possible performance, rather than using workarounds that are faster right now, and never come back and check if they stay that way. If update isn't faster today, we should file an issue, so that it is tomorrow.

(I can see that I flip between pragmatic and ideological reasoning - be fast, because it matters, but don't be fast by doing it the wrong way, even if it is faster today. Tradeoffs. It's tradeoffs all the way down. So reasonable people can disagree on where to trade what. )

Copy link

@mateusfccp mateusfccp Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not disagreeing, just wanted to understand the reasoning.

Thanks for the explanation.

(I actually didn't even know .update could be potentially faster, I never considered that it would avoid a lookup.)

@pvlKryu pvlKryu changed the title Fix issue 355 count frequency method to iterable Fix issue 355 added get frequency getter to iterable Sep 29, 2024
@@ -601,6 +601,22 @@ extension IterableExtension<T> on Iterable<T> {
yield slice;
}
}

/// Returns a map where the keys are the unique elements of the iterable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single line first paragraph. Format as sentences.
Don't use Returns for a getter. (Don't start any DartDoc with it, there is always a better word).

I'd say:

  /// The count of occurrences of each element.
  ///
  /// The map has an entry for each distinct element of this iterable,
  /// as determined by `==`, where the value is the number of eelements
  /// in this iterable which are equal to the key. 
  /// If there are elements that are equal, but not identical, its unspecified
  /// which of the elements is used as the key.
  ///
  /// For example `['a', 'b', 'c', 'b', 'c', 'c'].frequencies` 
  /// is a map with entries like `{'a': , 'b': 2, 'c': 3}`.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected

expect(frequencyMap, {42: 1});
});

test('should return correct frequency map for Queue of integers', () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably overkill to go throguh all kinds of iterables.
I'd rather have more diversity in the elements.

For example having elements that are equal, but not identical, or having elements that are records.
It's the building of the map that is tricky, iterating the iterable will probably work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added get frequencies tests extended group

@lrhn lrhn requested a review from natebosch September 29, 2024 21:50
@lrhn
Copy link
Contributor

lrhn commented Sep 29, 2024

@natebosch WDYT. Is this worth adding?
(it's pretty simple, so no big complexity add, and it's something that is often hard for people to write efficiently.)

@natebosch
Copy link
Contributor

@natebosch WDYT. Is this worth adding?

Yes, I think it is worth adding. It's not as general as groupFoldBy without the keyOf callback, but you can accomplish the same with a .map call. Getting frequencies by == is common enough to have the specialized implementation.

/// ```
///
/// Note: This method uses `==` to compare elements.
/// For collections # `List`, `Set`, or `Map`, deep equality is not checked.
Copy link
Contributor

@lrhn lrhn Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... where not even shallow equality is checked, two lists are equal only if they are the same instance.

If that is a use-case you want to support, maybe we should to add a parameter for the equality, to allow you to ask for const ListEquality() for the elements:

Map<T, int> frequencies([Equality<T>? equality]) {
  var frequenceyMap = equality == null ? <T, int>{} : 
     equality is IdentityEquality ? LinkedHashMap<T, int>.identity() : 
     LinkedHashMap<T, int>(equality.equals, equality.hash);
  // ...
}

or have a parallel ... frequenciesBy(Equality<T> equality) ....

Then you can do nested list equality as .frequenciesBy(const ListEquality(ListEquality())).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is "shallow" as it relates to this, not as it relates to the elements of this that are also collection typed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK. In that case the wording is at least ambiguous.
Consider:

/// For elements that are collections, like `List`, `Set` or `Map`, the `==` comparison only
/// checks the identity of the collection, not its contents.

/// but not identical (`identical`),
/// it is unspecified which of the elements is used as the key in the map.
/// For example, if there are multiple lists with the same content,
/// the map will only keep one of them as a key.
Copy link
Contributor

@lrhn lrhn Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a good example, though, since list equality is identity. Same below ...


group('get frequencies tests extended', () {
test('list of equal but not identical strings', () {
var list = ['apple', String.fromCharCodes('apple'.codeUnits)];
Copy link
Contributor

@lrhn lrhn Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String is not a good choice for this test because those will be identical when compiled for web.

I'd drop this one, and just use the MyObject test below.
(Or keep this, but document that identity may differ between platforms.)

expect(frequencyMap, {const MyObject(1): 2, const MyObject(2): 1});
});

test('list with equal numbers but different types', () {
Copy link
Contributor

@lrhn lrhn Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good test. There is no way around web considering them all identical here, but different classes being equal is an important test for native numbers.

(I was about to suggest testing with double.nan, but ... don't. You can't look it up in the map afterwards, and the default implementation of Map.entries actually looks up the value, so you can't assume to be able to even inspect the returned map.)

});

test('list with elements that are objects', () {
var list = [const MyObject(1), const MyObject(1), const MyObject(2)];
Copy link
Contributor

@lrhn lrhn Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use const here, because that makes the objects actually identical, so it doesn't check equality being different from identitiy.
Or have one const and one not.
Or add a second field to MyObject, where only one, say the first one is used for equality: [const MyObject(1, "a"), const MyObject(1, "b") ...
would be equal but not identical.

test('should handle empty List', () {
var list = [];
var frequencyMap = list.frequencies;
expect(frequencyMap, {});
Copy link
Contributor

@lrhn lrhn Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider isEmpty as matcher. I think it works for maps too, and it should a more precise an error message like "is not empty".

@@ -1352,6 +1353,122 @@ void main() {
expect(l3.toList(), [4, 5]);
});
});
group('get frequencies tests', () {
Copy link
Contributor

@lrhn lrhn Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 'frequenices', it's cleaner.

The names of groups and tests are concatenated with a space between them, so they can rely on context. Things like "should be correct" is a given for tests, so it doesn't add value. Consider:

    group('frequencies', () { 
      test('of integers', () { ... });
      test('of strings', () { ... });
      test('of empty iterable', () { ... });
      test('of empty single element iterable', () { ... });
      group('(when equality is not identity)', () {
        test('of records', () { ... });
        /// ...
      });
   });

These names are only important when a test fails.
At that point, they should give enough context to make the error message meaningful.

Take "frequencies of single element iterable" with an error (if the result was empty) of

  Expected: {42: 1}
    Actual: {}
     Which: has different length and is missing map key <42>

With that headline, it's fairly clear where and what the problem is.

expect(frequencyMap, {'a': 1, 'b': 1, 'c': 1});
});

test('should handle empty Set', () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would drop the tests for sets and queues unless there is a reason to believe that the implementation treats them differently (which it doesn't).

We're not testing iteration, that is assumed to work, so all that matters is the sequence of values we get by iterating, and a List is fine for that.

(If we start trying to detect lists or sets, to do more efficient iteration or maybe assuming that sets cannot have duplicates so we don't need to count - which isn't true because sets can have non-standard equality - then there is reason to test with, and without, those types specifically.)

expect(frequencyMap, {'apple': 2});
});

test('list of records', () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(So for naming, I'd drop the "list of" here, what is important is that it's records being compared, and it being a list isn't important to the test.
Maybe just test('of records', () { ... }); with the changes above.)

Copy link
Contributor

@lrhn lrhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, but if doing shallow comparison of lists is a goal, we need to parameterize with an equality.

@pvlKryu
Copy link
Author

pvlKryu commented Oct 1, 2024

@lrhn Thank you very much for the quick feedback and for the detailed and helpful comments. I need some time to review everything you mentioned and correct it. I will let you know soon!

@mosuem
Copy link
Contributor

mosuem commented Oct 21, 2024

Closing as the dart-lang/collection repository is merged into the dart-lang/core monorepo. Please re-open this PR there!

@mosuem mosuem closed this Oct 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants