Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Iterable<T>.windows() to apply the sliding window algorithm on Iterables. #724

Open
vxern opened this issue Nov 23, 2024 · 4 comments
Open

Comments

@vxern
Copy link

vxern commented Nov 23, 2024

Example signature:

extension IterableWindows<T> on Iterable<T> {
  Iterable<List<T>> windows(int size) sync* {
    ...
  }
}

Given the following call:

final list = [1, 2, 3, 4, 5];
final size = 2;

print(list.windows(size));

windows() produces a list of slices, where every slice is of size size, and where every consecutive slice begins one element later:

[[1, 2], [2, 3], [3, 4], [4, 5]]

Notes:

  • The number of produced windows is equal to list.length - (size - 1).
    • So, given a list of length 4 and window size 2, there would be 3 windows.

Constraints:

  • size must be a positive, non-zero integer.
  • size must not exceed the length of list.

If accepted, I'd be happy to implement this myself!

@vxern vxern changed the title Add an helper for getting sliding windows on collections. Add an extension for getting sliding windows on collections. Nov 23, 2024
@vxern
Copy link
Author

vxern commented Nov 23, 2024

A sample implementation could look something like this:

extension IterableExtension<T> on Iterable<T> {
  Iterable<List<T>> windows(int size) sync* {
    if (size <= 0) {
      throw StateError('The size must be a positive, non-zero integer.');
    }
    
    if (size > length) {
      throw StateError('The size must not exceed the number of elements.');
    }
  
    final iterator = this.iterator;
    final window = <T>[];
    while (iterator.moveNext()) {
      window.add(iterator.current);
      if (window.length != size) {
        continue;
      }
      
      yield [...window];
      
      window.removeAt(0);
    }
  }
}

@lrhn
Copy link
Member

lrhn commented Nov 23, 2024

Seems fairly inefficient.
If it cannot be optimized then I don't see much reason to have it as a library function.

So, can it be optimized?

If we assume that each element must be a persistent shareable list, that shouldn't keep any other values alive than the ones on that list, then we do need to create one new list per element.

Inside the iteration, it only needs to remember the n-1 last elements. Those can be stored in a cyclic buffer, rather than using .remove(0), so there is no extra copying other than into the new list.
Soemething like:

extension<T> on Iterable<T> {
  Iterable<List<T>> window(int size) sync* {
    if (size < 1) throw RangeError.range(size, 1, null, "size");
    List<T>? buffer; // Cyclic buffer.
    var cursor = -size;
    for (var element in this) {
      if (cursor >= 0) {
        buffer![cursor] = element;
        var cursorAfter = cursor + 1;
        yield List<T>.filled(size, element)
          ..setRange(0, size - cursorAfter, buffer, cursorAfter)
          ..setRange(size - cursorAfter, size - 1, buffer);
        cursor = cursorAfter;
        if (cursor >= size) cursor = 0;
      } else {
        (buffer ??= List<T>.filled(size, element))[size + cursor] = element;
        if (++cursor == 0) yield buffer.toList(growable: false);
      }
    }
  }
}

Possible. It's still somewhat overkill to create a new list for each element if it doesn't have to survive past a synchronous computation.
If every step instead provided the same Queue, and the moveNext removed one element and added another, then it would be more efficient.
The queue is picky guaranteed to be correct until the next call to moveNext
Og the user needs a list, they can call toList on the queue.
If we only expose the queue as an Iterable, they can't mess with it either.

I think that would be a better API. More error prone, if someone holds on to an element after calling moveNext, but less unnecessary overhead of you don't need it.

import "dart:collection" show ListQueue;
extension <T> on Iterable<T> {
  Iterable<Iterable<T>> windows(int size) sync* {
    if (size < 1) throw RangeError.value(size, 1, null, "size");
    var queue = ListQueue<T>(size + 1); 
    var count = 0;
    for(var e in this) {
      if (count == size) {
        queue..removeFirst()..add(e);
        yield IterableView(queue);
      } else {
        queue.add(e);
        count++;
        if (count == size) yield IterableView(queue);
      }
    }
  }
}

(Where IterableView is just something that hides the Queue API. Or it could be made so that the object can be disabled when you call moveNext again.)

That's what I would want from a function like this.

@vxern
Copy link
Author

vxern commented Nov 23, 2024

I only gave a sample implementation, so naturally it wouldn't be quite there when it comes to performance. I was actually hoping I'd get some feedback about how it could be implemented in reality. Though, ultimately, I was rather more curious about what the sentiment would be on adding such a function, since the actual implementation details could be worked out once decided that 'sure, this is useful enough, let's think to add it'.

Regarding your implementation, where does IterableView come from?

@lrhn
Copy link
Member

lrhn commented Nov 23, 2024

The IterableView is just some hypothetical wrapper that hides the Queue API and only exposes the Iterable API.
It could probably be DelegatingIterable from package:collection.

My point here is that it can be optimized (that's good, otherwise I'd be more worried), but also that even the optimized version that returns a new List for each element is more costly than what I'd prefer.
It's not doing allocation per element that worries me, as much as copying size elements. If that's not necessary, then it's a non-constant unnecessary overhead. If it is necessary, doing .map((i) => i.toList()) would make that list for you.

(I would probably make a wrapper that can also be disabled, something like:

class TemporaryIterable<T> extends Iterable<T> {
  Iterable<T>? _source;
  Iterable<T> get _checkSource => _source ?? (throw StateError("No longer valid"));
  TemporaryIterable(Iterable<T> this._source);
  void _dispose() { _source = null; }
  int get length => _checkSource.length;
  Iterator<T> get iterator => _TemporaryIterator<T>(this, _checkSource.iterator);
  // every iteratable member forwards to `_checkSource.member`.
}
class _TemporaryIterator<T> implements Iterator<T> {
  final _TemporaryIterable<T> _source;
  final Iterator<T> _iterator;
  bool moveNext() {
    _source._checkSource;
    return _iterator.moveNext();
  }
  T get current {
    _source._checkSource;
    return _iterator.current;
  }
}

Then I'd do: var iterable = TemporaryIterable<T>(queue); yield iterable; iterable._dispose(); to make sure you can't keep using the old view after calling moveNext.

@vxern vxern changed the title Add an extension for getting sliding windows on collections. Add Iterable<T>.windows() to apply the sliding window algorithm on Iterables. Nov 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants