Add `Iterable<T>.windows()` to apply the sliding window algorithm on `Iterable`s. #724

vxern · 2024-11-23T11:29:20Z

Example signature:

extension IterableWindows<T> on Iterable<T> {
  Iterable<List<T>> windows(int size) sync* {
    ...
  }
}

Given the following call:

final list = [1, 2, 3, 4, 5];
final size = 2;

print(list.windows(size));

windows() produces a list of slices, where every slice is of size size, and where every consecutive slice begins one element later:

[[1, 2], [2, 3], [3, 4], [4, 5]]

Notes:

The number of produced windows is equal to list.length - (size - 1).
- So, given a list of length 4 and window size 2, there would be 3 windows.

Constraints:

size must be a positive, non-zero integer.
size must not exceed the length of list.

If accepted, I'd be happy to implement this myself!

The text was updated successfully, but these errors were encountered:

vxern · 2024-11-23T11:35:30Z

A sample implementation could look something like this:

extension IterableExtension<T> on Iterable<T> {
  Iterable<List<T>> windows(int size) sync* {
    if (size <= 0) {
      throw StateError('The size must be a positive, non-zero integer.');
    }
    
    if (size > length) {
      throw StateError('The size must not exceed the number of elements.');
    }
  
    final iterator = this.iterator;
    final window = <T>[];
    while (iterator.moveNext()) {
      window.add(iterator.current);
      if (window.length != size) {
        continue;
      }
      
      yield [...window];
      
      window.removeAt(0);
    }
  }
}

lrhn · 2024-11-23T15:50:34Z

Seems fairly inefficient.
If it cannot be optimized then I don't see much reason to have it as a library function.

So, can it be optimized?

If we assume that each element must be a persistent shareable list, that shouldn't keep any other values alive than the ones on that list, then we do need to create one new list per element.

Inside the iteration, it only needs to remember the n-1 last elements. Those can be stored in a cyclic buffer, rather than using .remove(0), so there is no extra copying other than into the new list.
Soemething like:

extension<T> on Iterable<T> {
  Iterable<List<T>> window(int size) sync* {
    if (size < 1) throw RangeError.range(size, 1, null, "size");
    List<T>? buffer; // Cyclic buffer.
    var cursor = -size;
    for (var element in this) {
      if (cursor >= 0) {
        buffer![cursor] = element;
        var cursorAfter = cursor + 1;
        yield List<T>.filled(size, element)
          ..setRange(0, size - cursorAfter, buffer, cursorAfter)
          ..setRange(size - cursorAfter, size - 1, buffer);
        cursor = cursorAfter;
        if (cursor >= size) cursor = 0;
      } else {
        (buffer ??= List<T>.filled(size, element))[size + cursor] = element;
        if (++cursor == 0) yield buffer.toList(growable: false);
      }
    }
  }
}

Possible. It's still somewhat overkill to create a new list for each element if it doesn't have to survive past a synchronous computation.
If every step instead provided the same Queue, and the moveNext removed one element and added another, then it would be more efficient.
The queue is picky guaranteed to be correct until the next call to moveNext
Og the user needs a list, they can call toList on the queue.
If we only expose the queue as an Iterable, they can't mess with it either.

I think that would be a better API. More error prone, if someone holds on to an element after calling moveNext, but less unnecessary overhead of you don't need it.

import "dart:collection" show ListQueue;
extension <T> on Iterable<T> {
  Iterable<Iterable<T>> windows(int size) sync* {
    if (size < 1) throw RangeError.value(size, 1, null, "size");
    var queue = ListQueue<T>(size + 1); 
    var count = 0;
    for(var e in this) {
      if (count == size) {
        queue..removeFirst()..add(e);
        yield IterableView(queue);
      } else {
        queue.add(e);
        count++;
        if (count == size) yield IterableView(queue);
      }
    }
  }
}

(Where IterableView is just something that hides the Queue API. Or it could be made so that the object can be disabled when you call moveNext again.)

That's what I would want from a function like this.

vxern · 2024-11-23T16:49:21Z

I only gave a sample implementation, so naturally it wouldn't be quite there when it comes to performance. I was actually hoping I'd get some feedback about how it could be implemented in reality. Though, ultimately, I was rather more curious about what the sentiment would be on adding such a function, since the actual implementation details could be worked out once decided that 'sure, this is useful enough, let's think to add it'.

Regarding your implementation, where does IterableView come from?

lrhn · 2024-11-23T18:02:12Z

The IterableView is just some hypothetical wrapper that hides the Queue API and only exposes the Iterable API.
It could probably be DelegatingIterable from package:collection.

My point here is that it can be optimized (that's good, otherwise I'd be more worried), but also that even the optimized version that returns a new List for each element is more costly than what I'd prefer.
It's not doing allocation per element that worries me, as much as copying size elements. If that's not necessary, then it's a non-constant unnecessary overhead. If it is necessary, doing .map((i) => i.toList()) would make that list for you.

(I would probably make a wrapper that can also be disabled, something like:

class TemporaryIterable<T> extends Iterable<T> {
  Iterable<T>? _source;
  Iterable<T> get _checkSource => _source ?? (throw StateError("No longer valid"));
  TemporaryIterable(Iterable<T> this._source);
  void _dispose() { _source = null; }
  int get length => _checkSource.length;
  Iterator<T> get iterator => _TemporaryIterator<T>(this, _checkSource.iterator);
  // every iteratable member forwards to `_checkSource.member`.
}
class _TemporaryIterator<T> implements Iterator<T> {
  final _TemporaryIterable<T> _source;
  final Iterator<T> _iterator;
  bool moveNext() {
    _source._checkSource;
    return _iterator.moveNext();
  }
  T get current {
    _source._checkSource;
    return _iterator.current;
  }
}

Then I'd do: var iterable = TemporaryIterable<T>(queue); yield iterable; iterable._dispose(); to make sure you can't keep using the old view after calling moveNext.

vxern added the package:collection label Nov 23, 2024

vxern changed the title ~~Add an helper for getting sliding windows on collections.~~ Add an extension for getting sliding windows on collections. Nov 23, 2024

vxern changed the title ~~Add an extension for getting sliding windows on collections.~~ Add Iterable<T>.windows() to apply the sliding window algorithm on Iterables. Nov 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `Iterable<T>.windows()` to apply the sliding window algorithm on `Iterable`s. #724

Add `Iterable<T>.windows()` to apply the sliding window algorithm on `Iterable`s. #724

vxern commented Nov 23, 2024 •

edited

Loading

vxern commented Nov 23, 2024 •

edited

Loading

lrhn commented Nov 23, 2024 •

edited

Loading

vxern commented Nov 23, 2024 •

edited

Loading

lrhn commented Nov 23, 2024 •

edited

Loading

Add Iterable<T>.windows() to apply the sliding window algorithm on Iterables. #724

Add Iterable<T>.windows() to apply the sliding window algorithm on Iterables. #724

Comments

vxern commented Nov 23, 2024 • edited Loading

vxern commented Nov 23, 2024 • edited Loading

lrhn commented Nov 23, 2024 • edited Loading

vxern commented Nov 23, 2024 • edited Loading

lrhn commented Nov 23, 2024 • edited Loading

Add `Iterable<T>.windows()` to apply the sliding window algorithm on `Iterable`s. #724

Add `Iterable<T>.windows()` to apply the sliding window algorithm on `Iterable`s. #724

vxern commented Nov 23, 2024 •

edited

Loading

vxern commented Nov 23, 2024 •

edited

Loading

lrhn commented Nov 23, 2024 •

edited

Loading

vxern commented Nov 23, 2024 •

edited

Loading

lrhn commented Nov 23, 2024 •

edited

Loading