diff --git a/Guides/Merge.md b/Guides/Merge.md new file mode 100644 index 00000000..b11c9398 --- /dev/null +++ b/Guides/Merge.md @@ -0,0 +1,193 @@ +# Merge + +- Between Partitions: + [[Source](https://github.com/apple/swift-algorithms/blob/main/Sources/Algorithms/MergePartitions.swift) | + [Tests](https://github.com/apple/swift-algorithms/blob/main/Tests/SwiftAlgorithmsTests/MergePartitionsTests.swift)] +- Between Arbitrary Sequences: + [[Source](https://github.com/apple/swift-algorithms/blob/main/Sources/Algorithms/Merge.swift) | + [Tests](https://github.com/apple/swift-algorithms/blob/main/Tests/SwiftAlgorithmsTests/MergeTests.swift)] + +Splice two sequences that use the same sorting criteria into a sequence that +is also sorted with that criteria. + +If the sequences are sorted with something besides the less-than operator (`<`), +then a predicate can be supplied: + +```swift +let merged = lazilyMerge([10, 4, 0, 0, -3], [20, 6, 1, -1, -5], keeping: .sum, sortedBy: >) +print(Array(merged)) +// [20, 10, 6, 4, 1, 0, 0, -1, -3, -5] +``` + +Sorted sequences can be treated as (multi-)sets. +Due to being sorted, +distinguishing elements that are shared between sequences or +are exclusive to a sequence can be determined in a resonable time frame. +Set operations take advantage of the catagories of sharing, +so applying operations can be done in-line during merging: + +```swift +let first = [0, 1, 1, 2, 5, 10], second = [-1, 0, 1, 2, 2, 7, 10, 20] +print(merge(first, second, keeping: .union)) +print(merge(first, second, keeping: .intersection)) +print(merge(first, second, keeping: .secondWithoutFirst)) +print(merge(first, second, keeping: .sum)) // Standard merge! +/* +[-1, 0, 1, 1, 2, 2, 5, 7, 10, 20] +[0, 1, 2, 10] +[-1, 2, 7, 20] +[-1, 0, 0, 1, 1, 1, 2, 2, 2, 5, 7, 10, 10, 20] +*/ +``` + +## Detailed Design + +The merging algorithm can be applied in two domains: + +- Free functions taking the source sequences. +- Functions over a `MutableCollection & BidirectionalCollection`, + where the two sources are adjancent partitions of the collection. + +Besides the optional ordering predicate, +the partition-merging methods' other parameter is the index to the +first element of the second partition, +or `endIndex` if that partition is empty. + +Besides the optional ordering predicate, +the free functions take the two operand sequences and the desired set operation +(intersection, union, symmetric difference, *etc.*). +Use `.sum` for a conventional merge. +Half of those functions take an extra parameter taking a reference to +a collection type. +These functions create an object of that type and eagerly fill it with the +result of the merger. +The functions without that parameter return a special sequence that lazily +generates the result of the merger. + +```swift +// Merging two adjacent partitions. + +extension MutableCollection where Self : BidirectionalCollection { + /// Assuming that both this collection's slice before the given index and + /// the slice at and past that index are both sorted according to + /// the given predicate, + /// rearrange the slices' elements until the collection as + /// a whole is sorted according to the predicate. + public mutating func mergePartitions( + across pivot: Index, + sortedBy areInIncreasingOrder: (Element, Element) throws(Fault) -> Bool + ) throws(Fault) where Fault : Error +} + +extension MutableCollection where Self : BidirectionalCollection, Self.Element : Comparable { + /// Assuming that both this collection's slice before the given index and + /// the slice at and past that index are both sorted, + /// rearrange the slices' elements until the collection as + /// a whole is sorted. + public mutating func mergePartitions(across pivot: Index) +} + +// Merging two sequences with free functions, applying a set operation. +// Has lazy and eager variants. + +/// Given two sequences treated as (multi)sets, both sorted according to +/// a given predicate, +/// return a sequence that lazily vends the also-sorted result of applying a +/// given set operation to the sequence operands. +public func lazilyMerge( + _ first: First, _ second: Second, keeping filter: MergerSubset, + sortedBy areInIncreasingOrder: @escaping (First.Element, Second.Element) -> Bool +) -> MergedSequence +where First : Sequence, Second : Sequence, First.Element == Second.Element + +/// Given two sorted sequences treated as (multi)sets, +/// return a sequence that lazily vends the also-sorted result of applying a +/// given set operation to the sequence operands. +public func lazilyMerge( + _ first: First, _ second: Second, keeping filter: MergerSubset +) -> MergedSequence +where First : Sequence, Second : Sequence, First.Element : Comparable, + First.Element == Second.Element + +/// Returns a sorted array containing the result of the given set operation +/// applied to the given sorted sequences, +/// where sorting is determined by the given predicate. +public func merge( + _ first: First, _ second: Second, keeping filter: MergerSubset, + sortedBy areInIncreasingOrder: (First.Element, Second.Element) throws(Fault) -> Bool +) throws(Fault) -> [Second.Element] +where First : Sequence, Second : Sequence, + Fault : Error, First.Element == Second.Element + +/// Returns a sorted array containing the result of the given set operation +/// applied to the given sorted sequences. +public func merge( + _ first: First, _ second: Second, keeping filter: MergerSubset +) -> [Second.Element] +where First : Sequence, Second : Sequence, + First.Element : Comparable, First.Element == Second.Element +``` + +Target subsets are described by a new type. + +```swift +/// Description of which elements of a merger will be retained. +public enum MergerSubset : UInt, CaseIterable +{ + case none, firstWithoutSecond, secondWithoutFirst, symmetricDifference, + intersection, first, second, union, + sum + + //... +} +``` + +Every set-operation combination is provided, although some are degenerate. + +The merging free-functions use these support types: + +```swift +/// A sequence that reads from two sequences treated as (multi)sets, +/// where both sequences' elements are sorted according to some predicate, +/// and emits a sorted merger, +/// excluding any elements barred by a set operation. +public struct MergedSequence + : Sequence, LazySequenceProtocol + where First : Sequence, Second : Sequence, Fault : Error, + First.Element == Second.Element +{ + //... +} + +/// An iterator that reads from two virtual sequences treated as (multi)sets, +/// where both sequences' elements are sorted according to some predicate, +/// and emits a sorted merger, +/// excluding any elements barred by a set operation. +public struct MergingIterator + : IteratorProtocol + where First : IteratorProtocol, Second : IteratorProtocol, Fault : Error, + First.Element == Second.Element +{ + //... +} +``` + +The partition merger operates **O(** 1 **)** in space; +for time it works at _???_ for random-access collections and +_???_ for bidirectional collections. + +The eager merging free functions operate at **O(** _n_ `+` _m_ **)** in +space and time, +where *n* and *m* are the lengths of the source sequences. +The lazy merging free functions operate at **O(** 1 **)** in space and time. +Actually generating the entire merged sequence will take +**O(** _n_ `+` _m_ **)** over distributed time. + +### Naming + +Many merging functions use the word "merge" in their name. + +**[C++]:** Provides the `merge` and `inplace_merge` functions. +Set operations are provided by +the `set_union`, `set_intersection`, `set_difference`, and +`set_symmetric_difference` functions. diff --git a/Guides/README.md b/Guides/README.md index d4894882..fbb8b35d 100644 --- a/Guides/README.md +++ b/Guides/README.md @@ -12,6 +12,7 @@ These guides describe the design and intention behind the APIs included in the ` #### Mutating algorithms +- [`mergePartitions(across:)`, `mergePartitions(across:sortedBy:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/MergedSorted.md): In-place merger of sorted partitions. - [`rotate(toStartAt:)`, `rotate(subrange:toStartAt:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Rotate.md): In-place rotation of elements. - [`stablePartition(by:)`, `stablePartition(subrange:by:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Partition.md): A partition that preserves the relative order of the resulting prefix and suffix. @@ -20,6 +21,7 @@ These guides describe the design and intention behind the APIs included in the ` - [`chain(_:_:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Chain.md): Concatenates two collections with the same element type. - [`cycled()`, `cycled(times:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Cycle.md): Repeats the elements of a collection forever or a set number of times. - [`joined(by:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Joined.md): Concatenate sequences of sequences, using an element or sequence as a separator, or using a closure to generate each separator. +- [`lazilyMerge(_:_:keeping:sortedBy:)`, `lazilyMerge(_:_:keeping:)`, `merge(_:_:keeping:sortedBy:)`, `merge(_:_:keeping:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Merge.md): Merge two sorted sequences together. - [`product(_:_:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Product.md): Iterates over all the pairs of two collections; equivalent to nested `for`-`in` loops. #### Subsetting operations diff --git a/Sources/Algorithms/Documentation.docc/Algorithms.md b/Sources/Algorithms/Documentation.docc/Algorithms.md index 3cfb2693..95a2aa70 100644 --- a/Sources/Algorithms/Documentation.docc/Algorithms.md +++ b/Sources/Algorithms/Documentation.docc/Algorithms.md @@ -40,3 +40,4 @@ Explore more chunking methods and the remainder of the Algorithms package, group - - - +- diff --git a/Sources/Algorithms/Documentation.docc/Merging.md b/Sources/Algorithms/Documentation.docc/Merging.md new file mode 100644 index 00000000..ad72b7a0 --- /dev/null +++ b/Sources/Algorithms/Documentation.docc/Merging.md @@ -0,0 +1,22 @@ +# Merging + +Merge two sorted sequences as a new sorted sequence. +Take two sorted sequences to be treated as sets, +then generate the result of applying a set operation. + +## Topics + +### Merging Sorted Sequences + +- ``lazilyMerge(_:_:keeping:sortedBy)`` +- ``lazilyMerge(_:_:keeping:)`` +- ``merge(_:_:keeping:sortedBy)`` +- ``merge(_:_:keeping:)`` +- ``Swift/MutableCollection/mergePartitions(across:sortedBy:)`` +- ``Swift/MutableCollection/mergePartitions(across:)`` + +### Supporting Types + +- ``MergerSubset`` +- ``MergedSequence`` +- ``MergingIterator`` diff --git a/Sources/Algorithms/Merge.swift b/Sources/Algorithms/Merge.swift new file mode 100644 index 00000000..f32a897c --- /dev/null +++ b/Sources/Algorithms/Merge.swift @@ -0,0 +1,462 @@ +//===----------------------------------------------------------------------===// +// +// This source file is part of the Swift Algorithms open source project +// +// Copyright (c) 2024 Apple Inc. and the Swift project authors +// Licensed under Apache License v2.0 with Runtime Library Exception +// +// See https://swift.org/LICENSE.txt for license information +// +//===----------------------------------------------------------------------===// + +//===----------------------------------------------------------------------===// +// MARK: MergerSubset +//-------------------------------------------------------------------------===// + +/// Description of which elements of a merger will be retained. +public enum MergerSubset: UInt, CaseIterable { + /// Keep no elements. + case none + /// Keep the elements of the first source that are not also in the second. + case firstWithoutSecond + /// Keep the elements of the second source that are not also in the first. + case secondWithoutFirst + /// Keep the elements of both sources that are not present in the other. + case symmetricDifference + /// Keep the elements that are present in both sorces. + case intersection + /// Keep only the elements from the first source. + case first + /// Keep only the elements from the second source. + case second + /// Keep all of the elements from both sources, consolidating shared ones. + case union + /// Keep all elements from both sources, including duplicates. + case sum = 0b1111 // `union` with an extra bit to distinguish. +} + +extension MergerSubset { + /// Whether the elements exclusive to the first source are emitted. + @inlinable + public var emitsExclusivesToFirst: Bool { rawValue & 0b001 != 0 } + /// Whether the elements exclusive to the second source are emitted. + @inlinable + public var emitsExclusivesToSecond: Bool { rawValue & 0b010 != 0 } + /// Whether the elements shared by both sources are emitted. + @inlinable + public var emitsSharedElements: Bool { rawValue & 0b100 != 0 } +} + +extension MergerSubset { + /// Create a filter specifying a full merge (duplicating the shared elements). + @inlinable + public init() { self = .sum } + /// Create a filter specifying which categories of elements are included in + /// the merger, with shared elements consolidated. + public init(keepExclusivesToFirst: Bool, keepExclusivesToSecond: Bool, + keepSharedElements: Bool) { + self = switch (keepSharedElements, keepExclusivesToSecond, + keepExclusivesToFirst) { + case (false, false, false): .none + case (false, false, true): .firstWithoutSecond + case (false, true, false): .secondWithoutFirst + case (false, true, true): .symmetricDifference + case ( true, false, false): .intersection + case ( true, false, true): .first + case ( true, true, false): .second + case ( true, true, true): .union + } + } +} + +extension MergerSubset { + /// Return the worst-case bounds with the given source lengths. + /// + /// These non-necessarily exclusive conditions can affect the result: + /// + /// - One or both of the sources is empty. + /// - The sources are identical. + /// - The sources have no elements in common. + /// - The shorter source is a subset of the longer one. + /// - The sources have just partial overlap. + /// + /// Both inputs must be nonnegative. + fileprivate + func expectedCountRange(given firstLength: Int, and secondLength: Int) + -> ClosedRange { + /// Generate a range for a single value without repeating its expression. + func singleValueRange(_ v: Int) -> ClosedRange { return v...v } + + return switch self { + case .none: + singleValueRange(0) + case .firstWithoutSecond: + max(firstLength - secondLength, 0)...firstLength + case .secondWithoutFirst: + max(secondLength - firstLength, 0)...secondLength + case .symmetricDifference: + abs(firstLength - secondLength)...(firstLength + secondLength) + case .intersection: + 0...min(firstLength, secondLength) + case .first: + singleValueRange(firstLength) + case .second: + singleValueRange(secondLength) + case .union: + max(firstLength, secondLength)...(firstLength + secondLength) + case .sum: + singleValueRange(firstLength + secondLength) + } + } +} + +//===----------------------------------------------------------------------===// +// MARK: - Merging functions +//-------------------------------------------------------------------------===// + +/// Given two sequences treated as (multi)sets, both sorted according to +/// a given predicate, +/// return a sequence that lazily vends the also-sorted result of applying a +/// given set operation to the sequence operands. +/// +/// For simply merging the sequences, use `.sum` as the operation. +/// +/// - Precondition: Both `first` and `second` must be sorted according to +/// `areInIncreasingOrder`. +/// Said predicate must model a strict weak ordering over its arguments. +/// +/// - Parameters: +/// - first: The first sequence to merge. +/// - second: The second sequence to merge. +/// - filter: The subset of the merged sequence to keep. +/// - areInIncreasingOrder: The function expressing the sorting criterion. +/// - Returns: A lazy sequence for the resulting merge. +/// +/// - Complexity: O(1). +public func lazilyMerge( + _ first: First, + _ second: Second, + keeping filter: MergerSubset, + sortedBy areInIncreasingOrder: @escaping (First.Element, Second.Element) + -> Bool +) -> MergedSequence +where First.Element == Second.Element { + return .init(first, second, keeping: filter, sortedBy: areInIncreasingOrder) +} + +/// Given two sorted sequences treated as (multi)sets, +/// return a sequence that lazily vends the also-sorted result of applying a +/// given set operation to the sequence operands. +/// +/// For simply merging the sequences, use `.sum` as the operation. +/// +/// - Precondition: Both `first` and `second` must be sorted. +/// +/// - Parameters: +/// - first: The first sequence to merge. +/// - second: The second sequence to merge. +/// - filter: The subset of the merged sequence to keep. +/// - Returns: A lazy sequence for the resulting merge. +/// +/// - Complexity: O(1). +@inlinable +public func lazilyMerge( + _ first: First, + _ second: Second, + keeping filter: MergerSubset +) -> MergedSequence +where First.Element == Second.Element, Second.Element: Comparable { + return lazilyMerge(first, second, keeping: filter, sortedBy: <) +} + +/// Given two sequences treated as (multi)sets, both sorted according to +/// a given predicate, +/// eagerly apply a given set operation to the sequences then copy the +/// also-sorted result into a collection of a given type. +/// +/// For simply merging the sequences, use `.sum` as the operation. +/// +/// - Precondition: Both `first` and `second` must be sorted according to +/// `areInIncreasingOrder`. +/// Said predicate must model a strict weak ordering over its arguments. +/// Both `first` and `second` must be finite. +/// +/// - Parameters: +/// - first: The first sequence to merge. +/// - second: The second sequence to merge. +/// - type: A marker specifying the type of collection for +/// storing the result. +/// - filter: The subset of the merged sequence to keep. +/// - areInIncreasingOrder: The function expressing the sorting criterion. +/// - Returns: The resulting merge stored in a collection of the given `type`. +/// +/// - Complexity:O(`n` + `m`), +/// where *n* and *m* are the lengths of `first` and `second`. +@usableFromInline +func merge( + _ first: First, + _ second: Second, + into type: Result.Type, + keeping filter: MergerSubset, + sortedBy areInIncreasingOrder: (First.Element, Second.Element) throws(Fault) + -> Bool +) throws(Fault) -> Result +where First.Element == Second.Element, Second.Element == Result.Element { + func makeResult( + compare: @escaping (First.Element, Second.Element) throws(Fault) -> Bool + ) throws(Fault) -> Result { + var result = Result() + let sequence = _MergedSequence(first, second, keeping: filter, + sortedBy: compare) + var iterator = sequence.makeIterator() + result.reserveCapacity(sequence.underestimatedCount) + while let element = try iterator.throwingNext() { + result.append(element) + } + return result + } + + return try withoutActuallyEscaping(areInIncreasingOrder, + do: makeResult(compare:)) +} + +/// Returns a sorted array containing the result of the given set operation +/// applied to the given sorted sequences, +/// where sorting is determined by the given predicate. +/// +/// For simply merging the sequences, use `.sum` as the operation. +/// +/// - Precondition: Both `first` and `second` must be sorted according to +/// `areInIncreasingOrder`. +/// Said predicate must model a strict weak ordering over its arguments. +/// Both `first` and `second` must be finite. +/// +/// - Parameters: +/// - first: The first sequence to merge. +/// - second: The second sequence to merge. +/// - filter: The subset of the merged sequence to keep. +/// - areInIncreasingOrder: The function expressing the sorting criterion. +/// - Returns: The resulting merge stored in an array. +/// +/// - Complexity:O(`n` + `m`), +/// where *n* and *m* are the lengths of `first` and `second`. +@inlinable +public func merge( + _ first: First, + _ second: Second, + keeping filter: MergerSubset, + sortedBy areInIncreasingOrder: (First.Element, Second.Element) throws(Fault) + -> Bool +) throws(Fault) -> [Second.Element] +where First.Element == Second.Element { + return try merge(first, second, into: Array.self, keeping: filter, + sortedBy: areInIncreasingOrder) +} + +/// Returns a sorted array containing the result of the given set operation +/// applied to the given sorted sequences. +/// +/// For simply merging the sequences, use `.sum` as the operation. +/// +/// - Precondition: Both `first` and `second` must be sorted. +/// Both `first` and `second` must be finite. +/// +/// - Parameters: +/// - first: The first sequence to merge. +/// - second: The second sequence to merge. +/// - filter: The subset of the merged sequence to keep. +/// - Returns: The resulting merge stored in an array. +/// +/// - Complexity:O(`n` + `m`), +/// where *n* and *m* are the lengths of `first` and `second`. +@inlinable +public func merge( + _ first: First, + _ second: Second, + keeping filter: MergerSubset +) -> [Second.Element] +where First.Element == Second.Element, First.Element: Comparable { + return merge(first, second, keeping: filter, sortedBy: <) +} + +//===----------------------------------------------------------------------===// +// MARK: - MergedSequence +//-------------------------------------------------------------------------===// + +/// A sequence that reads from two sequences treated as (multi)sets, +/// where both sequences' elements are sorted according to some predicate, +/// and emits a sorted merger, +/// excluding any elements barred by a set operation. +public typealias MergedSequence + = _MergedSequence + where First: Sequence, Second: Sequence, First.Element == Second.Element + +/// The implementation for `MergedSequence`. +/// The public face of that type does not need an otherwise +/// unused error type declared, +/// so this type is needed to provide a way to hide the (`Never`) error type. +public struct _MergedSequence< + First: Sequence, + Second: Sequence, + Fault: Error +> where First.Element == Second.Element { + /// The elements for the first operand. + let base1: First + /// The elements for the second operand. + let base2: Second + /// The set operation to apply to the operands. + let filter: MergerSubset + /// The predicate with the sorting criterion. + let areInIncreasingOrder: (Element, Element) throws(Fault) -> Bool + + /// Create a sequence that reads from the two given sequences, + /// which will vend their merger after applying the given set operation, + /// where both the base sequences and this sequence emit their + /// elements sorted according to the given predicate. + init( + _ base1: First, + _ base2: Second, + keeping filter: MergerSubset, + sortedBy areInIncreasingOrder: @escaping (Element, Element) + throws(Fault) -> Bool + ) { + self.base1 = base1 + self.base2 = base2 + self.filter = filter + self.areInIncreasingOrder = areInIncreasingOrder + } +} + +extension _MergedSequence: Sequence { + public func makeIterator( + ) -> MergingIterator { + return .init(base1.makeIterator(), base2.makeIterator(), + keeping: filter, sortedBy: areInIncreasingOrder) + } + + public var underestimatedCount: Int { + filter.expectedCountRange( + given: base1.underestimatedCount, + and: base2.underestimatedCount + ).lowerBound + } +} + +extension _MergedSequence: LazySequenceProtocol {} + +//===----------------------------------------------------------------------===// +// MARK: - MergingIterator +//-------------------------------------------------------------------------===// + +/// An iterator that reads from two virtual sequences treated as (multi)sets, +/// where both sequences' elements are sorted according to some predicate, +/// and emits a sorted merger, +/// excluding any elements barred by a set operation. +public struct MergingIterator< + First: IteratorProtocol, + Second: IteratorProtocol, + Fault: Error +> where First.Element == Second.Element { + /// The elements for the first operand. + var base1: First? + /// The elements for the second operand. + var base2: Second? + /// The set operation to apply to the operands. + let filter: MergerSubset + /// The predicate with the sorting criterion. + let areInIncreasingOrder: (Element, Element) throws(Fault) -> Bool + + /// The latest element read from `base1`. + fileprivate var latest1: First.Element? + /// The latest element read from `base2`. + fileprivate var latest2: Second.Element? + /// Whether to continue iterating. + fileprivate var isFinished = false + + /// Create an iterator that reads from the two given iterators, + /// which will vend their merger after applying the given set operation, + /// where both the base iterators and this iterator emit their + /// elements sorted according to the given predicate. + init( + _ base1: First, + _ base2: Second, + keeping filter: MergerSubset, + sortedBy areInIncreasingOrder: @escaping (Element, Element) + throws(Fault) -> Bool + ) { + // Don't keep operand iterators that aren't needed. + switch filter { + case .none: + break + case .first: + self.base1 = base1 + case .second: + self.base2 = base2 + default: + self.base1 = base1 + self.base2 = base2 + } + + // The other members. + self.filter = filter + self.areInIncreasingOrder = areInIncreasingOrder + } +} + +extension MergingIterator: IteratorProtocol { + /// Advance to the next element, if any. May throw. + fileprivate mutating func throwingNext() throws(Fault) -> First.Element? { + while !isFinished { + // Extract another element from a source if the previous one was purged. + latest1 = latest1 ?? base1?.next() + latest2 = latest2 ?? base2?.next() + + // Of the latest valid elements, purge the smaller (or both when they are + // equivalent). Return said element if the filter permits, search again + // otherwise. + switch (latest1, latest2) { + case let (latestFirst?, latestSecond?) + where try areInIncreasingOrder(latestFirst, latestSecond): + defer { latest1 = nil } + guard filter.emitsExclusivesToFirst else { continue } + + return latestFirst + case let (latestFirst?, latestSecond?) + where try areInIncreasingOrder(latestSecond, latestFirst): + defer { latest2 = nil } + guard filter.emitsExclusivesToSecond else { continue } + + return latestSecond + case (let latestFirst?, _?): + // Purge both of the equivalent elements... + defer { + latest1 = nil + + // ...except when the second source's element is only deferred. + if filter != .sum { latest2 = nil } + } + guard filter.emitsSharedElements else { continue } + + // This will not cause mixed-source emission when only the second + // source is being vended, because this case won't ever be reached. + return latestFirst + case (nil, let latestSecond?) where filter.emitsExclusivesToSecond: + latest2 = nil + return latestSecond + case (let latestFirst?, nil) where filter.emitsExclusivesToFirst: + latest1 = nil + return latestFirst + default: + // Either both sources are exhausted, or just one is while the remainder + // of the other won't be emitted. + isFinished = true + } + } + return nil + } + + public mutating func next() -> Second.Element? { + return try! throwingNext() + } +} diff --git a/Sources/Algorithms/MergePartitions.swift b/Sources/Algorithms/MergePartitions.swift new file mode 100644 index 00000000..b067c789 --- /dev/null +++ b/Sources/Algorithms/MergePartitions.swift @@ -0,0 +1,178 @@ +//===----------------------------------------------------------------------===// +// +// This source file is part of the Swift Algorithms open source project +// +// Copyright (c) 2024 Apple Inc. and the Swift project authors +// Licensed under Apache License v2.0 with Runtime Library Exception +// +// See https://swift.org/LICENSE.txt for license information +// +//===----------------------------------------------------------------------===// + +extension MutableCollection where Self: BidirectionalCollection { + /// Assuming that both this collection's slice before the given index and + /// the slice at and past that index are both sorted according to + /// the given predicate, + /// rearrange the slices' elements until the collection as + /// a whole is sorted according to the predicate. + /// + /// Equivalent elements retain their relative order. + /// + /// It may be faster to use a global `merge` function with the partitions and + /// the sorting predicate as the arguments and then copy the + /// sorted result back. + /// + /// - Precondition: The `pivot` must be a valid index of this collection. + /// The partitions of `startIndex..( + across pivot: Index, + sortedBy areInIncreasingOrder: (Element, Element) throws(Fault) -> Bool + ) throws(Fault) { + // The pivot needs to be an interior element. + // (This therefore requires `self` to have a length of at least 2.) + guard pivot > startIndex, pivot < endIndex else { return } + + // Since each major partition is already sorted, we only need to swap the + // highest ranks of the leading partition with the lowest ranks of the + // trailing partition. + // + // - Zones: |--[1]--|--------[2]--------|------[3]------|---[4]---| + // - Before: ...[<=p], [x > p],... [>= x]; [p],... [<= x], [> x],... + // - After: ...[<=p], [p],... [<= x]; [x > p],... [>= x], [> x],... + // - Zones: |--[1]--|------[3]------|--------[2]--------|---[4]---| + // + // In other words: we're swapping the positions of zones [2] and [3]. + // + // Afterwards, the new leading partition of [1] and [3] ends up naturally + // sorted. However, the highest ranked element of [2] may outrank + // the lowest ranked element of [4], so the trailing partition ends up + // needing to call this function itself. + + // Find starting index of [2]. + let lowPivot: Index + do { + // Among the elements before the pivot, find the reverse-earliest that has + // at most an equivalent rank as the pivot element. + let pivotValue = self[pivot], searchSpace = self[.. Bool { + // e <= pivotValue → !(e > pivotValue) → !(pivotValue < e) + return try !areInIncreasingOrder(pivotValue, e) + } + if case let beforeLowPivot = try searchSpace.pi(where: atMostPivotValue), + beforeLowPivot < searchSpace.endIndex { + // In forward space, the element after the one just found will rank + // higher than the pivot element. + lowPivot = beforeLowPivot.base + + // There may be no prefix elements that outrank the pivot element. + // In other words, [2] is empty. + // (Therefore this collection is already globally sorted.) + guard lowPivot < pivot else { return } + } else { + // All the prefix elements rank higher than the pivot element. + // In other words, [1] is empty. + lowPivot = startIndex + } + } + + // Find the ending index of [3]. + let highPivot: Index + do { + // Find the earliest post-pivot element that ranks higher than the element + // from the previous step. If there isn't a match, i.e. [4] is empty, the + // entire post-pivot partition will be swapped. + let lowPivotValue = self[lowPivot] + func moreThanLowPivotValue(_ e: Element) throws(Fault) -> Bool { + return try areInIncreasingOrder(lowPivotValue, e) + } + highPivot = try self[pivot...].pi(where: moreThanLowPivotValue) + + // [3] starts with the pivot element, so it can never be empty. + } + + // Actually swap [2] and [3], then recur into [2] + [4]. + let exLowPivot = rotate(subrange: lowPivot..( + where in2nd: (Element) throws(Fault) -> Bool + ) throws(Fault) -> Index { + var n = count + var l = startIndex + + while n > 0 { + let half = n / 2 + let mid = index(l, offsetBy: half) + if try in2nd(self[mid]) { + n = half + } else { + l = index(after: mid) + n -= half + 1 + } + } + return l + } +} diff --git a/Tests/SwiftAlgorithmsTests/MergePartitionsTests.swift b/Tests/SwiftAlgorithmsTests/MergePartitionsTests.swift new file mode 100644 index 00000000..1920396d --- /dev/null +++ b/Tests/SwiftAlgorithmsTests/MergePartitionsTests.swift @@ -0,0 +1,91 @@ +//===----------------------------------------------------------------------===// +// +// This source file is part of the Swift Algorithms open source project +// +// Copyright (c) 2024 Apple Inc. and the Swift project authors +// Licensed under Apache License v2.0 with Runtime Library Exception +// +// See https://swift.org/LICENSE.txt for license information +// +//===----------------------------------------------------------------------===// + +import XCTest +import Algorithms + +final class MergePartitionsTests: XCTestCase { + /// Check mergers with collections shorter than 2 elements. + func testDegenerateCases() { + var empty = EmptyCollection() + XCTAssertEqualSequences(empty, []) + empty.mergePartitions(across: empty.startIndex) + XCTAssertEqualSequences(empty, []) + empty.mergePartitions(across: empty.endIndex) + XCTAssertEqualSequences(empty, []) + + var single = CollectionOfOne(2) + XCTAssertEqualSequences(single, [2]) + single.mergePartitions(across: single.startIndex) + XCTAssertEqualSequences(single, [2]) + single.mergePartitions(across: single.endIndex) + XCTAssertEqualSequences(single, [2]) + } + + /// Check the regular merging cases. + func testNonThrowingCases() { + // No sub-partitions empty. + var sample1 = [0, 2, 4, 6, 8, 10, 1, 3, 5, 7, 9] + sample1.mergePartitions(across: 6) + XCTAssertEqualSequences(sample1, 0...10) + + // No pre-pivot elements less than or equal to the pivot element. + var sample2 = [4, 6, 8, 3, 5, 7] + sample2.mergePartitions(across: 3) + XCTAssertEqualSequences(sample2, 3...8) + + // No pre-pivot elements greater than the pivot element. + var sample3 = [3, 4, 5, 6, 7, 8] + sample3.mergePartitions(across: 3) + XCTAssertEqualSequences(sample3, 3...8) + + // The greatest elements are in the pre-pivot partition. + var sample4 = [3, 7, 8, 9, 4, 5, 6] + sample4.mergePartitions(across: 4) + XCTAssertEqualSequences(sample4, 3...9) + } + + /// Check what happens when the predicate throws. + func testThrowingCases() { + /// An error type. + enum MyError: Error { + /// An error state. + case anError + } + + // Test throwing. + var sample5 = [5, 3], counter = 0, limit = 1 + let compare: (Int, Int) throws -> Bool = { + guard counter < limit else { throw MyError.anError } + defer { counter += 1 } + + return $0 < $1 + } + XCTAssertThrowsError(try sample5.mergePartitions(across: 1, + sortedBy: compare)) + + // Interrupted comparisons. + sample5 = [2, 2, 4, 20, 3, 3, 5, 7] + counter = 0 ; limit = 6 + XCTAssertThrowsError(try sample5.mergePartitions(across: 4, + sortedBy: compare)) + XCTAssertEqualSequences(sample5, [2, 2, 4, 20, 3, 3, 5, 7]) + + // No interruptions. + counter = 0 ; limit = .max + XCTAssertNoThrow(try sample5.mergePartitions(across: 4, sortedBy: compare)) + XCTAssertEqualSequences(sample5, [2, 2, 3, 3, 4, 5, 7, 20]) + } + + // MARK: - Sample Code + + // To be determined... +} diff --git a/Tests/SwiftAlgorithmsTests/MergeTests.swift b/Tests/SwiftAlgorithmsTests/MergeTests.swift new file mode 100644 index 00000000..0dac0c6a --- /dev/null +++ b/Tests/SwiftAlgorithmsTests/MergeTests.swift @@ -0,0 +1,183 @@ +//===----------------------------------------------------------------------===// +// +// This source file is part of the Swift Algorithms open source project +// +// Copyright (c) 2024 Apple Inc. and the Swift project authors +// Licensed under Apache License v2.0 with Runtime Library Exception +// +// See https://swift.org/LICENSE.txt for license information +// +//===----------------------------------------------------------------------===// + +import XCTest +import Algorithms + +final class MergeTests: XCTestCase { + // MARK: Support Types for Set-Operation Mergers + + /// Check the convenience initializers for `MergerSubset`. + func testMergerSubsetInitializers() { + XCTAssertEqual(MergerSubset(), .sum) + + XCTAssertEqual( + MergerSubset(keepExclusivesToFirst: false, keepExclusivesToSecond: false, + keepSharedElements: false), + .none + ) + XCTAssertEqual( + MergerSubset(keepExclusivesToFirst: true, keepExclusivesToSecond: false, + keepSharedElements: false), + .firstWithoutSecond + ) + XCTAssertEqual( + MergerSubset(keepExclusivesToFirst: false, keepExclusivesToSecond: true, + keepSharedElements: false), + .secondWithoutFirst + ) + XCTAssertEqual( + MergerSubset(keepExclusivesToFirst: false, keepExclusivesToSecond: false, + keepSharedElements: true), + .intersection + ) + XCTAssertEqual( + MergerSubset(keepExclusivesToFirst: true, keepExclusivesToSecond: true, + keepSharedElements: false), + .symmetricDifference + ) + XCTAssertEqual( + MergerSubset(keepExclusivesToFirst: true, keepExclusivesToSecond: false, + keepSharedElements: true), + .first + ) + XCTAssertEqual( + MergerSubset(keepExclusivesToFirst: false, keepExclusivesToSecond: true, + keepSharedElements: true), + .second + ) + XCTAssertEqual( + MergerSubset(keepExclusivesToFirst: true, keepExclusivesToSecond: true, + keepSharedElements: true), + .union + ) + } + + /// Check the subset emission flags for `MergerSubset`. + func testMergerSubsetFlags() { + XCTAssertEqualSequences( + MergerSubset.allCases, + [.none, .firstWithoutSecond, .secondWithoutFirst, .symmetricDifference, + .intersection, .first, .second, .union, .sum] + ) + + XCTAssertEqualSequences( + MergerSubset.allCases.map(\.emitsExclusivesToFirst), + [false, true, false, true, false, true, false, true, true] + ) + XCTAssertEqualSequences( + MergerSubset.allCases.map(\.emitsExclusivesToSecond), + [false, false, true, true, false, false, true, true, true] + ) + XCTAssertEqualSequences( + MergerSubset.allCases.map(\.emitsSharedElements), + [false, false, false, false, true, true, true, true, true] + ) + } + + // MARK: - Set-Operation Mergers + + /// Test subset sequence results, no matter if lazy or eager generation. + func mergerTests( + converter: (Range, Range, MergerSubset) -> U + ) where U.Element == Int { + let first = 0..<7, second = 3..<10 + let expectedNone = EmptyCollection(), expectedFirstOnly = 0..<3, + expectedSecondOnly = 7..<10, expectedDiff = [0, 1, 2, 7, 8, 9], + expectedIntersection = 3..<7, expectedFirst = first, + expectedSecond = second, expectedUnion = 0..<10, + expectedSum = [0, 1, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 8, 9] + do { + let sequences = Dictionary(uniqueKeysWithValues: MergerSubset.allCases.map { + return ($0, converter(first, second, $0)) + }) + XCTAssertEqualSequences(sequences[.none]!, expectedNone) + XCTAssertEqualSequences(sequences[.firstWithoutSecond]!, expectedFirstOnly) + XCTAssertEqualSequences(sequences[.secondWithoutFirst]!, expectedSecondOnly) + XCTAssertEqualSequences(sequences[.symmetricDifference]!, expectedDiff) + XCTAssertEqualSequences(sequences[.intersection]!, expectedIntersection) + XCTAssertEqualSequences(sequences[.first]!, expectedFirst) + XCTAssertEqualSequences(sequences[.second]!, expectedSecond) + XCTAssertEqualSequences(sequences[.union]!, expectedUnion) + XCTAssertEqualSequences(sequences[.sum]!, expectedSum) + + XCTAssertLessThanOrEqual(sequences[.none]!.underestimatedCount, + expectedNone.count) + XCTAssertLessThanOrEqual(sequences[.firstWithoutSecond]!.underestimatedCount, + expectedFirstOnly.count) + XCTAssertLessThanOrEqual(sequences[.secondWithoutFirst]!.underestimatedCount, + expectedSecondOnly.count) + XCTAssertLessThanOrEqual(sequences[.symmetricDifference]!.underestimatedCount, + expectedDiff.count) + XCTAssertLessThanOrEqual(sequences[.intersection]!.underestimatedCount, + expectedIntersection.count) + XCTAssertLessThanOrEqual(sequences[.first]!.underestimatedCount, + expectedFirst.count) + XCTAssertLessThanOrEqual(sequences[.second]!.underestimatedCount, + expectedSecond.count) + XCTAssertLessThanOrEqual(sequences[.union]!.underestimatedCount, + expectedUnion.count) + XCTAssertLessThanOrEqual(sequences[.sum]!.underestimatedCount, + expectedSum.count) + } + + do { + // This exercises code missed by the `sequences` tests. + let flipped = Dictionary(uniqueKeysWithValues: MergerSubset.allCases.map { + return ($0, converter(second, first, $0)) + }) + XCTAssertEqualSequences(flipped[.none]!, expectedNone) + XCTAssertEqualSequences(flipped[.firstWithoutSecond]!, expectedSecondOnly) + XCTAssertEqualSequences(flipped[.secondWithoutFirst]!, expectedFirstOnly) + XCTAssertEqualSequences(flipped[.symmetricDifference]!, expectedDiff) + XCTAssertEqualSequences(flipped[.intersection]!, expectedIntersection) + XCTAssertEqualSequences(flipped[.first]!, expectedSecond) + XCTAssertEqualSequences(flipped[.second]!, expectedFirst) + XCTAssertEqualSequences(flipped[.union]!, expectedUnion) + XCTAssertEqualSequences(flipped[.sum]!, expectedSum) + } + + } + + /// Check the lazily-generated subset sequences. + func testLazySetMergers() { + mergerTests(converter: { lazilyMerge($0, $1, keeping: $2) }) + } + + /// Check the eagerly-generated subset sequences. + func testEagerSetMergers() { + mergerTests(converter: { merge($0, $1, keeping: $2) }) + } + + // MARK: - Sample Code + + /// Check the code from documentation. + func testSampleCode() { + // From the guide. + do { + let merged = lazilyMerge([10, 4, 0, 0, -3], [20, 6, 1, -1, -5], + keeping: .sum, sortedBy: >) + XCTAssertEqualSequences(merged, [20, 10, 6, 4, 1, 0, 0, -1, -3, -5]) + } + + do { + let first = [0, 1, 1, 2, 5, 10], second = [-1, 0, 1, 2, 2, 7, 10, 20] + XCTAssertEqualSequences(merge(first, second, keeping: .union), + [-1, 0, 1, 1, 2, 2, 5, 7, 10, 20]) + XCTAssertEqualSequences(merge(first, second, keeping: .intersection), + [0, 1, 2, 10]) + XCTAssertEqualSequences(merge(first, second, keeping: .secondWithoutFirst), + [-1, 2, 7, 20]) + XCTAssertEqualSequences(merge(first, second, keeping: .sum), + [-1, 0, 0, 1, 1, 1, 2, 2, 2, 5, 7, 10, 10, 20]) + } + } +}