From bfd6098eef57cce703991ba45c779ced89529204 Mon Sep 17 00:00:00 2001 From: SHILPEE GUPTA <78029920+shilpeegupta14@users.noreply.github.com> Date: Mon, 17 Jan 2022 19:40:40 +0530 Subject: [PATCH 1/3] swift high performance guide added --- docs/High-Performance-systems-in-swift.md | 416 ++++++++++++++++++++++ 1 file changed, 416 insertions(+) create mode 100644 docs/High-Performance-systems-in-swift.md diff --git a/docs/High-Performance-systems-in-swift.md b/docs/High-Performance-systems-in-swift.md new file mode 100644 index 0000000..692bee7 --- /dev/null +++ b/docs/High-Performance-systems-in-swift.md @@ -0,0 +1,416 @@ + +# High Performance systems in Swift + +This contains: + +Class Vs Struct tradeoffs + +Copy on Write + +Concurrency and Locking + + + + + +## Class Vs Struct tradeoff + + +For a better tradeoff, lets first consider the differences between classes and struct: + +Classes are reference semantics and structs are value semantics. We use structs as values and +classes as objects with identity. We use structs when comparing instance data with == is required +and classes when we compare instance identity with === as classes can hold values with same +reference in the memory. This is why structs are safe in mutation while classes aren’t. + +Structs use stack memory allocation while classes rely on heap memory allocation. Stack allocation +like insertion and deletion are faster than heap that stores different reference types and +manages reference count of that object. Hence, structs have faster memory allocation than classes. +But gradually, as the number of instances increases in the stack, it becomes difficult to copy +large value types. To reduce the complexity of copying large value types, copy on write(CoW) is +used from the swift standard library. +With copy on write behaviour, we dont need to create copies of the variable when its passed, +rather only creates the copy when the objects changes its value. Eg- we have an array of 10 +million data that is being passed but a copy will only be produced if the array changes. + +Thus, the CoW is performant with struct. +CoW is also performant in struct backed by class where the performance tend to be slower as + a result of overhead. +Consider example: + + + +## + +```swift +public struct HTTPRequest { + public var method: HTTPMethod + public var target: String + public var version: HTTPVersion + public var headers: [(String, String)] + public var body: [UInt8]? + public var trailers: [(String, String)]? +] +public enum HTTPMethod: String{ + case GET + case POST + case DELETE + case PUT + // … +} +public struct HTTPVersion{ + var major: Int + var minor: Int +} +``` +``` swift +let swift=HTTPRequest(method: .GET, target: “/performance/swift”,…) +var lang=swift +lang.target=“/performance/Nio” +XCTAssertEqual(swift.target, “/performance/swift”) //results true +``` +```swift +public func transform(_ httpRequest: HTTPRequest)->HTTPRequest{ + return httpRequest +} +_ = transform(httpRequest) + +``` + + + + + + The transform function takes 52 nano seconds to run. + +However, if I use a class HTTPRequest instead of struct HTTPRequest. It gives a result in +15 nano seconds which seems quite performant ideally. But we loose upon the value semantics. +As we change the target of lang, the swift target will also be changed. This can lead to +threading problems and cause memory leaks which ain’t performant. +In such cases we can use struct backed by class. This way we can gain the value semantics +of the struct and faster results from the classes. + + + +## +``` swift +public struct HTTPRequest{ + private class _Storage{ + var method: HTTPMethod + var target: String + var version: HTTPVersion + var headers: [(String, String)] + var body: [UInt8]? + var trailers: [(String, String)]? + + init(method: HTTPMethod = .GET, target: String, version: HTTPVersion=HTTPVersion(), headers: [(String, String)],body: [UInt8]? , trailers: [(String, String)]?) { + //[…] + } + } + private var _storage: _Storage +} +``` +``` swift +extension HTTPRequest{ + public var target: HTTPMethod { + get{ + return self._storage.target + } + set{ + self._storage.target=newValue + } + } +} +``` +``` swift +//setting a new target on the Skelton code written above. +var a = HTTPRequest( … ) + a.target = “/helloowww” +``` + + +This can set the results and obtain a one-to-one relationship between a single struct and +a storage. But this isn’t the only way to set a new target. One can also do like this: + +## +``` swift +var a = HTTPRequest( … ) +let b = a +a.target = “bye” +``` + +This can result in 2 structs pointing to the same backing +storage and it can really be problematic if we want to mutate +one struct but the effects of mutation will be shown in both +the structs. This can be resolved with a copy on write approach +by using a **isKnownUniquelyReferenced** function that returns +true or false. It returns true if the object have a strong +reference, otherwise false. If its true or there is only one +backing storage to a struct then we will set the new target. +In case its false or there exists a struct who share the same +backing storage with the other struct, we will create a new +storage by copying the backing storage such that each struct will +have their own separate backing storage. + + +*However, this approach is only applicable for 2 structs sharing the same backing storage.* + + +## +``` swift +//Implementation: +//Struct backed by class with copy-on write +extension HTTPRequest{ + public var target: HTTPMethod { + get{ return self._storage.target } + set{ + if !isKnownUniquelyReferenced(&self._storage) { + self._storage = self._stroage.copy( ) + } + self._storage.target = newValue + } + } +} +extension HTTPRequest._Storage { + func copy( ) -> HTTPRequest._Storage { + return _Storage(method: self.method, target: self.target, version: self.version, + headers: self.headers, body: self.body, trailers: self.trailers) + } +} +``` +This way we achieved the results in just 15 nanoseconds by +implementing the speed of the classes and the value semantics +of structs with covering an edge case scenario of 2 structs +sharing same backing storage. +Struct backed by class with CoW shows how performant copy-on-write is. We can also CoW +box all of the structs for some more performance like this but +repeated copy-on-write can gives a time complexity of O(N^2) +performance which is actually slower. This is why it isn’t +automated in our swift compiler as well. + +## Concurrency and locking performance +A concurrency model is high in performance if it have fewer +context switches or doesn’t have concurrency problems like +deadlocks, priority inversion, race conditions or contains +lesser lock contention pattern. + + +In GCD, when we submit tasks to a concurrent queue, the system +brings up threads until all cpu cores are saturated. Once a +thread gets blocked and still there are more tasks left to +execute, the system provides more threads to maintain a +concurrency level and help release the resources held by the +first thread. In complex applications, if there are many thread +that gets blocked for tasks like accessing database queue or +other. It will result in more number of system generated threads +than there are cpu cores available. This is called thread +explosion. +Each of the blocked thread holds valuable memory and resources with an associated kernel +data structure to track the thread. This resource overhead can lead to deadlocks and cause +greater scheduling overhead. Thread explosion comes with memory and scheduling overhead. +As more threads are brought up the cpu needs to perform full context switch in order to switch +form old to new thread and as a result the blocked threads starts executing while the +scheduler timeshares the thread on the cpu to make forward progress. But as a result of thread +explosion, time sharing hundreds of threads leads to excessive context switching and +lowers the performance with greater overhead. + +## +``` swift +func deserializeArticles(from data: Data) throws -> [Article] { … } +func updateDatabase(with articles: [Article], for feed: Feed) { … } +let urlSession = URLSession(configuration: .default, delegate: self, delegateQueue: concurrentQueue) +for feed in feedsToUpdate { + let dataTask = urlSession.dataTask(with: feed.url) { data, response, error in + guard let data = data else { return } + do { + let articles = try deserializeArticles(from: data) + databaseQueue.sync { + updateDatabase(with: articles, for: feed) + } + } catch { /* … */} + } + dataTask.resume( ) +} +``` +For good performance, swift concurrency provides ability to switch between continuations +instead of full context switches. This behaviour is implemented by making a runtime contract +that threads are always able to make forward progress by use of language features like : + +await semantics - This provides an asynchronous wait that doesn’t block the current thread +while waiting from the async functions instead the function maybe suspended and the thread +will be freed upto execute other tasks. +Runtime tracking of dependancies between tasks - Every continuation is dependent on the runtime of +the async function. Similarly within a task group, a parent task may create several child tasks +and each of those child tasks needs to complete before a parent task can proceed. This runtime +is explicitly known to the swift compiler and runtime who helps preserve the contract by help +of primitives like await, actors and task groups. + +``` swift +//Swift concurrency using task group +func deserializeArticles(from data: Data) throws -> [Article] { … } +func updateDatabase(with articles: [Article], for feed: Feed) async { … } +await withThrowingTaskGroup(of: [Article].self) { group in + for feed in feedsToUpdate { + group.async { + let (data, response) = try await URLSession.shared.data(from: feed.url) + let articles = try deserializeArticles(from: data) + await updateDatabase(with: articles, for: feed) + return articles + } + } +} +``` +This runtime contract helps building a new thread pool that will spawn only as many threads +as there are cpu cores, thereby making sure not to overcommit and hinder performance. This +way we were able to reduce context switches, eliminate deadlocks and boosted performance. + +When we use concurrent queues , we might run into thread explosion often. Hence, in a WWDC talk +about concurrency with Grand Central Dispatch, it was advised to structure applications into +distinct subsystems and maintain one serial dispatch queue per subsystem to control the +concurrency of the application. As we know, serial queues are concurrent with other queues so +there is still a performance benefit of offloading work to a queue, even if it’s concurrent. +But it was difficult to get concurrency greater than one within a subsystem without running the +risk of thread explosion. Now, with the new semantics that’s leveraged by the runtime, we can +obtain a safe and controlled concurrency. +Therefore, it is advised to use concurrent queue, only if there is measurable performance benefit. + +Even multiple concurrent threads where each thread is running sequentially and as long as they +don’t communicate via shared memory, they can obtain desired results but once different +concurrent threads share memory, the program crashes. To avoid this, we use thread +synchronisation. Synchronization offers +1) Mutual Exclusion +2) Reentrancy and Prioritisation +3) Main Actor + +Consider an example of updating database with some article by syncing it to a serial queue. +If the queue is not running, the calling thread is reused to execute the work item on the queue +without any context switch. If the serial queue is running ,the calling thread is blocked. +This blocking behaviour triggers thread explosion. + +``` swift +databaseQueue.sync { updateDatabase(with: articles, for: feed) } +``` +Hence it is advised to use dispatch async that is non blocking, so even under contention it +will not lead to thread explosion. However when there is no contention, dispatch needs to +request a new thread to do async work while the calling thread continues to do other task. +However, frequent use of dispatch async can lead to excess thread wake ups and context switches +which can significantly lower the performance. + +``` swift +databaseQueue.async { /*…background work…. */} +``` +This brings us to actor who is efficient in contention and non contention cases. Actors +guarantee mutual exclusion such that one work item may be active at a given time which means +that the actor state is not accessed concurrently preventing data races. Actors can be +considered as a class with some private state and an associated serial dispatch queue. +When we call a method on an actor that is not running, the calling thread can be reused to +execute the method call. In case where the called actor is already running, the calling thread +can suspend the executing function and pick up other work. + +In case there are lot of asynchronous work and specially a lot of contention the system needs to +make trade of based on the task priority. A high priority work such as user interaction would take +precedence over background work of saving backups. Actors are designed to allow the system to +prioritise work well due to notion of reentrancy. + + +## + +![alt text](https://github.com/shilpeegupta14/images/blob/main/Screenshot%202022-01-17%20at%201.25.47%20AM.png?raw=true) + + Quick recap of contention and non contention cases. + + +Consider a news application with serial database queue. Suppose the database receive a high +priority work of fetching latest data to update the ui and low priority work of backing up the +database to iCloud. + +``` swift +databaseQueue.async{ fetchLatestForDisplay() }. //user initiated +``` +``` swift +databaseQueue.async { backUpToiCloud() } //background +``` +The database queue looks like this: + +![alt text](https://github.com/shilpeegupta14/images/blob/main/Screenshot%202022-01-17%20at%201.35.59%20AM.png?raw=true) + +As the code runs new work items are created in interleaved order. Dispatch queue execute the +items received in a strict first in first out order. Unfortunately this means that after item A +has executed 5 lower priority items, it can then execute the next high priority item. This is +called a priority inversion. Serial queue work around priority inversion by boosting the priority +of all the work in the queue that is ahead of the high priority work In practice this means that +the work in the queue will be done sooner. + +Solving this issue require changing the semantic model away from first in first out which brings +actor reentrancy. Actor Reentrancy means new work items on a actor can make progress while one or +more older work items on it are suspended. + +``` swift +await database.fetchLatestForDisplay() //user initiated +``` +``` swift +await database.backUpToiCloud() //background +``` +![alt text](https://github.com/shilpeegupta14/images/blob/main/Screenshot%202022-01-17%20at%201.40.02%20AM.png?raw=true) + + Fig-Task A has high priority so it is executed first. +![alt text](https://github.com/shilpeegupta14/images/blob/main/Screenshot%202022-01-17%20at%201.41.32%20AM.png?raw=true) + + Fig-After task B is executed, all the lower priority items are executed. + +Since actors are designed for reentrancy the runtime may choose to move the higher priority item to +the front of the queue ahead of the lower priority items. This way higher priority work could be +executed first with lower priority work following later. This directly addresses the problem of +priority inversion allowing for more effective scheduling and resource utilisation. + +Consider the following code where we have a function update Articles on MainActor, which loads +articles out of the database and updates the UI for each article. Each iteration of the loop +requires at least two context switches: one to hop from the main actor to the database actor and +one to hop back. + +``` swift +// on database actor +func loadArticle(with id: ID) async throws -> Article { …} +@MainActor func updateUI(for article: Article) async { .. } +@MainActor func updateArticles(for ids: [ID]) async throws { + for id in ids{ + let article = try await database.loadArticle(with: id) + await updateUI(for: article) + } +} +``` + +Since each loop iteration requires two context switches, there is a repeating pattern where two +threads run one after another for a short span of time. If the number of loop iterations is low, +and substantial work is being done in each iteration, that is probably all right. However, if +execution hops on and off the main actor frequently, the overhead of switching threads can start +to add up. + +Restructuring the code so that work for the main actor is batched up. We can batch up work by +pushing the loop into the loadArticles and updateUI method calls, making sure they process arrays +instead of one value at a time. Batching up work reduces the number of context switches and help +increase performance by reducing the time complexity of the program. + +``` swift +//on database actor +func loadArticles(with ids: [ID]) async throws -> [Article] +@MainActor func updateUI(for articles: [Article]) async +@MainActor func updateArticles(for ids: [ID]) async throws { + let articles=try await database.loadArticles(with: ids) + await updateUI(for: articles) +} +``` +Note that swift makes no guaranty that the thread that executed the code before the await is the +same thread which will pick up the continuation as well. At the await point in our code, several +tasks are descheduled which can hold lock across await. Though locks ensure code safety and ensure +forward progress but persistent lock contention limits performance. Primitives like os_unfair_locks +and NSLocks are safe but compiler doesn’t provide support in correct usage of locks and the +programmer needs to handle it correctly. Primitives like semaphores and condition variables are +unsafe to use with swift concurrency as they hide dependency information from the swift runtime but +introduce a dependency in code execution. + +Lock contention depends upon how often lock is requested and how long it is held once acquired. +If its higher, the processor can sit idle despite of plenty of task provided. Hence it is advised +to use concurrency primitives like await, actors, and task groups, that are made known at compile +time in order to preserve the runtime contract and achieve higher performance. + From afb62a91830a82ad2a80b6c1495028965f98b900 Mon Sep 17 00:00:00 2001 From: SHILPEE GUPTA <78029920+shilpeegupta14@users.noreply.github.com> Date: Mon, 17 Jan 2022 23:52:29 +0530 Subject: [PATCH 2/3] Update High-Performance-systems-in-swift.md --- docs/High-Performance-systems-in-swift.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/High-Performance-systems-in-swift.md b/docs/High-Performance-systems-in-swift.md index 692bee7..4399d3b 100644 --- a/docs/High-Performance-systems-in-swift.md +++ b/docs/High-Performance-systems-in-swift.md @@ -18,22 +18,22 @@ Concurrency and Locking For a better tradeoff, lets first consider the differences between classes and struct: -Classes are reference semantics and structs are value semantics. We use structs as values and +Classes have reference semantics and structs have value semantics. We use structs as values and classes as objects with identity. We use structs when comparing instance data with == is required -and classes when we compare instance identity with === as classes can hold values with same -reference in the memory. This is why structs are safe in mutation while classes aren’t. +and classes when we compare instance identity with ===. Classes can hold values with same +reference in the memory and can undergo mutation while structs dont. Structs use stack memory allocation while classes rely on heap memory allocation. Stack allocation -like insertion and deletion are faster than heap that stores different reference types and +like insertion, deletion etc are faster than heap that stores different reference types and manages reference count of that object. Hence, structs have faster memory allocation than classes. But gradually, as the number of instances increases in the stack, it becomes difficult to copy -large value types. To reduce the complexity of copying large value types, copy on write(CoW) is -used from the swift standard library. +large value types and structs becomes slower. To reduce the complexity of copying large value types, +copy on write(CoW) is used from the swift standard library. With copy on write behaviour, we dont need to create copies of the variable when its passed, rather only creates the copy when the objects changes its value. Eg- we have an array of 10 million data that is being passed but a copy will only be produced if the array changes. -Thus, the CoW is performant with struct. +Thus, the CoW is performant with struct here. CoW is also performant in struct backed by class where the performance tend to be slower as a result of overhead. Consider example: From c4b7e421603d63546e117a8008b09449b736c788 Mon Sep 17 00:00:00 2001 From: SHILPEE GUPTA <78029920+shilpeegupta14@users.noreply.github.com> Date: Mon, 17 Jan 2022 23:53:45 +0530 Subject: [PATCH 3/3] Update High-Performance-systems-in-swift.md --- docs/High-Performance-systems-in-swift.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/High-Performance-systems-in-swift.md b/docs/High-Performance-systems-in-swift.md index 4399d3b..677b1db 100644 --- a/docs/High-Performance-systems-in-swift.md +++ b/docs/High-Performance-systems-in-swift.md @@ -34,8 +34,6 @@ rather only creates the copy when the objects changes its value. Eg- we have an million data that is being passed but a copy will only be produced if the array changes. Thus, the CoW is performant with struct here. -CoW is also performant in struct backed by class where the performance tend to be slower as - a result of overhead. Consider example: