diff --git a/docs/guides/performance/environment.md b/docs/guides/performance/environment.md index dd06973220..621f4eb529 100644 --- a/docs/guides/performance/environment.md +++ b/docs/guides/performance/environment.md @@ -7,19 +7,36 @@ The environment where DuckDB is run has an obvious impact on performance. This p ## Hardware Configuration -### CPU and Memory +### CPU -As a rule of thumb, DuckDB requires a **minimum** of 125 MB of memory per thread. -For example, if you use 8 threads, you need at least 1 GB of memory. -For ideal performance, aggregation-heavy workloads require approx. 5 GB memory per thread and join-heavy workloads require approximately 10 GB memory per thread. +DuckDB works efficiently on both AMD64 (x86_64) and ARM64 (AArch64) CPU architectures. + +### Memory > Bestpractice Aim for 5-10 GB memory per thread. -> Tip If you have a limited amount of memory, try to [limit the number of threads]({% link docs/configuration/pragmas.md %}#threads), e.g., by issuing `SET threads = 4;`. +#### Minimum Required Memory + +As a rule of thumb, DuckDB requires a _minimum_ of 125 MB of memory per thread. +For example, if you use 8 threads, you need at least 1 GB of memory. +If you are working in a memory-constained environment, consider [limiting the number of threads]({% link docs/configuration/pragmas.md %}#threads), e.g., by issuing: + +```sql +SET threads = 4; +``` + +#### Memory for Ideal Performance + +The amount of memory required for ideal performance depends on several factors, including the data set size and the queries to execute. +Maybe surprisingly, the _queries_ have a larger effect on the memory requirement. +Workloads containing large joins over many-to-many tables yield large intermediate datasets and thus require more memory for their evaluation to fully fit into the memory. +As an approximation, aggregation-heavy workloads require 5 GB memory per thread and join-heavy workloads require 10 GB memory per thread. -### Disk +#### Larger-than-Memory Workloads -DuckDB is capable of operating both as an in-memory and as a disk-based database system. In both cases, it can spill to disk to process larger-than-memory workloads (a.k.a. out-of-core processing) for which a fast disk is highly beneficial. However, if the workload fits in memory, the disk speed only has a limited effect on performance. +DuckDB can process larger-than-memory workloads by spilling to disk. +This is possible thanks to _out-of-core_ support for grouping, joining, sorting and windowing operators. +Note that larger-than-memory workloads can be processed both in persistent mode and in in-memory mode as DuckDB still spills to disk in both modes. ### Local Disk diff --git a/docs/guides/performance/how_to_tune_workloads.md b/docs/guides/performance/how_to_tune_workloads.md index efd53332d7..bce1c6a301 100644 --- a/docs/guides/performance/how_to_tune_workloads.md +++ b/docs/guides/performance/how_to_tune_workloads.md @@ -39,10 +39,10 @@ These are called _blocking operators_ as they require their entire input to be b and are the most memory-intensive operators in relational database systems. The main blocking operators are the following: -* _sorting:_ [`ORDER BY`]({% link docs/sql/query_syntax/orderby.md %}) * _grouping:_ [`GROUP BY`]({% link docs/sql/query_syntax/groupby.md %}) -* _windowing:_ [`OVER ... (PARTITION BY ... ORDER BY ...)`]({% link docs/sql/functions/window_functions.md %}) * _joining:_ [`JOIN`]({% link docs/sql/query_syntax/from.md %}#joins) +* _sorting:_ [`ORDER BY`]({% link docs/sql/query_syntax/orderby.md %}) +* _windowing:_ [`OVER ... (PARTITION BY ... ORDER BY ...)`]({% link docs/sql/functions/window_functions.md %}) DuckDB supports larger-than-memory processing for all of these operators.