diff --git a/docs/admin/sharding-partitioning.rst b/docs/admin/sharding-partitioning.rst index 57d0855d..fc568861 100644 --- a/docs/admin/sharding-partitioning.rst +++ b/docs/admin/sharding-partitioning.rst @@ -111,7 +111,7 @@ cluster. Over-sharding and over-partitioning are common flaws leading to an overall poor performance. - **As a rule of thumb, a single shard should hold somewhere between 5 - 100 + **As a rule of thumb, a single shard should hold somewhere between 5 - 50 GB of data.** To avoid oversharding, CrateDB by default limits the number of shards per @@ -129,7 +129,7 @@ benchmarks across various strategies. The following steps provide a general guid - Calculate the throughput Then, to calculate the number of shards, you should consider that the size of each -shard should roughly be between 5 - 100 GB, and that each node can only manage +shard should roughly be between 5 - 50 GB, and that each node can only manage up to 1000 shards. Time series example @@ -146,12 +146,12 @@ time series data with the following assumptions: Given the daily throughput is around 10 GB/day, the monthly throughput is 30 times that (~ 300 GB). The partition column can be day, week, month, quarter, etc. So, assuming a monthly partition, the next step is to calculate the number of shards -with the **shard size recommendation** (5 - 100 GB) and the **number of nodes** in +with the **shard size recommendation** (5 - 50 GB) and the **number of nodes** in the cluster in mind. -With three shards, each shard will hold 100 GB (300 GB / 3 shards), which is too -close to the upper limit. With six shards, each shard will manage 50 GB -(300 GB / 6 shards) of data, which is closer to the recommended size range (5 - 100 GB). +With three shards, each shard would hold 100 GB (300 GB / 3 shards), which is above +the upper limit. With six shards, each shard will manage 50 GB (300 GB / 6 shards) +of data, which is right on the spot. .. code-block:: psql diff --git a/docs/feature/cluster/index.md b/docs/feature/cluster/index.md index 1454dd8b..03ef3287 100644 --- a/docs/feature/cluster/index.md +++ b/docs/feature/cluster/index.md @@ -111,7 +111,7 @@ data loss, and to improve read performance. ## Synopsis With a monthly throughput of 300 GB, partitioning your table by month, and using six shards, each shard will manage 50 GB of data, which is -within the recommended size range (5 - 100 GB). +within the recommended size range (5 - 50 GB). Through replication, the table will store three copies of your data, in order to reduce the chance of permanent data loss.