Skip to content

Commit

Permalink
BRT: More optimizations after per-vdev splitting
Browse files Browse the repository at this point in the history
- With both pending and current AVL-trees being per-vdev and having
effectively identical comparison functions (pending tree compared
also birth time, but I don't believe it is possible for them to be
different for the same offset within one transaction group), it
makes no sense to move entries from one to another.  Instead inline
dramatically simplified brt_entry_addref() into brt_pending_apply().
It no longer requires bv_lock, since there is nothing concurrent
to it at the time.  And it does not need to search the tree for the
previous entries, since it is the same tree, we already have the
entry and we know it is unique.
 - Put brt_vdev_lookup() and brt_vdev_addref() into different tree
traversals to avoid false positives in the first due to the second
entcount modifications.  It saves dramatic amount of time when a
file cloned first time by not looking for non-existent ZAP entries.
 - Remove avl_is_empty(bv_tree) check from brt_maybe_exists().  I
don't think it is needed, since by the time all added entries are
already accounted in bv_entcount. The extra check must be producing
too many false positives for no reason.  Also we don't need bv_lock
there, since bv_entcount pointer must be table at this point, and
we don't care about false positive races here, while false negative
should be impossible, since all brt_vdev_addref() have already
completed by this point.  This dramatically reduces lock contention
on massive deletes of cloned blocks.  The only remaining one is
between multiple parallel free threads calling brt_entry_decref().
 - Do not update ZAP if net change for a block over the TXG was 0.
In combination with above it makes file move between datasets as
cheap operation as originally intended if it fits into one TXG.
 - Do not allocate vdevs on pool creation or import if it did not
have active block cloning. This allows to save a bit in few cases.
 - While here, add proper error handling in brt_load() on pool
import instead of assertions.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes #16773
  • Loading branch information
amotin authored Nov 20, 2024
1 parent 49a377a commit 457f8b7
Show file tree
Hide file tree
Showing 2 changed files with 249 additions and 324 deletions.
17 changes: 7 additions & 10 deletions include/sys/brt_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -168,25 +168,22 @@ struct brt_vdev {
avl_tree_t bv_tree;
};

/* Size of bre_offset / sizeof (uint64_t). */
/* Size of offset / sizeof (uint64_t). */
#define BRT_KEY_WORDS (1)

#define BRE_OFFSET(bre) (DVA_GET_OFFSET(&(bre)->bre_bp.blk_dva[0]))

/*
* In-core brt entry.
* On-disk we use bre_offset as the key and bre_refcount as the value.
* On-disk we use ZAP with offset as the key and count as the value.
*/
typedef struct brt_entry {
uint64_t bre_offset;
uint64_t bre_refcount;
avl_node_t bre_node;
blkptr_t bre_bp;
uint64_t bre_count;
uint64_t bre_pcount;
} brt_entry_t;

typedef struct brt_pending_entry {
blkptr_t bpe_bp;
uint64_t bpe_count;
avl_node_t bpe_node;
} brt_pending_entry_t;

#ifdef __cplusplus
}
#endif
Expand Down
Loading

0 comments on commit 457f8b7

Please sign in to comment.