-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-17458. Remove unnecessary BP lock in ReplicaMap. #6717
Conversation
💔 -1 overall
This message was automatically generated. |
@Hexiaoqiao @zhangshuyan0 @tomscut Sir, could you please take a look at this PR when you have free time? Thanks a lot. |
💔 -1 overall
This message was automatically generated. |
@hfutatzhanghb Thanks for your works. We should be careful to remove BP lock here. List one of the changes as example, it will return one definite value before this PR because hold RW lock here, but uncertain after this PR, such as another thread invoke
I didn't traverse all invoker here, and not sure if it will involve some potential risk. FYI. |
💔 -1 overall
This message was automatically generated. |
Sir, Thanks for your replying. Yes, we need to be very careful to modify class ReplicaMap. In fact, i have check the methods one by one and I think we can push this PR forward after it runs stablely on our product for a long time. |
💔 -1 overall
This message was automatically generated. |
} | ||
return m.put(replicaInfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not safe here. If there is somebody changing the map
after line125 but before line127, the replicaInfo
may not be able to added to map
.
💔 -1 overall
This message was automatically generated. |
@hfutatzhanghb As discussed above (which is also confirmed by @zhangshuyan0), It is unsafe to remove BP lock here, just close this PR. Please feel free to reopen it if there is one more graceful improvement. Thanks again. |
@Hexiaoqiao Ok, Sir. I have push another PR to optimize this. please check #6764 |
Description of PR
In HDFS-16429 we make LightWeightResizableGSet to be thread safe, and in HDFS-16511 we change some methods in ReplicaMap to acquire read lock instead of acquiring write lock.
This PR try to remove unnecessary Block_Pool read lock further.
Recently, I performed stress tests on datanodes to measure their read/write operations/second.
Before we removing some lock(createRbw、finalizeBlock etc.), it can only achieve ~2K write ops. After optimizing, it can achieve more than 5K write ops.
I have summarize all caller methods of ReplicaMap#get, ReplicaMap#remove, ReplicaMap#replicas, ReplicaMap#add.
It is safe to remove.
Below table is ReplicaMap#add's invocation point.