Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc: Improve OutputBufferManager initialization #11350

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yingsu00
Copy link
Collaborator

Starting from C++11, the C++ standard guarantees that the initialization
of function-local static variables is thread-safe. This is better than
using a global mutex, especially for subsequent getInstance() calls. This
is because the overhead of using a static local variable only needs to do
a simple check to see if the variable has already been initialized, while
for the global mutex case, all getInstance() calls need to aquire this
lock exclusively.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 25, 2024
Copy link

netlify bot commented Oct 25, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 432dc90
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/675213e7f6e2970007ab12ff

@yingsu00 yingsu00 marked this pull request as ready for review October 26, 2024 02:17
@yingsu00 yingsu00 requested review from kgpai and Yuhta October 26, 2024 02:20
return instance_;
std::weak_ptr<OutputBufferManager> OutputBufferManager::getInstance(
const Options* optionsPtr) {
auto options = optionsPtr ? *optionsPtr : Options();
Copy link
Contributor

@Yuhta Yuhta Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We lose some check to make sure no new option is passed in if the instance is already created. On the other hand, it seems OutputBufferManager::initialize is not used anywhere. Maybe just drop the option parameter altogether?

Also I don't see why it needs to be a weak_ptr. Should just return const shared_ptr<OutputBufferManager>&.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Yuhta I have updated the code to

std::shared_ptr<OutputBufferManager> OutputBufferManager::getInstance() {
  static std::shared_ptr<OutputBufferManager> instance =
      std::make_shared<OutputBufferManager>(Options());
  return instance;
}

Appreciate your review again!

instance_ = std::make_shared<OutputBufferManager>(Options());
}
return instance_;
std::shared_ptr<OutputBufferManager> OutputBufferManager::getInstance() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can return const std::shared_ptr<OutputBufferManager>& to avoid a copy and atomic add (thus cache invalidation).

@yingsu00 yingsu00 force-pushed the static branch 3 times, most recently from 189aea0 to 67d13ea Compare November 2, 2024 00:33
@Yuhta Yuhta added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Nov 4, 2024
@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mbasmanova
Copy link
Contributor

@yingsu00 CI is red. Please, take a look. We need a green CI to merge.

@yingsu00
Copy link
Collaborator Author

yingsu00 commented Nov 6, 2024

@yingsu00 CI is red. Please, take a look. We need a green CI to merge.

CI is green now. Thanks for porting it.

@ManManson
Copy link
Contributor

Thanks for working on this!

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mbasmanova
Copy link
Contributor

@yingsu00 Seeing failures in Prestissimo. If this change is not backwards compatible it needs to be fixed.

fbcode/github/presto-trunk/presto-native-execution/presto_cpp/main/PeriodicTaskManager.cpp:239:55: error: no member named 'lock' in 'std::shared_ptr<facebook::velox::exec::OutputBufferManager>'
  239 |       velox::exec::OutputBufferManager::getInstance().lock()->numBuffers());
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
buck-out/v2/gen/fbcode/4ac7a641bc650342/velox/common/base/__velox_stats_reporter__/buck-headers/velox/common/base/StatsReporter.h:208:43: note: expanded from macro 'RECORD_METRIC_VALUE'
  208 |         reporter->addMetricValue((key), ##__VA_ARGS__);        \
      |                                           ^~~~~~~~~~~
1 error generated.

@yingsu00
Copy link
Collaborator Author

yingsu00 commented Nov 7, 2024

@yingsu00 Seeing failures in Prestissimo. If this change is not backwards compatible it needs to be fixed.

fbcode/github/presto-trunk/presto-native-execution/presto_cpp/main/PeriodicTaskManager.cpp:239:55: error: no member named 'lock' in 'std::shared_ptr<facebook::velox::exec::OutputBufferManager>'
  239 |       velox::exec::OutputBufferManager::getInstance().lock()->numBuffers());
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
buck-out/v2/gen/fbcode/4ac7a641bc650342/velox/common/base/__velox_stats_reporter__/buck-headers/velox/common/base/StatsReporter.h:208:43: note: expanded from macro 'RECORD_METRIC_VALUE'
  208 |         reporter->addMetricValue((key), ##__VA_ARGS__);        \
      |                                           ^~~~~~~~~~~
1 error generated.

I'll send a PR to Presto repo to fix this.

@yingsu00
Copy link
Collaborator Author

yingsu00 commented Nov 7, 2024

@yingsu00 Seeing failures in Prestissimo. If this change is not backwards compatible it needs to be fixed.

fbcode/github/presto-trunk/presto-native-execution/presto_cpp/main/PeriodicTaskManager.cpp:239:55: error: no member named 'lock' in 'std::shared_ptr<facebook::velox::exec::OutputBufferManager>'
  239 |       velox::exec::OutputBufferManager::getInstance().lock()->numBuffers());
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
buck-out/v2/gen/fbcode/4ac7a641bc650342/velox/common/base/__velox_stats_reporter__/buck-headers/velox/common/base/StatsReporter.h:208:43: note: expanded from macro 'RECORD_METRIC_VALUE'
  208 |         reporter->addMetricValue((key), ##__VA_ARGS__);        \
      |                                           ^~~~~~~~~~~
1 error generated.

I'll send a PR to Presto repo to fix this.

Draft Presto PR: prestodb/presto#23970. How do you want to proceed?
Once this Velox PR is merged, I can add an advance Velox commit in prestodb/presto#23970. The "advance Velox" will need the hash of this commit.

@mbasmanova
Copy link
Contributor

@yingsu00 We cannot merge Velox PR that breaks Prestissimo. The changes need to be made in backward-compatible manner.

@mbasmanova
Copy link
Contributor

@yingsu00 Some examples: #5953 #6459

@mbasmanova mbasmanova removed the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Nov 7, 2024
@yingsu00
Copy link
Collaborator Author

yingsu00 commented Nov 7, 2024

@yingsu00 Some examples: #5953 #6459

@mbasmanova Thanks for the examples, but could you elaborate a bit how to follow these examples? Since OutputbufferManager::getInstance() now returns shared_ptr, all callsites, including the ones in Presto, needs to remove the lock() from OutputbufferManager::getInstance().lock(). Did you mean you wanted to keep it returning weak_ptr? Returning weak_ptr is not quite appropriate here, as @Yuhta suggested. It makes sense to keep any class fields to this object a weak_ptr, e.g. in PartitionedOutput const std::weak_ptr<exec::OutputBufferManager> bufferManager_;, but it's better to return a shared_ptr for OutputbufferManager::getInstance().

Also I don't quite understand why merging this PR would break Presto given that Presto needs to advance Velox version to pick up this change. Presto won't break unless someone else advances Velox for Presto without merging prestodb/presto#23970 together. To avoid that situation, I will add "Advance Velox" change to prestodb/presto#23970 once I get the hash after this change is merged to Velox. Do you think if it works this way? Or do you have other suggestions? Appreciate it if you could elaborate more. Thanks!

@yingsu00
Copy link
Collaborator Author

@ManManson @mbasmanova @Yuhta Can anyone provide some advice how to proceed? Thank you!

@ManManson
Copy link
Contributor

The most straightforward way would be to split the work into several pieces as follows:

  1. Introduce std::shared_ptr<...> getInstanceStrong() and convert uses that are strictly confined to Velox internals, but leave public APIs (which are used by Prestissimo) as is.
  2. Upgrade Velox version in Presto C++ and convert all references to OutputBufferManager to use getInstanceStrong(). This ensures that we don't have references to the old method signature anymore.
  3. Make old getInstance() to also return std::shared_ptr<...> (so getInstance() and getInstanceStrong() become the same thing).
  4. Upgrade Velox version in Presto C++ once again and convert all uses of getInstanceStrong() to getInstance().
  5. Now we are free to finally drop the getInstanceStrong() method.

@yingsu00
Copy link
Collaborator Author

The most straightforward way would be to split the work into several pieces as follows:

  1. Introduce std::shared_ptr<...> getInstanceStrong() and convert uses that are strictly confined to Velox internals, but leave public APIs (which are used by Prestissimo) as is.
  2. Upgrade Velox version in Presto C++ and convert all references to OutputBufferManager to use getInstanceStrong(). This ensures that we don't have references to the old method signature anymore.
  3. Make old getInstance() to also return std::shared_ptr<...> (so getInstance() and getInstanceStrong() become the same thing).
  4. Upgrade Velox version in Presto C++ once again and convert all uses of getInstanceStrong() to getInstance().
  5. Now we are free to finally drop the getInstanceStrong() method.

Cool! Thanks for the idea @ManManson. I have updated this PR to the state of 1) as you suggested. Please re-review.

@yingsu00 yingsu00 requested a review from rui-mo November 15, 2024 03:33
@yingsu00
Copy link
Collaborator Author

@ManManson @Yuhta Will you folks be able to review this PR again?

Comment on lines 39 to 32
const std::shared_ptr<OutputBufferManager>&
OutputBufferManager::getInstanceStrong() {
static const std::shared_ptr<OutputBufferManager> instance =
std::make_shared<OutputBufferManager>(Options());
return instance;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OutputBufferManager::getInstance() should be implemented in terms of getInstanceStrong(). Something like this:

std::weak_ptr<OutputBufferManager> OutputBufferManager::getInstance() {
	return getInstanceStrong();
}

Otherwise it will break any existing consumer of Velox since getInstance() and getInstanceStrong() would return different instances.

@yingsu00 yingsu00 changed the title Improve OutputBufferManager initialization misc: Improve OutputBufferManager initialization Nov 16, 2024
Copy link
Contributor

@ManManson ManManson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yingsu00 yingsu00 added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Nov 19, 2024
@yingsu00
Copy link
Collaborator Author

Thank you @ManManson ! @Yuhta @mbasmanova Do you have other comments? If not I'm marking this PR to be "ready-to-merge" again. Once this one is merged I'll send follow up PRs as @ManManson suggested.

velox/exec/OutputBufferManager.h Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mbasmanova
Copy link
Contributor

@yingsu00 Ying, would you rebase so we can try landing this PR?

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mbasmanova
Copy link
Contributor

@yingsu00 Thank you for rebasing. Seeing CI failures. Would you take a look?

@yingsu00
Copy link
Collaborator Author

yingsu00 commented Dec 5, 2024

Details

@mbasmanova Thanks for importing it! The test failures seems to be caused by #11737 which was exported and not reviewed. #11743 is supposed to fix it. We can wait it to be merged first then I'll rebase again.

Starting from C++11, the C++ standard guarantees that the initialization
of function-local static variables is thread-safe. This is better than
using a global mutex, especially for subsequent getInstance() calls. This
is because the overhead of using a static local variable only needs to do
a simple check to see if the variable has already been initialized, while
for the global mutex case, all getInstance() calls need to aquire this
lock exclusively.
@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mbasmanova
Copy link
Contributor

CI is still red.

@yingsu00
Copy link
Collaborator Author

yingsu00 commented Dec 6, 2024

hi @mbasmanova All tests pass now except the Facebook internal ones:

Facebook Internal - Builds & Tests:
3 warning_suppressed, 3 warning, 6 warning_preexisting, 94 warning_nosignal, 893 advice_no_signal
3 warning_suppressed, 3 warning, 6 warning_preexisting, 94 warning_nosignal, 893 advice_no_signal internally at Meta.
warning:
infer-checkers-fbcode-diff-analysis-with-buck

I'm unable to view them though. Would it be possible to show me what the warnings are? I can fix them once I know what's wrong. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants