-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL] Add uniffle integration #3767
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/oap-project/gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
2 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
cpp/core/jni/JniWrapper.cc
Outdated
// rename CelebornClient RssClient | ||
std::shared_ptr<CelebornClient> celebornClient = | ||
std::make_shared<CelebornClient>(vm, partitionPusher, unifflePushPartitionDataMethod); | ||
partitionWriterCreator = std::make_shared<CelebornPartitionWriterCreator>(std::move(celebornClient)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you explain a bit why Celeborn's code is reused here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CelebornClient wraps in the partitionWriterCreator mainly do the work related to take out partition data, when evictPartitions
is called. That will be all the same with different rss framework
|
I'll look into the GHA action, add one in this patch later |
@summaryzb The org.apache.spark.shuffle.UniffleColumnarBatchSerializer class is missing in the commit? |
Is it possible to reuse |
Run Gluten Clickhouse CI |
Yes, it's almost the same. Change it to reuse |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
2 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
@PZD-CHINA @PHILO-HE |
...fle/velox/src/main/java/org/apache/spark/shuffle/gluten/uniffle/GlutenRssShuffleManager.java
Outdated
Show resolved
Hide resolved
@summaryzb, suppose uniffle-0.8.0 should be pre-installed in CI docker. Please let us know if there is any other dependency required. Thanks! |
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
Run Gluten Clickhouse CI |
a1a5e24
to
e5310aa
Compare
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
@PHILO-HE PTAL |
Uniffle run tpcds with only one worker may encounter some stability issues occasionally,so this integration test only run with tpch |
@jackylee-ch PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work! Please also respond to Hongze's comments. Thanks!
#3767 (comment)
// rename CelebornClient RssClient | ||
std::shared_ptr<CelebornClient> uniffleClient = | ||
std::make_shared<CelebornClient>(vm, partitionPusher, unifflePushPartitionDataMethod); | ||
partitionWriter = std::make_unique<CelebornPartitionWriter>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like the above comment, better to use a common name if they indeed can be shared by both celeborn & uniffle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a common name relate to all the logic of rss in jni layer, but not limit to this PartitionWriter
, maybe it's better to resolve this in a separate pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@summaryzb, ok to me. Please create an issue to track this. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow this
...fle/velox/src/main/java/org/apache/spark/shuffle/gluten/uniffle/GlutenRssShuffleManager.java
Outdated
Show resolved
Hide resolved
...fle/velox/src/main/java/org/apache/spark/shuffle/gluten/uniffle/GlutenRssShuffleManager.java
Outdated
Show resolved
Hide resolved
...fle/velox/src/main/java/org/apache/spark/shuffle/gluten/uniffle/GlutenRssShuffleManager.java
Outdated
Show resolved
Hide resolved
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
@PHILO-HE PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhztheplayer, do you have any other comment?
sparkConf.getSizeAsBytes( | ||
RssSparkConfig.RSS_WRITER_BUFFER_SIZE.key(), | ||
RssSparkConfig.RSS_WRITER_BUFFER_SIZE.defaultValue().get()); | ||
compressionCodec = GlutenShuffleUtils.getCompressionCodec(sparkConf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I note celeborn honors Spark's config for SHUFFLE_COMPRESS. Should uniffle do the same check? See e182d65.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix
Run Gluten Clickhouse CI |
I'm good if we have tests covering the feature. Please proceed if it's ready to merge. Thanks @summaryzb for this work! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have a follow-up issue #5335 to solve the conflicting code, there is no problem on my side. Thanks for your work.@summaryzb
This pr is ready to be merged on my side. Would leave some time to see if there are any more suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@summaryzb Thank you for working on uniffle integration and I just left some late minor review comments.
run: | | ||
export MAVEN_HOME=/usr/lib/maven && \ | ||
export PATH=${PATH}:${MAVEN_HOME}/bin && \ | ||
export export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export export
-> export
tar xzf apache-uniffle-0.8.0-incubating-bin.tar.gz -C /opt/ && mv /opt/rss-0.8.0-hadoop2.8 /opt/uniffle && \ | ||
wget -nv https://archive.apache.org/dist/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz && \ | ||
tar xzf hadoop-2.8.5.tar.gz -C /opt/ | ||
rm -f /opt/uniffle/jars/server/shuffle-server-0.8.0-SNAPSHOT.jar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you want to remove shuffle-server-0.8.0.jar
instead of shuffle-server-0.8.0-SNAPSHOT.jar
, was this a typo or something?
What changes were proposed in this pull request?
Add uniffle integration
How was this patch tested?
Integration test with uniffle manually
When uniffle 0.8.0 release in maven, add test in gluten action flow