Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Add uniffle integration #3767

Merged
merged 4 commits into from
Apr 11, 2024
Merged

[VL] Add uniffle integration #3767

merged 4 commits into from
Apr 11, 2024

Conversation

summaryzb
Copy link
Contributor

What changes were proposed in this pull request?

Add uniffle integration

How was this patch tested?

Integration test with uniffle manually
When uniffle 0.8.0 release in maven, add test in gluten action flow

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

Run Gluten Clickhouse CI

2 similar comments
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

// rename CelebornClient RssClient
std::shared_ptr<CelebornClient> celebornClient =
std::make_shared<CelebornClient>(vm, partitionPusher, unifflePushPartitionDataMethod);
partitionWriterCreator = std::make_shared<CelebornPartitionWriterCreator>(std::move(celebornClient));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you explain a bit why Celeborn's code is reused here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CelebornClient wraps in the partitionWriterCreator mainly do the work related to take out partition data, when evictPartitions is called. That will be all the same with different rss framework

@zhztheplayer
Copy link
Member

Integration test with uniffle manually

gluten-celeborn has a GHA action job here. Is it possible for gluten-uniffle to add a similar one? Another option is to add some UTs, at least cover the newly added code.

@summaryzb
Copy link
Contributor Author

I'll look into the GHA action, add one in this patch later

@rhh777
Copy link

rhh777 commented Jan 10, 2024

@summaryzb The org.apache.spark.shuffle.UniffleColumnarBatchSerializer class is missing in the commit?

@zhztheplayer
Copy link
Member

UniffleColumnarBatchSerializer

Is it possible to reuse ColumnarBatchSerializer somehow ?

Copy link

Run Gluten Clickhouse CI

@summaryzb
Copy link
Contributor Author

summaryzb commented Jan 11, 2024

Is it possible to reuse ColumnarBatchSerializer somehow ?

Yes, it's almost the same. Change it to reuse ColumnarBatchSerializer

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

2 similar comments
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@zhouyuan
Copy link
Contributor

@PZD-CHINA @PHILO-HE
we may need to also add Uniffle installation in docker, similar as Celeborn
https://github.com/oap-project/gluten/blob/main/tools/gluten-te/ubuntu/dockerfile-buildenv#L118

pom.xml Show resolved Hide resolved
@PHILO-HE
Copy link
Contributor

@PZD-CHINA @PHILO-HE we may need to also add Uniffle installation in docker, similar as Celeborn https://github.com/oap-project/gluten/blob/main/tools/gluten-te/ubuntu/dockerfile-buildenv#L118

@summaryzb, suppose uniffle-0.8.0 should be pre-installed in CI docker. Please let us know if there is any other dependency required. Thanks!

@weiting-chen weiting-chen mentioned this pull request Feb 21, 2024
21 tasks
Copy link

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

Copy link

github-actions bot commented Apr 3, 2024

Run Gluten Clickhouse CI

@summaryzb summaryzb force-pushed the main branch 2 times, most recently from a1a5e24 to e5310aa Compare April 3, 2024 09:20
Copy link

github-actions bot commented Apr 3, 2024

Run Gluten Clickhouse CI

1 similar comment
Copy link

github-actions bot commented Apr 3, 2024

Run Gluten Clickhouse CI

@summaryzb
Copy link
Contributor Author

@PHILO-HE PTAL

@summaryzb
Copy link
Contributor Author

Uniffle run tpcds with only one worker may encounter some stability issues occasionally,so this integration test only run with tpch

@summaryzb
Copy link
Contributor Author

@jackylee-ch PTAL

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work! Please also respond to Hongze's comments. Thanks!
#3767 (comment)

.github/workflows/velox_docker.yml Outdated Show resolved Hide resolved
// rename CelebornClient RssClient
std::shared_ptr<CelebornClient> uniffleClient =
std::make_shared<CelebornClient>(vm, partitionPusher, unifflePushPartitionDataMethod);
partitionWriter = std::make_unique<CelebornPartitionWriter>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the above comment, better to use a common name if they indeed can be shared by both celeborn & uniffle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a common name relate to all the logic of rss in jni layer, but not limit to this PartitionWriter, maybe it's better to resolve this in a separate pr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@summaryzb, ok to me. Please create an issue to track this. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow this

Copy link

github-actions bot commented Apr 9, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Apr 9, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Apr 9, 2024

Run Gluten Clickhouse CI

@summaryzb
Copy link
Contributor Author

@PHILO-HE PTAL

pom.xml Show resolved Hide resolved
Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhztheplayer, do you have any other comment?

sparkConf.getSizeAsBytes(
RssSparkConfig.RSS_WRITER_BUFFER_SIZE.key(),
RssSparkConfig.RSS_WRITER_BUFFER_SIZE.defaultValue().get());
compressionCodec = GlutenShuffleUtils.getCompressionCodec(sparkConf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note celeborn honors Spark's config for SHUFFLE_COMPRESS. Should uniffle do the same check? See e182d65.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix

Copy link

Run Gluten Clickhouse CI

@zhztheplayer
Copy link
Member

@zhztheplayer, do you have any other comment?

I'm good if we have tests covering the feature. Please proceed if it's ready to merge.

Thanks @summaryzb for this work!

Copy link
Contributor

@jackylee-ch jackylee-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have a follow-up issue #5335 to solve the conflicting code, there is no problem on my side. Thanks for your work.@summaryzb

@jackylee-ch
Copy link
Contributor

This pr is ready to be merged on my side. Would leave some time to see if there are any more suggestions.

@jackylee-ch jackylee-ch merged commit b9408f9 into apache:main Apr 11, 2024
38 checks passed
Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@summaryzb Thank you for working on uniffle integration and I just left some late minor review comments.

run: |
export MAVEN_HOME=/usr/lib/maven && \
export PATH=${PATH}:${MAVEN_HOME}/bin && \
export export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

export export -> export

tar xzf apache-uniffle-0.8.0-incubating-bin.tar.gz -C /opt/ && mv /opt/rss-0.8.0-hadoop2.8 /opt/uniffle && \
wget -nv https://archive.apache.org/dist/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz && \
tar xzf hadoop-2.8.5.tar.gz -C /opt/
rm -f /opt/uniffle/jars/server/shuffle-server-0.8.0-SNAPSHOT.jar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want to remove shuffle-server-0.8.0.jar instead of shuffle-server-0.8.0-SNAPSHOT.jar, was this a typo or something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants