Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Optimize MergingSnapshotProducer to use referenced manifests to determine if manifest needs to be rewritten #11131

Conversation

amogh-jahagirdar
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar commented Sep 13, 2024

This change optimizes merging snapshot producer logic to determine if a file has been deleted in a manifest in the case deleteFile(ContentFile file) is called; optimizing this logic means that determining if a manifest needs to be rewritten is now optimized.

Previously determining a manifest file needs to be rewritten has 2 high level steps:

  1. Open up the manifest, iterate through entries until an entry matches a criteria for deletion (either by partition expression, or a path based deletion). Stop iterating through the manifest if one of these criteria is hit.
  2. If 1 ends up yielding manifests that need to be rewritten, a new manifest is rewritten with the same contents as the old manifests minus any deleted files; for delete manifests if there are delete files older than the min sequence number those are also dropped as part of the writing.

The optimization in this PR will optimize step 1 in the case deleteFile(ContentFile file) is called by keeping track of the data/delete file's manifests and the position in the manifest that is being deleted. Using this the logic for doing the first pass over the manifest is no longer necessary since the presence of a referenced manifest means that manifest must be rewritten.
Before


Benchmark                                       (numFiles)  (percentDeleteFilesReplaced)  Mode  Cnt  Score   Error  Units
ReplaceDeleteFilesBenchmark.replaceDeleteFiles       50000                             5    ss    5  0.234 ± 0.054   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles       50000                            25    ss    5  0.599 ± 1.330   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles       50000                            50    ss    5  0.814 ± 1.823   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles       50000                           100    ss    5  0.877 ± 1.837   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles      100000                             5    ss    5  0.426 ± 0.050   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles      100000                            25    ss    5  0.838 ± 1.920   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles      100000                            50    ss    5  1.051 ± 1.847   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles      100000                           100    ss    5  1.525 ± 0.108   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles      500000                             5    ss    5  0.666 ± 1.511   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles      500000                            25    ss    5  1.596 ± 0.289   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles      500000                            50    ss    5  1.991 ± 0.438   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles      500000                           100    ss    5  2.496 ± 0.566   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles     1000000                             5    ss    5  1.110 ± 1.858   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles     1000000                            25    ss    5  1.955 ± 0.317   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles     1000000                            50    ss    5  2.499 ± 0.556   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles     1000000                           100    ss    5  2.947 ± 1.181   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles     2000000                             5    ss    5  1.709 ± 0.159   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles     2000000                            25    ss    5  3.153 ± 1.251   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles     2000000                            50    ss    5  3.093 ± 1.475   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles     2000000                           100    ss    5  5.808 ± 3.145   s/op

After
ReplaceDeleteFilesBenchmark.replaceDeleteFiles       50000                             5    ss    5  0.196 ± 0.038   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles       50000                            25    ss    5  0.468 ± 1.496   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles       50000                            50    ss    5  0.652 ± 1.914   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles       50000                           100    ss    5  0.741 ± 1.925   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles      100000                             5    ss    5  0.223 ± 0.019   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles      100000                            25    ss    5  0.641 ± 1.916   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles      100000                            50    ss    5  0.879 ± 1.883   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles      100000                           100    ss    5  1.326 ± 0.122   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles      500000                             5    ss    5  0.470 ± 1.631   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles      500000                            25    ss    5  1.349 ± 0.140   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles      500000                            50    ss    5  1.655 ± 0.244   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles      500000                           100    ss    5  2.077 ± 0.258   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles     1000000                             5    ss    5  0.866 ± 1.878   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles     1000000                            25    ss    5  1.652 ± 0.113   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles     1000000                            50    ss    5  2.038 ± 0.123   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles     1000000                           100    ss    5  1.871 ± 0.278   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles     2000000                             5    ss    5  1.319 ± 0.097   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles     2000000                            25    ss    5  1.993 ± 0.113   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles     2000000                            50    ss    5  1.820 ± 0.317   s/op
ReplaceDeleteFilesBenchmark.replaceDeleteFiles     2000000                           100    ss    5  3.363 ± 0.336   s/op

@github-actions github-actions bot added the core label Sep 13, 2024
@amogh-jahagirdar
Copy link
Contributor Author

amogh-jahagirdar commented Sep 13, 2024

Publishing a draft so I can test against entire CI. I need to think more about a good way to benchmark this and if there's even more reasonable optimizations that I can do here

Well another possible optimization to think through:

If we have both the manifestLocation and the ordinal position of the content file in the manifest AND there are no delete expressions or deletions by pure paths, we can possibly just write the new manifest with entries that are not part of the referenced positions without having to evaluate file paths or predicates against manifest entries.

We could just evaluate against the positions (every entry would be compared against the "deleted" pos set) as opposed to file paths/partition values, which should be a bit more performant.

@amogh-jahagirdar amogh-jahagirdar force-pushed the optimize-merging-snapshot-producer branch 7 times, most recently from ad1dd92 to b230f1e Compare September 16, 2024 14:54
@amogh-jahagirdar amogh-jahagirdar force-pushed the optimize-merging-snapshot-producer branch 2 times, most recently from 47cda8a to 4099608 Compare September 16, 2024 15:48
@amogh-jahagirdar amogh-jahagirdar force-pushed the optimize-merging-snapshot-producer branch from 4099608 to bfed848 Compare September 16, 2024 16:23
Copy link
Contributor Author

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On testing, even though SparkRewriteDataFilesAction exercises this new path I think it's worth adding a separate test in TestRewriteFiles, or if https://github.com/apache/iceberg/pull/11166/files gets in first can also add a test to RowDelta. The test should explicitly read entries from manifests to delete

@amogh-jahagirdar amogh-jahagirdar force-pushed the optimize-merging-snapshot-producer branch from bfed848 to d0f28a6 Compare September 20, 2024 02:21
@github-actions github-actions bot added the build label Sep 20, 2024
@amogh-jahagirdar amogh-jahagirdar force-pushed the optimize-merging-snapshot-producer branch 3 times, most recently from be432d6 to 5eece22 Compare September 20, 2024 02:53
@amogh-jahagirdar amogh-jahagirdar marked this pull request as ready for review September 20, 2024 03:00
@amogh-jahagirdar
Copy link
Contributor Author

amogh-jahagirdar commented Sep 20, 2024

Seems like after my latest updates to not add to the delete paths if a manifest location is defined, some cherry pick test cases are failing. Taking a look.

@amogh-jahagirdar amogh-jahagirdar force-pushed the optimize-merging-snapshot-producer branch 4 times, most recently from b37854d to 07a3b1b Compare September 20, 2024 18:05
@amogh-jahagirdar amogh-jahagirdar force-pushed the optimize-merging-snapshot-producer branch 2 times, most recently from 2e583a2 to e8a87bc Compare October 24, 2024 23:30
@github-actions github-actions bot added the spark label Oct 24, 2024
Comment on lines +297 to +303
DeleteFile deleteFile = TestHelpers.deleteFiles(table).iterator().next();
Path deleteFilePath = new Path(String.valueOf(deleteFile.path()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests changed for the same reason mentioned #11131 (comment)

@amogh-jahagirdar amogh-jahagirdar force-pushed the optimize-merging-snapshot-producer branch 2 times, most recently from eb3e848 to 6e71f7f Compare October 25, 2024 02:57
// assuming that the manifest does not have any live entries or aged out deletes
Set<String> manifestLocations =
manifests.stream().map(ManifestFile::path).collect(Collectors.toSet());
boolean trustReferencedManifests =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spotless formats this in a weird way. I wonder if we can play around with the names to stay on 1 line and/or potentially add a helper method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a bit here on naming, but unfortunately couldn't get a shorter name which accurately captures what this variable represents. I just went ahead with a helper method, since I think really someone reading the code in most cases generally just needs to know what it means to trust the referenced manifests rather than the "how". If they need to know the conditions for trust, it seems reasonable to just read the helper method logic.

…o determine if a given manifest needs to be rewritten or not
@amogh-jahagirdar amogh-jahagirdar force-pushed the optimize-merging-snapshot-producer branch 5 times, most recently from eab6906 to fb5a573 Compare November 20, 2024 23:24
Copy link
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Awesome work, @amogh-jahagirdar!

}

rowDelta.commit();

this.deleteFiles = generatedDeleteFiles;
List<DeleteFile> deleteFilesReadFromManifests = Lists.newArrayList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit hard to follow the way we generate these values (like why use a map if only checking the key?). What if we restructure this logic a bit given that we have to read the manifests anyway?

RowDelta rowDelta = table.newRowDelta();

for (int ordinal = 0; ordinal < numFiles; ordinal++) {
  ...
}

rowDelta.commit();

int replacedDeleteFilesCount = (int) Math.ceil(numFiles * (percentDeleteFilesReplaced / 100.0));
List<DeleteFile> oldDeleteFiles = Lists.newArrayListWithExpectedSize(replacedDeleteFilesCount);
List<DeleteFile> newDeleteFiles = Lists.newArrayListWithExpectedSize(replacedDeleteFilesCount);

try (CloseableIterable<FileScanTask> tasks = table.newScan().planFiles()) {
  for (FileScanTask task : Iterables.limit(tasks, replacedDeleteFilesCount)) {
    DeleteFile oldDeletes = Iterables.getOnlyElement(task.deletes());
    oldDeleteFiles.add(oldDeletes);
    DeleteFile newDeletes = FileGenerationUtil.generatePositionDeleteFile(table, task.file());
    newDeleteFiles.add(newDeletes);
  }
}

this.deleteFilesToReplace = oldDeleteFiles;
this.pendingDeleteFiles = newDeleteFiles;

@@ -69,6 +70,7 @@ public String partition() {
private final Map<Integer, PartitionSpec> specsById;
private final PartitionSet deleteFilePartitions;
private final Set<F> deleteFiles = newFileSet();
private final Set<String> manifestsReferencedForDeletes = Sets.newHashSet();
Copy link
Contributor

@aokolnychyi aokolnychyi Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any shorter names like manifestsWithDeletes or deleteFileManifests?
A few calls below split into multiple lines.

@@ -185,6 +198,14 @@ List<ManifestFile> filterManifests(Schema tableSchema, List<ManifestFile> manife
return ImmutableList.of();
}

// Use the current set of referenced manifests as a source of truth when it's a subset of all
// manifests and all removals which were performed reference manifests.
// If a manifest is not in the trusted referenced set and has no live files, this means that the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first sentence is very clear. I am not sure I follow the bit about "and has no live files" in the second sentence, as the check for live files is done further. Even if we want to mention live files, shall and become or?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me reword a bit more, I was trying to express the additional condition that even if a file is not in the referenced manifests, if there are live files that still means we need to rewrite the manifest.

// Use the current set of referenced manifests as a source of truth when it's a subset of all
// manifests and all removals which were performed reference manifests.
// If a manifest is not in the trusted referenced set and has no live files, this means that the
// manifest has no deleted entries and does not need to be rewritten
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing . for consistency with the first sentence?

// manifests and all removals which were performed reference manifests.
// If a manifest is not in the trusted referenced set and has no live files, this means that the
// manifest has no deleted entries and does not need to be rewritten
Set<String> manifestLocations =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why not have this inside canTrustManifestReferences?

@@ -327,62 +354,71 @@ private ManifestFile filterManifest(Schema tableSchema, ManifestFile manifest) {
// this assumes that the manifest doesn't have files to remove and streams through the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the empty line above this comment still has a purpose, given it is one block now.

}

@SuppressWarnings({"CollectionUndefinedEquality", "checkstyle:CyclomaticComplexity"})
private boolean manifestHasDeletedFiles(
PartitionAndMetricsEvaluator evaluator, ManifestReader<F> reader) {
if (manifestsReferencedForDeletes.contains(reader.file().location())) {
Copy link
Contributor

@aokolnychyi aokolnychyi Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was my bad to suggest reader.file().location(). It will be fragile as the location may undergo some validation or parsing in FileIO, which we don't control here. Probably better to pass ManifestFile to this method and simply use manifest.path().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah not at all, my bad for missing this. Yeah we shouldn't touch reader.file() and just use the manifest.path

…o determine if a given manifest needs to be rewritten or not
@amogh-jahagirdar amogh-jahagirdar force-pushed the optimize-merging-snapshot-producer branch from fb5a573 to d5147a4 Compare November 21, 2024 14:15
@amogh-jahagirdar
Copy link
Contributor Author

Thanks @aokolnychyi @rdblue for reviewing, will go ahead and merge!

@amogh-jahagirdar amogh-jahagirdar merged commit 90be5d7 into apache:main Nov 21, 2024
49 checks passed
nastra pushed a commit to nastra/iceberg that referenced this pull request Nov 21, 2024
@pvary
Copy link
Contributor

pvary commented Dec 4, 2024

I have a compaction test (TestRewriteDataFiles.testV2Table) in an ongoing PR.
The PR: #11497
The test code: https://github.com/apache/iceberg/pull/11497/files#diff-39871b9e62b1e4e68c69f126035226176df902f364ea765b417898ad5952e496R328-R341

The tests creates an Iceberg table with 2 snapshots with delete files each:

  • Snapshot 1:
    • Data file - DF1
    • Equality delete file - EQD1
    • Position delete file - PD1
  • Snapshot 2:
    • Data file - DF2
    • Equality delete file - EQD2

Then the test creates a compaction commit with the RewriteDataFilesCommitManager.CommitService which rewrites the 2 data files (DF1 + DF2) to a single compacted data (DF3) file and removes the deleted rows.

Before this change (#11131) the resulting snapshot contained a single data file, and a single delete file. The table content is: DF3, EQ2
After this change (#11131) the resulting snapshot contains a single data file, and no delete files are removed. The table content is: DF3, EQ1, PD1, EQ2

Is this change intentional? Data wise the result is correct in both cases, as no data files are remaining for which the delete files need to be applied, but the new result is definitely suboptimal.

pvary pushed a commit to pvary/iceberg that referenced this pull request Dec 4, 2024
@amogh-jahagirdar
Copy link
Contributor Author

amogh-jahagirdar commented Dec 4, 2024

@pvary I'll double check that test but my suspicion is you're seeing the behavior from this particular change #11131 (comment), this has the rationale.

My suspicion is that the test used to rely on the eager rewriting of manifests with only aged out deletes, and that's no longer the case. If that's the case, what's happening is that we're not eagerly rewriting manifests which only have aged out deletes since it's not required for correctness and it's best to avoid any additional work and risk failure for things not related to metadata correctness (of course this does tradeoff the extra metadata in storage for a period, until that manifest gets impacted for another write). Important note is that, if a manifest needs to be rewritten for other purposes and has aged out deletes, the aged out deletes will be removed from metadata. Let me take a deeper look though to confirm that's what happening for the test case

pvary pushed a commit to pvary/iceberg that referenced this pull request Dec 4, 2024
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants