Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to TileCodec to compress and skip redundant windows #539

Merged
merged 7 commits into from
Oct 10, 2023

Conversation

piyushn-stripe
Copy link
Collaborator

Summary

This is a follow up from our prior TileCodec PR - #523 (comment)
Made some changes to TileCodec to not compute and store the full cross product of buckets * windows for a given FG. The window are redundant within a tile and costs us on the aggregator side + on the payload ser / deser side. Now we'll only have one counter for given aggregation, bucket in a tile. We explode this out on the deser side so that this continues to work with @aarongreen-stripe's PR - #531.

Why / Goal

Improve aggregation efficiency and cut down tile payload sizes written out to KV store

Test Plan

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested

Checklist

  • Documentation update

Reviewers

@nikhilsimha || @ezvz

# Conflicts:
#	online/src/main/scala/ai/chronon/online/GroupByServingInfoParsed.scala
#	online/src/main/scala/ai/chronon/online/TileCodec.scala
#	spark/src/test/scala/ai/chronon/spark/test/InMemoryStream.scala
# Conflicts:
#	online/src/test/scala/ai/chronon/online/TileCodecTest.scala
bucketPos += 1
}
}
flattenedIr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be more efficient to directly implement and not pay for the intermediate collection.
updateFlatWithTile(historicalFlatIr, tileIr) vs. update(historicalFlatIr, flatten(tile))

Copy link
Contributor

@nikhilsimha nikhilsimha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! One optimization suggestion. But we can revisit that if we find the latency to be an issue. I would still recommend adding that as a todo.

@piyushn-stripe piyushn-stripe merged commit face1b2 into master Oct 10, 2023
@piyushn-stripe piyushn-stripe deleted the piyushn--tile-codec-compression branch October 10, 2023 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants