Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

385 chained joins - possible solution #723

Closed

Conversation

chris-twiner
Copy link
Contributor

#385 - possible solution covering most uses I can imagine, seriously not an issue if it's not desired/fitting the aesthetic, the itch simply needed scratching

@codecov
Copy link

codecov bot commented Jun 6, 2023

Codecov Report

Merging #723 (a898e50) into master (e257c4c) will increase coverage by 0.07%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #723      +/-   ##
==========================================
+ Coverage   95.52%   95.59%   +0.07%     
==========================================
  Files          67       68       +1     
  Lines        1184     1204      +20     
  Branches       39       34       -5     
==========================================
+ Hits         1131     1151      +20     
  Misses         53       53              
Flag Coverage Δ
2.12.17 95.59% <100.00%> (+0.07%) ⬆️
2.13.10 96.19% <100.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...set/src/main/scala/frameless/FramelessSyntax.scala 100.00% <100.00%> (ø)
.../src/main/scala/frameless/ops/ChainedJoinOps.scala 100.00% <100.00%> (ø)

* @tparam T the type of ds
* @tparam U the type of other
*/
case class ChainedJoinOps[T, U](ds: TypedDataset[T], other: TypedDataset[U]) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non case class

@@ -15,4 +16,8 @@ trait FramelessSyntax {
implicit class DataframeSyntax(self: DataFrame){
def unsafeTyped[T: TypedEncoder]: TypedDataset[T] = TypedDataset.createUnsafe(self)
}

implicit class ChainedJoinSyntax[T](ds: TypedDataset[T]) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to think about, not fond of adding automagic implicit for join use cases

@cchantep
Copy link
Collaborator

cchantep commented Jun 6, 2023

Hi, Would you mind to add as first commit (before any change) a failing test corresponding to #385 ?

@chris-twiner
Copy link
Contributor Author

chris-twiner commented Jun 6, 2023

hihi, there isn't a "failing" case - it's purely cosmetic line-saver sugar to remove the current requirement of introducing an extra "val" for each join and allowing a dataset to be defined in-line e.g. .join(load_data_here).jointype((left, right) => ... )

@cchantep
Copy link
Collaborator

cchantep commented Jun 6, 2023

hihi, there isn't a "failing" case - it's purely cosmetic line-saver sugar to remove the current requirement of introducing an extra "val" for each join and allowing a dataset to be defined in-line e.g. .join(load_data_here).jointype((left, right) => ... )

Then I'm not sure it worth the cost of maintaining each join function in double.

@chris-twiner
Copy link
Contributor Author

hihi, there isn't a "failing" case - it's purely cosmetic line-saver sugar to remove the current requirement of introducing an extra "val" for each join and allowing a dataset to be defined in-line e.g. .join(load_data_here).jointype((left, right) => ... )

Then I'm not sure it worth the cost of maintaining each join function in double.

Aside from you putting some new lines in and a one-line fix from me 3 years ago, the code hasn't been touched for 6 years. I don't suppose it needs much maintenance.

That said documenting to use a generic tap / thrush would allow the primary issue to be solved (that of not chaining) without duplication, it just wouldn't support drop in functions with both left and right datasets. You'd end up with a perhaps less pretty:

val leftDs = ...
val midDs = ...
val rightDs = ...

import mouse.any._

val joinedDs = leftDs.joinLeft(midDs)(...)
    .thrush( leftWithMid => leftWithMid.joinLeft(rightDs)(leftWithMid(`attrib == ...))
    .thrush( other joins )

it's still seems a niche issue though, I can't find anything on stack but I'll ask one of the people who original raised a comment if that's not solving the issue for them.

@cchantep
Copy link
Collaborator

cchantep commented Jun 7, 2023

The preferred resolution is to document for me.

@chris-twiner
Copy link
Contributor Author

The preferred resolution is to document for me.

ok, given I just got confirmation from one of the people on the issue that it was about the ugliness and added difficulty in reasoning about it with 5+ joins, advising to use a thrush s probably enough, if not as fluent.

Happy for you to close the PR / issue on that basis

@cchantep cchantep closed this Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants