Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added query library functions to overload Period and Duration arithmetic #5509

Merged
merged 13 commits into from
Jun 18, 2024

Conversation

chipkent
Copy link
Member

@chipkent chipkent commented May 20, 2024

Added query library functions to overload Period and Duration arithmetic.

from deephaven import empty_table
t2 = empty_table(10).update(["P = 'P2D'", "D = 'PT2h'"])

t3 = t2.update([
    "P2 = P * 2", "P3 = 2 * P", 
    "P4 = P * 2L", "P5 = 2L * P",
    # "P6 = P * 2.3", "P7 = 2.3 * P", # Not supported because it doesn't make sense
    # "P8 = P / 2", # Not supported because it doesn't make sense
    # "P9 = P / 2L", # Not supported because it doesn't make sense
    # "P10 = P * 2.3", # Not supported because it doesn't make sense
    "P11 = P + P",
    "P12 = P - P",
    ])

t4 = t2.update([
    "D2 = D * 2", "D3 = 2 * D", 
    "D4 = D * 2L", "D5 = 2L * D",
    # "D6 = D * 2.3", "D7 = 2.3 * D", # Floating point is not supported because it is not in the Java time lib
    "D8 = D / 2", 
    "D9 = D / 2L",
    # "D10 = D * 2.3", # Floating point is not supported because it is not in the Java time lib
    "D11 = D + D",
    "D12 = D - D",
    ])

t6 = t2.update("I = now() + P*2 + 2*P")
t7 = t2.update("I = now() + D*2 + D/2")

Resolves #5379

@chipkent chipkent added this to the 3. May 2024 milestone May 20, 2024
@chipkent chipkent self-assigned this May 20, 2024
Copy link
Contributor

@alexpeters1208 alexpeters1208 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test results:

from deephaven import empty_table

t = empty_table(100).update("BaseTime = '2020-01-01T00:00:00Z'")

# pass
t1 = t.update("Timestamp = BaseTime + i*'PT1s'")

# pass
t2 = t.update("Timestamp = BaseTime + ii*'PT1s'")

# pass
t3 = t.update("Timestamp = BaseTime + i*'P1D'")

# fail
t4 = t.update("Timestamp = BaseTime + ii*'P1D'")

# pass
t5 = t.update("Timestamp = BaseTime + 1*'PT1s'")

# pass
t6 = t.update("Timestamp = BaseTime + 1L*'PT1s'")

# pass
t7 = t.update("Timestamp = BaseTime + 1*'P1D'")

# fail
t8 = t.update("Timestamp = BaseTime + 1L*'P1D'")

# pass
t9 = t.update("Timestamp = BaseTime + ((int)1)*'PT1s'")

# pass
t10 = t.update("Timestamp = BaseTime + ((long)1)*'PT1s'")

# pass
t11 = t.update("Timestamp = BaseTime + ((int)1)*'P1D'")

# fail
t12 = t.update("Timestamp = BaseTime + ((long)1)*'P1D'")

Since multiply(long, Duration) is supported, I also expect multiply(long, Period)

@alexpeters1208 alexpeters1208 self-requested a review May 20, 2024 21:31
alexpeters1208
alexpeters1208 previously approved these changes May 20, 2024
Copy link
Contributor

@alexpeters1208 alexpeters1208 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, is there a good reason to implement a workaround for multiply(long, Period)?

Copy link
Member

@devinrsmith devinrsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems okay in isolation, but it does seem a little bit weird to offer multiplication but not addition. Just making sure we are aware of this and are ok with it before merging.

@chipkent chipkent changed the title feat: Added query library functions to multiply durations and periods by integers feat: Added query library functions to overload Period and Duration arithmetic May 22, 2024
Add periods.
Subtract durations.
Subtract periods.
Multiply durations by doubles.
Multiply periods by longs.
Divide durations.
Copy link
Contributor

@alexpeters1208 alexpeters1208 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions / observations:

  1. Should Duration1 / Duration2 -> Duration when Duration1 >= Duration2 be supported?
  2. Should Duration1 / Duration2 -> Period when Duration1 < Duration2 be supported?
  3. Should Period1 / Period2 -> Period be supported?
  4. Should anything involving % be supported?

Next, I ran more tests and found unexpected or inconsistent behavior with Java's primitive null:

from deephaven import empty_table

t = empty_table(10).update(["P='P3D'", "D = 'PT3s'"])

# Pass
t1 = t.update("P1 = P + null")

# Pass
t2 = t.update("P2 = P - null")

# Fail
t3 = t.update("P3 = P * null")

# Fail
t4 = t.update("P3 = P / null")

# Pass
t5 = t.update("D1 = D + null")

# Pass
t6 = t.update("D2 = D - null")

# Fail
t7 = t.update("D3 = D * null")

# Fail
t8 = t.update("D3 = D / null")

I also ran all of these tests, and everything worked as expected:

from deephaven import empty_table

t = empty_table(10).update(["P='P3D'", "D = 'PT3s'"])

# Pass
t1 = t.update("P1 = P * NULL_INT")

# Pass
t2 = t.update("P2 = D * NULL_INT")

# Pass
t3 = t.update("P3 = D / NULL_INT")

# Pass
t4 = t.update("P4 = P * NULL_LONG")

# Pass
t5 = t.update("P5 = D * NULL_LONG")

# Pass
t6 = t.update("P6 = D / NULL_LONG")

# Pass
t7 = t.update("P7 = D * NULL_FLOAT")

# Pass
t8 = t.update("P8 = D / NULL_FLOAT")

# Pass
t9 = t.update("P9 = D * NULL_DOUBLE")

# Pass
t10 = t.update("P10 = D / NULL_DOUBLE")

Comment on lines 2049 to 2051
if (scalar > Integer.MAX_VALUE || scalar < Integer.MIN_VALUE) {
throw new DateTimeOverflowException("Scalar value is too large to be cast to an int");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why provide a long version if this is the case?

Copy link
Member Author

@chipkent chipkent May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lack of a long version was the first complaint when @alexpeters1208 did user testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(makes sense to me to have this — it's easy to wind up with long-typed columns in the query language even when you only need/expect int-or-smaller values)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this argument. The same could be said for any int-based method "why doesn't it accept a long?". Should we create a version of this that works with BigInteger? If the answer is "no", I would say we shouldn't provide a long version. I would much prefer to address the root of the issue "I've got a logical int-column, but it's technically typed as long; how can I make it an int column?". In which case, you could just cast or do a check + cast.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with Devin, here. If we don't accept longs that are out of int range, it makes no sense to have a method that takes a long.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @rbasralian on this. Stuff like this happens frequently in real queries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is happening frequently in real queries, we should address the root cause of it: "Why are logical int columns typed as longs?".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't something we're doing elsewhere in our libraries. It isn't something users should expect from libraries generally.

@chipkent
Copy link
Member Author

A couple of questions / observations:

  1. Should Duration1 / Duration2 -> Duration when Duration1 >= Duration2 be supported?

I'm not sure what you are asking for. You want Duration divide(Duration, Duration)? If so, that doesn't make sense from a dimensional analysis perspective. At best it would be double divide(Duration, Duration).

Is there a real use case for that?

  1. Should Duration1 / Duration2 -> Period when Duration1 < Duration2 be supported?

Again, I'm not sure what you are asking for. You want Period divide(Duration, Duration)? That certainly shouldn't happen because it is crossing between time types and calendar types.

  1. Should Period1 / Period2 -> Period be supported?

Period divisions were intentionally not supported. It is too poorly defined too quickly. There is no way to properly represent 1 day divided by 3.

  1. Should anything involving % be supported?

No operations are obvious to me.

Next, I ran more tests and found unexpected or inconsistent behavior with Java's primitive null:

Note that in all of these tests the null is not typed. The success or failure depends on if there is a method where the untyped null can get converted to an acceptable type. That is all these tests are checking. It isn't obvious if that makes the tests fruitful.

from deephaven import empty_table

t = empty_table(10).update(["P='P3D'", "D = 'PT3s'"])

# Pass
t1 = t.update("P1 = P + null")

I'm guessing this is getting converted to plus(Period, Duration)

Pass

t2 = t.update("P2 = P - null")

I'm guessing this is getting converted to minus(Period, Duration)

Fail

t3 = t.update("P3 = P * null")

There are no object methods for multiply. All of the methods use primitive scalars.

Fail

t4 = t.update("P3 = P / null")

There are no object methods for divide. All of the methods use primitive scalars.

Pass

t5 = t.update("D1 = D + null")

I'm guessing this is getting converted to plus(Duration, Duration)

Pass

t6 = t.update("D2 = D - null")

I'm guessing this is getting converted to minus(Duration, Duration)

Fail

t7 = t.update("D3 = D * null")

There are no object methods for multiply. All of the methods use primitive scalars.

Fail

t8 = t.update("D3 = D / null")

There are no object methods for divide. All of the methods use primitive scalars.

What exactly did you expect to happen in these cases? How would it happen with a real query?

@alexpeters1208 alexpeters1208 self-requested a review May 24, 2024 22:15
alexpeters1208
alexpeters1208 previously approved these changes May 24, 2024
Comment on lines 1963 to 1965
} catch (ArithmeticException ex) {
throw new DateTimeOverflowException(ex);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconsistent (although probably more correct) than the versions that catch (Exception ex). The date time operations typically document throwing:

     * @throws DateTimeException if the subtraction cannot be made
     * @throws ArithmeticException if numeric overflow occurs

So, when we catch Exception and re-wrap in DateTimeOverflowException, we might be "lying" b/c this could be caused by something other than overflow.

I see this pattern is used widely in DateTimeUtils, I think imprecisely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have adjusted the handling to be:

        } catch (DateTimeException | ArithmeticException ex) {
            throw new DateTimeOverflowException(ex);
        }

Overflows can come through either exception. java.time does use DateTimeException for other problems, but those problems should not be in the methods here.

More explicit overflow exception handling.
Copy link
Member

@devinrsmith devinrsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would approve this if Period multiply(final Period period, final long scalar) is changed to an int. I think if you still feel strongly about the long version we can continue debating after this PR is merged?

@chipkent chipkent requested a review from devinrsmith June 18, 2024 15:14
devinrsmith
devinrsmith previously approved these changes Jun 18, 2024
@chipkent chipkent merged commit 4fe94db into deephaven:main Jun 18, 2024
15 checks passed
@chipkent chipkent deleted the multiply_duration branch June 18, 2024 17:31
@github-actions github-actions bot locked and limited conversation to collaborators Jun 18, 2024
@deephaven-internal
Copy link
Contributor

Labels indicate documentation is required. Issues for documentation have been opened:

Community: deephaven/deephaven-docs-community#237

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support multiply(long, java.time.Duration)
6 participants