Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto cast types #33

Open
caniko opened this issue Oct 16, 2023 · 13 comments · Fixed by #35
Open

Auto cast types #33

caniko opened this issue Oct 16, 2023 · 13 comments · Fixed by #35

Comments

@caniko
Copy link

caniko commented Oct 16, 2023

Types should be cast automatically this should be possible with prepared statements where the DB responds with the type of each column (does it?). For statements that are not prepared, maybe allow the user to define the data types by enum?

I am getting errors like float not being cast to columns that are double, and int failing to cast to smallint.

@caniko
Copy link
Author

caniko commented Oct 16, 2023

Errors
Writing python float to scylla double:

marshaling error: Expected 8 bytes for a floating type, got 4

Can write Python float to scylla Float, but can't read it:

Cannot parse value of column <column> as Float.

Types that are OK

  • int -> Integer
  • uuid, but have to prepare it with uuid.bytes raw python uuid object does not work.

I am open to do a quick zoom call to debug this together!

@s3rius
Copy link
Member

s3rius commented Oct 16, 2023

For such ambiguous types we have special wrappers in scyllapy.extra_types. And python UUIDs should work, since we test it here.

Actually, autocast is possible for prepared statements by using metadata. We will try to implement this.

@caniko
Copy link
Author

caniko commented Oct 16, 2023

Something you could also do is to define custom types towards fields/attributes on Pydantic.BaseModels for type casting. I don't have time for this, but it fits really nicely IMHO.

These models could be used to define tables, and even materialized views.

@s3rius
Copy link
Member

s3rius commented Oct 16, 2023

I'm currently working on a small ORM, but don't have much time to complete it.

@caniko
Copy link
Author

caniko commented Oct 16, 2023

I understand, thanks a lot! This is a great project. I think the most important are following features:

  • Have a dtype kwarg in execute and batch, for us to define dtypes with respect to index:
scylla.execute(statement, (5.0, 1000000, 255), (Double, Integer, TinyInt))
  • Auto cast for prepared statements, and give client side errors on fail.

@s3rius
Copy link
Member

s3rius commented Oct 17, 2023

The suggested implementation for non-prepared queries seems a little bit too complex for me.

Because currently you can do it like:

from scyllapy.extra_types import Double,  TinyInt

scylla.execute(statement, (Double(5.0), 100000, TinyInt(255))

Also, it allows you to easily use dicts as arguments.

from scyllapy.extra_types import Double,  TinyInt

scylla.execute(statement, {"a": Double(5.0), "b": 100000, "c": TinyInt(255})

@caniko
Copy link
Author

caniko commented Oct 18, 2023

I should've provided the motivation for implementing it for execute. It comes from the need to implement it for scylla.batch:

tuple_with_thousand_tuples = ((1, "hello", 5.0), ...)

my_batch = Batch()
for _ in len(tuple_with_thousand_tuples):
    batch.add_query(prepared)

await scylla.batch(batch, tuple_with_thousand_tuples, dtypes=(SmallInt, None, Double))

None means do nothing.

I think it makes a lot more sense to have this functionality here! It is just natural to also add it to execute. This is especially useful if you only know the types at runtime, meaning the prepared statement might change; it just makes the code overall more readable.

@s3rius
Copy link
Member

s3rius commented Nov 3, 2023

I don't like the idea of another parameter in execute. Since you can wrap your values in extra_types. I've made an implemetation that stores information about types of query inside the query. Now if you use prepared statement, then it is going to try to cast arguments to known types. For all other cases, please wrap your values in extra types. It makes it explicit and nice. Also, about batches. There's another type of batches, which can make your code more readable.

import scyllapy
from scyllapy import extra_types

batch = scyllapy.InlineBatch()

for i in range(100):
    batch.add_query("INSERT INTO table (id, val) VALUES (?, ?)", (i, extra_types.Double(1.0)))

scylla.batch(batch)

@caniko
Copy link
Author

caniko commented Nov 3, 2023

Will this work?

import scyllapy
from scyllapy import extra_types

batch = scyllapy.InlineBatch()

prepared = await scylla.prepare("INSERT INTO table (id, val) VALUES (?, ?)")

for i in range(100):
    # NOTE: I am expecting prepared to cast 1.0 to Double
    batch.add_query(prepared, (i, 1.0))

scylla.batch(batch)

@s3rius
Copy link
Member

s3rius commented Nov 3, 2023

No. Because in batches we don't have access to queries. I wanted to implement it, but couldn't do it now.

@s3rius s3rius closed this as completed in #35 Nov 3, 2023
@caniko
Copy link
Author

caniko commented Nov 7, 2023

No. Because in batches we don't have access to queries. I wanted to implement it, but couldn't do it now.

Maybe keep the issue open until you do that? @s3rius

@s3rius
Copy link
Member

s3rius commented Nov 9, 2023

Sure. But I'm not sure if it's possible. Anyway. Maybe we will be able to implement it later.

@s3rius s3rius reopened this Nov 9, 2023
@diviyank
Copy link

diviyank commented Jun 27, 2024

Errors Writing python float to scylla double:

marshaling error: Expected 8 bytes for a floating type, got 4

Can write Python float to scylla Float, but can't read it:

Cannot parse value of column <column> as Float.

Types that are OK

* `int` -> `Integer`

* uuid, but have to prepare it with `uuid.bytes` raw python `uuid` object does not work.

I am open to do a quick zoom call to debug this together!

One of the issues is that we "can" push Float data into the Scylla Table, but as mentioned, we cannot retrieve it ;

Do we absolutely need to convert all floats to doubles to make this work?

Edit: the issue arises from a typo:

https://github.com/Intreecom/scyllapy/blob/d8d01c1866985121073a83e44d8fb305baf6d42d/src/utils.rs#L371C1-L374C56

the cast should be .as_float()

Best regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants