-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement option 'delete_rows' of argument 'if_exists' in 'DataFrame.to_sql' API. #60376
base: main
Are you sure you want to change the base?
feat: Implement option 'delete_rows' of argument 'if_exists' in 'DataFrame.to_sql' API. #60376
Conversation
@WillAyd I chose the name @erfannariman tagging you due to your help/interest during the lifecycle of this issue. |
e443d8d
to
33cd0d6
Compare
pandas/tests/io/test_sql.py
Outdated
@@ -2698,6 +2700,20 @@ def test_drop_table(conn, request): | |||
assert not insp.has_table("temp_frame") | |||
|
|||
|
|||
@pytest.mark.parametrize("conn", sqlalchemy_connectable + ["sqlite_buildin"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use the all_connectable
here? We should have it work for ADBC as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey o/. Thanks a ton for the fast review.
I'm unit testing the method delete_rows
here. I didn't implement this method for the ADBCDatabase
to follow the existing convention.
Do you believe we should implement delete_rows
for the ADBCDatabase
?
Do you see value in unit testing delete_rows
as its functionality is tested at test_to_sql_exist
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea generally we strive to have the same behavior across all drivers; we should only really pare back from using all_connectable
for very specific use cases
I think the fixture in test_to_sql_exist
is a good start, but it is missing assertions around functionality specific to delete_rows
; that test would pass if you re-used the same behavior of "replace", but that's not what we are after
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect. We're 100% aligned then. That's why I created the unit test.
I'm going to refactor it so the ADBC backend implements this new functionality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that ADBC doesn't use SQLAlchemy so I think will have to write the SQL statement. I think that's OK given DELETE FROM
is ANSI-SQL, but will also want to be careful that we aren't at risk of SQL injection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to implement the method but using parameters with the adbc driver didn't work at all.
While cursor.execute("SELECT ?", parameters=(1,))
works like a charm the query DELETE FROM ?
didn't work at all. I couldn't get why and it is related to the internals of the driver. Maybe that's the reason no query in the ADBC backend is using it.
I believe we should create a new issue to investigate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a prepared statement is the right approach; I believe that syntax you are using is also specific to SQLite
I think you can just use f-strings. Ultimately, the driver should be preventing against SQL injection, so as long as we test that is the case I don't think we need any special handling on our end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, just out of curiosity - it is failing for the SQLite database.
Ultimately, the driver should be preventing against SQL injection, so as long as we test that is the case I don't think we need any special handling on our end
Humm... I'm not sure about this statement. AFAIK there's no driver preventing out of box protection of SQL Injection. This responsibility lies to whom is using it, and we're not protected as f-strings are used all over the ADBC backend. Might be worth creating a ticket to check this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prepared statements do not bind table names, which is the issue you are running into.
Humm... I'm not sure about this statement. AFAIK there's no driver preventing out of box protection of SQL Injection. This responsibility lies to whom is using it, and we're not protected as f-strings are used all over the ADBC backend. Might be worth creating a ticket to check this ?
The presence of f-strings are a red herring, as the ADBC drivers are implemented in lower level languages. Upstream, there are PRs like:
and
that attempt to guard against injection. Are you able to circumvent those safeguards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today I learned 🙇 . Thanks a lot - for real. I'll submit a patch in a few.
pandasSQL.to_sql(test_frame1, table_name, if_exists="replace") | ||
== test_frame1.shape[0] | ||
) | ||
assert pandasSQL.delete_rows(table_name) is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an assertion that the table does not actually get dropped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure, good catch !
assert pandasSQL.delete_rows(table_name) is None | ||
assert count_rows(conn, table_name) == 0 | ||
|
||
|
||
@pytest.mark.parametrize("conn", all_connectable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test that ensures that this is atomic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the location of this comment got me confused. You mean testing delete_rows
is atomic ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea; I think can add a new test (unless you see a good one that can be parametrized)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll need some help here. I failed to get the transaction in which delete_rows
is running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to inspect the transaction; that's probably a few layers removed from where we are operating.
I think you can just write some data, change some integral columns to string, and try to rewrite the same data. I think this should fail, and afterwards you can inspect the table to make sure the data that was successfully written remains
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was surprised by this behavior but learned it is a feature. It is possible to use the strict
mode when creating the table but neither of pandas or sqlalchemy support that as of now.
.
Other than that I've been scratching my head the last couple of hours. I was able to assert the atomic behavior for the SQLAlchemy backend but failed miserably for the ADBC ones. Still trying to grok what's going on behind scenes. The result I get running the exact same code is that ADBC is not rolling back on failure - we end up with an empty table. Any hunch ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification on SQLite. So if you were to inspect the type of the column after performing a DELETE operation, would it retain the data type prior to the DELETE or what you append after? I suppose there's some cause for concern there, as this DELETE / Append might introduce different behaviors across different databases, which is something we need to avoid.
Not sure about the ADBC portion, but sounds like the transaction management in the implementation is off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification on SQLite. So if you were to inspect the type of the column after performing a DELETE operation, would it retain the data type prior to the DELETE or what you append after? I suppose there's some cause for concern there, as this DELETE / Append might introduce different behaviors across different databases, which is something we need to avoid.
Yes, it would retain the original type. This affects all databases with dynamic types (SQLite is the only I know). Are you thinking of adding safeguards for both delete/append to avoid that ? Maybe we should document that behavior instead of blocking something a user might want to have as it is provided by the database itself ?
Not sure about the ADBC portion, but sounds like the transaction management in the implementation is off
Yes, and this is what is bothering me. I'll push the latest changes would be great if you could take a look ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm if we can't get consistent behavior I'm not sure its worth pursuing here. This is just a small convenience of the user calling DELETE themselves, but for us to take on and manage expectations around diverging database behavior would be pretty far out of scope for the pandas project
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello again o/,
Tests are working now.
Hmm if we can't get consistent behavior I'm not sure its worth pursuing here.
Pandas is consistent. Be it using append
(prone to the same problem) or delete_rows
. The naming delete_rows
implies we're not going to change the schema of a table. If the database you're using supports such "non-standard" behavior (appending / deleting different types) that is up to you to know - pandas should not interfere on that matter.
This is just a small convenience of the user calling DELETE themselves
I do agree and we're consistent on that matter. We delete all rows of the table in a transaction - no matter the database - and insert the new records.
b1c0e03
to
3c33249
Compare
…Frame.to_sql' API.
3c33249
to
1ef5a87
Compare
doc/source/whatsnew/v3.0.0.rst
file if fixing a bug or adding a new feature.