Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On conflict clause / upserts #816

Merged
merged 32 commits into from
May 3, 2023
Merged

On conflict clause / upserts #816

merged 32 commits into from
May 3, 2023

Conversation

dantownsend
Copy link
Member

@dantownsend dantownsend commented Apr 28, 2023

Resolves #252

Based on #798 by @sinisaos with the following changes:

  • We're not supporting older SQLite versions, to keep things simpler.
  • Gives a bit more granular control over which constraints are used, and which values to update.

Remaining tasks:

  • Update docs
  • Add more tests
  • Raise an exception for older SQLite versions

@dantownsend dantownsend added the enhancement New feature or request label Apr 28, 2023
@codecov-commenter
Copy link

codecov-commenter commented Apr 28, 2023

Codecov Report

Merging #816 (8c4a99c) into master (9841730) will decrease coverage by 13.35%.
The diff coverage is 86.73%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff             @@
##           master     #816       +/-   ##
===========================================
- Coverage   91.36%   78.02%   -13.35%     
===========================================
  Files         108      108               
  Lines        7552     7644       +92     
===========================================
- Hits         6900     5964      -936     
- Misses        652     1680     +1028     
Impacted Files Coverage Δ
piccolo/apps/asgi/commands/new.py 73.43% <ø> (ø)
piccolo/query/methods/objects.py 95.30% <0.00%> (-2.68%) ⬇️
piccolo/query/methods/select.py 91.83% <50.00%> (-6.13%) ⬇️
piccolo/query/mixins.py 87.65% <86.25%> (-5.61%) ⬇️
piccolo/query/methods/insert.py 97.87% <100.00%> (+0.65%) ⬆️

... and 36 files with indirect coverage changes

@sinisaos
Copy link
Member

@dantownsend What is the syntax for updating multiple on_conflict columns? I tried a list of tuples like this

values=[(Band.name, "Javas"),(Band.popularity, 1000)] 

but it doesn't work. How do we write test like this https://github.com/piccolo-orm/piccolo/pull/798/files#diff-fac5af1d6e85e27669c58e982989dfcf8327f312f8b29a272a1188f86ecb7f39R126-R127 or that's not possible at this time?

@dantownsend
Copy link
Member Author

@dantownsend What is the syntax for updating multiple on_conflict columns? I tried a list of tuples like this

@sinisaos My thinking was that the values list can either take a column, or a tuple. If it's a column, it uses the new value. If it's a tuple, it uses the value specified in the tuple.

# If there's a conflict, it sets the popularity to 2000:
await Band.insert(
    Band(name='Pythonistas', popularity=2000)
).on_conflict(
    action='DO UPDATE',
    target=[Band.name],
    values=[Band.popularity]
)

# If there's a conflict, it sets the popularity to something custom:
await Band.insert(
    Band(name='Pythonistas', popularity=2000)
).on_conflict(
    action='DO UPDATE',
    target=[Band.name],
    values=[(Band.popularity, 3000)]
)

I haven't had time to fully test this though - so there may be bugs.

@sinisaos
Copy link
Member

@dantownsend I understand it for one column but how to do it for multiple columns. With the old API we didn't have to set any columns or values ​​because Piccolo did everything for us. For example if we have rows like this

[
    {'id': 1, 'name': 'Band 1', 'popularity': 1}, 
    {'id': 2, 'name': 'Band 2', 'popularity': 2}, 
    {'id': 3, 'name': 'Band 3', 'popularity': 3}
]

we can set one or more columns to update on_conflict like this for one column (e.g Band.popularity change and Band.name stay the same)

await Band.insert(
    Band(id=1, name="Band 1", popularity=111),
    Band(id=2, name="Band 2", popularity=222),
    Band(id=3, name="Band 3", popularity=333),
    on_conflict=OnConflict.do_update,
).run()
# result
[
    {'id': 1, 'name': 'Band 1', 'popularity': 111}, 
    {'id': 2, 'name': 'Band 2', 'popularity': 222}, 
    {'id': 3, 'name': 'Band 3', 'popularity': 333}
]

or for multiple column (Band.popularity and Band.name both changes)

await Band.insert(
    Band(id=1, name="Band 1111", popularity=1111),
    Band(id=2, name="Band 2222", popularity=2222),
    Band(id=3, name="Band 3333", popularity=3333),
    on_conflict=OnConflict.do_update,
).run()
# result
[
    {'id': 1, 'name': 'Band 1111', 'popularity': 1111}, 
    {'id': 2, 'name': 'Band 2222', 'popularity': 2222}, 
    {'id': 3, 'name': 'Band 3333', 'popularity': 3333}
]

How can we do this with the new API, and if not implemented, please do. Sorry for long comment.

@dantownsend
Copy link
Member Author

Should be something like this:

await Band.insert(
    Band(id=1, name="Band 1111", popularity=1111),
    Band(id=2, name="Band 2222", popularity=2222),
    Band(id=3, name="Band 3333", popularity=3333)
).on_conflict(
    action='DO UPDATE',
    target=[Band.id],
    values=[Band.name, Band.popularity]
)

If there are loads of columns we pass Band.all_columns() to target, or make that the default behaviour if the user doesn't specify values.

@sinisaos
Copy link
Member

sinisaos commented Apr 28, 2023

I'm sorry, but that doesn't work. Both Sqlite and Postgres throw a syntax error (sqlite3.OperationalError: near ""popularity"": syntax error and asyncpg.exceptions.PostgresSyntaxError: syntax error at or near ""popularity""). If we set only one column like this values=[Band.name] everything works. I hope this helps you find what the problem is.

@dantownsend
Copy link
Member Author

I'm sorry, but that doesn't work. Both Sqlite and Postgres throw a syntax error (sqlite3.OperationalError: near ""popularity"": syntax error and asyncpg.exceptions.PostgresSyntaxError: syntax error at or near ""popularity""). If we set only one column like this values=[Band.name] everything works. I hope this helps you find what the problem is.

Thanks - sounds like a bug. I'll see what's going on.

@sinisaos
Copy link
Member

@dantownsend Yes. Missing comma between EXCLUDED columns after UPDATE SET in querystring. This is log
INSERT INTO "band" ("id","name","popularity") VALUES (1,'Band 1',1),(2,'Band 2',2),(3,'Band 3',3) ON CONFLICT ("id") DO UPDATE SET "name"=EXCLUDED."name" "popularity"=EXCLUDED."popularity"

@dantownsend
Copy link
Member Author

@sinisaos You're right - that's the problem. I'll refactor it later.

piccolo/query/mixins.py Outdated Show resolved Hide resolved
piccolo/query/mixins.py Outdated Show resolved Hide resolved
@sinisaos
Copy link
Member

@dantownsend I'm sorry if I caused problems because I never used Github suggested changes, but it seemed to me that it was easiest to show what changes I think should be made.

@dantownsend
Copy link
Member Author

@dantownsend I'm sorry if I caused problems because I never used Github suggested changes, but it seemed to me that it was easiest to show what changes I think should be made.

It's good - works well. Thanks for suggesting those changes 👍

@dantownsend
Copy link
Member Author

I've added some docs. Just need to decide on the names of the arguments.

It's currently:

>>> await Band.insert(
...     Band(name="Pythonistas", popularity=1200)
... ).on_conflict(
...     action="DO UPDATE",
...     target=[Band.name],
...     values=[Band.popularity],
... )

The names action, target and values were inspired by the Postgres docs, but I'm wondering if this is better?

>>> await Band.insert(
...     Band(name="Pythonistas", popularity=1200)
... ).on_conflict(
...     action="DO UPDATE",
...     constraints=[Band.name],
...     update=[Band.popularity],
... )

@dantownsend dantownsend marked this pull request as ready for review April 30, 2023 21:43
@dantownsend
Copy link
Member Author

@sinisaos I think it's pretty much done now 🤞

The only thing I know of missing is we're letting the user specify the target using a string. Postgres lets you specify the target using the name of a column, or an index expression:

https://www.postgresql.org/docs/current/sql-insert.html#index_expression

I need to read a bit more about how the index expression is meant to work.

@@ -1364,12 +1364,12 @@ def test_distinct_on_sqlite(self):
SQLite doesn't support ``DISTINCT ON``, so a ``ValueError`` should be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be NotImplementedError instead ValueError.

@sinisaos
Copy link
Member

sinisaos commented May 1, 2023

This works and looks great. Can I suggest two more tests that will cover tuple value as columns and tuple value as columns name in values ​​argument.

def test_do_update_tuple_values(self):
    """
    We can use tuple values in ``values``.
    """
    Band = self.Band

    new_popularity = self.band.popularity + 1000
    new_name = "Rustaceans"

    Band.insert(
        Band(
            id=self.band.id,
            name=new_name,
            popularity=new_popularity,
        )
    ).on_conflict(
        action="DO UPDATE",
        targets=[Band.id],
        values=[
            (Band.name, new_name),
            (Band.popularity, new_popularity + 2000),
        ],
    ).run_sync()

    self.assertListEqual(
        Band.select().run_sync(),
        [
            {
                "id": self.band.id,
                "name": new_name,
                "popularity": new_popularity + 2000,
            }
        ],
    )

def test_do_update_tuple_values_string_column_name(self):
    """
    We can use string column name in tuple values.
    """
    Band = self.Band

    new_popularity = self.band.popularity + 1000
    new_name = "Rustaceans"

    Band.insert(
        Band(
            id=self.band.id,
            name=new_name,
            popularity=new_popularity,
        )
    ).on_conflict(
        action="DO UPDATE",
        targets=[Band.id],
        values=[
            ("name", new_name),
            ("popularity", new_popularity + 2000),
        ],
    ).run_sync()

    self.assertListEqual(
        Band.select().run_sync(),
        [
            {
                "id": self.band.id,
                "name": new_name,
                "popularity": new_popularity + 2000,
            }
        ],
    )

@sinisaos sinisaos mentioned this pull request May 1, 2023
@dantownsend
Copy link
Member Author

Can I suggest two more tests that will cover tuple value as columns and tuple value as columns name in values ​​argument.

Good point - I'll add those.

I realised I made a small mistake. When specifying the target, you can specify multiple columns. If you specify column_a and column_b, it means it's targeting a composite unique constraint for column_a and column_b. I realised when I was writing a test - need to update the docs to reflect this.

@dantownsend
Copy link
Member Author

I added a where clause, but it's failing on cockroach.

I also need to move it. The first one is just useful for conditional constraints, but the second one is more likely to be used:

Screenshot 2023-05-02 at 23 17 41

@sinisaos
Copy link
Member

sinisaos commented May 3, 2023

I added a where clause, but it's failing on cockroach.

I also need to move it. The first one is just useful for conditional constraints, but the second one is more likely to be used:

I agree with you and think you should move self.where below self.actions to get correct querystring like this

if self.action:
    query += " {}"
    values.append(self.action_string)

if self.where:
    query += " WHERE {}"
    values.append(self.where.querystring)

because Cockroach ON CONFLICT with where clause syntax is like this (from docs)

You can also use a WHERE clause to apply the DO UPDATE SET expression conditionally:

INSERT INTO accounts (id, balance)
    VALUES (8, 700)
    ON CONFLICT (id)
    DO UPDATE SET balance = excluded.balance
    WHERE excluded.balance > accounts.balance;

After that you should also update the test for where

query = Band.insert(
    Band(name=self.band.name, popularity=new_popularity)
).on_conflict(
    target=Band.name,
    where=Band.popularity < new_popularity,
    action="DO UPDATE",
    values=[Band.popularity],
)

self.assertIn(
    f'WHERE "band"."popularity" < {new_popularity}',
    query.__str__(),
)

I hope that makes sense.

@dantownsend
Copy link
Member Author

@sinisaos Yes, you're right - thanks. I've made those changes.

Once the tests pass, I'll release this. There are some minor things missing, but I think this is fine for 99% of use cases.

@dantownsend dantownsend merged commit 63af9c4 into master May 3, 2023
@dantownsend
Copy link
Member Author

Follow up actions in the future:

  • Maybe expose some of this via save.
  • Allow conditional constraints (using where)
  • Use on_conflict='DO NOTHING' with Table.add_m2m

@sinisaos
Copy link
Member

sinisaos commented May 3, 2023

  • Use on_conflict='DO NOTHING' with Table.add_m2m

When you release the new Piccolo version, if you want, I can open a PR to do it as we discussed.

@dantownsend
Copy link
Member Author

When you release the new Piccolo version, if you want, I can open a PR to do it as we discussed.

@sinisaos Thanks. My only concern is we might need composite unique constraints to make it work. So the M2M table has two foreign keys, and they're unique together.

@sinisaos
Copy link
Member

sinisaos commented May 3, 2023

@dantownsend Ok. You're probably right and you know better about these things than I do, so I'll leave it up to you.

@dantownsend
Copy link
Member Author

dantownsend commented May 3, 2023

@sinisaos Composite unique constraints is definitely something we need to add soon. I just haven't figured out a great API for it, because most of Piccolo works with column references instead of strings.

Could be something like this (using strings):

class Band(Table, constraints=[UniqueTogether('name', 'manager')]):
    name = Varchar()
    manager = ForeignKey()

Or this:

class Band(Table):
    name = Varchar()
    manager = ForeignKey()
    constraints = [UniqueTogether('name', 'manager')])

This might work ... I'm not sure though:

class Band(Table):
    name = Varchar()
    manager = ForeignKey()
    constraints = [UniqueTogether(name, manager)])

I considered something like this, but it's hard to track for migrations:

class Band(Table):
    name = Varchar()
    manager = ForeignKey()

    @classmethod
    def constraints(cls):
        return [UniqueTogether(cls.name, cls.manager)]

If you have any ideas / preference let me know.

@sinisaos
Copy link
Member

sinisaos commented May 3, 2023

@dantownsend If I may ask, why not use this PR #582?

@dantownsend
Copy link
Member Author

@sinisaos I think that PR is good. Just considering all the options, as once it's in, the API is very hard to change.

Drapersniper added a commit to PyLav/PyLav that referenced this pull request May 3, 2023
@sinisaos
Copy link
Member

sinisaos commented May 3, 2023

@dantownsend These are all valid choices and I don't know how difficult it is to implement them, but I like this example the most.

class Band(Table):
    name = Varchar()
    manager = ForeignKey()
    constraints = [UniqueTogether('name', 'manager')])

@dantownsend
Copy link
Member Author

dantownsend commented May 3, 2023

@sinisaos Yeah, something like that probably makes more sense than passing the args in like class Band(Table, constraints=...).

Constraints and schema support are the main things to add next. Will try and go through that PR soon.

@sinisaos
Copy link
Member

sinisaos commented May 4, 2023

@dantownsend As usual you were right 😄. Unique constraints in joining_table will solve the problem of duplicates. I just tried adding a constraints with raw sql like this (just an example)

class TrackToPlaylist(Table, db=DB):
    tracks = ForeignKey(Track)
    playlists = ForeignKey(Playlist)

    async def constraints():
        constraint = "CREATE UNIQUE INDEX unique_tracks_playlists ON track_to_playlist(tracks, playlists)"
        await TrackToPlaylist.raw(constraint)

and then I just change the insert methods to use on_conflict in M2MAddRelated like this

async def _run(self):
    ...
    if unsaved:
            await rows[0].__class__.insert(*unsaved).on_conflict(
                action="DO NOTHING"
            ).run()
    ...
    return (
            await joining_table.insert(*joining_table_rows)
            .on_conflict(action="DO NOTHING")
            .run()
        )

and everything works and there are no duplicates in the joining_table. When you implement UniqueTogether it should also solve the problem of duplicates in m2m. This is just info and I hope this is useful for you.

@dantownsend
Copy link
Member Author

@sinisaos Thanks for checking that - we know it's the right approach then.

Do you want to create a PR, adding on_conflict to that insert query? It might be worth merging in straight away, as at least the user can manually add the composite unique constraint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an API for upserting
3 participants