Bubblegum Update Metadata Version 2 #135

danenbm · 2023-10-23T10:34:52Z

Due to some issues with this approach, most likely we are now going with Version 1

Summary

This PR uses a global update_metadata_seq number for when metadata is updated by either mint_v1 or update_metadata, and uses a download_metadata_seq for the background task that prevents older-downloaded metadata from overwriting newer-downloaded metadata. I pulled out common code from mint_v1 and update_metadata and moved it to db.rs.

The slot_updated checks were removed from most of the Bubblegum program transformers. However, each Bubblegum program transformer makes a call to asset_should_be_updated, which checks whether the asset is being updated by Token Metadata account-based program transformers, or if the global update_metadata_seq is higher than the current sequence number, indicating that an update_metadata occurred later than the current instruction.

Comparison of Version 1 and Version 2

With this Version 2 PR, both mint_v1 and update_metadata update everything. So it uses fewer sequence numbers than Version 1 for protecting the different fields that can be changed by both mint_v1 or update_metadata.

It is more future-proof than Version 1, because if we added more fields that could be changed by update_metadata, they are already protected by the Version 2 strategy and thus would not require any indexing changes.

Numerical comparison of Version 1 and Version 2

	Version 1	Version 2
New seq numbers to check/update	4	2
New queries to `asset` to check before updating	1-2 depending on ix	1
Total upserts in `mintV1` (today main has 10)	12	11
Total upserts in `update_metadata`	6	11

Note that #142 uses a transaction to combine upserts for the asset table.

Sea-ORM autogeneration

As mentioned offline, there were some slight discrepancies with the existing generated Sea-ORM code and the migrations, so I updated the generated code to match the migrations.

Testing

New instruction

Ran a createTree, mintV1, verifyCreator, unverifyCreator, and updateMetadata, and observed after the updateMetadata that the name, symbol, and URI had been changed.
Deleted the database, then ran the exact same 5 transactions in reverse and observed exact same final result in the database.
Again, ran a createTree, mintV1, verifyCreator, unverifyCreator, and updateMetadata, but this time one by one, and checked detailed database state after each instruction.
- Verified the second creator was set to verified after verifyCreator, which matched the creator passed to the instruction.
- Verified the correct creator was set to unverified after unverifyCreator, which matched the creator passed to the instruction.
- Verified that the name, symbol, URI, and creator array changed after updateMetadata, to the values specified in the updateArgs sent into the instruction.

Regression testing

Note did this testing with code in #142 as it stacks on top of this PR.

Tested following instruction sequences from Bubblegum and verified database was in correct state afterwards:
- create_tree, mint_v1, transfer, burn
- create_tree, mint_v1, redeem, decompress_v1
- create_tree, mint_v1, redeem, cancel_redeem, redeem, decompress_v1
- create_tree, mint_v1, transfer, transfer
- create_tree, mint_v1, delegate, transfer
- create_tree, mint_v1, verify_creator
- create_tree, mint_v1, verify_collection
- create_tree, mint_v1, verify_collection, unverify_collection
- create_tree, mint_v1, set_and_verify_collection
- create_tree, mint_to_collection_v1, unverify_collection
Deleted and rebuilt database, ran the instruction sequences from Bubblegum in reverse order (i.e.burn, transfer, mint_v1, create_tree, ) and verified database ended up in same state as when they were run forwards.
Also ran one of the sequences in a random order and duplicating transactions, and still got expected results.

…sformers

…sset table

blockiosaurus

This looks good based on my very little understanding of DAS

nft_ingester/src/tasks/common/mod.rs

…elsewhere

NicolasPennie · 2023-11-30T05:37:59Z

migration/src/m20231019_120101_add_seq_numbers_bgum_update_metadata.rs

+                DatabaseBackend::Postgres,
+                "
+                ALTER TABLE asset_creators
+                RENAME COLUMN seq to verified_seq;


Have you considered what happens when you rename a column for a table with over 275M rows? That's the size of the mainnet creators table. I've never renamed a SQL column but even adding indexes needs to be done carefully at this point. Re-naming a column might not be feasible in production without spinning up a second table and copying the data – which is no fun.

I really don't think renaming a column has anything to do with the size of the table? In postgres the names of columns are stored in the catalog, and this change would just be a single row update in the catalog, it shouldn't require a whole table rewrite.

If you want to make sure to do this update without having to update all the readers at the same time you could create a view that has the old name, but since this change only impacts ingesters I think you can just shut down ingesters, run migrations, start up again,

Is changing the name OK then, given the way you all run migrations?

@linuskendall is right, it should be fine

nft_ingester/src/program_transformers/bubblegum/update_metadata.rs

NicolasPennie · 2023-11-30T06:00:13Z

nft_ingester/src/program_transformers/bubblegum/update_metadata.rs

+                // Upsert into `asset` table.
+
+                // Set base mint info.
+                let tree_id = bundle.keys.get(5).unwrap().0.to_vec();


This value should be parsed from blockbuster instead of via an array index

metaplex-foundation/blockbuster#31

WIll incorporate in #134 after the above is merged

nft_ingester/src/program_transformers/bubblegum/update_metadata.rs

nft_ingester/src/program_transformers/bubblegum/mint_v1.rs

NicolasPennie · 2023-11-30T06:31:42Z

nft_ingester/src/program_transformers/bubblegum/mint_v1.rs

-                    .await
-                    .map_err(|db_err| IngesterError::AssetIndexError(db_err.to_string()))?;
+                // Set base mint info.
+                let tree_id = bundle.keys.get(3).unwrap().0.to_vec();


Can this come from blockbuster instead?

yes I can do that with this and the other similar case you pointed out.

metaplex-foundation/blockbuster#31

Will incorporate in #134 after the above is merged.

nft_ingester/src/program_transformers/bubblegum/mint_v1.rs

NicolasPennie · 2023-11-30T06:35:56Z

nft_ingester/src/program_transformers/bubblegum/db.rs

+                .to_owned(),
+            )
+            .build(DbBackend::Postgres);
+        query.sql = format!(


Why is the method called "unprotected" if this case exists?

Misnomer. I guess it should have been called "partially protected" or something. The protection on the verified flag was needed for verify/unverify. But the positions, share amounts, etc. would be blanket protected by update_metadata_seq.

NicolasPennie · 2023-11-30T06:53:48Z

nft_ingester/src/program_transformers/bubblegum/db.rs

+where
+    T: ConnectionTrait + TransactionTrait,
+{
+    if let Some(asset) = asset::Entity::find_by_id(id).one(txn).await? {


I don't think this approach works. It can lead to race conditions.

Say we go to seq 1 -> 2 with our update

Fetch asset, seq says 1

Determine it can be updated because 2 > 1

Concurrent process upserts new data with seq 3

We then revert from seq 3 -> 2

This function can act as a filter, but not a true safety mechanism. It should be clearly labelled as such. Therefore no method this is called should be "unprotected".

To give an example as a filter:
If current seq = 5 and we want to update to 4, then we know we can eagerly discord the txn.

So this is possible to do since postgres traansactions can be acid, but it needs some changes to the handling. My initial thoughts would be that we have to do:

Need to use transactions (this is already there in a separate pr)

Need to use SELECT FOR UPDATE

Need serializable transaction isiolation level (I'd have to think, maybe we don't need quite as strict ?)

This would ensure that step no 4 would fail due to that a concurrent process has updated a field inin a SELECT FOR UPDATE clause.

I agree the current approach is unsound with concurrent ingestor processes. I forgot about this, as my docker testing only runs one ingester. I felt there was something too easy about Version 2, but couldn't quite see it as I tested and thought about it, which is why I had kept Version 1 around.

The choices are:

Either go back to more of the Version 1 approach, or

look into furthering the use of Postgres transactions to resolve it.

After offline discussion with @linuskendall and @NicolasPennie we are planning on doing something more like Version 1, but trying to incorporate transactions to combine some of the upserts in the style of #142. I just have to make sure I understand/implement the error handling correctly based on the comments from @linuskendall.

Went back to Version 1 approach: #134 and incorporated all learnings from Version 2 PR.

nft_ingester/src/program_transformers/bubblegum/delegate.rs

linuskendall

In principle it looks ok but Nick's concern about the serialisability of the changes rquires addressing. This can be addressed via DB transaction and setting the appropraite isolation levels.

linuskendall · 2023-11-30T21:28:09Z

digital_asset_types/src/dao/generated/asset_grouping.rs

 pub struct Model {
    pub id: i64,
    pub asset_id: Vec<u8>,
    pub group_key: String,
    pub group_value: Option<String>,
    pub seq: Option<i64>,
    pub slot_updated: Option<i64>,
-    pub verified: Option<bool>,
+    pub verified: bool,


do we need a migration to set the default value for thid? since this used to be optional we might have null values in. db

If you look at m20230720_120101_add_asset_grouping_verified.rs it was not nullable from that point. All I did for this was regenerate the SeaORM objects based on the existing migrations. Do we need a special migration to check whether people have it as Optional in their databases? My guess is that it wasn't ever nullable and the SeaORM was somehow not updated?

linuskendall · 2023-11-30T21:28:41Z

digital_asset_types/src/dao/generated/cl_audits.rs

@@ -22,7 +22,7 @@ pub struct Model {
    pub seq: i64,
    pub level: i64,
    pub hash: Vec<u8>,
-    pub created_at: Option<DateTime>,
+    pub created_at: DateTime,


same as above , does this need a migration since it has changed from option

Yeah I have same response. If you look at m20230919_072154_cl_audits.rs its not nullable in the original migration that added the table.

linuskendall · 2023-11-30T21:32:54Z

migration/src/m20231019_120101_add_seq_numbers_bgum_update_metadata.rs

+                DatabaseBackend::Postgres,
+                "
+                ALTER TABLE asset_creators
+                RENAME COLUMN seq to verified_seq;


I really don't think renaming a column has anything to do with the size of the table? In postgres the names of columns are stored in the catalog, and this change would just be a single row update in the catalog, it shouldn't require a whole table rewrite.

linuskendall · 2023-11-30T21:34:48Z

migration/src/m20231019_120101_add_seq_numbers_bgum_update_metadata.rs

+                DatabaseBackend::Postgres,
+                "
+                ALTER TABLE asset_creators
+                RENAME COLUMN seq to verified_seq;


If you want to make sure to do this update without having to update all the readers at the same time you could create a view that has the old name, but since this change only impacts ingesters I think you can just shut down ingesters, run migrations, start up again,

linuskendall · 2023-11-30T21:59:24Z

nft_ingester/src/program_transformers/bubblegum/db.rs

+where
+    T: ConnectionTrait + TransactionTrait,
+{
+    if let Some(asset) = asset::Entity::find_by_id(id).one(txn).await? {


So this is possible to do since postgres traansactions can be acid, but it needs some changes to the handling. My initial thoughts would be that we have to do:

Need to use transactions (this is already there in a separate pr)

Need to use SELECT FOR UPDATE

Need serializable transaction isiolation level (I'd have to think, maybe we don't need quite as strict ?)

This would ensure that step no 4 would fail due to that a concurrent process has updated a field inin a SELECT FOR UPDATE clause.

nft_ingester/src/program_transformers/bubblegum/db.rs

linuskendall · 2023-11-30T22:05:55Z

nft_ingester/src/tasks/common/mod.rs

@@ -110,24 +111,29 @@ impl BgTask for DownloadMetadataTask {
            id: Unchanged(download_metadata.asset_data_id.clone()),
            metadata: Set(body),
            reindex: Set(Some(false)),
+            download_metadata_seq: Set(Some(download_metadata.seq)),


I am not sure about this addition. I can see that the download_metadata uri should be protected by a seq since that is on chain. But the offchain metadata can change at any time. To me the system should always download the latest metadata that is offchain irrespectively of the seq?

So as long as we know that the URI in the asset_data table is only ever updated to the latest we shouldn't protect the download metadata itself?

If the URI is updated, then indeed we must redownload metadata but I also don't see a harm in scheduling additional metadata downloads in between. Not having an additional sequence number to check might be quite good also.

Adding this sequence number has limited value and causes extra DB queries/complexity.

If we want to limit the number of downloads, we could either

a) keep track of URI changes and only trigger a redownload if the URI hasn't been downloaded yet
b) make the download metadata task smarter to understand if the offchain data has changed (i.e. do some kind of HEAD check to find the HTTP headers to show if data is changed)

I can see that the download_metadata uri should be protected by a seq since that is on chain. But the offchain metadata can change at any time. To me the system should always download the latest metadata that is offchain irrespectively of the seq?

Even though the URI in the asset_data table is protected there and thus only ever updated to the latest URI, there is not an easy way to cancel all existing tasks for an asset, so I think an older task could overwrite newer downloaded data.

Example:

mint_v1, has uri1, seq 1, kicks off task(uri1, seq=1).

update_metadata, has uri2, seq2, kicks of task(uri2, seq=2)

Task2 finishes and sets asset_data.metadata to the data from uri2.

Task1 finishes and incorrectly sets asset_data.metadata to the data from uri1.

However, with using a download_metadata_seq, step 4 would be different. Task1 would finish and NOT update asset_data.metadata to the data from uri1.

danenbm · 2023-12-04T08:49:02Z

Due to some issues with this approach, most likely we are now going with Version 1

danenbm · 2023-12-05T06:25:22Z

Version 2 asset table checks can cause a race condition with concurrent ingestor processes. Closing this PR in favor of Version 1

danenbm added 18 commits October 15, 2023 14:33

Add code to index Bubblegum Update Metadata

09a599d

Update rust toolchain file

5ef2eb9

Merge branch 'main' into danenbm/update-metadata-parsing

7dcf74b

Fix moved variable after merge

2084aa7

Add code from mintV1 that allows for empty URI

269bf0d

Ordering using asset.seq initially applied to update_metadata

bdf8e3c

Add simple check for whether asset was decompressed to Bubblegum tran…

0240ce5

…sformers

Don't prevent sequence number update when already decompressed

c26c693

Add sequence number to downloading metadata background task

dd891e8

Add sequence number migration (Sea ORM not regenerated yet)

97fe275

Regenerate Sea-ORM types

660a6ba

Use new sequence numbers for Bubblegum Update Metadata

213dac5

Extra condition to protect out of order creator verification

026c16a

Remove base_info_seq for each creator and add creators_added_seq to a…

e797adc

…sset table

Regenerate Sea-ORM types

f2b1218

Change creator metadata updates to use new creators_added_seq

77020fa

Use update_metadata_seq to protect multiple items from multiple tables

5c15538

Refactor to use update_metadata_seq

ccc9257

danenbm changed the title ~~Danenbm/update metadata parsing 2~~ Bubblegum Update Metadata Version 2 Oct 23, 2023

danenbm mentioned this pull request Oct 23, 2023

Bubblegum Update Metadata Version 1 #134

Merged

danenbm requested review from NicolasPennie, linuskendall, febo, Juanito87 and blockiosaurus October 23, 2023 11:09

danenbm marked this pull request as ready for review October 23, 2023 11:09

blockiosaurus approved these changes Oct 23, 2023

View reviewed changes

danenbm commented Oct 23, 2023

View reviewed changes

nft_ingester/src/tasks/common/mod.rs Outdated Show resolved Hide resolved

danenbm added 2 commits October 28, 2023 19:40

Modify unprotected_upsert_asset_base_info to update info not covered …

2077511

…elsewhere

Use updated version of Blockbuster that correctly parses

a2999e2

danenbm mentioned this pull request Nov 14, 2023

Danenbm/fix update metadata parse metaplex-foundation/blockbuster#30

Merged

This was referenced Nov 22, 2023

Use transactions for asset table upserts #142

Closed

Update raw name and raw symbol for existing NFTs #139

Merged

Change download metadata seq check to use lte

d817795

danenbm force-pushed the danenbm/update-metadata-parsing-2 branch from bc7c374 to d817795 Compare November 27, 2023 21:17

danenbm added 2 commits November 28, 2023 12:57

Index verified for token metadata collection

30d83ea

Update to use latest blockbuster beta release

c7028bb

NicolasPennie reviewed Nov 30, 2023

View reviewed changes

linuskendall approved these changes Nov 30, 2023

View reviewed changes

danenbm closed this Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bubblegum Update Metadata Version 2 #135

Bubblegum Update Metadata Version 2 #135

danenbm commented Oct 23, 2023 •

edited

Loading

blockiosaurus left a comment

NicolasPennie Nov 30, 2023

linuskendall Nov 30, 2023

linuskendall Nov 30, 2023

danenbm Dec 1, 2023

NicolasPennie Dec 5, 2023

NicolasPennie Nov 30, 2023

danenbm Dec 4, 2023

danenbm Dec 4, 2023 •

edited

Loading

NicolasPennie Nov 30, 2023

danenbm Dec 1, 2023

danenbm Dec 4, 2023

danenbm Dec 4, 2023

NicolasPennie Nov 30, 2023

danenbm Dec 1, 2023

NicolasPennie Nov 30, 2023

linuskendall Nov 30, 2023

danenbm Dec 1, 2023

danenbm Dec 4, 2023

linuskendall left a comment

linuskendall Nov 30, 2023

danenbm Dec 2, 2023 •

edited

Loading

linuskendall Nov 30, 2023

danenbm Dec 2, 2023

linuskendall Nov 30, 2023

linuskendall Nov 30, 2023

linuskendall Nov 30, 2023

linuskendall Nov 30, 2023

linuskendall Nov 30, 2023

danenbm Dec 2, 2023 •

edited

Loading

danenbm commented Dec 4, 2023

danenbm commented Dec 5, 2023

Bubblegum Update Metadata Version 2 #135

Bubblegum Update Metadata Version 2 #135

Conversation

danenbm commented Oct 23, 2023 • edited Loading

Due to some issues with this approach, most likely we are now going with Version 1

Summary

Comparison of Version 1 and Version 2

Numerical comparison of Version 1 and Version 2

Sea-ORM autogeneration

Testing

New instruction

Regression testing

blockiosaurus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danenbm Dec 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linuskendall left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danenbm Dec 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danenbm Dec 2, 2023 • edited Loading

Choose a reason for hiding this comment

danenbm commented Dec 4, 2023

danenbm commented Dec 5, 2023

danenbm commented Oct 23, 2023 •

edited

Loading

danenbm Dec 4, 2023 •

edited

Loading

danenbm Dec 2, 2023 •

edited

Loading

danenbm Dec 2, 2023 •

edited

Loading