Skip to content

Aura node architecture

Kyrylo Stepanov edited this page Nov 25, 2024 · 5 revisions

Aura node architecture

The Aura node can be divided into four primary components: Ingester, Synchronizer, Backfiller, and API. These core workers operate continuously to ensure data consistency and availability.

  • Ingester and Backfiller: These components are responsible for populating the main database—RocksDB. The ingester handles real-time data ingestion, while the backfiller ensures that any missed data is retroactively inserted into RocksDB.

  • Synchronizer: This worker moves data from RocksDB to PostgreSQL, enabling more complex API queries that require relational data handling and advanced querying capabilities.

  • API: The API layer is responsible for serving data to external consumers, providing easy access to indexed information.

Additional Workers

In addition to the primary workers, there are six auxiliary workers available for specialized tasks:

  • Migrator

  • RawBackfiller

  • RawBackup

  • ColumnCopier

  • ColumnRemover

  • ForkDetector

image

Ingester

This module receives the latest updates from the Solana blockchain, indexes them, and stores them in the primary database—RocksDB.

The core function of this module is to retrieve updates from the data source (either Redis or TCP stream), parse the data, and save the relevant transaction and account information into RocksDB.

It can source the latest transactions and account updates via Redis or a direct TCP connection, depending on the configuration of the message_source parameter. While both options are supported, using Redis has a significant advantage: during Aura node updates, no data will be lost. Redis stores all updates during downtime, eliminating the need to re-parse the entire accounts snapshot to restore the current network state after an update.

Data processing

The Aura node processes both account updates and transactions. Account updates are essential for handling NFTs created with the MPL Metadata or MPL Core Solana programs. On the other hand, transactions are needed for processing compressed NFTs (cNFTs), as all cNFT data is stored within instruction arguments and events.

The system uses two key components for parsing and storing data into RocksDB:

  • AccountsProcessor: Responsible for parsing and saving account-related data.

  • BubblegumTxProcessor: Handles the processing of compressed NFT transactions, saving relevant data to RocksDB. Both of these workers exclusively save data to RocksDB.

Additionally, there is a BatchMintPersister, which processes batch minting operations through a separate queue from accounts and transactions. Its task is to download a JSON file containing the assets to be minted, reconstruct the local Merkle tree, and verify that the provided data in the JSON file is accurate. If everything checks out, it stores the assets from the file into RocksDB.

Consistency checks

In addition to the main ingester task, which collects the latest NFT updates, there are several auxiliary workers responsible for identifying and filling gaps in the data. These workers include SignatureFetcher, SequenceConsistentGapFiller, and ForkCleaner.

  • SignatureFetcher: This worker fetches all transaction signatures for the Bubblegum program using the Solana RPC. As new transactions are processed (from sources like TCP or Redis), their signatures are stored in RocksDB. SignatureFetcher compares the signatures in RocksDB with those returned by the RPC. If discrepancies are found, it indicates a gap in the data. When a gap is detected, SignatureFetcher retrieves the missing transactions from the RPC and processes them to maintain data consistency.

  • SequenceConsistentGapFiller: This worker detects gaps in Merkle tree sequences. Each cNFT update action (e.g., transfer or update) increments the tree sequence, and these sequences are tracked in RocksDB for each tree. If a sequence is missing (e.g., seq n+1 is skipped), the worker signals the need to reprocess blocks in the range where the gap occurred (i.e., from seq n to seq n+2). It pushes the affected block numbers to the force_reingestable_slots column family. Another worker then iterates over this column family, downloading and parsing the required blocks to fill the sequence gap.

  • ForkCleaner: This worker periodically checks the LeafSignature column family in RocksDB, which stores signatures for all processed transactions. ForkCleaner looks for signatures that exist in forks, typically indicated by identical signatures with different slots and Merkle tree sequences. Upon detecting a fork, ForkCleaner removes entries from CLItems that correspond to forked sequences and deletes those sequences from the TreeSeqIdx column family. This allows SequenceConsistentGapFiller to identify the missing sequences and trigger the reprocessing of affected transactions.

Backfiller

Backfiller is a background job which can do different tasks, depends on it's mode. Some of them are one-time jobs and some continious.

It has four modes of operation. Below, you can find a brief description for each of them.

Ingest Directly

This is a one-time job.

The consumer is the DirectBlockParser, which is a struct with a Bubblegum transactions parser.

The produced item is the BackfillSource. The inner object can either be a BigTable client or an RPC client.

It launches the SlotsCollector with parameters to start from and parse until to collect slots (u64 numbers) for a pubkey.

The SlotsCollector saves slots to the BubblegumSlots Rocks CF.

Then the TransactionsParser is launched. It uses the BubblegumSlotGetter to get slots to process from the BubblegumSlots CF.

The block producer here is the BackfillSource.

It processes blocks, saves transaction results, and then drops the numbers of processed slots from the Rocks BubblegumSlots and adds them to the IngestableSlots CFs.

It doesn’t save any parameters and doesn’t save raw blocks.

Persist

This is a one-time job.

The consumer is RocksDB.

The producer is the BackfillSource (either BigTable or RPC).

Slots to start from and parse until will be taken from the config.

The SlotsCollector collects slots and saves them to the BubblegumSlots Rocks CF.

The TransactionsParser is launched to get the block by slot number and persists it to the Rocks RawBlock CF.

Once a block is persisted, its number is dropped from BubblegumSlots and added to the IngestableSlots Rocks CF.

It doesn’t save any parameters.

Ingest Persisted

This is a one-time job.

The consumer is the DirectBlockParser, which is a struct with a Bubblegum transactions parser.

The producer is RocksDB.

At the beginning, the TransactionsParser takes the slot to start from. It can take this value either from the config or it will start the iteration from the beginning of the raw_blocks_cbor Rocks CF.

For the DirectBlockParser, the already_processed_slot function always returns false, so it will parse everything.

The block is extracted from the producer - RocksDB.

The consumer receives the block and processes it. More specifically, it parses transactions, calls get_ingest_transaction_results() to get TransactionResult, and saves it to the Rocks.

Once it has parsed all the blocks, it saves the maximum slot number to the LastFetchedSlot RocksDB parameter. This allows us to restart the backfiller in PersistAndIngest mode and it will start collecting new slots and blocks we don't have yet in the DB.

Once it finishes its job, it will not do any post backfill jobs.

Persist And Ingest

This is a continuous job.

Three workers are running in this mode: slot collector, block fetcher and saver, and block parsing.

Perpetual Slot Collection

The consumer is Rocks.

The producer for slots is the BackfillSource (BigTable or RPC).

It takes the parse_until slot from RockDB LastFetchedSlot parameter. If there is no value, it takes it from the config.

Slot numbers are saved to the BubblegumSlots Rocks CF.

Perpetual Slot Processing

The consumer is RocksDB.

The producer is the BackfillSource (BigTable or RPC).

From the BubblegumSlots Rocks CF, slots are extracted, and then the block is downloaded with the help of the BackfillSource.

Once the block is downloaded and saved, the slot is dropped from the BubblegumSlots CF and also this slot is added to the IngestableSlots CF so the next worker could parse it.

Perpetual Block Ingestion

The consumer is the DirectBlockParser, which is a struct with a Bubblegum transactions parser.

The producer is RocksDB.

The IngestableSlotGetter returns slots from the IngestableSlots CF, then blocks are extracted from the Rocks.

Once a block is received, it’s parsed, and the slot is dropped from the IngestableSlots CF.

Json processor

This worker is responsible for downloading JSON files. During database synchronization, the Synchronizer assigns tasks to download any missing JSONs. The JSON Processor handles these tasks by retrieving them from PostgreSQL and then storing the downloaded JSON files into RocksDB.

Backup worker

Coming soon...

GRPC client

Coming soon...

Synchronizer

Since the Aura node uses two databases—RocksDB and PostgreSQL—a tool is needed to ensure data consistency between them. For this, the AssetsUpdateIdx column family in RocksDB stores all asset update indexes. These indexes consist of the sequence number, slot, and account public key. The sequence is an internal counter tracking updates and is unrelated to the Merkle tree sequence. Every update that the ingester processes is saved to this column family.

On the PostgreSQL side, the same index is stored to track the latest synchronized update.

API

Coming soon...

Migrator

Legacy utilite to migrate JSONs and tasks from one data base to another one.

Raw backfiller

This is a separate tool that performs all the functions of the Backfiller from the Ingester, except for direct ingestion. It shares the same codebase as the Ingester.

The tool is primarily used for downloading large amounts of raw blocks or parsing blocks that have already been downloaded. A separate binary was created to allow these processes to run independently. It’s typically necessary to use this tool when setting up a new node.

In the scripts/ directory, you’ll find two bash scripts to execute these processes:

  • run-ingest-persisted (for parsing already downloaded blocks)

  • run-slots-persisting (for downloading and saving raw blocks)

Raw backup

This tool creates backups of raw blocks and JSONs. It works by iterating through all blocks and JSONs in the source RocksDB and copying them to the target RocksDB. This is particularly useful when you want to create a backup of raw, non-indexed data. The backup can later be used by the Aura node itself or by other indexers that utilize different data structures.

Column copier

This tool copies column data from one RocksDB to another. The source database is opened in secondary mode. It's primarily a development tool, useful for debugging purposes.

Column remover

As the name suggests, this tool is used to drop specific columns from RocksDB. Like the Column Copier, it's mainly a development tool, helpful for bug fixing and debugging various cases.

Fork detector

This binary is designed to detect transactions that were part of a fork, particularly identifying cNFTs that were updated in these forked transactions.

The script was necessary because the previous fork cleaner could incorrectly handle data removal when a fork occurred. Specifically, if the same asset is updated in multiple blocks (one of which is forked) and those blocks have different sequences, the cleaner doesn't properly resolve the discrepancy. It may remove one sequence but leave the other, which can lead to problems. If the sequence from the forked block (which may be higher) is dropped, the tool won’t backfill the lower sequence that was accepted by the majority of validators.

It's important to run this binary with the indexer turned off.

Once a fork is detected, the binary removes the corresponding sequences. Afterward, when the indexer is relaunched, the SequenceConsistentGapFiller identifies any gaps in the sequences and fills them appropriately.

The current version of the fork cleaner handles forks efficiently, so this tool doesn't need to be run continuously.

Aura storage

As mentioned earlier, the Aura node utilizes two types of storage: RocksDB and PostgreSQL. RocksDB serves as the primary storage for all processed data, while PostgreSQL functions as an index storage solution for complex API queries, such as searchAsset.

Below is a description of the data stored in each database.

RocksDB

AssetStaticDetails

Stores static information about assets, such as immutable properties.

Key

  • asset pubkey

Fields

  • pubkey
  • specification_asset_class
  • royalty_target_type
  • created_at
  • edition_address

AssetDynamicDetails

Holds dynamic details of assets.

Key

  • asset pubkey

Fields

  • pubkey
  • is_compressible
  • is_compressed
  • is_frozen
  • supply
  • seq
  • is_burnt
  • was_decompressed
  • onchain_data
  • creators
  • royalty_amount
  • url
  • chain_mutability
  • lamports
  • executable
  • metadata_owner
  • raw_name
  • mpl_core_plugins
  • mpl_core_unknown_plugins
  • rent_epoch
  • num_minted
  • current_size
  • plugins_json_version
  • mpl_core_external_plugins
  • mpl_core_unknown_external_plugins

MetadataMintMap

Stores a mapping between metadata and the mint accounts to not calculate metadata key each time.

Key

  • metadata pubkey

Fields

  • pubkey
  • mint_key

AssetAuthority

Stores data related to the authority or control over the asset.

Key

  • asset pubkey

Fields

  • pubkey
  • authority
  • slot_updated
  • write_version

AssetOwner

Contains data about the current owner of each asset.

Key

  • asset pubkey

Fields

  • pubkey
  • owner
  • delegate
  • owner_type
  • owner_delegate_seq

AssetLeaf

Stores leaf data related to assets as part of a Merkle tree structure.

Key

  • asset pubkey

Fields

  • pubkey
  • tree_id
  • leaf
  • nonce
  • data_hash
  • creator_hash
  • leaf_seq
  • slot_updated

AssetCollection

Contains collection-level data for assets that belong to specific collections.

Key

  • asset pubkey

Fields

  • pubkey
  • collection
  • is_collection_verified
  • authority

OffChainData

Stores off-chain data associated with the assets, such as metadata(JSON file).

Key

  • url

Fields

  • url
  • metadata

ClItem

Holds CLItem data emited by Account compression program during instruction execution.

Key

  • Merkle tree node id + tree pubkey

Fields

  • cli_node_idx
  • cli_tree_key
  • cli_leaf_idx
  • cli_seq
  • cli_level
  • cli_hash
  • slot_updated

ClLeaf

Stores leaf nodes from the Merkle tree.

Key

  • Merkle tree node id + tree pubkey

Fields

  • cli_leaf_idx
  • cli_tree_key
  • cli_node_idx

BubblegumSlots

Stores slots numbers with transactions related to Bubblegum program.

Key

  • slot number

IngestableSlots

Stores slots numbers that require to be processed. This column family is populated by backfiller if it works either in IngestPersisted or PersistAndIngest mode.

Key

  • slot number

ForceReingestableSlots

Stores slots numbers which we have to re-parse because of gap in data. This column family is used only if sequence_consistent_checker is active. If it found a gap in tree sequence it writes slots which has to be re-parse to this column. And then slot_force_persister is iterating over these slots, download blocks and parse them.

Key

  • slot number

RawBlock

Stores raw block data in CBOR format.

Key

  • block number

Fields

  • data

AssetsUpdateIdx

Stores the index of updated assets. This column is used by synchronizer to keep RocksDB and PostgreSQL in sync.

Key

  • sequence + slot + pubkey

SlotAssetIdx

Maps slots to asset updates.

Key

  • slot + pubkey

TreeSeqIdx

Stores the sequence index for Merkle trees. Every sequence update is saved here.

Key

  • tree pubkey + sequence

Fields

  • slot

TreesGaps

Stores pubkeys of trees which has gaps in sequences.

Key

  • tree pubkey

TokenMetadataEdition

Stores either Edition or MasterEdition information.

Key

  • asset pubkey

Fields

Edition

  • key
  • parent
  • edition
  • write_version

MasterEdition

  • key
  • supply
  • max_supply
  • write_version

TokenAccount

Contains data about token accounts

Key

  • asset pubkey

Fields

  • pubkey
  • mint
  • delegate
  • owner
  • frozen
  • delegated_amount
  • slot_updated
  • amount
  • write_version

TokenAccountOwnerIdx

Stores bool flag is wallet's token balance is zero or not.

Key

  • owner wallet + token account pubkey

Fields

  • is_zero_balance
  • write_version

TokenAccountMintOwnerIdx

Stores bool flag is wallet's token balance is zero or not. But compared to TokenAccountOwnerIdx data sorted by mint as well.

Key

  • mint + owner + token account

Fields

  • is_zero_balance
  • write_version

AssetSignature

Stores compressed assets signatures.

Key

  • key + leaf id + sequence

Fields

  • transaction signature
  • instruction name
  • slot

BatchMintToVerify

A queue for batch mint operations to process.

Key

  • file hash

Fields

  • file_hash
  • url
  • created_at_slot
  • signature
  • download_attempts
  • persisting_state
  • staker
  • collection_mint

FailedBatchMint

Stores batch mints which did not pass verification.

Key

  • status + file hash

Fields

  • status
  • file_hash
  • url
  • created_at_slot
  • signature
  • download_attempts
  • staker

BatchMintWithStaker

Stores downloaded batch mint information.

Key

  • file hash

Fields

  • batch_mint
    • tree_id
    • batch_mints
    • raw_metadata_map
    • max_depth
    • max_buffer_size
  • staker

MigrationVersions

Stores RocksDB migration version.

Key

  • version number

TokenPrice

Stores token prices in USD by its symbol.

Key

  • token symbol

Fields

  • price

AssetPreviews

Represents information about asset preview stored on Storage service.

Key

  • asset's url hash

Fields

  • size
  • failed

UrlToDownload

Rocks DB column family that is used as a queue for asset URLs, to be sent to Storage service, where they are downloaded and saved as previews.

Key

  • url

Fields

  • timestamp
  • download_attempts

ScheduledJob

Represents information about background job that can be one time job, or a scheduled job that is launched recurrently with a given interval.

Key

  • job id

Fields

  • job_id
  • run_interval_sec
  • last_run_epoch_time
  • last_run_status
  • state

Inscription

Stores information about token inscriptions.

Key

  • asset pubkey

Fields

  • authority
  • root
  • content_type
  • encoding
  • inscription_data_account
  • order
  • size
  • validation_hash
  • write_version

InscriptionData

Stores raw inscription data.

Key

  • asset pubkey

Fields

  • pubkey
  • data
  • write_version

LeafSignature

This column family contains sequence updates for each leaf in the tree.

Key

  • set of Signature+TreeId+leafId

Fields

  • data: hash map with slots and sequences

PostgreSQL

Below you can find short description of tables Aura node has in PorsgreSQL.

Asset creators

Stores asset creators.

  • pubkey
  • creator
  • verified
  • slot_updated

Asset authorities

Stores asset authorities.

  • pubkey
  • authority
  • slot_updated

Asset

Stores all the asset info.

  • pubkey
  • specification_version
  • specification_asset_class
  • royalty_target_type
  • royalty_amount
  • slot_created
  • owner
  • owner_type
  • delegate
  • collection
  • is_collection_verified
  • is_burnt
  • is_compressible
  • is_compressed
  • is_frozen
  • supply
  • metadata_url_id
  • slot_updated
  • authority_fk

Batch mints

Batch mints queue.

  • file_name
  • state
  • error
  • url
  • tx_reward
  • created_a

Last synced key

Stores last synced asset. Used by synchronizer for data bases synchronization.

  • id
  • last_synced_asset_update_key

Tasks

Stores tasks for json downloader to process NFTs metadata.

  • metadata_url
  • status
  • locked_until
  • attempts
  • max_attempts
  • error
  • id