MetadataHandler

As part of forming a query execution plan that includes a federated data source, Athena needs a way to obtain key metadata from your source. More precisely, Athena needs a way to obtain:

list of schemas (aka databases).
list of tables in a given schema.
Table definitions (e.g. column names, column types).
Partitions that should be queried for a given Schema, Table, and Predicate.
How to split-up/parallelize reads of a partitions.

The Athena Query Federation SDK provides an MetadataHandler as an abstract class that you can extend in order to implement the above functionality via the below functions:

doListSchemas(...) - lists available schemas.
doListTables(...) - lists available tables in a schema.
doGetTable(...) - get the definition of a Table.
doGetTableLayout(...) - provides partition information and optionally performs partition pruning.
doGetSplits(...) - tells Athena how it can split up and parallelize reads of a Partition.

Also provided is a partial implementation of these methods which uses the AWS Glue DataCatalog for metadata. The GlueMetadataHandler can jump start your MetadataHandler if your source lacks its own metadata source. The athena-redis is an example of a connector that uses AWS Glue DataCatalog since Redis lacks a traditional metastore for helping Athena understand how to interpret your Redis keys/prefixes/zsets as Tables and Columns.

Advanced Usage

In most cases you will deploy a MetadataHandler and RecordHandler together in the same Lambda function by using a CompositeHandler. There are however some unique cases where you may want to deploy them independently. This is supported by Athena and most often done for one of the below reasons:

You have a centralized source of meta-data for all your data sources (e.g. a Single Source of Truth) which is in its own VPC.
Your data sources themselves are in separate VPC which do not contain the meta-data source.
Your meta data operations and data reads require different scale or languages in their lambda function.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MetadataHandler

Advanced Usage

Clone this wiki locally