Doc: Add missing fields to metadata tables in Spark page #11897

ebyhr · 2025-01-01T08:40:37Z

spark-sql (default)> DESC default.test.manifests;
content             	int
path                	string
length              	bigint
partition_spec_id   	int
added_snapshot_id   	bigint
added_data_files_count	int
existing_data_files_count	int
deleted_data_files_count	int
added_delete_files_count	int
existing_delete_files_count	int
deleted_delete_files_count	int
partition_summaries 	array<struct<contains_null:boolean,contains_nan:boolean,lower_bound:string,upper_bound:string>>
Time taken: 0.107 seconds, Fetched 12 row(s)

spark-sql (default)> DESC default.test.position_deletes;
file_path           	string              	Path of a file in which a deleted row is stored
pos                 	bigint              	Ordinal position of a deleted row in the data file
row                 	struct<a:int,b:int> 	Deleted row values
partition           	struct<b:int>       	Partition that position delete row belongs to
spec_id             	int                 	Spec ID used to track the file containing a row
delete_file_path    	string              	Path of the file in which a row is stored
# Partition Information
# col_name          	data_type           	comment
partition.b         	int
Time taken: 0.031 seconds, Fetched 9 row(s)

spark-sql (default)> DESC default.test.all_data_files;
content             	int                 	Contents of the file: 0=data, 1=position deletes, 2=equality deletes
file_path           	string              	Location URI with FS scheme
file_format         	string              	File format name: avro, orc, or parquet
spec_id             	int                 	Partition spec ID
partition           	struct<b:int>       	Partition data tuple, schema based on the partition spec
record_count        	bigint              	Number of records in the file
file_size_in_bytes  	bigint              	Total file size in bytes
column_sizes        	map<int,bigint>     	Map of column id to total size on disk
value_counts        	map<int,bigint>     	Map of column id to total count, including null and NaN
null_value_counts   	map<int,bigint>     	Map of column id to null value count
nan_value_counts    	map<int,bigint>     	Map of column id to number of NaN values in the column
lower_bounds        	map<int,binary>     	Map of column id to lower bound
upper_bounds        	map<int,binary>     	Map of column id to upper bound
key_metadata        	binary              	Encryption key metadata blob
split_offsets       	array<bigint>       	Splittable offsets
equality_ids        	array<int>          	Equality comparison field IDs
sort_order_id       	int                 	Sort order ID
readable_metrics    	struct<a:struct<column_size:bigint,value_count:bigint,null_value_count:bigint,nan_value_count:bigint,lower_bound:int,upper_bound:int>,b:struct<column_size:bigint,value_count:bigint,null_value_count:bigint,nan_value_count:bigint,lower_bound:int,upper_bound:int>>	Column metrics in readable form
Time taken: 0.029 seconds, Fetched 18 row(s)

spark-sql (default)> DESC default.test.all_delete_files;
content             	int                 	Contents of the file: 0=data, 1=position deletes, 2=equality deletes
file_path           	string              	Location URI with FS scheme
file_format         	string              	File format name: avro, orc, or parquet
spec_id             	int                 	Partition spec ID
partition           	struct<b:int>       	Partition data tuple, schema based on the partition spec
record_count        	bigint              	Number of records in the file
file_size_in_bytes  	bigint              	Total file size in bytes
column_sizes        	map<int,bigint>     	Map of column id to total size on disk
value_counts        	map<int,bigint>     	Map of column id to total count, including null and NaN
null_value_counts   	map<int,bigint>     	Map of column id to null value count
nan_value_counts    	map<int,bigint>     	Map of column id to number of NaN values in the column
lower_bounds        	map<int,binary>     	Map of column id to lower bound
upper_bounds        	map<int,binary>     	Map of column id to upper bound
key_metadata        	binary              	Encryption key metadata blob
split_offsets       	array<bigint>       	Splittable offsets
equality_ids        	array<int>          	Equality comparison field IDs
sort_order_id       	int                 	Sort order ID
readable_metrics    	struct<a:struct<column_size:bigint,value_count:bigint,null_value_count:bigint,nan_value_count:bigint,lower_bound:int,upper_bound:int>,b:struct<column_size:bigint,value_count:bigint,null_value_count:bigint,nan_value_count:bigint,lower_bound:int,upper_bound:int>>	Column metrics in readable form
Time taken: 0.03 seconds, Fetched 18 row(s)

spark-sql (default)> DESC default.test.all_manifests;
content             	int
path                	string
length              	bigint
partition_spec_id   	int
added_snapshot_id   	bigint
added_data_files_count	int
existing_data_files_count	int
deleted_data_files_count	int
added_delete_files_count	int
existing_delete_files_count	int
deleted_delete_files_count	int
partition_summaries 	array<struct<contains_null:boolean,contains_nan:boolean,lower_bound:string,upper_bound:string>>
reference_snapshot_id	bigint
Time taken: 0.029 seconds, Fetched 13 row(s)

Doc: Add missing fields to metadata tables

6c106ee

github-actions bot added the docs label Jan 1, 2025

ebyhr changed the title ~~Doc: Add missing fields to metadata tables~~ Doc: Add missing fields to metadata tables in Spark page Jan 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc: Add missing fields to metadata tables in Spark page #11897

Doc: Add missing fields to metadata tables in Spark page #11897

ebyhr commented Jan 1, 2025

Doc: Add missing fields to metadata tables in Spark page #11897

Are you sure you want to change the base?

Doc: Add missing fields to metadata tables in Spark page #11897

Conversation

ebyhr commented Jan 1, 2025