Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc: Add missing fields to metadata tables in Spark page #11897

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ebyhr
Copy link
Contributor

@ebyhr ebyhr commented Jan 1, 2025

spark-sql (default)> DESC default.test.manifests;
content             	int
path                	string
length              	bigint
partition_spec_id   	int
added_snapshot_id   	bigint
added_data_files_count	int
existing_data_files_count	int
deleted_data_files_count	int
added_delete_files_count	int
existing_delete_files_count	int
deleted_delete_files_count	int
partition_summaries 	array<struct<contains_null:boolean,contains_nan:boolean,lower_bound:string,upper_bound:string>>
Time taken: 0.107 seconds, Fetched 12 row(s)
spark-sql (default)> DESC default.test.position_deletes;
file_path           	string              	Path of a file in which a deleted row is stored
pos                 	bigint              	Ordinal position of a deleted row in the data file
row                 	struct<a:int,b:int> 	Deleted row values
partition           	struct<b:int>       	Partition that position delete row belongs to
spec_id             	int                 	Spec ID used to track the file containing a row
delete_file_path    	string              	Path of the file in which a row is stored
# Partition Information
# col_name          	data_type           	comment
partition.b         	int
Time taken: 0.031 seconds, Fetched 9 row(s)
spark-sql (default)> DESC default.test.all_data_files;
content             	int                 	Contents of the file: 0=data, 1=position deletes, 2=equality deletes
file_path           	string              	Location URI with FS scheme
file_format         	string              	File format name: avro, orc, or parquet
spec_id             	int                 	Partition spec ID
partition           	struct<b:int>       	Partition data tuple, schema based on the partition spec
record_count        	bigint              	Number of records in the file
file_size_in_bytes  	bigint              	Total file size in bytes
column_sizes        	map<int,bigint>     	Map of column id to total size on disk
value_counts        	map<int,bigint>     	Map of column id to total count, including null and NaN
null_value_counts   	map<int,bigint>     	Map of column id to null value count
nan_value_counts    	map<int,bigint>     	Map of column id to number of NaN values in the column
lower_bounds        	map<int,binary>     	Map of column id to lower bound
upper_bounds        	map<int,binary>     	Map of column id to upper bound
key_metadata        	binary              	Encryption key metadata blob
split_offsets       	array<bigint>       	Splittable offsets
equality_ids        	array<int>          	Equality comparison field IDs
sort_order_id       	int                 	Sort order ID
readable_metrics    	struct<a:struct<column_size:bigint,value_count:bigint,null_value_count:bigint,nan_value_count:bigint,lower_bound:int,upper_bound:int>,b:struct<column_size:bigint,value_count:bigint,null_value_count:bigint,nan_value_count:bigint,lower_bound:int,upper_bound:int>>	Column metrics in readable form
Time taken: 0.029 seconds, Fetched 18 row(s)
spark-sql (default)> DESC default.test.all_delete_files;
content             	int                 	Contents of the file: 0=data, 1=position deletes, 2=equality deletes
file_path           	string              	Location URI with FS scheme
file_format         	string              	File format name: avro, orc, or parquet
spec_id             	int                 	Partition spec ID
partition           	struct<b:int>       	Partition data tuple, schema based on the partition spec
record_count        	bigint              	Number of records in the file
file_size_in_bytes  	bigint              	Total file size in bytes
column_sizes        	map<int,bigint>     	Map of column id to total size on disk
value_counts        	map<int,bigint>     	Map of column id to total count, including null and NaN
null_value_counts   	map<int,bigint>     	Map of column id to null value count
nan_value_counts    	map<int,bigint>     	Map of column id to number of NaN values in the column
lower_bounds        	map<int,binary>     	Map of column id to lower bound
upper_bounds        	map<int,binary>     	Map of column id to upper bound
key_metadata        	binary              	Encryption key metadata blob
split_offsets       	array<bigint>       	Splittable offsets
equality_ids        	array<int>          	Equality comparison field IDs
sort_order_id       	int                 	Sort order ID
readable_metrics    	struct<a:struct<column_size:bigint,value_count:bigint,null_value_count:bigint,nan_value_count:bigint,lower_bound:int,upper_bound:int>,b:struct<column_size:bigint,value_count:bigint,null_value_count:bigint,nan_value_count:bigint,lower_bound:int,upper_bound:int>>	Column metrics in readable form
Time taken: 0.03 seconds, Fetched 18 row(s)
spark-sql (default)> DESC default.test.all_manifests;
content             	int
path                	string
length              	bigint
partition_spec_id   	int
added_snapshot_id   	bigint
added_data_files_count	int
existing_data_files_count	int
deleted_data_files_count	int
added_delete_files_count	int
existing_delete_files_count	int
deleted_delete_files_count	int
partition_summaries 	array<struct<contains_null:boolean,contains_nan:boolean,lower_bound:string,upper_bound:string>>
reference_snapshot_id	bigint
Time taken: 0.029 seconds, Fetched 13 row(s)

@github-actions github-actions bot added the docs label Jan 1, 2025
@ebyhr ebyhr changed the title Doc: Add missing fields to metadata tables Doc: Add missing fields to metadata tables in Spark page Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant