You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using PARTITION BY in SQL generates 'Error: External(NotImplemented("it is not yet supported to write to hive partitions with datatype Float64"))'
#13602
Closed
ajazam opened this issue
Nov 29, 2024
· 4 comments
let file_path = file_path.to_str().unwrap();
let ctx = SessionContext::new();
let csv_df = ctx.read_csv(file_path, CsvReadOptions::default()).await?;
csv_df.show().await?;
let schema = Schema::new(vec![
Field::new("dte", DataType::Timestamp(arrow::datatypes::TimeUnit::Second, None), false),
Field::new("ot", DataType::UInt16, false),
]);
ctx.register_csv("data" ,file_path, CsvReadOptions::new().schema(&schema).has_header(true)).await?;
let df = ctx.sql("copy (SELECT dte, ot, EXTRACT(YEAR FROM dte) AS year from data) to './partitioned_output' stored as parquet PARTITIONED BY (year)").await?;
df.count().await?;
Ok(())
}
cargo.toml
[package]
name = "datafusion_csv"
version = "0.1.0"
edition = "2021"
[dependencies]
tokio = { version = "1", features = ["full"] }
datafusion = "43.0.0"
arrow = "53.3.0"
tempfile = "3.14.0"
Expected behavior
I am expecting a folder year=2016 containing a parquet file
Additional context
I was original trying to have folders for month and day, couldn't get the application to work and then created this simpler example.
The text was updated successfully, but these errors were encountered:
Looks like date_part was updated to return Int32 instead of Float64 in this PR #13466 which should fix this issue. As a workaround you could try casting it like arrow_cast(EXTRACT(..), 'Int64')
Thanks gents I got it working. For anybody else who comes up against this issue I made the following alteration
let df = ctx.sql("copy (SELECT dte, ot, arrow_cast(EXTRACT(YEAR FROM dte), 'Int32') AS year from data) to './partitioned_output' stored as parquet PARTITIONED BY (year)").await?;
Describe the bug
I am trying to create a parquet file with hive partitioning, from csv data and get error
Error: External(NotImplemented("it is not yet supported to write to hive partitions with datatype Float64"))
To Reproduce
main.rs
use std::fs::File;
use std::io::Write;
use arrow::datatypes::{DataType, Field, Schema};
use datafusion::prelude::*;
use tempfile::tempdir;
#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
let dir = tempdir()?;
let file_path = dir.path().join("example.csv");
2016-07-01 00:00:00,2
2016-07-01 06:45:00,3"#
.as_bytes())?;
}
cargo.toml
[package]
name = "datafusion_csv"
version = "0.1.0"
edition = "2021"
[dependencies]
tokio = { version = "1", features = ["full"] }
datafusion = "43.0.0"
arrow = "53.3.0"
tempfile = "3.14.0"
Expected behavior
I am expecting a folder year=2016 containing a parquet file
Additional context
I was original trying to have folders for month and day, couldn't get the application to work and then created this simpler example.
The text was updated successfully, but these errors were encountered: