Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logical type from avro input is broken. #78

Open
ryota717 opened this issue Jan 28, 2022 · 0 comments
Open

logical type from avro input is broken. #78

ryota717 opened this issue Jan 28, 2022 · 0 comments

Comments

@ryota717
Copy link

ryota717 commented Jan 28, 2022

hi, i'm using columnify with avro input record. and found that records of logical types(around datetime: date, timemillis, timemicros, timestampmillis, timestampmicros) are broken.

for example, the sample data gets result below.

# jsonl input(OK)
$ ./columnify -schemaType avro -schemaFile columnifier/testdata/schema/logicals.avsc -recordType jsonl columnifier/testdata/record/logicals.jsonl > jsonl.parquet
$ parquet-tools cat -json jsonl.parquet
{"date":1,"timemillis":1000,"timemicros":1000000,"timestampmillis":1000,"timestampmicros":1000000}
{"date":2,"timemillis":2000,"timemicros":2000000,"timestampmillis":2000,"timestampmicros":2000000}
{"date":3,"timemillis":3000,"timemicros":3000000,"timestampmillis":3000,"timestampmicros":3000000}
{"date":4,"timemillis":4000,"timemicros":4000000,"timestampmillis":4000,"timestampmicros":4000000}
{"date":5,"timemillis":5000,"timemicros":5000000,"timestampmillis":5000,"timestampmicros":5000000}
{"date":6,"timemillis":6000,"timemicros":6000000,"timestampmillis":6000,"timestampmicros":6000000}
{"date":7,"timemillis":7000,"timemicros":7000000,"timestampmillis":7000,"timestampmicros":7000000}
{"date":8,"timemillis":8000,"timemicros":8000000,"timestampmillis":8000,"timestampmicros":8000000}
{"date":9,"timemillis":9000,"timemicros":9000000,"timestampmillis":9000,"timestampmicros":9000000}
{"date":10,"timemillis":10000,"timemicros":10000000,"timestampmillis":10000,"timestampmicros":10000000}

# avro input(NG)
$ ./columnify -schemaType avro -schemaFile columnifier/testdata/schema/logicals.avsc -recordType avro columnifier/testdata/record/logicals.avro > avro.parquet
$ parquet-tools cat -json avro.parquet
{"date":1970,"timemillis":1000000000,"timemicros":1000000000,"timestampmillis":1970,"timestampmicros":1970}
{"date":1970,"timemillis":2000000000,"timemicros":2000000000,"timestampmillis":1970,"timestampmicros":1970}
{"date":1970,"timemillis":0,"timemicros":3000000000,"timestampmillis":1970,"timestampmicros":1970}
{"date":1970,"timemillis":0,"timemicros":4000000000,"timestampmillis":1970,"timestampmicros":1970}
{"date":1970,"timemillis":0,"timemicros":5000000000,"timestampmillis":1970,"timestampmicros":1970}
{"date":1970,"timemillis":0,"timemicros":6000000000,"timestampmillis":1970,"timestampmicros":1970}
{"date":1970,"timemillis":0,"timemicros":7000000000,"timestampmillis":1970,"timestampmicros":1970}
{"date":1970,"timemillis":0,"timemicros":8000000000,"timestampmillis":1970,"timestampmicros":1970}
{"date":1970,"timemillis":0,"timemicros":9000000000,"timestampmillis":1970,"timestampmicros":1970}
{"date":1970,"timemillis":0,"timemicros":10000000000,"timestampmillis":1970,"timestampmicros":1970}

this behavior seems to come from goavro that format logical types to go native types(using time).
though i dont have good idea to reformat go native types to parquet primitive types before writing :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant