-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hive fails to insert optional struct into avro backed table. #20
Comments
@IllyaYalovyy I'm playing with this to see what's up. Stay tuned. |
As a workaround I replaced optional records with mandatory records that are populated with NULLs. This approach leads to higher space consumption and performance degradation, but it works. |
Hey... I am running into this (or a similar) issue as well... and have posted an issue on the Cloudera customer support portal. Was there any resolution to this?? Thanks!!! |
Sure. Send me more details on your problem, and I hope I can help you. |
Thanks... I have tried to create an example as close as this initial issue as possible. My schema looks like this: { I create two tables (mytable1 and mytable2) using this schema as follows.... CREATE external TABLE mytable1 Describing one of the tables shows.... a string from deserializer I assume here that since hive generally allows nullable, and my avro schema is a union with null, that it more-or-less ignores the union from a hive perspective. I am given a file using the avro schema provided (I can also create this file from a java app). Then I run a 'insert overwrite table mytable2 select * from mytable1; 2012-08-13 10:11:56,223 WARN org.apache.hadoop.mapred.Child: Error running child I can write a MR job that does this without error. Like you found out, if I change the subrecord to not be a union, but just a record, it works fine with haivvreo.. that may be an option longer term.. but not for now.. I am going to download the haivvreo code tonight and see if I can figure this out (more for my own edification that anything). I was looking at com.linkedin.haivvreo.AvroSerializer (specifically, the serialize() method). I noticed that it uses the TypeInfo instance to determine the hive type, and it resolves to a "struct". The Avro schema type, however, is a union for this field. So, I thought I would redirect it to the serializeUnion() method if the Schema is a type union. Not sure if I am on the right path... but thought it would be fun to try. I appreciate any thoughts you might have. Thanks!!!!! |
Hi, i am facing the same problems with unions and records.
CREATE TABLE my_table
ROW FORMAT SERDE
'com.linkedin.haivvreo.AvroSerDe'
WITH SERDEPROPERTIES (
'schema.literal'='{
"type": "record",
"name": "record_test",
"namespace": "com.test.haivvreo.record",
"fields": [
{
"name": "test_record_1",
"type": [{
"type": "record",
"name": "record_1",
"fields": [
{
"name": "double_value",
"type": "double"
},
{
"name": "string_value",
"type": "string"
},
{
"name": "array_values",
"type": {
"type": "array",
"items": "string"
}
}
]
}, "null" ]
},
{
"name": "test_record_2",
"type": [{
"type": "record",
"name": "record_2",
"fields": [
{
"name": "double_value",
"type": "double"
},
{
"name": "string_value",
"type": "string"
},
{
"name": "array_values",
"type": {
"type": "array",
"items": "string"
}
}
]
}]
}
]
}'
)
STORED AS INPUTFORMAT
'com.linkedin.haivvreo.AvroContainerInputFormat'
OUTPUTFORMAT
'com.linkedin.haivvreo.AvroContainerOutputFormat';
As you can see if I have defined test_record_1 as an union type in the schema as it should be [record, null] and If I am inserting data into the test_record_1 column I am getting the error "AvroRuntimeException: Not a record:..." as @IllyaYalovyy mentioned above. I am receiving following error in the mapping phase if I am trying to insert data into test_record_2:
What's wrong here? |
I am not sure this is the same issue I ran into... but the one was having I believe this to be a bug in the haivvreo/avro code. When I was trying to The actual AvroRuntimeException occurs on line 227 of So, I started reviewing the code that occurs before this, and ran into a if (schema.getType() == Schema.Type.UNION) { ... and I added the method getBaseType(Schema), which I copied from some private Schema getBaseType(Schema schema) { This all worked for me. I contacted Cloudera and explained the issue to On Mon, Sep 17, 2012 at 5:21 AM, mischuh [email protected] wrote:
Doug Houck "I swear by my life and my love of it that I will never live for the sake |
FYI, this was fixed in the integrated-into-hive Avro support in HIVE-3528. |
Thanks! It was really helpful! Unfortunately we stuck with CDH4 (Hive 0.9) |
@busbey Any chance this got merged back to CDH's fork of Haivvreo? I'd be happy to merge it here. |
@jghoman I don't think it's been merged into any part of CDH yet. I'll likely merge it into CDH4's Hive, but that'll still be the built in Avro SerDe and not the Haivvreo support from CDH3. @IllyaYalovyy are you likely to follow further CDH4 releases, or is there some particular reason you're staying with Haivvreo on CDH4 instead of the builtin SerDe? I believe CDH4 with Hive 0.9 means you're atleast on CDH 4.1.0 and the built in version has been available since then -- see the "New Feature" section. |
Environment:
CDH3U3, avro-1.5.4 and avro-mapred-1.5.4, haivvreo-1.0.7
Avro schema:
Query:
Exception:
The text was updated successfully, but these errors were encountered: