Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with loading Data to Avro Backed table #26

Open
ImMilind opened this issue Nov 7, 2012 · 2 comments
Open

Problems with loading Data to Avro Backed table #26

ImMilind opened this issue Nov 7, 2012 · 2 comments

Comments

@ImMilind
Copy link

ImMilind commented Nov 7, 2012

ISSUE 1 :

  1. I have data serialized to a file using JAVA and Avro APIs.
  2. Created a partitioned table using same schema and Hive using Haivvreo
  3. Copied file from 1 to HDFS
  4. Registered the partition with the table
  5. Tried loading data to table using

hive> use serdetestdb; load data inpath '/user/immilind/Employee3.ser' into table employee_table partition (schema_def='Employee3',gen_time='2012110684533',arr_time='20121106090422');
OK
Time taken: 0.763 seconds
Loading data to table serdetestdb.employee_table partition (schema_def=Employee3, gen_time=2012110684533, arr_time=20121106090422)
OK
Time taken: 1.31 seconds

But Select query does not find any data.

hive> use serdetestdb; select * from employee_table;
OK
Time taken: 0.016 seconds
OK
Time taken: 0.444 seconds

The file Employee3.ser is copied to registered partition.

What is it that I am missing ?

ISSUE 2:

Moreover I am using pig to load data from table.

register /homes/immilind/haivvreo-1.0.12-avro15-hive81-SNAPSHOT.jar;
eventData = load 'serdetestdb.employee_table' using org.apache.hcatalog.pig.HCatLoader();
actualData = filter eventData by schema_def == 'Employee3' and gen_time=='2012110684533' and arr_time=='20121106090422';
dump actualData;

Though the jar file has com.linkedin.haivvreo.AvroContainerInputFormat is thorws class not found error

@jghoman
Copy link
Owner

jghoman commented Nov 8, 2012

Haivvreo + HCat isn't supported. HCat has some problems. I'm planning on adding support for this via the Avro Serde I moved to Hive, not necessarily through Haivvreo.

@ImMilind
Copy link
Author

ImMilind commented Nov 8, 2012

Well what abt Issue 1 ?

Table Creation Script :

CREATE EXTERNAL TABLE employee

PARTITIONED BY (schema_def string, gen_time string, arr_time string)

ROW FORMAT SERDE 'com.linkedin.haivvreo.AvroSerDe'

WITH SERDEPROPERTIES (
'schema-literal' = '{
"type" : "record",
"name" : "employee3",
"fields":[
{"name" : "name", "type" : "string", "default" : "NU"},
{"name" : "age", "type" : "int", "default" : 0 },
{"name" : "dept", "type": "string", "default" : "DU"}
]
}'
)

STORED AS INPUTFORMAT 'com.linkedin.haivvreo.AvroContainerInputFormat'
OUTPUTFORMAT 'com.linkedin.haivvreo.AvroContainerOutputFormat'

Schema used to serialize data:
{
"type" : "record",
"name" : "employee3",
"fields":[
{"name" : "name", "type" : "string", "default" : "NU"},
{"name" : "age", "type" : "int", "default" : 0 },
{"name" : "dept", "type": "string", "default" : "DU"}
]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants