Bug with the YARN streaming interface #82

bcornec · 2014-02-28T18:59:31Z

Using version 2.1.6 of the glusterfs-hadoop plugin in an hadoop 2.x and glusterfs 3.4 environment, we have errors with the YARN streaming interface.

SW used:
glusterfs-libs-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.59rhs-1.el6rhs.x86_64

RHEL 6.4 with kernel 2.6.32-358.32.3.el6.x86_64
glusterfs-hadoop-2.1.6.jar

Run results:

-bash-4.1$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -mapper /bin/cat -input /ls-gfs.txt -output /process/
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=7b04317, git.commit.user.email=[email protected],
git.commit.message.full=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers

include the sudoers file in the srpm, git.commit.id=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.message.short=Merge pull request #80
from jayunit100/2.1.6_release_fix_sudoers, git.commit.user.name=jay vyas, git.build.user.name=Unknown, git.commit.id.describe=2.1.6,
git.build.user.email=Unknown, git.branch=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.time=07.02.2014 @ 12:06:31 EST,
git.build.time=10.02.2014 @ 13:31:20 EST}
14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.6
14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Write buffer size : 131072
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.4-Intel.jar] /tmp/streamjob2645998574693064427.jar tmpDir=null
14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/28 15:05:10 INFO mapred.FileInputFormat: Total input paths to process : 1
14/02/28 15:05:10 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/yarn/.staging/job_1393593232248_0003
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.mapred.FileInputFormat.getSplitHosts(FileInputFormat.java:508)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:298)
at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:503)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:495)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:390)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1234)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1231)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1231)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:589)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:584)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:584)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:575)
at org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:1014)
at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:134)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

wattsteve · 2014-03-25T13:27:50Z

Your hadoop processes should be started under user yarn not root. Since this issue is with the Intel Hadoop Distribution you need to follow specific instructions to get it to properly run. The instructions are available here - https://access.redhat.com/site/articles/730763/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug with the YARN streaming interface #82

Bug with the YARN streaming interface #82

bcornec commented Feb 28, 2014

wattsteve commented Mar 25, 2014

Bug with the YARN streaming interface #82

Bug with the YARN streaming interface #82

Comments

bcornec commented Feb 28, 2014

Run results:

wattsteve commented Mar 25, 2014