Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with the YARN streaming interface #82

Open
bcornec opened this issue Feb 28, 2014 · 1 comment
Open

Bug with the YARN streaming interface #82

bcornec opened this issue Feb 28, 2014 · 1 comment

Comments

@bcornec
Copy link

bcornec commented Feb 28, 2014

Using version 2.1.6 of the glusterfs-hadoop plugin in an hadoop 2.x and glusterfs 3.4 environment, we have errors with the YARN streaming interface.

SW used:
glusterfs-libs-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-3.4.0.59rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.59rhs-1.el6rhs.x86_64

RHEL 6.4 with kernel 2.6.32-358.32.3.el6.x86_64
glusterfs-hadoop-2.1.6.jar

Run results:

-bash-4.1$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -mapper /bin/cat -input /ls-gfs.txt -output /process/
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: Initializing GlusterFS, CRC disabled.
14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: GIT INFO={git.commit.id.abbrev=7b04317, git.commit.user.email=[email protected],
git.commit.message.full=Merge pull request #80 from jayunit100/2.1.6_release_fix_sudoers

include the sudoers file in the srpm, git.commit.id=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.message.short=Merge pull request #80
from jayunit100/2.1.6_release_fix_sudoers, git.commit.user.name=jay vyas, git.build.user.name=Unknown, git.commit.id.describe=2.1.6,
git.build.user.email=Unknown, git.branch=7b04317ff5c13af8de192626fb40c4a0a5c37000, git.commit.time=07.02.2014 @ 12:06:31 EST,
git.build.time=10.02.2014 @ 13:31:20 EST}
14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: GIT_TAG=2.1.6
14/02/28 15:05:09 INFO glusterfs.GlusterFileSystem: Configuring GlusterFS
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/28 15:05:09 INFO glusterfs.GlusterVolume: Write buffer size : 131072
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.4-Intel.jar] /tmp/streamjob2645998574693064427.jar tmpDir=null
14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
14/02/28 15:05:10 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Initializing gluster volume..
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Root of Gluster file system is /mnt/hpbigdata
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: mapreduce/superuser daemon : root
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Working directory is : glusterfs:/user/yarn
14/02/28 15:05:10 INFO glusterfs.GlusterVolume: Write buffer size : 131072
14/02/28 15:05:10 INFO mapred.FileInputFormat: Total input paths to process : 1
14/02/28 15:05:10 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/yarn/.staging/job_1393593232248_0003
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.mapred.FileInputFormat.getSplitHosts(FileInputFormat.java:508)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:298)
at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:503)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:495)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:390)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1234)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1231)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1231)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:589)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:584)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:584)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:575)
at org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:1014)
at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:134)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

@wattsteve
Copy link

Your hadoop processes should be started under user yarn not root. Since this issue is with the Intel Hadoop Distribution you need to follow specific instructions to get it to properly run. The instructions are available here - https://access.redhat.com/site/articles/730763/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants