You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've come across similar issues of the options for child JVMs specified in with-job-conf not "sticking". I experienced GC issues in a reducer of one of my Cascalog jobs for the first time last week. I found the with-job-conf macro and wrapped the query execution form with it, to no avail:
(let [snk-qry-by-chan (for [chan channels]
(channel-query chan))
all-snk-qry-seq (apply concat snk-qry-by-chan)]
;; configure the MapReduce child JVM options to avoid GC Overhead Limit err
(with-job-conf {"mapred.child.java.opts""-XX:-UseGCOverheadLimit -Xmx4g"}
;; execute all of the queries in parallel
(apply ?- all-snk-qry-seq)))
But from the logging output from the reducer in question, regardless of what I specified in with-job-conf, I always saw this:
2013-07-12 17:25:55,216 INFO cascading.flow.hadoop.FlowMapper: child jvm opts: -Xmx1073741824
Further details:
We're running a Cloudera distribution (v 4.1.4) of Hadoop, and the version of Hadoop is 2.0.0.
I'm running Cascalog in cluster mode (I uberjar the code whenever I deploy).
The exception being thrown from the JVM is a GC Overhead Limit exceeded (as opposed to something like OutOfMemoryError).
(new detail as of 7/18/13) I've noticed that the with-job-conf does pass at least some other jobconf settings. The only example I've noticed clearly is, in the with-job-conf map, I had a key of "io.compression.codecs" and the value was a string containing "com.hadoop.compression.lzo.LzopCodec", which does not exist on our installation, and I got an error .
I saw Robin's workaround, which seems to just modify the site-hadoop.xml. It would be great if the with-job-conf settings "stuck" so as not to have to tweak site settings for per-job needs (especially since I don't manage the Hadoop cluster).
The text was updated successfully, but these errors were encountered:
I've noticed (perhaps?) related issues in pure Cascading. Configuration properties supplied to the FlowConnector don't always get passed into the JobConf, the behaviour seems inconsistent and unpredictable. Would be good to have visibility and explicit guaranteed control over the JobConf.
(as from the mailing list: https://groups.google.com/forum/#!topic/cascalog-user/Rq_O33VsDyc )
I've come across similar issues of the options for child JVMs specified in with-job-conf not "sticking". I experienced GC issues in a reducer of one of my Cascalog jobs for the first time last week. I found the with-job-conf macro and wrapped the query execution form with it, to no avail:
The relevant parts of my project.clj
But from the logging output from the reducer in question, regardless of what I specified in with-job-conf, I always saw this:
2013-07-12 17:25:55,216 INFO cascading.flow.hadoop.FlowMapper: child jvm opts: -Xmx1073741824
Further details:
I saw Robin's workaround, which seems to just modify the site-hadoop.xml. It would be great if the with-job-conf settings "stuck" so as not to have to tweak site settings for per-job needs (especially since I don't manage the Hadoop cluster).
The text was updated successfully, but these errors were encountered: