Clean up DAGScheduler datastructures after job completes #414

jhartlaub · 2013-01-25T05:41:33Z

I added some fixes to reduce heap/prevent memory leak we found after running a load test (thousands of jobs). There may be more leaks, but this fixes a big one.

mateiz · 2013-01-25T06:57:15Z

Good catch; the only thing I'm not 100% sure of is whether this will work when two active jobs depend on the same stage. (This can only happen if two threads submit a job at the same time, but that may occur in a multi-user system or in Spark Streaming.) I'll look into it more carefully, but if you've thought about this, let me know.

jhartlaub · 2013-01-25T07:24:02Z

This will remove all stages submitted under a single job - if another job comes along and depends on the same stages, it will not work. In our cases, we have not resubmitted the same RDD's multiple types (using shark we create new RDD's every time).

I can see now where getShuffleMapStage in wich a lookup matches a stage by an ID from the RDD. I think a per-job reference-count for the stage-related maps might work- seem plausible?

Thank you for looking at this, btw. You seem very busy if the mailing list is any indication.

-Jon

mateiz · 2013-01-26T21:14:53Z

Yeah, I think a reference count would be better. We should also make a test where multiple jobs depend on the same stage. For example:

val pairs = sc.parallelize(...)
val grouped = pairs.groupByKey()
grouped.filter(f1).count()
grouped.filter(f2).count()

Here both the actions should depend on that first shuffle map stage before the group-by.

The other thing I've been thinking about is whether we should just use WeakHashMap for most of the data structures. This way stages will only be kept for RDDs that the user program references somehow. I think this will be a fair amount of work but it might be the right thing in the long term.

AmplabJenkins · 2013-04-04T21:13:14Z

Can one of the admins verify this patch?

AmplabJenkins · 2013-04-10T20:49:19Z

I'm the Jenkins test bot for the UC, Berkeley AMPLab. I've noticed your pull request and will test it once an admin authorizes me to. Thanks for your submission!

AmplabJenkins · 2013-04-18T22:05:45Z

I'm the Jenkins test bot for the UC, Berkeley AMPLab. I've noticed your pull request and will test it once an admin authorizes me to. Thanks for your submission!

velvia · 2013-07-25T16:44:47Z

Bump -- any progress on this one?

mateiz · 2013-07-25T17:53:17Z

I think this will have to be done after Spark 0.8, because of the reference-counting issue above. However, @markhamstra has also been looking at the reference counting for some of his own work, I believe.

AmplabJenkins · 2013-08-05T21:34:00Z

Thank you for your pull request. An admin will review this request soon.

Clean up DAGScheduler datastructurs when a job has completed.

a6981d8

markhamstra mentioned this pull request Apr 19, 2013

add JobLogger to Spark #573

Closed

markhamstra mentioned this pull request Jul 19, 2013

Support Cancellation of Spark Jobs #665

Closed

markhamstra mentioned this pull request Mar 30, 2014

SPARK-1202 - Add a "cancel" button in the UI for stages apache/spark#246

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up DAGScheduler datastructures after job completes #414

Clean up DAGScheduler datastructures after job completes #414

jhartlaub commented Jan 25, 2013

mateiz commented Jan 25, 2013

jhartlaub commented Jan 25, 2013

mateiz commented Jan 26, 2013

AmplabJenkins commented Apr 4, 2013

AmplabJenkins commented Apr 10, 2013

AmplabJenkins commented Apr 18, 2013

velvia commented Jul 25, 2013

mateiz commented Jul 25, 2013

AmplabJenkins commented Aug 5, 2013

Clean up DAGScheduler datastructures after job completes #414

Are you sure you want to change the base?

Clean up DAGScheduler datastructures after job completes #414

Conversation

jhartlaub commented Jan 25, 2013

mateiz commented Jan 25, 2013

jhartlaub commented Jan 25, 2013

mateiz commented Jan 26, 2013

AmplabJenkins commented Apr 4, 2013

AmplabJenkins commented Apr 10, 2013

AmplabJenkins commented Apr 18, 2013

velvia commented Jul 25, 2013

mateiz commented Jul 25, 2013

AmplabJenkins commented Aug 5, 2013