SparkContext.
runJob
Executes the given partitionFunc on the specified set of partitions, returning the result as an array of elements.
If ‘partitions’ is not specified, this will run over all partitions.
New in version 1.1.0.
RDD
target RDD to run tasks on
a function to run on each partition of the RDD
set of partitions to run on; some jobs may not want to compute on all partitions of the target RDD, e.g. for operations like first
this parameter takes no effect
results of specified partitions
See also
SparkContext.cancelAllJobs()
Examples
>>> myRDD = sc.parallelize(range(6), 3) >>> sc.runJob(myRDD, lambda part: [x * x for x in part]) [0, 1, 4, 9, 16, 25]
>>> myRDD = sc.parallelize(range(6), 3) >>> sc.runJob(myRDD, lambda part: [x * x for x in part], [0, 2], True) [0, 1, 16, 25]