pyspark.RDD.reduce#
- RDD.reduce(f)[source]#
- Reduces the elements of this RDD using the specified commutative and associative binary operator. Currently reduces partitions locally. - New in version 0.7.0. - Parameters
- ffunction
- the reduce function 
 
- Returns
- T
- the aggregated result 
 
 - Examples - >>> from operator import add >>> sc.parallelize([1, 2, 3, 4, 5]).reduce(add) 15 >>> sc.parallelize((2 for _ in range(10))).map(lambda x: 1).cache().reduce(add) 10 >>> sc.parallelize([]).reduce(add) Traceback (most recent call last): ... ValueError: Can not reduce() empty RDD