pyspark.sql.DataFrame.sample#
- DataFrame.sample(withReplacement=None, fraction=None, seed=None)[source]#
- Returns a sampled subset of this - DataFrame.- New in version 1.3.0. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- withReplacementbool, optional
- Sample with replacement or not (default - False).
- fractionfloat, optional
- Fraction of rows to generate, range [0.0, 1.0]. 
- seedint, optional
- Seed for sampling (default a random seed). 
 
- Returns
- DataFrame
- Sampled rows from given DataFrame. 
 
 - Notes - This is not guaranteed to provide exactly the fraction specified of the total count of the given - DataFrame.- fraction is required and, withReplacement and seed are optional. - Examples - >>> df = spark.range(10) >>> df.sample(0.5, 3).count() 7 >>> df.sample(fraction=0.5, seed=3).count() 7 >>> df.sample(withReplacement=True, fraction=0.5, seed=3).count() 1 >>> df.sample(1.0).count() 10 >>> df.sample(fraction=1.0).count() 10 >>> df.sample(False, fraction=1.0).count() 10