pyspark.pandas.DataFrame.hist#

DataFrame.hist(bins=10, **kwds)[source]#

Draw one histogram of the DataFrame’s columns. A histogram is a representation of the distribution of data. This function calls plotting.backend.plot(), on each series in the DataFrame, resulting in one histogram per column.

Parameters

binsinteger or sequence, default 10: Number of histogram bins to be used. If an integer is given, bins + 1 bin edges are calculated and returned. If bins is a sequence, it gives bin edges, including left edge of first bin and right edge of last bin. In this case, bins are returned unmodified.
**kwds: All other plotting keyword arguments to be passed to plotting backend.

Returns

plotly.graph_objs.Figure: Return an custom object when backend!=plotly. Return an ndarray when subplots=True (matplotlib-only).

Examples

Basic plot.

For Series:

>>> s = ps.Series([1, 3, 2])
>>> s.plot.hist()  

For DataFrame:

>>> df = pd.DataFrame(
...     np.random.randint(1, 7, 6000),
...     columns=['one'])
>>> df['two'] = df['one'] + np.random.randint(1, 7, 6000)
>>> df = ps.from_pandas(df)
>>> df.plot.hist(bins=12, alpha=0.5)