pyspark.pandas.DataFrame.hist#

DataFrame.hist(bins=10, **kwds)[source]#

Draw one histogram of the DataFrame’s columns. A histogram is a representation of the distribution of data. This function calls plotting.backend.plot(), on each series in the DataFrame, resulting in one histogram per column.

Parameters
binsinteger or sequence, default 10

Number of histogram bins to be used. If an integer is given, bins + 1 bin edges are calculated and returned. If bins is a sequence, it gives bin edges, including left edge of first bin and right edge of last bin. In this case, bins are returned unmodified.

**kwds

All other plotting keyword arguments to be passed to plotting backend.

Returns
plotly.graph_objs.Figure

Return an custom object when backend!=plotly. Return an ndarray when subplots=True (matplotlib-only).

Examples

Basic plot.

For Series:

>>> s = ps.Series([1, 3, 2])
>>> s.plot.hist()  

For DataFrame:

>>> df = pd.DataFrame(
...     np.random.randint(1, 7, 6000),
...     columns=['one'])
>>> df['two'] = df['one'] + np.random.randint(1, 7, 6000)
>>> df = ps.from_pandas(df)
>>> df.plot.hist(bins=12, alpha=0.5)