public class GroupedDataset<K,V>
extends Object
implements scala.Serializable
Dataset has been logically grouped by a user specified grouping key. Users should not
construct a GroupedDataset directly, but should instead call groupBy on an existing
Dataset.
COMPATIBILITY NOTE: Long term we plan to make GroupedDataset) extend GroupedData. However,
making this change to the class hierarchy would break some function signatures. As such, this
class should be considered a preview of the final API. Changes will be made to the interface
after Spark 1.6.
| Modifier and Type | Method and Description |
|---|---|
<U1> Dataset<scala.Tuple2<K,U1>> |
agg(TypedColumn<V,U1> col1)
Computes the given aggregation, returning a
Dataset of tuples for each unique key
and the result of computing this aggregation over all elements in the group. |
<U1,U2> Dataset<scala.Tuple3<K,U1,U2>> |
agg(TypedColumn<V,U1> col1,
TypedColumn<V,U2> col2)
Computes the given aggregations, returning a
Dataset of tuples for each unique key
and the result of computing these aggregations over all elements in the group. |
<U1,U2,U3> Dataset<scala.Tuple4<K,U1,U2,U3>> |
agg(TypedColumn<V,U1> col1,
TypedColumn<V,U2> col2,
TypedColumn<V,U3> col3)
Computes the given aggregations, returning a
Dataset of tuples for each unique key
and the result of computing these aggregations over all elements in the group. |
<U1,U2,U3,U4> |
agg(TypedColumn<V,U1> col1,
TypedColumn<V,U2> col2,
TypedColumn<V,U3> col3,
TypedColumn<V,U4> col4)
Computes the given aggregations, returning a
Dataset of tuples for each unique key
and the result of computing these aggregations over all elements in the group. |
<U,R> Dataset<R> |
cogroup(GroupedDataset<K,U> other,
CoGroupFunction<K,V,U,R> f,
Encoder<R> encoder)
Applies the given function to each cogrouped data.
|
<U,R> Dataset<R> |
cogroup(GroupedDataset<K,U> other,
scala.Function3<K,scala.collection.Iterator<V>,scala.collection.Iterator<U>,scala.collection.TraversableOnce<R>> f,
Encoder<R> evidence$4)
Applies the given function to each cogrouped data.
|
Dataset<scala.Tuple2<K,Object>> |
count()
Returns a
Dataset that contains a tuple with each key and the number of items present
for that key. |
<U> Dataset<U> |
flatMapGroups(FlatMapGroupsFunction<K,V,U> f,
Encoder<U> encoder)
Applies the given function to each group of data.
|
<U> Dataset<U> |
flatMapGroups(scala.Function2<K,scala.collection.Iterator<V>,scala.collection.TraversableOnce<U>> f,
Encoder<U> evidence$2)
Applies the given function to each group of data.
|
<L> GroupedDataset<L,V> |
keyAs(Encoder<L> evidence$1)
Returns a new
GroupedDataset where the type of the key has been mapped to the specified
type. |
Dataset<K> |
keys()
Returns a
Dataset that contains each unique key. |
<U> Dataset<U> |
mapGroups(scala.Function2<K,scala.collection.Iterator<V>,U> f,
Encoder<U> evidence$3)
Applies the given function to each group of data.
|
<U> Dataset<U> |
mapGroups(MapGroupsFunction<K,V,U> f,
Encoder<U> encoder)
Applies the given function to each group of data.
|
org.apache.spark.sql.execution.QueryExecution |
queryExecution() |
Dataset<scala.Tuple2<K,V>> |
reduce(scala.Function2<V,V,V> f)
Reduces the elements of each group of data using the specified binary function.
|
Dataset<scala.Tuple2<K,V>> |
reduce(ReduceFunction<V> f)
Reduces the elements of each group of data using the specified binary function.
|
public org.apache.spark.sql.execution.QueryExecution queryExecution()
public <L> GroupedDataset<L,V> keyAs(Encoder<L> evidence$1)
GroupedDataset where the type of the key has been mapped to the specified
type. The mapping of key columns to the type follows the same rules as as on Dataset.
evidence$1 - (undocumented)public Dataset<K> keys()
Dataset that contains each unique key.
public <U> Dataset<U> flatMapGroups(scala.Function2<K,scala.collection.Iterator<V>,scala.collection.TraversableOnce<U>> f, Encoder<U> evidence$2)
Dataset.
This function does not support partial aggregation, and as a result requires shuffling all
the data in the Dataset. If an application intends to perform an aggregation over each
key, it is best to use the reduce function or an
Aggregator.
Internally, the implementation will spill to disk if any given group is too large to fit into
memory. However, users must take care to avoid materializing the whole iterator for a group
(for example, by calling toList) unless they are sure that this is possible given the memory
constraints of their cluster.
f - (undocumented)evidence$2 - (undocumented)public <U> Dataset<U> flatMapGroups(FlatMapGroupsFunction<K,V,U> f, Encoder<U> encoder)
Dataset.
This function does not support partial aggregation, and as a result requires shuffling all
the data in the Dataset. If an application intends to perform an aggregation over each
key, it is best to use the reduce function or an
Aggregator.
Internally, the implementation will spill to disk if any given group is too large to fit into
memory. However, users must take care to avoid materializing the whole iterator for a group
(for example, by calling toList) unless they are sure that this is possible given the memory
constraints of their cluster.
f - (undocumented)encoder - (undocumented)public <U> Dataset<U> mapGroups(scala.Function2<K,scala.collection.Iterator<V>,U> f, Encoder<U> evidence$3)
Dataset.
This function does not support partial aggregation, and as a result requires shuffling all
the data in the Dataset. If an application intends to perform an aggregation over each
key, it is best to use the reduce function or an
Aggregator.
Internally, the implementation will spill to disk if any given group is too large to fit into
memory. However, users must take care to avoid materializing the whole iterator for a group
(for example, by calling toList) unless they are sure that this is possible given the memory
constraints of their cluster.
f - (undocumented)evidence$3 - (undocumented)public <U> Dataset<U> mapGroups(MapGroupsFunction<K,V,U> f, Encoder<U> encoder)
Dataset.
This function does not support partial aggregation, and as a result requires shuffling all
the data in the Dataset. If an application intends to perform an aggregation over each
key, it is best to use the reduce function or an
Aggregator.
Internally, the implementation will spill to disk if any given group is too large to fit into
memory. However, users must take care to avoid materializing the whole iterator for a group
(for example, by calling toList) unless they are sure that this is possible given the memory
constraints of their cluster.
f - (undocumented)encoder - (undocumented)public Dataset<scala.Tuple2<K,V>> reduce(scala.Function2<V,V,V> f)
f - (undocumented)public Dataset<scala.Tuple2<K,V>> reduce(ReduceFunction<V> f)
f - (undocumented)public <U1> Dataset<scala.Tuple2<K,U1>> agg(TypedColumn<V,U1> col1)
Dataset of tuples for each unique key
and the result of computing this aggregation over all elements in the group.
col1 - (undocumented)public <U1,U2> Dataset<scala.Tuple3<K,U1,U2>> agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2)
Dataset of tuples for each unique key
and the result of computing these aggregations over all elements in the group.
col1 - (undocumented)col2 - (undocumented)public <U1,U2,U3> Dataset<scala.Tuple4<K,U1,U2,U3>> agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3)
Dataset of tuples for each unique key
and the result of computing these aggregations over all elements in the group.
col1 - (undocumented)col2 - (undocumented)col3 - (undocumented)public <U1,U2,U3,U4> Dataset<scala.Tuple5<K,U1,U2,U3,U4>> agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4)
Dataset of tuples for each unique key
and the result of computing these aggregations over all elements in the group.
col1 - (undocumented)col2 - (undocumented)col3 - (undocumented)col4 - (undocumented)public Dataset<scala.Tuple2<K,Object>> count()
Dataset that contains a tuple with each key and the number of items present
for that key.
public <U,R> Dataset<R> cogroup(GroupedDataset<K,U> other, scala.Function3<K,scala.collection.Iterator<V>,scala.collection.Iterator<U>,scala.collection.TraversableOnce<R>> f, Encoder<R> evidence$4)
Dataset this and other. The function can return an iterator containing elements of an
arbitrary type which will be returned as a new Dataset.
other - (undocumented)f - (undocumented)evidence$4 - (undocumented)public <U,R> Dataset<R> cogroup(GroupedDataset<K,U> other, CoGroupFunction<K,V,U,R> f, Encoder<R> encoder)
Dataset this and other. The function can return an iterator containing elements of an
arbitrary type which will be returned as a new Dataset.
other - (undocumented)f - (undocumented)encoder - (undocumented)