pyspark.sql.functions.tuple_union_agg_double#
- pyspark.sql.functions.tuple_union_agg_double(col, lgNomEntries=None, mode=None)[source]#
Aggregate function: returns the compact binary representation of the Datasketches TupleSketch that is the union of the double TupleSketch objects in the input column.
New in version 4.2.0.
- Parameters
- Returns
ColumnThe binary representation of the merged TupleSketch.
Examples
>>> from pyspark.sql import functions as sf >>> df1 = spark.createDataFrame([(1, 10.0), (2, 20.0)], ["key", "value"]) >>> df1 = df1.agg(sf.tuple_sketch_agg_double("key", "value").alias("sketch")) >>> df2 = spark.createDataFrame([(3, 30.0), (4, 40.0)], ["key", "value"]) >>> df2 = df2.agg(sf.tuple_sketch_agg_double("key", "value").alias("sketch")) >>> df3 = df1.union(df2) >>> df3.agg(sf.tuple_sketch_estimate_double(sf.tuple_union_agg_double("sketch"))).show() +---------------------------------------------------------------------+ |tuple_sketch_estimate_double(tuple_union_agg_double(sketch, 12, sum))| +---------------------------------------------------------------------+ | 4.0| +---------------------------------------------------------------------+