pyspark.sql.functions.kll_merge_agg_bigint#

pyspark.sql.functions.kll_merge_agg_bigint(col, k=None)[source]#

Aggregate function: merges binary KllLongsSketch representations and returns the merged sketch. The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch.

New in version 4.1.0.

Parameters
colColumn or column name

The column containing binary KllLongsSketch representations

kColumn or int, optional

The k parameter that controls size and accuracy (range 8-65535)

Returns
Column

The merged binary representation of the KllLongsSketch.

Examples

>>> from pyspark.sql import functions as sf
>>> df1 = spark.createDataFrame([1,2,3], "INT")
>>> df2 = spark.createDataFrame([4,5,6], "INT")
>>> sketch1 = df1.agg(sf.kll_sketch_agg_bigint("value").alias("sketch"))
>>> sketch2 = df2.agg(sf.kll_sketch_agg_bigint("value").alias("sketch"))
>>> merged = sketch1.union(sketch2).agg(sf.kll_merge_agg_bigint("sketch").alias("merged"))
>>> n = merged.select(sf.kll_sketch_get_n_bigint("merged")).first()[0]
>>> n
6