Package org.apache.datasketches.hll

The hll package contains a high performance implementation of Phillipe Flajolet's HLL sketch with significantly improved error behavior.

If the ONLY use case for sketching is counting uniques and merging, the HLL sketch is the highest performing in terms of accuracy for space consumed. For large counts, this HLL version will be 2 to 16 times smaller for the same accuracy than the Theta Sketches.

HLL sketches do not retain any of the hash values of the associated unique identifiers, so if there is any anticipation of a future need to leverage associations with these retained hash values, Theta Sketches would be a better choice.

HLL sketches cannot be intermixed or merged in any way with Theta Sketches.

Author:
Lee Rhodes, Kevin Lang
  • Class Summary 
    Class Description
    HllSketch
    This is a high performance implementation of Phillipe Flajolet’s HLL sketch but with significantly improved error behavior.
    IntMemoryPairIterator
    Iterates within a given Memory extracting integer pairs.
    Union
    This performs union operations for all HllSketches.
  • Enum Summary 
    Enum Description
    TgtHllType
    Specifies the target type of HLL sketch to be created.