Package org.apache.datasketches.hll
The hll package contains a high performance implementation of Phillipe
Flajolet's HLL sketch with significantly improved error behavior.
If the ONLY use case for sketching is counting uniques and merging, the HLL sketch is the highest performing in terms of accuracy for space consumed. For large counts, this HLL version will be 2 to 16 times smaller for the same accuracy than the Theta Sketches.
HLL sketches do not retain any of the hash values of the associated unique identifiers, so if there is any anticipation of a future need to leverage associations with these retained hash values, Theta Sketches would be a better choice.
HLL sketches cannot be intermixed or merged in any way with Theta Sketches.
- Author:
- Lee Rhodes, Kevin Lang
-
Class Summary Class Description HllSketch This is a high performance implementation of Phillipe Flajolet’s HLL sketch but with significantly improved error behavior.IntMemoryPairIterator Iterates within a given Memory extracting integer pairs.Union This performs union operations for all HllSketches. -
Enum Summary Enum Description TgtHllType Specifies the target type of HLL sketch to be created.