Package org.apache.datasketches.kll
Class KllSketch
- java.lang.Object
-
- org.apache.datasketches.kll.KllSketch
-
- Direct Known Subclasses:
KllDoublesSketch,KllFloatsSketch
public abstract class KllSketch extends Object
This class is the root of the KLL sketch class hierarchy. It includes the public API that is independent of either sketch type (float or double) and independent of whether the sketch is targeted for use on the heap or Direct (off-heap).Please refer to the documentation in the package-info:
org.apache.datasketches.kll- Author:
- Lee Rhodes, Kevin Lang
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classKllSketch.SketchTypeUsed to define the variable type of the current instance of this class.
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Deprecated Methods Modifier and Type Method Description intgetCurrentCompactSerializedSizeBytes()Returns the current number of bytes this sketch would require to store in the compact Memory Format.intgetCurrentUpdatableSerializedSizeBytes()Returns the current number of bytes this sketch would require to store in the updatable Memory Format.abstract intgetK()Returns the user configured parameter kstatic intgetKFromEpsilon(double epsilon, boolean pmf)Gets the approximate value of k to use given epsilon, the normalized rank error.static intgetMaxSerializedSizeBytes(int k, long n)Deprecated.Instead use getMaxSerializedSizeBytes(int, long, boolean) from the descendants of this class, or getMaxSerializedSizeBytes(int, long, SketchType, boolean) from this class.static intgetMaxSerializedSizeBytes(int k, long n, KllSketch.SketchType sketchType, boolean updatableMemFormat)Returns upper bound on the serialized size of a KllSketch given the following parameters.abstract longgetN()Returns the length of the input stream in items.doublegetNormalizedRankError(boolean pmf)Gets the approximate rank error of this sketch normalized as a fraction between zero and one.static doublegetNormalizedRankError(int k, boolean pmf)Gets the normalized rank error given k and pmf.intgetNumRetained()Returns the number of retained items (samples) in the sketch.intgetSerializedSizeBytes()Returns the current number of bytes this Sketch would require if serialized.booleanhasMemory()Returns true if this sketch's data structure is backed by Memory or WritableMemory.booleanisDirect()Returns true if the backing resource is direct, i.e., actually allocated in off-heap memory.booleanisEmpty()Returns true if this sketch is empty.booleanisEstimationMode()Returns true if this sketch is in estimation mode.booleanisMemoryUpdatableFormat()Returns true if the backing WritableMemory is in updatable format.booleanisReadOnly()Returns true if this sketch is read only.booleanisSameResource(org.apache.datasketches.memory.Memory that)Returns true if the backing resource of this is identical with the backing resource of that.voidmerge(KllSketch other)Merges another sketch into this one.voidreset()This resets the current sketch back to zero entries.byte[]toByteArray()Returns serialized sketch in a compact byte array form.StringtoString()StringtoString(boolean withLevels, boolean withData)Returns a summary of the sketch as a string.
-
-
-
Field Detail
-
DEFAULT_K
public static final int DEFAULT_K
The default value of K- See Also:
- Constant Field Values
-
MAX_K
public static final int MAX_K
The maximum value of K- See Also:
- Constant Field Values
-
-
Method Detail
-
getKFromEpsilon
public static int getKFromEpsilon(double epsilon, boolean pmf)Gets the approximate value of k to use given epsilon, the normalized rank error.- Parameters:
epsilon- the normalized rank error between zero and one.pmf- if true, this function returns the value of k assuming the input epsilon is the desired "double-sided" epsilon for the getPMF() function. Otherwise, this function returns the value of k assuming the input epsilon is the desired "single-sided" epsilon for all the other queries.Please refer to the documentation in the package-info:
org.apache.datasketches.kll- Returns:
- the value of k given a value of epsilon.
-
getMaxSerializedSizeBytes
@Deprecated public static int getMaxSerializedSizeBytes(int k, long n)
Deprecated.Instead use getMaxSerializedSizeBytes(int, long, boolean) from the descendants of this class, or getMaxSerializedSizeBytes(int, long, SketchType, boolean) from this class. Version 3.2.0Returns upper bound on the compact serialized size of a FloatsSketch given a parameter k and stream length. This method can be used if allocation of storage is necessary beforehand.- Parameters:
k- parameter that controls size of the sketch and accuracy of estimatesn- stream length- Returns:
- upper bound on the compact serialized size
-
getMaxSerializedSizeBytes
public static int getMaxSerializedSizeBytes(int k, long n, KllSketch.SketchType sketchType, boolean updatableMemFormat)Returns upper bound on the serialized size of a KllSketch given the following parameters.- Parameters:
k- parameter that controls size of the sketch and accuracy of estimatesn- stream lengthsketchType- either DOUBLES_SKETCH or FLOATS_SKETCHupdatableMemFormat- true if updatable Memory format, otherwise the standard compact format.- Returns:
- upper bound on the serialized size of a KllSketch.
-
getNormalizedRankError
public static double getNormalizedRankError(int k, boolean pmf)Gets the normalized rank error given k and pmf. Static method version of the getNormalizedRankError(boolean).- Parameters:
k- the configuration parameterpmf- if true, returns the "double-sided" normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.- Returns:
- if pmf is true, the normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.
-
getCurrentCompactSerializedSizeBytes
public final int getCurrentCompactSerializedSizeBytes()
Returns the current number of bytes this sketch would require to store in the compact Memory Format.- Returns:
- the current number of bytes this sketch would require to store in the compact Memory Format.
-
getCurrentUpdatableSerializedSizeBytes
public final int getCurrentUpdatableSerializedSizeBytes()
Returns the current number of bytes this sketch would require to store in the updatable Memory Format.- Returns:
- the current number of bytes this sketch would require to store in the updatable Memory Format.
-
getK
public abstract int getK()
Returns the user configured parameter k- Returns:
- the user configured parameter k
-
getN
public abstract long getN()
Returns the length of the input stream in items.- Returns:
- stream length
-
getNormalizedRankError
public final double getNormalizedRankError(boolean pmf)
Gets the approximate rank error of this sketch normalized as a fraction between zero and one.- Parameters:
pmf- if true, returns the "double-sided" normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries. The epsilon value returned is a best fit to 99 percentile empirically measured max error in thousands of trials- Returns:
- if pmf is true, returns the normalized rank error for the getPMF() function.
Otherwise, it is the "single-sided" normalized rank error for all the other queries.
Please refer to the documentation in the package-info:
org.apache.datasketches.kll
-
getNumRetained
public final int getNumRetained()
Returns the number of retained items (samples) in the sketch.- Returns:
- the number of retained items (samples) in the sketch
-
getSerializedSizeBytes
public int getSerializedSizeBytes()
Returns the current number of bytes this Sketch would require if serialized.- Returns:
- the number of bytes this sketch would require if serialized.
-
hasMemory
public boolean hasMemory()
Returns true if this sketch's data structure is backed by Memory or WritableMemory.- Returns:
- true if this sketch's data structure is backed by Memory or WritableMemory.
-
isDirect
public boolean isDirect()
Returns true if the backing resource is direct, i.e., actually allocated in off-heap memory. This is the case for off-heap memory and memory mapped files. This backing resource could be either Memory(read-only) or WritableMemory. However, if the backing Memory or WritabelMemory resource is allocated on-heap, this will return false.- Returns:
- true if the backing resource is off-heap memory.
-
isEmpty
public final boolean isEmpty()
Returns true if this sketch is empty.- Returns:
- empty flag
-
isEstimationMode
public final boolean isEstimationMode()
Returns true if this sketch is in estimation mode.- Returns:
- estimation mode flag
-
isMemoryUpdatableFormat
public final boolean isMemoryUpdatableFormat()
Returns true if the backing WritableMemory is in updatable format.- Returns:
- true if the backing WritableMemory is in updatable format.
-
isReadOnly
public final boolean isReadOnly()
Returns true if this sketch is read only.- Returns:
- true if this sketch is read only.
-
isSameResource
public final boolean isSameResource(org.apache.datasketches.memory.Memory that)
Returns true if the backing resource of this is identical with the backing resource of that. The capacities must be the same. If this is a region, the region offset must also be the same.- Parameters:
that- A different non-null object- Returns:
- true if the backing resource of this is the same as the backing resource of that.
-
merge
public final void merge(KllSketch other)
Merges another sketch into this one. Attempting to merge a KllDoublesSketch with a KllFloatsSketch will throw an exception.- Parameters:
other- sketch to merge into this one
-
reset
public final void reset()
This resets the current sketch back to zero entries. It retains key parameters such as k and SketchType (double or float).
-
toByteArray
public byte[] toByteArray()
Returns serialized sketch in a compact byte array form.- Returns:
- serialized sketch in a compact byte array form.
-
toString
public String toString(boolean withLevels, boolean withData)
Returns a summary of the sketch as a string.- Parameters:
withLevels- if true include information about levelswithData- if true include sketch data- Returns:
- string representation of sketch summary
-
-