Class KllSketch

  • Direct Known Subclasses:
    KllDoublesSketch, KllFloatsSketch

    public abstract class KllSketch
    extends Object
    This class is the root of the KLL sketch class hierarchy. It includes the public API that is independent of either sketch type (float or double) and independent of whether the sketch is targeted for use on the heap or Direct (off-heap).

    Please refer to the documentation in the package-info:
    org.apache.datasketches.kll

    Author:
    Lee Rhodes, Kevin Lang
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  KllSketch.SketchType
      Used to define the variable type of the current instance of this class.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int DEFAULT_K
      The default value of K
      static int MAX_K
      The maximum value of K
    • Method Summary

      All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      int getCurrentCompactSerializedSizeBytes()
      Returns the current number of bytes this sketch would require to store in the compact Memory Format.
      int getCurrentUpdatableSerializedSizeBytes()
      Returns the current number of bytes this sketch would require to store in the updatable Memory Format.
      abstract int getK()
      Returns the user configured parameter k
      static int getKFromEpsilon​(double epsilon, boolean pmf)
      Gets the approximate value of k to use given epsilon, the normalized rank error.
      static int getMaxSerializedSizeBytes​(int k, long n)
      Deprecated.
      Instead use getMaxSerializedSizeBytes(int, long, boolean) from the descendants of this class, or getMaxSerializedSizeBytes(int, long, SketchType, boolean) from this class.
      static int getMaxSerializedSizeBytes​(int k, long n, KllSketch.SketchType sketchType, boolean updatableMemFormat)
      Returns upper bound on the serialized size of a KllSketch given the following parameters.
      abstract long getN()
      Returns the length of the input stream in items.
      double getNormalizedRankError​(boolean pmf)
      Gets the approximate rank error of this sketch normalized as a fraction between zero and one.
      static double getNormalizedRankError​(int k, boolean pmf)
      Gets the normalized rank error given k and pmf.
      int getNumRetained()
      Returns the number of retained items (samples) in the sketch.
      int getSerializedSizeBytes()
      Returns the current number of bytes this Sketch would require if serialized.
      boolean hasMemory()
      Returns true if this sketch's data structure is backed by Memory or WritableMemory.
      boolean isDirect()
      Returns true if the backing resource is direct, i.e., actually allocated in off-heap memory.
      boolean isEmpty()
      Returns true if this sketch is empty.
      boolean isEstimationMode()
      Returns true if this sketch is in estimation mode.
      boolean isMemoryUpdatableFormat()
      Returns true if the backing WritableMemory is in updatable format.
      boolean isReadOnly()
      Returns true if this sketch is read only.
      boolean isSameResource​(org.apache.datasketches.memory.Memory that)
      Returns true if the backing resource of this is identical with the backing resource of that.
      void merge​(KllSketch other)
      Merges another sketch into this one.
      void reset()
      This resets the current sketch back to zero entries.
      byte[] toByteArray()
      Returns serialized sketch in a compact byte array form.
      String toString()  
      String toString​(boolean withLevels, boolean withData)
      Returns a summary of the sketch as a string.
    • Method Detail

      • getKFromEpsilon

        public static int getKFromEpsilon​(double epsilon,
                                          boolean pmf)
        Gets the approximate value of k to use given epsilon, the normalized rank error.
        Parameters:
        epsilon - the normalized rank error between zero and one.
        pmf - if true, this function returns the value of k assuming the input epsilon is the desired "double-sided" epsilon for the getPMF() function. Otherwise, this function returns the value of k assuming the input epsilon is the desired "single-sided" epsilon for all the other queries.

        Please refer to the documentation in the package-info:
        org.apache.datasketches.kll

        Returns:
        the value of k given a value of epsilon.
      • getMaxSerializedSizeBytes

        @Deprecated
        public static int getMaxSerializedSizeBytes​(int k,
                                                    long n)
        Deprecated.
        Instead use getMaxSerializedSizeBytes(int, long, boolean) from the descendants of this class, or getMaxSerializedSizeBytes(int, long, SketchType, boolean) from this class. Version 3.2.0
        Returns upper bound on the compact serialized size of a FloatsSketch given a parameter k and stream length. This method can be used if allocation of storage is necessary beforehand.
        Parameters:
        k - parameter that controls size of the sketch and accuracy of estimates
        n - stream length
        Returns:
        upper bound on the compact serialized size
      • getMaxSerializedSizeBytes

        public static int getMaxSerializedSizeBytes​(int k,
                                                    long n,
                                                    KllSketch.SketchType sketchType,
                                                    boolean updatableMemFormat)
        Returns upper bound on the serialized size of a KllSketch given the following parameters.
        Parameters:
        k - parameter that controls size of the sketch and accuracy of estimates
        n - stream length
        sketchType - either DOUBLES_SKETCH or FLOATS_SKETCH
        updatableMemFormat - true if updatable Memory format, otherwise the standard compact format.
        Returns:
        upper bound on the serialized size of a KllSketch.
      • getNormalizedRankError

        public static double getNormalizedRankError​(int k,
                                                    boolean pmf)
        Gets the normalized rank error given k and pmf. Static method version of the getNormalizedRankError(boolean).
        Parameters:
        k - the configuration parameter
        pmf - if true, returns the "double-sided" normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.
        Returns:
        if pmf is true, the normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.
      • getCurrentCompactSerializedSizeBytes

        public final int getCurrentCompactSerializedSizeBytes()
        Returns the current number of bytes this sketch would require to store in the compact Memory Format.
        Returns:
        the current number of bytes this sketch would require to store in the compact Memory Format.
      • getCurrentUpdatableSerializedSizeBytes

        public final int getCurrentUpdatableSerializedSizeBytes()
        Returns the current number of bytes this sketch would require to store in the updatable Memory Format.
        Returns:
        the current number of bytes this sketch would require to store in the updatable Memory Format.
      • getK

        public abstract int getK()
        Returns the user configured parameter k
        Returns:
        the user configured parameter k
      • getN

        public abstract long getN()
        Returns the length of the input stream in items.
        Returns:
        stream length
      • getNormalizedRankError

        public final double getNormalizedRankError​(boolean pmf)
        Gets the approximate rank error of this sketch normalized as a fraction between zero and one.
        Parameters:
        pmf - if true, returns the "double-sided" normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries. The epsilon value returned is a best fit to 99 percentile empirically measured max error in thousands of trials
        Returns:
        if pmf is true, returns the normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.

        Please refer to the documentation in the package-info:
        org.apache.datasketches.kll

      • getNumRetained

        public final int getNumRetained()
        Returns the number of retained items (samples) in the sketch.
        Returns:
        the number of retained items (samples) in the sketch
      • getSerializedSizeBytes

        public int getSerializedSizeBytes()
        Returns the current number of bytes this Sketch would require if serialized.
        Returns:
        the number of bytes this sketch would require if serialized.
      • hasMemory

        public boolean hasMemory()
        Returns true if this sketch's data structure is backed by Memory or WritableMemory.
        Returns:
        true if this sketch's data structure is backed by Memory or WritableMemory.
      • isDirect

        public boolean isDirect()
        Returns true if the backing resource is direct, i.e., actually allocated in off-heap memory. This is the case for off-heap memory and memory mapped files. This backing resource could be either Memory(read-only) or WritableMemory. However, if the backing Memory or WritabelMemory resource is allocated on-heap, this will return false.
        Returns:
        true if the backing resource is off-heap memory.
      • isEmpty

        public final boolean isEmpty()
        Returns true if this sketch is empty.
        Returns:
        empty flag
      • isEstimationMode

        public final boolean isEstimationMode()
        Returns true if this sketch is in estimation mode.
        Returns:
        estimation mode flag
      • isMemoryUpdatableFormat

        public final boolean isMemoryUpdatableFormat()
        Returns true if the backing WritableMemory is in updatable format.
        Returns:
        true if the backing WritableMemory is in updatable format.
      • isReadOnly

        public final boolean isReadOnly()
        Returns true if this sketch is read only.
        Returns:
        true if this sketch is read only.
      • isSameResource

        public final boolean isSameResource​(org.apache.datasketches.memory.Memory that)
        Returns true if the backing resource of this is identical with the backing resource of that. The capacities must be the same. If this is a region, the region offset must also be the same.
        Parameters:
        that - A different non-null object
        Returns:
        true if the backing resource of this is the same as the backing resource of that.
      • merge

        public final void merge​(KllSketch other)
        Merges another sketch into this one. Attempting to merge a KllDoublesSketch with a KllFloatsSketch will throw an exception.
        Parameters:
        other - sketch to merge into this one
      • reset

        public final void reset()
        This resets the current sketch back to zero entries. It retains key parameters such as k and SketchType (double or float).
      • toByteArray

        public byte[] toByteArray()
        Returns serialized sketch in a compact byte array form.
        Returns:
        serialized sketch in a compact byte array form.
      • toString

        public String toString​(boolean withLevels,
                               boolean withData)
        Returns a summary of the sketch as a string.
        Parameters:
        withLevels - if true include information about levels
        withData - if true include sketch data
        Returns:
        string representation of sketch summary