Class Sketch<S extends Summary>

java.lang.Object
org.apache.datasketches.tuple.Sketch<S>
Type Parameters:
S - Type of Summary
Direct Known Subclasses:
CompactSketch, UpdatableSketch

public abstract class Sketch<S extends Summary>
extends Object
This is an equivalent to org.apache.datasketches.theta.Sketch with addition of a user-defined Summary object associated with every unique entry in the sketch.
  • Field Summary

    Fields 
    Modifier and Type Field Description
    protected static byte PREAMBLE_LONGS  
  • Method Summary

    Modifier and Type Method Description
    abstract CompactSketch<S> compact()
    Converts this sketch to a CompactSketch on the Java heap.
    abstract int getCountLessThanThetaLong​(long thetaLong)
    Gets the number of hash values less than the given theta expressed as a long.
    double getEstimate()
    Estimates the cardinality of the set (number of unique values presented to the sketch)
    double getEstimate​(int numSubsetEntries)
    Gets the estimate of the true distinct population of subset tuples represented by the count of entries in a subset of the total retained entries of the sketch.
    double getLowerBound​(int numStdDev)
    Gets the approximate lower error bound given the specified number of Standard Deviations.
    double getLowerBound​(int numStdDev, int numSubsetEntries)
    Gets the estimate of the lower bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
    abstract int getRetainedEntries()  
    double getTheta()
    Gets the value of theta as a double between zero and one
    long getThetaLong()
    Returns Theta as a long
    double getUpperBound​(int numStdDev)
    Gets the approximate upper error bound given the specified number of Standard Deviations.
    double getUpperBound​(int numStdDev, int numSubsetEntries)
    Gets the estimate of the upper bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
    boolean isEmpty()
    boolean isEstimationMode()
    Returns true if the sketch is Estimation Mode (as opposed to Exact Mode).
    abstract SketchIterator<S> iterator()
    Returns a SketchIterator
    abstract byte[] toByteArray()
    This is to serialize an instance to a byte array.
    String toString()  

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Field Details

  • Method Details

    • compact

      public abstract CompactSketch<S> compact()
      Converts this sketch to a CompactSketch on the Java heap.

      If this sketch is already in compact form this operation returns this.

      Returns:
      this sketch as a CompactSketch on the Java heap.
    • getEstimate

      public double getEstimate()
      Estimates the cardinality of the set (number of unique values presented to the sketch)
      Returns:
      best estimate of the number of unique values
    • getUpperBound

      public double getUpperBound​(int numStdDev)
      Gets the approximate upper error bound given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.
      Parameters:
      numStdDev - See Number of Standard Deviations
      Returns:
      the upper bound.
    • getLowerBound

      public double getLowerBound​(int numStdDev)
      Gets the approximate lower error bound given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.
      Parameters:
      numStdDev - See Number of Standard Deviations
      Returns:
      the lower bound.
    • getEstimate

      public double getEstimate​(int numSubsetEntries)
      Gets the estimate of the true distinct population of subset tuples represented by the count of entries in a subset of the total retained entries of the sketch.
      Parameters:
      numSubsetEntries - number of entries for a chosen subset of the sketch.
      Returns:
      the estimate of the true distinct population of subset tuples represented by the count of entries in a subset of the total retained entries of the sketch.
    • getLowerBound

      public double getLowerBound​(int numStdDev, int numSubsetEntries)
      Gets the estimate of the lower bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
      Parameters:
      numStdDev - See Number of Standard Deviations
      numSubsetEntries - number of entries for a chosen subset of the sketch.
      Returns:
      the estimate of the lower bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
    • getUpperBound

      public double getUpperBound​(int numStdDev, int numSubsetEntries)
      Gets the estimate of the upper bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
      Parameters:
      numStdDev - See Number of Standard Deviations
      numSubsetEntries - number of entries for a chosen subset of the sketch.
      Returns:
      the estimate of the upper bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
    • isEmpty

      public boolean isEmpty()
      Returns:
      true if empty.
    • isEstimationMode

      public boolean isEstimationMode()
      Returns true if the sketch is Estimation Mode (as opposed to Exact Mode). This is true if theta < 1.0 AND isEmpty() is false.
      Returns:
      true if the sketch is in estimation mode.
    • getRetainedEntries

      public abstract int getRetainedEntries()
      Returns:
      number of retained entries
    • getCountLessThanThetaLong

      public abstract int getCountLessThanThetaLong​(long thetaLong)
      Gets the number of hash values less than the given theta expressed as a long.
      Parameters:
      thetaLong - the given theta as a long between zero and Long.MAX_VALUE.
      Returns:
      the number of hash values less than the given thetaLong.
    • getTheta

      public double getTheta()
      Gets the value of theta as a double between zero and one
      Returns:
      the value of theta as a double
    • toByteArray

      public abstract byte[] toByteArray()
      This is to serialize an instance to a byte array.
      Returns:
      serialized representation of the sketch
    • iterator

      public abstract SketchIterator<S> iterator()
      Returns a SketchIterator
      Returns:
      a SketchIterator
    • getThetaLong

      public long getThetaLong()
      Returns Theta as a long
      Returns:
      Theta as a long
    • toString

      public String toString()
      Overrides:
      toString in class Object