Class KllDoublesSketch
- java.lang.Object
-
- org.apache.datasketches.kll.KllSketch
-
- org.apache.datasketches.kll.KllDoublesSketch
-
public abstract class KllDoublesSketch extends KllSketch
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.datasketches.kll.KllSketch
KllSketch.SketchType
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description double[]getCDF(double[] splitPoints)Returns an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF, of the input stream given a set of splitPoint (values).static intgetMaxSerializedSizeBytes(int k, long n, boolean updatableMemoryFormat)Returns upper bound on the serialized size of a KllDoublesSketch given the following parameters.doublegetMaxValue()Returns the max value of the stream.doublegetMinValue()Returns the min value of the stream.double[]getPMF(double[] splitPoints)Returns an approximation to the Probability Mass Function (PMF) of the input stream given a set of splitPoints (values).doublegetQuantile(double fraction)Returns an approximation to the value of the data item that would be preceded by the given fraction of a hypothetical sorted version of the input stream so far.doublegetQuantileLowerBound(double fraction)Gets the lower bound of the value interval in which the true quantile of the given rank exists with a confidence of at least 99%.double[]getQuantiles(double[] fractions)This is a more efficient multiple-query version of getQuantile().double[]getQuantiles(int numEvenlySpaced)This is also a more efficient multiple-query version of getQuantile() and allows the caller to specify the number of evenly spaced fractional ranks.doublegetQuantileUpperBound(double fraction)Gets the upper bound of the value interval in which the true quantile of the given rank exists with a confidence of at least 99%.doublegetRank(double value)Returns an approximation to the normalized (fractional) rank of the given value from 0 to 1, inclusive.static KllDoublesSketchheapify(org.apache.datasketches.memory.Memory srcMem)Factory heapify takes the sketch image in Memory and instantiates an on-heap sketch.KllDoublesSketchIteratoriterator()static KllDoublesSketchnewDirectInstance(int k, org.apache.datasketches.memory.WritableMemory dstMem, org.apache.datasketches.memory.MemoryRequestServer memReqSvr)Create a new direct instance of this sketch with a given k.static KllDoublesSketchnewDirectInstance(org.apache.datasketches.memory.WritableMemory dstMem, org.apache.datasketches.memory.MemoryRequestServer memReqSvr)Create a new direct instance of this sketch with the default k.static KllDoublesSketchnewHeapInstance()Create a new heap instance of this sketch with the default k = 200.static KllDoublesSketchnewHeapInstance(int k)Create a new heap instance of this sketch with a given parameter k.voidupdate(double value)Updates this sketch with the given data item.static KllDoublesSketchwrap(org.apache.datasketches.memory.Memory srcMem)Wrap a sketch around the given read only source Memory containing sketch data that originated from this sketch.static KllDoublesSketchwritableWrap(org.apache.datasketches.memory.WritableMemory srcMem, org.apache.datasketches.memory.MemoryRequestServer memReqSvr)Wrap a sketch around the given source Memory containing sketch data that originated from this sketch.-
Methods inherited from class org.apache.datasketches.kll.KllSketch
getCurrentCompactSerializedSizeBytes, getCurrentUpdatableSerializedSizeBytes, getK, getKFromEpsilon, getMaxSerializedSizeBytes, getMaxSerializedSizeBytes, getN, getNormalizedRankError, getNormalizedRankError, getNumRetained, getSerializedSizeBytes, hasMemory, isDirect, isEmpty, isEstimationMode, isMemoryUpdatableFormat, isReadOnly, isSameResource, merge, reset, toByteArray, toString, toString
-
-
-
-
Method Detail
-
getMaxSerializedSizeBytes
public static int getMaxSerializedSizeBytes(int k, long n, boolean updatableMemoryFormat)Returns upper bound on the serialized size of a KllDoublesSketch given the following parameters.- Parameters:
k- parameter that controls size of the sketch and accuracy of estimatesn- stream lengthupdatableMemoryFormat- true if updatable Memory format, otherwise the standard compact format.- Returns:
- upper bound on the serialized size of a KllSketch.
-
heapify
public static KllDoublesSketch heapify(org.apache.datasketches.memory.Memory srcMem)
Factory heapify takes the sketch image in Memory and instantiates an on-heap sketch. The resulting sketch will not retain any link to the source Memory.- Parameters:
srcMem- a Memory image of a sketch serialized by this sketch. See Memory- Returns:
- a heap-based sketch based on the given Memory.
-
newDirectInstance
public static KllDoublesSketch newDirectInstance(int k, org.apache.datasketches.memory.WritableMemory dstMem, org.apache.datasketches.memory.MemoryRequestServer memReqSvr)
Create a new direct instance of this sketch with a given k.- Parameters:
k- parameter that controls size of the sketch and accuracy of estimates.dstMem- the given destination WritableMemory object for use by the sketchmemReqSvr- the given MemoryRequestServer to request a larger WritableMemory- Returns:
- a new direct instance of this sketch
-
newDirectInstance
public static KllDoublesSketch newDirectInstance(org.apache.datasketches.memory.WritableMemory dstMem, org.apache.datasketches.memory.MemoryRequestServer memReqSvr)
Create a new direct instance of this sketch with the default k. The default k = 200 results in a normalized rank error of about 1.65%. Higher values of k will have smaller error but the sketch will be larger (and slower).- Parameters:
dstMem- the given destination WritableMemory object for use by the sketchmemReqSvr- the given MemoryRequestServer to request a larger WritableMemory- Returns:
- a new direct instance of this sketch
-
newHeapInstance
public static KllDoublesSketch newHeapInstance()
Create a new heap instance of this sketch with the default k = 200. The default k = 200 results in a normalized rank error of about 1.65%. Higher values of K will have smaller error but the sketch will be larger (and slower). This will have a rank error of about 1.65%.- Returns:
- new KllDoublesSketch on the heap.
-
newHeapInstance
public static KllDoublesSketch newHeapInstance(int k)
Create a new heap instance of this sketch with a given parameter k. k can be any value between DEFAULT_M and 65535, inclusive. The default k = 200 results in a normalized rank error of about 1.65%. Higher values of K will have smaller error but the sketch will be larger (and slower).- Parameters:
k- parameter that controls size of the sketch and accuracy of estimates.- Returns:
- new KllDoublesSketch on the heap.
-
wrap
public static KllDoublesSketch wrap(org.apache.datasketches.memory.Memory srcMem)
Wrap a sketch around the given read only source Memory containing sketch data that originated from this sketch.- Parameters:
srcMem- the read only source Memory- Returns:
- instance of this sketch
-
writableWrap
public static KllDoublesSketch writableWrap(org.apache.datasketches.memory.WritableMemory srcMem, org.apache.datasketches.memory.MemoryRequestServer memReqSvr)
Wrap a sketch around the given source Memory containing sketch data that originated from this sketch.- Parameters:
srcMem- a WritableMemory that contains data.memReqSvr- the given MemoryRequestServer to request a larger WritableMemory- Returns:
- instance of this sketch
-
getCDF
public double[] getCDF(double[] splitPoints)
Returns an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF, of the input stream given a set of splitPoint (values).The resulting approximations have a probabilistic guarantee that can be obtained from the getNormalizedRankError(false) function.
If the sketch is empty this returns null.
- Parameters:
splitPoints- an array of m unique, monotonically increasing double values that divide the real number line into m+1 consecutive disjoint intervals. The definition of an "interval" is inclusive of the left splitPoint (or minimum value) and exclusive of the right splitPoint, with the exception that the last interval will include the maximum value. It is not necessary to include either the min or max values in these split points.- Returns:
- an array of m+1 double values on the interval [0.0, 1.0), which are a consecutive approximation to the CDF of the input stream given the splitPoints. The value at array position j of the returned CDF array is the sum of the returned values in positions 0 through j of the returned PMF array.
-
getMaxValue
public double getMaxValue()
Returns the max value of the stream. If the sketch is empty this returns NaN.- Returns:
- the max value of the stream
-
getMinValue
public double getMinValue()
Returns the min value of the stream. If the sketch is empty this returns NaN.- Returns:
- the min value of the stream
-
getPMF
public double[] getPMF(double[] splitPoints)
Returns an approximation to the Probability Mass Function (PMF) of the input stream given a set of splitPoints (values).The resulting approximations have a probabilistic guarantee that can be obtained from the getNormalizedRankError(true) function.
If the sketch is empty this returns null.
- Parameters:
splitPoints- an array of m unique, monotonically increasing double values that divide the real number line into m+1 consecutive disjoint intervals. The definition of an "interval" is inclusive of the left splitPoint (or minimum value) and exclusive of the right splitPoint, with the exception that the last interval will include the maximum value. It is not necessary to include either the min or max values in these split points.- Returns:
- an array of m+1 doubles on the interval [0.0, 1.0), each of which is an approximation to the fraction of the total input stream values (the mass) that fall into one of those intervals. The definition of an "interval" is inclusive of the left splitPoint and exclusive of the right splitPoint, with the exception that the last interval will include maximum value.
-
getQuantile
public double getQuantile(double fraction)
Returns an approximation to the value of the data item that would be preceded by the given fraction of a hypothetical sorted version of the input stream so far.We note that this method has a fairly large overhead (microseconds instead of nanoseconds) so it should not be called multiple times to get different quantiles from the same sketch. Instead use getQuantiles(), which pays the overhead only once.
If the sketch is empty this returns NaN.
- Parameters:
fraction- the specified fractional position in the hypothetical sorted stream. These are also called normalized ranks or fractional ranks. If fraction = 0.0, the true minimum value of the stream is returned. If fraction = 1.0, the true maximum value of the stream is returned.- Returns:
- the approximation to the value at the given fraction
-
getQuantileLowerBound
public double getQuantileLowerBound(double fraction)
Gets the lower bound of the value interval in which the true quantile of the given rank exists with a confidence of at least 99%.- Parameters:
fraction- the given normalized rank as a fraction- Returns:
- the lower bound of the value interval in which the true quantile of the given rank exists with a confidence of at least 99%. Returns NaN if the sketch is empty.
-
getQuantiles
public double[] getQuantiles(double[] fractions)
This is a more efficient multiple-query version of getQuantile().This returns an array that could have been generated by using getQuantile() with many different fractional ranks, but would be very inefficient. This method incurs the internal set-up overhead once and obtains multiple quantile values in a single query. It is strongly recommend that this method be used instead of multiple calls to getQuantile().
If the sketch is empty this returns null.
- Parameters:
fractions- given array of fractional positions in the hypothetical sorted stream. These are also called normalized ranks or fractional ranks. These fractions must be in the interval [0.0, 1.0], inclusive.- Returns:
- array of approximations to the given fractions in the same order as given fractions array.
-
getQuantiles
public double[] getQuantiles(int numEvenlySpaced)
This is also a more efficient multiple-query version of getQuantile() and allows the caller to specify the number of evenly spaced fractional ranks.If the sketch is empty this returns null.
- Parameters:
numEvenlySpaced- an integer that specifies the number of evenly spaced fractional ranks. This must be a positive integer greater than 0. A value of 1 will return the min value. A value of 2 will return the min and the max value. A value of 3 will return the min, the median and the max value, etc.- Returns:
- array of approximations to the given fractions in the same order as given fractions array.
-
getQuantileUpperBound
public double getQuantileUpperBound(double fraction)
Gets the upper bound of the value interval in which the true quantile of the given rank exists with a confidence of at least 99%.- Parameters:
fraction- the given normalized rank as a fraction- Returns:
- the upper bound of the value interval in which the true quantile of the given rank exists with a confidence of at least 99%. Returns NaN if the sketch is empty.
-
getRank
public double getRank(double value)
Returns an approximation to the normalized (fractional) rank of the given value from 0 to 1, inclusive.The resulting approximation has a probabilistic guarantee that can be obtained from the getNormalizedRankError(false) function.
If the sketch is empty this returns NaN.
- Parameters:
value- to be ranked- Returns:
- an approximate rank of the given value
-
iterator
public KllDoublesSketchIterator iterator()
- Returns:
- the iterator for this class
-
update
public void update(double value)
Updates this sketch with the given data item.- Parameters:
value- an item from a stream of items. NaNs are ignored.
-
-