Class Sketch
- Direct Known Subclasses:
CompactSketch,UpdateSketch
public abstract class Sketch extends Object
- Author:
- Lee Rhodes
-
Method Summary
Modifier and Type Method Description CompactSketchcompact()Converts this sketch to a ordered CompactSketch on the Java heap.abstract CompactSketchcompact(boolean dstOrdered, org.apache.datasketches.memory.WritableMemory dstMem)Convert this sketch to a new CompactSketch of the chosen order and direct or on the heap.abstract intgetCompactBytes()Returns the number of storage bytes required for this Sketch if its current state were compacted.intgetCountLessThanThetaLong(long thetaLong)Gets the number of hash values less than the given theta expressed as a long.abstract intgetCurrentBytes()Returns the number of storage bytes required for this sketch in its current state.abstract doublegetEstimate()Gets the unique count estimate.abstract FamilygetFamily()Returns the Family that this sketch belongs todoublegetLowerBound(int numStdDev)Gets the approximate lower error bound given the specified number of Standard Deviations.static intgetMaxCompactSketchBytes(int numberOfEntries)Returns the maximum number of storage bytes required for a CompactSketch with the given number of actual entries.static intgetMaxUpdateSketchBytes(int nomEntries)Returns the maximum number of storage bytes required for an UpdateSketch with the given number of nominal entries (power of 2).intgetRetainedEntries()Returns the number of valid entries that have been retained by the sketch.abstract intgetRetainedEntries(boolean valid)Returns the number of entries that have been retained by the sketch.static intgetSerializationVersion(org.apache.datasketches.memory.Memory mem)Returns the serialization version from the given MemorydoublegetTheta()Gets the value of theta as a double with a value between zero and oneabstract longgetThetaLong()Gets the value of theta as a longdoublegetUpperBound(int numStdDev)Gets the approximate upper error bound given the specified number of Standard Deviations.abstract booleanhasMemory()Returns true if this sketch's data structure is backed by Memory or WritableMemory.static Sketchheapify(org.apache.datasketches.memory.Memory srcMem)Heapify takes the sketch image in Memory and instantiates an on-heap Sketch using the Default Update Seed.static Sketchheapify(org.apache.datasketches.memory.Memory srcMem, long seed)Heapify takes the sketch image in Memory and instantiates an on-heap Sketch using the given seed.abstract booleanisCompact()Returns true if this sketch is in compact form.abstract booleanisDirect()Returns true if the this sketch's internal data structure is backed by direct (off-heap) Memory.abstract booleanisEmpty()booleanisEstimationMode()Returns true if the sketch is Estimation Mode (as opposed to Exact Mode).abstract booleanisOrdered()Returns true if internal cache is orderedbooleanisSameResource(org.apache.datasketches.memory.Memory that)Returns true if the backing resource of this is identical with the backing resource of that.abstract HashIteratoriterator()Returns a HashIterator that can be used to iterate over the retained hash values of the Theta sketch.abstract byte[]toByteArray()Serialize this sketch to a byte array form.StringtoString()Returns a human readable summary of the sketch.StringtoString(boolean sketchSummary, boolean dataDetail, int width, boolean hexMode)Gets a human readable listing of contents and summary of the given sketch.static StringtoString(byte[] byteArr)Returns a human readable string of the preamble of a byte array image of a Theta Sketch.static StringtoString(org.apache.datasketches.memory.Memory mem)Returns a human readable string of the preamble of a Memory image of a Theta Sketch.static Sketchwrap(org.apache.datasketches.memory.Memory srcMem)Wrap takes the sketch image in Memory and refers to it directly.static Sketchwrap(org.apache.datasketches.memory.Memory srcMem, long seed)Wrap takes the sketch image in Memory and refers to it directly with just a reference.
-
Method Details
-
heapify
Heapify takes the sketch image in Memory and instantiates an on-heap Sketch using the Default Update Seed. The resulting sketch will not retain any link to the source Memory.- Parameters:
srcMem- an image of a Sketch where the image seed hash matches the default seed hash. See Memory- Returns:
- a Heap-based Sketch from the given Memory
-
heapify
Heapify takes the sketch image in Memory and instantiates an on-heap Sketch using the given seed. The resulting sketch will not retain any link to the source Memory.- Parameters:
srcMem- an image of a Sketch where the image seed hash matches the given seed hash. See Memoryseed- See Update Hash Seed. Compact sketches store a 16-bit hash of the seed, but not the seed itself.- Returns:
- a Heap-based Sketch from the given Memory
-
wrap
Wrap takes the sketch image in Memory and refers to it directly. There is no data copying onto the java heap. Only "Direct" Serialization Version 3 (i.e, OpenSource) sketches that have been explicitly stored as direct objects can be wrapped. This method assumes theUtil.DEFAULT_UPDATE_SEED. Default Update Seed.- Parameters:
srcMem- an image of a Sketch where the image seed hash matches the default seed hash. See Memory- Returns:
- a Sketch backed by the given Memory
-
wrap
Wrap takes the sketch image in Memory and refers to it directly with just a reference. There is no data copying onto the java heap. Only "Direct" Serialization Version 3 (i.e, OpenSource) sketches that have been explicitly stored as direct objects can be wrapped.The wrap operation enables fast read-only merging and access to all the public read-only API.
Note: wrapping earlier serial version sketches will result in a on-heap form of the sketch where all data will be copied to the heap. These early versions were never designed to "wrap".
Wrapping any subclass of this class that is empty or contains only a single item will result in on-heap equivalent forms of empty and single item sketch respectively. This is actually faster and consumes less overall memory.
- Parameters:
srcMem- an image of a Sketch where the image seed hash matches the given seed hash. See Memoryseed- See Update Hash Seed. Compact sketches store a 16-bit hash of the seed, but not the seed itself.- Returns:
- a UpdateSketch backed by the given Memory except as above.
-
compact
Converts this sketch to a ordered CompactSketch on the Java heap.If this sketch is already in the proper form, this method returns this, otherwise, this method returns a new CompactSketch of the proper form.
A CompactSketch is always immutable.
- Returns:
- this sketch as an ordered CompactSketch on the Java heap.
-
compact
public abstract CompactSketch compact(boolean dstOrdered, org.apache.datasketches.memory.WritableMemory dstMem)Convert this sketch to a new CompactSketch of the chosen order and direct or on the heap.If this sketch is already in the proper form, this operation returns this, otherwise, this method returns a new CompactSketch of the proper form.
If this sketch is a type of UpdateSketch, the compacting process converts the hash table of the UpdateSketch to a simple list of the valid hash values. Any hash values of zero or equal-to or greater than theta will be discarded. The number of valid values remaining in the CompactSketch depends on a number of factors, but may be larger or smaller than Nominal Entries (or k). It will never exceed 2k. If it is critical to always limit the size to no more than k, then rebuild() should be called on the UpdateSketch prior to calling this method.
A CompactSketch is always immutable.
- Parameters:
dstOrdered- See Destination OrdereddstMem- See Destination Memory.- Returns:
- this sketch as a CompactSketch in the chosen form
-
getCompactBytes
public abstract int getCompactBytes()Returns the number of storage bytes required for this Sketch if its current state were compacted. It this sketch is already in the compact form this is equivalent to callinggetCurrentBytes().- Returns:
- number of compact bytes
-
getCountLessThanThetaLong
public int getCountLessThanThetaLong(long thetaLong)Gets the number of hash values less than the given theta expressed as a long.- Parameters:
thetaLong- the given theta as a long between zero and Long.MAX_VALUE.- Returns:
- the number of hash values less than the given thetaLong.
-
getCurrentBytes
public abstract int getCurrentBytes()Returns the number of storage bytes required for this sketch in its current state.- Returns:
- the number of storage bytes required for this sketch
-
getEstimate
public abstract double getEstimate()Gets the unique count estimate.- Returns:
- the sketch's best estimate of the cardinality of the input stream.
-
getFamily
Returns the Family that this sketch belongs to- Returns:
- the Family that this sketch belongs to
-
getLowerBound
public double getLowerBound(int numStdDev)Gets the approximate lower error bound given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.- Parameters:
numStdDev- See Number of Standard Deviations- Returns:
- the lower bound.
-
getMaxCompactSketchBytes
public static int getMaxCompactSketchBytes(int numberOfEntries)Returns the maximum number of storage bytes required for a CompactSketch with the given number of actual entries. Note that this assumes the worse case of the sketch in estimation mode, which requires storing theta and count.- Parameters:
numberOfEntries- the actual number of entries stored with the CompactSketch.- Returns:
- the maximum number of storage bytes required for a CompactSketch with the given number of entries.
-
getMaxUpdateSketchBytes
public static int getMaxUpdateSketchBytes(int nomEntries)Returns the maximum number of storage bytes required for an UpdateSketch with the given number of nominal entries (power of 2).- Parameters:
nomEntries- Nominal Entres This will become the ceiling power of 2 if it is not.- Returns:
- the maximum number of storage bytes required for a UpdateSketch with the given nomEntries
-
getRetainedEntries
public int getRetainedEntries()Returns the number of valid entries that have been retained by the sketch.- Returns:
- the number of valid retained entries
-
getRetainedEntries
public abstract int getRetainedEntries(boolean valid)Returns the number of entries that have been retained by the sketch.- Parameters:
valid- if true, returns the number of valid entries, which are less than theta and used for estimation. Otherwise, return the number of all entries, valid or not, that are currently in the internal sketch cache.- Returns:
- the number of retained entries
-
getSerializationVersion
public static int getSerializationVersion(org.apache.datasketches.memory.Memory mem)Returns the serialization version from the given Memory- Parameters:
mem- the sketch Memory- Returns:
- the serialization version from the Memory
-
getTheta
public double getTheta()Gets the value of theta as a double with a value between zero and one- Returns:
- the value of theta as a double
-
getThetaLong
public abstract long getThetaLong()Gets the value of theta as a long- Returns:
- the value of theta as a long
-
getUpperBound
public double getUpperBound(int numStdDev)Gets the approximate upper error bound given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.- Parameters:
numStdDev- See Number of Standard Deviations- Returns:
- the upper bound.
-
hasMemory
public abstract boolean hasMemory()Returns true if this sketch's data structure is backed by Memory or WritableMemory.- Returns:
- true if this sketch's data structure is backed by Memory or WritableMemory.
-
isCompact
public abstract boolean isCompact()Returns true if this sketch is in compact form.- Returns:
- true if this sketch is in compact form.
-
isDirect
public abstract boolean isDirect()Returns true if the this sketch's internal data structure is backed by direct (off-heap) Memory.- Returns:
- true if the this sketch's internal data structure is backed by direct (off-heap) Memory.
-
isEmpty
public abstract boolean isEmpty()- Returns:
- true if empty.
-
isEstimationMode
public boolean isEstimationMode()Returns true if the sketch is Estimation Mode (as opposed to Exact Mode). This is true if theta < 1.0 AND isEmpty() is false.- Returns:
- true if the sketch is in estimation mode.
-
isOrdered
public abstract boolean isOrdered()Returns true if internal cache is ordered- Returns:
- true if internal cache is ordered
-
isSameResource
public boolean isSameResource(org.apache.datasketches.memory.Memory that)Returns true if the backing resource of this is identical with the backing resource of that. The capacities must be the same. If this is a region, the region offset must also be the same.- Parameters:
that- A different non-null object- Returns:
- true if the backing resource of this is the same as the backing resource of that.
-
iterator
Returns a HashIterator that can be used to iterate over the retained hash values of the Theta sketch.- Returns:
- a HashIterator that can be used to iterate over the retained hash values of the Theta sketch.
-
toByteArray
public abstract byte[] toByteArray()Serialize this sketch to a byte array form.- Returns:
- byte array of this sketch
-
toString
Returns a human readable summary of the sketch. This method is equivalent to the parameterized call:
Sketch.toString(sketch, true, false, 8, true); -
toString
Gets a human readable listing of contents and summary of the given sketch. This can be a very long string. If this sketch is in a "dirty" state there may be values in the dataDetail view that are ≥ theta.- Parameters:
sketchSummary- If true the sketch summary will be output at the end.dataDetail- If true, includes all valid hash values in the sketch.width- The number of columns of hash values. Default is 8.hexMode- If true, hashes will be output in hex.- Returns:
- The result string, which can be very long.
-
toString
Returns a human readable string of the preamble of a byte array image of a Theta Sketch.- Parameters:
byteArr- the given byte array- Returns:
- a human readable string of the preamble of a byte array image of a Theta Sketch.
-
toString
Returns a human readable string of the preamble of a Memory image of a Theta Sketch.- Parameters:
mem- the given Memory object- Returns:
- a human readable string of the preamble of a Memory image of a Theta Sketch.
-