Package org.apache.datasketches.frequencies
This package is dedicated to streaming algorithms that enable estimation of the frequency of occurence of items in a weighted multiset stream of items. If the frequency distribution of items is sufficiently skewed, these algorithms are very useful in identifying the "Heavy Hitters" that occured most frequently in the stream. The accuracy of the estimation of the frequency of an item has well understood error bounds that can be returned by the sketch.
These sketches are mergable and can be serialized and deserialized to/from a compact form.
- Author:
- Lee Rhodes
-
Class Summary Class Description ItemsSketch<T> This sketch is useful for tracking approximate frequencies of items of type <T> with optional associated counts (<T> item, long count) that are members of a multiset of such items.ItemsSketch.Row<T> Row class that defines the return values from a getFrequentItems query.LongsSketch This sketch is useful for tracking approximate frequencies of long items with optional associated counts (long item, long count) that are members of a multiset of such items.LongsSketch.Row Row class that defines the return values from a getFrequentItems query. -
Enum Summary Enum Description ErrorType Specifies one of two types of error regions of the statistical classification Confusion Matrix that can be excluded from a returned sample of Frequent Items.