Package org.apache.datasketches.frequencies

This package is dedicated to streaming algorithms that enable estimation of the frequency of occurence of items in a weighted multiset stream of items. If the frequency distribution of items is sufficiently skewed, these algorithms are very useful in identifying the "Heavy Hitters" that occured most frequently in the stream. The accuracy of the estimation of the frequency of an item has well understood error bounds that can be returned by the sketch.

These sketches are mergable and can be serialized and deserialized to/from a compact form.

Author:
Lee Rhodes
  • Class Summary 
    Class Description
    ItemsSketch<T>
    This sketch is useful for tracking approximate frequencies of items of type <T> with optional associated counts (<T> item, long count) that are members of a multiset of such items.
    ItemsSketch.Row<T>
    Row class that defines the return values from a getFrequentItems query.
    LongsSketch
    This sketch is useful for tracking approximate frequencies of long items with optional associated counts (long item, long count) that are members of a multiset of such items.
    LongsSketch.Row
    Row class that defines the return values from a getFrequentItems query.
  • Enum Summary 
    Enum Description
    ErrorType
    Specifies one of two types of error regions of the statistical classification Confusion Matrix that can be excluded from a returned sample of Frequent Items.