datasketches-java 3.0.0 API
Sketching Core Library
Overview
The Sketching Core Library provides a range of stochastic streaming algorithms and closely related java technologies that are particularly useful when integrating this technology into systems that must deal with massive data. Click on the package links below for the package introduction and APIs.
This library is divided into packages that constitute distinct groups of functionality:
- CPC - Compressed Probabilistic Counting
- FDT - Frequent Distinct Tuples
- Frequencies Frequent Items-
- Hash - Common Hash Functions
- HLL - HyperLogLog
- HLLMap - HLL Map
- KLL - High Performance Karnin, Lang, Liberty Quantiles
- Quantiles - Quantiles
- Sampling - Weighted and Unweighted Reservoirs
- Theta - The Theta Family
- Tuple - The Base Tuple Family
- Tuple/adouble - Example Implementation with a single double
- Tuple/aninteger - Example Implementation with a single integer
- Tuple/strings - Example Implementation with an array of Strings
| Package | Description |
|---|---|
| org.apache.datasketches |
This package is the parent package for all sketch algorithms.
|
| org.apache.datasketches.cpc |
Compressed Probabilistic Counting
|
| org.apache.datasketches.fdt | |
| org.apache.datasketches.frequencies |
This package is dedicated to streaming algorithms that enable estimation of the
frequency of occurence of items in a weighted multiset stream of items.
|
| org.apache.datasketches.hash |
The hash package contains a high-performing and extended Java implementation
of Austin Appleby's 128-bit MurmurHash3 hash function originally coded in C.
|
| org.apache.datasketches.hll |
The hll package contains a high performance implementation of Phillipe
Flajolet's HLL sketch with significantly improved error behavior.
|
| org.apache.datasketches.hllmap |
The hllmap package contains a space efficient HLL mapping sketch of keys to approximate unique
count of identifiers.
|
| org.apache.datasketches.kll | |
| org.apache.datasketches.quantiles |
The quantiles package contains stochastic streaming algorithms that enable single-pass
analysis of the distribution of a stream of real (double) values or generic items.
|
| org.apache.datasketches.req | |
| org.apache.datasketches.sampling |
This package is dedicated to streaming algorithms that enable fixed size, uniform sampling of
unweighted items from a stream.
|
| org.apache.datasketches.theta |
The theta package contains all the sketch classes that are members of the
Theta Sketch Framework.
|
| org.apache.datasketches.tuple |
The tuple package contains implementation of sketches based on the idea of
theta sketches with the addition of values associated with unique keys.
|
| org.apache.datasketches.tuple.adouble | |
| org.apache.datasketches.tuple.aninteger | |
| org.apache.datasketches.tuple.arrayofdoubles | |
| org.apache.datasketches.tuple.strings |