Class TopHeavyHitters<T>


  • public final class TopHeavyHitters<T>
    extends java.lang.Object
    INTERNAL API

    Mutable, open-addressing with linear-probing (though naive one which in theory could get pathological) heavily optimised "top N heavy hitters" data-structure.

    Keeps a number of specific heavy hitters around in memory.

    See also Section 5.2 of http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf for a discussion about the assumptions made and guarantees about the Heavy Hitters made in this model. We assume the Cash Register model in which there are only additions, which simplifies HH detection significantly.

    This class is a hybrid data structure containing a hashmap and a heap pointing to slots in the hashmap. The capacity of the hashmap is twice that of the heap to reduce clumping of entries on collisions.

    • Constructor Summary

      Constructors 
      Constructor Description
      TopHeavyHitters​(int max, scala.reflect.ClassTag<T> classTag)  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int capacity()  
      scala.collection.Iterator<T> iterator()
      Iterates over the current heavy hitters, order is not of significance.
      long lowestHitterWeight()
      Weight of lowest heavy hitter, if a new inserted item has a weight greater than this, it is a heavy hitter.
      int mask()  
      int max()  
      java.lang.String toDebugString()  
      java.lang.String toString()  
      boolean update​(T item, long count)
      Attempt adding item to heavy hitters set, if it does not fit in the top yet, it will be dropped and the method will return false.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • TopHeavyHitters

        public TopHeavyHitters​(int max,
                               scala.reflect.ClassTag<T> classTag)
    • Method Detail

      • max

        public int max()
      • capacity

        public int capacity()
      • mask

        public int mask()
      • iterator

        public scala.collection.Iterator<T> iterator()
        Iterates over the current heavy hitters, order is not of significance. Not thread safe, accesses internal heap directly (to avoid copying of data). Access must be synchronised externally.
      • toDebugString

        public java.lang.String toDebugString()
      • update

        public boolean update​(T item,
                              long count)
        Attempt adding item to heavy hitters set, if it does not fit in the top yet, it will be dropped and the method will return false.

        WARNING: It is only allowed to *increase* the weight of existing elements, decreasing is disallowed.

        Returns:
        true if the added item has become a heavy hitter.
      • lowestHitterWeight

        public long lowestHitterWeight()
        Weight of lowest heavy hitter, if a new inserted item has a weight greater than this, it is a heavy hitter. This gets the index of the lowest heavy hitter from the top of the heap (lowestHitterIndex) if there is any and looks up its weight. If there is no entry in the table (lowestHitterIndex returns a negative index) this returns a zero weight.
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object