public class TrieIntSeqCounter extends Object implements IntSeqCounter
TrieIntSeqCounter
implements an integer sequence
counter with a trie structure of counts.
Implementation Note: This trie-based integer sequence
counter is not as tight in memory as the character tries, but is
much more efficient for nodes with many daughters. It unfolds
1-daughter and 2-daughter nodes, and beyond that uses
balanced binary trees (via java.util.TreeMap
)
Constructor and Description |
---|
TrieIntSeqCounter(int maxLength)
Construct an integer sequence counter for subsequences
up to the specified maximum length.
|
Modifier and Type | Method and Description |
---|---|
int |
count(int[] is,
int start,
int end)
Returns the count of the specified sequence of integers.
|
long |
extensionCount(int[] is,
int start,
int end)
Returns the sum of the count of all sequences that extend
the specified sequence by one integer.
|
void |
handleNGrams(int nGram,
int minCount,
ObjectHandler<int[]> handler)
Supplies each n-gram of the specified length and with greater
than or equal to the specified minimum count to the specified
handler.
|
void |
incrementSequence(int[] is,
int start,
int end,
int count)
Increments the count for the specified slice by the specified
amount.
|
void |
incrementSubsequences(int[] is,
int start,
int end)
Increments the count for all subsequences of the specified
integer sequence up to the specified maximum length.
|
void |
incrementSubsequences(int[] is,
int start,
int end,
int count)
Increments the count for all subsequences of the specified
integer sequence up to the specified maximum length with the
specified count.
|
int[] |
integersFollowing(int[] is,
int start,
int end)
Returns an array of the integers that follow the specified
integer array slice.
|
int |
maxLength()
Returns the maximum length of subsequence of integers being
counted.
|
ObjectToCounterMap<int[]> |
nGramCounts(int nGram,
int minCount)
Returns a histogram of counts for n-grams of integers of
the specified length, with a count of at least the specified
minimum count.
|
int |
numExtensions(int[] is,
int start,
int end)
Returns the number of one integer extensions of the specified
with non-zero counts.
|
int[] |
observedIntegers()
Returns an array of all integers that have non-zero counts in
the model.
|
void |
prune(int minCount)
Removes all counts for sequences that are less than the minimum
count.
|
void |
rescale(double countMultiplier)
Rescales all counts by multiplying them by the specified
factor.
|
String |
toString()
Return a string-based representation of this integer sequence
counter.
|
int |
trieSize()
Returns the size of this graph, measured in number of nodes
in the trie structure.
|
public TrieIntSeqCounter(int maxLength)
maxLength
- Maximum length of subsequences counted.IllegalArgumentException
- If the maximum length is
less than zero.public void prune(int minCount)
minCount
- Minimum count to maintain a node.public void rescale(double countMultiplier)
int
after being multipled by the scaling factor:
count=(int)(count*countMultiplier)
Unlike pruning, scaling has a cumulative effect and is not
idempotent. For instance, a count of four scaled by half once
will be two, and scaled by half twice will be one. Because of
rounding, it's not even guaranteed that rescaling twice,
rescale(0.5); rescale(0.5);
, returns the same
result as rescaling with the combined factor,
rescale(0.25);
.
Also unlike pruning, scaling, because of the integer rounding, may change the ratios between surviving counts. For instance, under scaling by 0.5, both 3 and 2 rescale to 1.
countMultiplier
- Amount by which counts are scaled.public int maxLength()
public void incrementSubsequences(int[] is, int start, int end)
incrementSubsequences({1,3,17,8,122},1,4)
with a
maximum length of 2
increments the bigram sequence
counts {3,17}
, {17,8}
and the unigram
sequence counts {3}
, {17}
and
{8}
.is
- Underlying array of integers.start
- Index of first integer in the slice.end
- Index of one past the last integer in the slice.IndexOutOfBoundsException
- If the start and end minus one
indices do not fall within the range of the integer array.public void incrementSubsequences(int[] is, int start, int end, int count)
incrementSubsequences(is,start,end,n)
is
equivalent to calling
incrementSubsequences(is,start,end)
a total of
n
times.is
- Underlying array of integers.start
- Index of first integer in the slice.end
- Index of one past the last integer in the slice.count
- IndexOutOfBoundsException
- If the start and end minus one
indices do not fall within the range of the integer array.IllegalArgumentException
- If the count is less than zero.public void incrementSequence(int[] is, int start, int end, int count)
incrementSequence({1,2,3,4},1,3,15)
results in the
sequence 2,3
having its count incremented by 15.
If the sequence provided is longer than the maximum sequence
counted, only its final counts are used. For example, if the
maximum length is 3, then calling
incrementSequence({1,2,3,4,5},0,5,12)
is
equivalent to calling
incrementSequence({3,4,5},0,3,12)
.
is
- Underlying array of integers.start
- Index of first integer in the slice.end
- Index of one past the last integer in the slice.count
- IndexOutOfBoundsException
- If the start and end minus one
indices do not fall within the range of the integer array.IllegalArgumentException
- If the count is less than zero.public ObjectToCounterMap<int[]> nGramCounts(int nGram, int minCount)
nGram
- Length of n-gram whose histrogram is returned.minCount
- Minimum count of element in histogram.IllegalArgumentException
- If the n-gram length is less
than 1.public int trieSize()
public void handleNGrams(int nGram, int minCount, ObjectHandler<int[]> handler)
nGram
- Length of n-grams to visit.minCount
- Minimum count of visited n-gram.handler
- Handler for visited n-grams.public int count(int[] is, int start, int end)
IntSeqCounter
count
in interface IntSeqCounter
is
- Underlying array of integers.start
- Index of first integer in the slice.end
- Index of one past the last integer in the slice.public long extensionCount(int[] is, int start, int end)
IntSeqCounter
extensionCount
in interface IntSeqCounter
is
- Underlying array of integers.start
- Index of first integer in the slice.end
- Index of one past the last integer in the slice.public int numExtensions(int[] is, int start, int end)
IntSeqCounter
numExtensions
in interface IntSeqCounter
is
- Underlying array of integers.start
- Index of first integer in the slice.end
- Index of one past the last integer in the slice.public int[] observedIntegers()
IntSeqCounter
observedIntegers
in interface IntSeqCounter
public int[] integersFollowing(int[] is, int start, int end)
IntSeqCounter
integersFollowing
in interface IntSeqCounter
is
- Underlying array of integers.start
- Index of first integer in the slice.end
- Index of one past the last integer in the slice.