E
 the type of objects being clusteredpublic class CompleteLinkClusterer<E> extends AbstractHierarchicalClusterer<E>
CompleteLinkClusterer
implements complete link
agglomerative clustering. Complete link clustering is a greedy
algorithm in which the two closest clusters are always merged up to
a specified distance threshold. Distance between clusters for
complete link clustering is defined be the maximum of the distances
between the members of the clusters. See SingleLinkClusterer
for a clusterer that takes the minimum rather
than the maximum in making clustering decisions.
For example, consider the following distance matrix (which is
also analyzed by way of example in SingleLinkClusterer
):
The result of completelink clustering is the following dendrogram:
A B C D E A 0 1 2 7 5 B 0 3 8 6 C 0 5 9 D 0 4 E 0
A B C D E      1     2     3    4   5   6   7   8   9 
First, the objects A
and B
are merged at
distance one. At distance 3, the object C
is merged
with the cluster {A,B}
because
max(distance(C,A),distance(C,B))=3
, and that is
smaller than any other pair of distances. Next, D
and E
are merged at distance 4, because that is the
distance between them, and they are both further than 4 away from
the cluster {A,B,C}
. This continues with one more
step, which happens at distance 9, the distance between C
and E
, which is the maximum distance between pairs drawn from
{A,B,C}
and {D,E}
.
The various clusters at each distance bound threshold are:
The intervals show the clusters returned for thresholds within the specified interval. As usual, square brackets denote inclusive range bounds and round brackets exclusive bounds. Although this example has the same singlelink clustering, see the class documentation in
Threshold Range Clusters [Double.NEGATIVE_INFINITY,1) {A}, {B}, {C}, {D}, {E} [1,3) {A,B}, {C}, {D}, {E} [3,4) {A,B,C}, {D}, {E} [4,9) {A,B,C}, {D,E} [9,Double.POSITIVE_INFINITY] {A,B,C,D,E}
SingleLinkClusterer
for an example that does not.
Note that results may not be welldefined in the case of ties between proximities. If there are ties, there is no guarantee as to how this implementation will break the tie.
Implementation Note: This algorithm requires worstcase
O(n^{3})
running time to cluster
n
elements. The initialization loop considers all pairs
of elements, thus requiring n^{2}
running time. The main loop then walks over these pairs, requiring
up to n
steps each to compute new distances. If the
initial array of pairs is heavily pruned, the outer loop is smaller
and the updates are actually smaller, too.
Constructor and Description 

CompleteLinkClusterer(Distance<? super E> distance)
Construct a complete link clusterer with no distance bound.

CompleteLinkClusterer(double maxDistance,
Distance<? super E> distance)
Construct a complete link clusterer with the specified distance
bound.

Modifier and Type  Method and Description 

Dendrogram<E> 
hierarchicalCluster(Set<? extends E> elementSet)
Returns the array of clusters derived from performing
clustering with this class's specified maximum distance.

cluster, distance, getMaxDistance, setMaxDistance
public CompleteLinkClusterer(double maxDistance, Distance<? super E> distance)
maxDistance
 Maximum distance between pairs of clusters
to allow clustering.distance
 Distance measure to use among instances.public CompleteLinkClusterer(Distance<? super E> distance)
Double.POSITIVE_INFINITY
as the
bound value, which effectively makes clustering unbounded.public Dendrogram<E> hierarchicalCluster(Set<? extends E> elementSet)
AbstractHierarchicalClusterer
Double.POSITIVE_INFINITY
should result in a complete
clustering.hierarchicalCluster
in interface HierarchicalClusterer<E>
hierarchicalCluster
in class AbstractHierarchicalClusterer<E>
elementSet
 Set of objects to cluster.