E
 the type of objects being clusteredpublic abstract class Dendrogram<E> extends Object implements Scored
Dendrogram
represents the result of a weighted
hierarchical clustering of a set of objects. A dendrogram's cost
is interpreted as the cost of forming the cluster of its members;
this may be a join cost for a bottomup (agglomerative) clusterer
or a split cost for a topdown (divisive) clusterer. The minimum
cost is 0.0, and each dendrogram's subdendrogram's must have a
lower cost than it.
There are two subclasses of dendrogram, leafs and links. A
LeafDendrogram
is a single element cluster with a cost of
0.0. A LinkDendrogram
is constructed by merging two
subdendrograms with a given cost. The resulting link dendrogram then
becomes the parent of the subdendrograms.
Dendrograms that are linked point to their parent dendrogram
through the method parent()
. The parent is always a link
dendrogram. The method dereference()
returns the result
of following the chain of parent links to the final, toplevel
dendrogram. All of the members of a given dendrogram will dereference
to the same parent.
Dendrograms return their set of elements through memberSet()
. They may also be partitioned into clusterings in
one of two ways. These clusterings are sets of sets that cover the
members of the dendrogram. The method partitionDistance(double)
returns all toplevel dendrograms
where all links are within the specified score. The method partitionK(int)
returns a partition of the specified size. Both
operate by breaking links in the dendrogram. The thresholded
method breaks all links whose score are above a threshold. The
sizebased method breaks links in order of decreasing distance
until the specified number of partitions is generated.
The method copheneticCorrelation(Distance)
may be used
to evaluate the distances between objects implied by the dendrogram
and the distances in a specified distance function. The distances
implied by the dendrogram are socalled cophenetic distances. The
cophenetic distance between a pair of elements in the member set of
a dendrogram is the distance of the subdendrogram linking them.
Given a distance measure and number of clusters, the method
withinClusterScatter(int,Distance)
evaluates the withincluster
scatter, as defined in the class documentation for ClusterScore
.
Modifier and Type  Method and Description 

boolean 
contains(E elt)
Returns
true if this dendrogram contains the
specified object as a leaf. 
double 
copheneticCorrelation(Distance<? super E> distance)
Returns the Pearson correlation coefficient between the
interobject distances and the cophenetic distances defined by
this dendrogram.

Dendrogram<E> 
dereference()
Returns the toplevel cluster of which this dendrogram
is a member.

abstract Set<E> 
memberSet()
Returns the members of this dendrogram.

LinkDendrogram<E> 
parent()
Returns the dendrogram that immediately contains
this dendrogram or
null if this is a toplevel
dendrogram. 
Set<Set<E>> 
partitionDistance(double maxDistance)
Returns the partition produced by cutting this dendrogram at
the specified maximum distance.

Set<Set<E>> 
partitionK(int numClusters)
Return the partition of this dendrogram with the specified
number of clusters.

String 
prettyPrint()
Returns a prettyprinted version of this dendrogram with
entries on a single line and indenting with scores to show the
merges.

abstract double 
score()
Returns the proximity between the components of this
dendrogram.

int 
size()
Returns the size of this dendrogram measured by the number
of objects.

static <E> boolean 
structurallyEquivalent(Dendrogram<E> dendrogram1,
Dendrogram<E> dendrogram2)
Return
true if the specified dendrograms are
structurally equivalent. 
String 
toString()
Returns a stringbased representation of this dendrogram.

double 
withinClusterScatter(int numClusters,
Distance<? super E> distance)
Returns the withincluster scatter relative to the specified
distance of the clustering determined by splitting this
dendrogram into the specified number of clusters.

public LinkDendrogram<E> parent()
null
if this is a toplevel
dendrogram.null
if this is a root dendrogram.public Dendrogram<E> dereference()
null
parent
dereferences to itself.public int size()
public boolean contains(E elt)
true
if this dendrogram contains the
specified object as a leaf.elt
 Object to test.true
if this dendrogram contains the
specified object as a leaf.public Set<Set<E>> partitionK(int numClusters)
numClusters
 Number of clusters in returned partition.IllegalArgumentException
 If the specified number of clusters is
less than one or larger than the number of elements.public Set<Set<E>> partitionDistance(double maxDistance)
This operation may be understood in terms of the isomorphism between equivalence relations and partitions. An equivalence relation is a transitive, symmetric, reflexive relation between objects. An equivalence relation determines a partition with clusters corresponding to sets of mutually equivalent objects. For this method, objects are equivalent if the distance between them is less than or equal to the specified maximum distance in the dendrogram.
maxDistance
 Maximum dendrogram distance at which
elements are considered to be part of the same cluster.public String prettyPrint()
public String toString()
public abstract Set<E> memberSet()
public abstract double score()
public double withinClusterScatter(int numClusters, Distance<? super E> distance)
ClusterScore.withinClusterScatter(partitionK(numClusters),distance)
.
See
ClusterScore.withinClusterScatter(Set,Distance)
for more information on the scatter evaluation metric.
IllegalArgumentException
 If the number of clusters is
not between 1 and the size of this dendrogram inclusive.public double copheneticCorrelation(Distance<? super E> distance)
The cophenetic distance between a pair of elements is defined to be the score of the dendrogram which joins the two elements together; that is, they are in distinct subdendrograms of the dendrogram.
A cophenetic distance forms what is known as an ultrametric. It forms a usual metric distance satisfying the following requirements:
In addition, an ultrametric satisfies the following:d(x,y) >= 0 d(x,x) = 0 d(x,y) = d(y,x) d(x,y) + d(y,z) >= d(x,z)
d(x,z) << max(d(x,y), d(y,z))
Geometrically, this means that for all triples x,y,z
at least two of d(x,y),d(x,z),d(y,z)
are
the same.
Given the cophenetic distance ultrametric from a dendrogram, both singlelink and completelink clusterers will produce the same dendrogram when clustering based on the cophenetic distance.
distance
 Underlying distance measure.public static <E> boolean structurallyEquivalent(Dendrogram<E> dendrogram1, Dendrogram<E> dendrogram2)
true
if the specified dendrograms are
structurally equivalent. Two dendrograms are structurally
equivalent if and only if they have the same score, and they
are either both leaf dendrograms with the same object, or they
are both link dendrograms with equivalent daughters (in either
order).
Note that this is not the same relation as Object.equals(Object)
, which is defined by reference, not
by structural equivalence. Nor is it the same relation as
defined by the score comparators.
dendrogram1
 First dendrogram to compare.dendrogram2
 Second dendrogram to compare.true
if the two dendrograms are equivalent.