SciPy provides a hierarchical clustering implementation that makes clustering data relatively straightforward.
linkage produces a linkage matrix that defines the dendrogram. The
method parameter determines how clusters at each level of the hierarchy are linked, and the
metric parameter determines which distance measure is used. These parameters should be tuned to your problem. More information on the
linkage function and valid parameters can be found here.
Once we have our linkage matrix, we can extract clusters from it using
fcluster as in line #22. Here we must specify a threshold
t, at which to cut the dendrogram.
fcluster returns an array of cluster assignments where each index corresponds to the row index of the
data array, and the value indicates the cluster assignment. More information on
fcluster can be found here.
If we already know the threshold value
t ahead of time and don't require a dendrogram plot, we can use the
fclusterdata function to create the linkage matrix and extract the clusters in a single function call, as in line #34. More information on
fclusterdata can be found here.