Patternite Logo

Hierarchical clustering in Python with SciPy


Using SciPy, we can perform hierarchical clustering on our dataset, and efficiently traverse the resulting dendrogram to generate clusters at different levels


SciPy provides a hierarchical clustering implementation that makes clustering data relatively straightforward.


linkage produces a linkage matrix that defines the dendrogram. The method parameter determines how clusters at each level of the hierarchy are linked, and the metric parameter determines which distance measure is used. These parameters should be tuned to your problem. More information on the linkage function and valid parameters can be found here.


Once we have our linkage matrix, we can extract clusters from it using fcluster as in line #22. Here we must specify a threshold t, at which to cut the dendrogram. fcluster returns an array of cluster assignments where each index corresponds to the row index of the data array, and the value indicates the cluster assignment. More information on fcluster can be found here.


If we already know the threshold value t ahead of time and don't require a dendrogram plot, we can use the fclusterdata function to create the linkage matrix and extract the clusters in a single function call, as in line #34. More information on fclusterdata can be found here.

Profile picture for duncster

| edited

Patternite © 2021

Patternite Logo