LingPy

lingpy.algorithm.cluster.flat_upgma

lingpy.algorithm.cluster.flat_upgma(matrix, threshold, taxa=None)

Carry out a flat cluster analysis based on the UPGMA algorithm (Sokal1958).

Parameters :

matrix : list or numpy.array

A two-dimensional list containing the distances.

threshold : float

The threshold which terminates the algorithm.

taxa : list

A list containing the names of the taxa. If set to None, the indices of the taxa will be returned instead of their names.

Returns :

clusters : dict

A dictionary with cluster-IDs as keys and a list of the taxa corresponding to the respective ID as values.

See also

lingpy.algorithm.clusters.upgma, lingpy.algorithm.clusters.neighbor

Examples

The function is automatically imported along with LingPy.

>>> from lingpy import *

Create a list of arbitrary taxa.

>>> taxa = ['German','Swedish','Icelandic','English','Dutch']

Create an arbitrary distance matrix.

>>> matrix = squareform([0.5,0.67,0.8,0.2,0.4,0.7,0.6,0.8,0.8,0.3])
>>> matrix
array([[ 0.  ,  0.5 ,  0.67,  0.8 ,  0.2 ],
       [ 0.5 ,  0.  ,  0.4 ,  0.7 ,  0.6 ],
       [ 0.67,  0.4 ,  0.  ,  0.8 ,  0.8 ],
       [ 0.8 ,  0.7 ,  0.8 ,  0.  ,  0.3 ],
       [ 0.2 ,  0.6 ,  0.8 ,  0.3 ,  0.  ]])

Carry out the flat cluster analysis.

>>> flat_upgma(clusters,matrix,0.5)
{0: ['German', 'Dutch', 'English'], 1: ['Swedish', 'Icelandic']}