LingPy

This documentation is for version 2.0.dev, which is not released yet.

lingpy.compare.lexstat.LexStat.cluster

LexStat.cluster(method='sca', cluster_method='upgma', threshold=0.55, scale=0.5, factor=0.3, restricted_chars='_T', mode='overlap', verbose=False, gop=-2, **keywords)

Function for flat clustering of words into cognate sets.

Parameters :

method : {‘sca’,’lexstat’,’edit-dist’,’turchin’} (default=’sca’)

Select the method that shall be used for the calculation.

cluster_method : {‘upgma’,’single’,’complete’} (default=’upgma’)

Select the cluster method. ‘upgma’ (Sokal1958 refers to average linkage clustering.

threshold : float (default=0.6)

Select the threshold for the cluster approach. If set to c{False}, an automatic threshold will be calculated by calculating the average distance of unrelated sequences (use with care).

scale : float (default=0.5)

Select the scale for the gap extension penalty.

factor : float (default=0.3)

Select the factor for extra scores for identical prosodic segments.

restricted_chars : str (default=”T_”)

Select the restricted chars (boundary markers) in the prosodic strings in order to enable secondary alignment.

mode : {‘global’,’local’,’overlap’,’dialign’} (default=’overlap’)

Select the mode for the alignment analysis.

verbose : bool (default=False)

Define whether verbose output should be used or not.

gop : int (default=-2)

If ‘sca’ is selected as a method, define the gap opening penalty.

This Page