LingPy

This documentation is for version 2.0.dev, which is not released yet.

lingpy.align.multiple.Multiple.iterate_clusters

Multiple.iterate_clusters(threshold, check='final', mode='global', gop=-3, scale=0.5, factor=0, gap_weight=1, restricted_chars='T_')

Iterative refinement based on a flat cluster analysis of the data.

This method uses the lingpy.algorithm.cluster.flat_upgma() function in order to retrieve a flat cluster of the data.

Parameters :

threshold : float

The threshold for the flat cluster analysis.

check : string (default=”final”)

Specify when to check for improved sum-of-pairs scores: After each iteration (“immediate”) or after all iterations have been carried out (“final”).

mode : { ‘global’, ‘overlap’, ‘dialign’ }

A string indicating which kind of alignment analysis should be carried out during the progressive phase. Select between:

  • ‘global’ – traditional global alignment analysis based on the Needleman-Wunsch algorithm Needleman1970,
  • ‘dialign’ – global alignment analysis which seeks to maximize local similarities Morgenstern1996.
  • ‘overlap’ – semi-global alignment, where gaps introduced in the beginning and the end of a sequence do not score.

gop : int (default=-5)

The gap opening penalty (GOP) used in the analysis.

gep_scale : float (default=0.6)

The factor by which the penalty for the extension of gaps (gap extension penalty, GEP) shall be decreased. This approach is essentially inspired by the exension of the basic alignment algorithm for affine gap penalties [Goto81].

factor : float (default=0.3)

The factor by which the initial and the descending position shall be modified.

gap_weight : float (default=0)

The factor by which gaps in aligned columns contribute to the calculation of the column score. When set to 0, gaps will be ignored in the calculation. When set to 0.5, gaps will count half as much as other characters.

This Page