LingPy

lingpy.compare.Multiple.iterate_orphans

Multiple.iterate_orphans(check='final', mode='global', gop=-3, gep_scale=0.5, scale=(1, 1, 1), factor=0, gap_weight=1, restricted_chars='T')

Iterate over the most divergent sequences in the sample.

Parameters :

check : string (default=”final”)

Specify when to check for improved sum-of-pairs scores: After each iteration (“immediate”) or after all iterations have been carried out (“final”).

mode : { ‘global’, ‘overlap’, ‘dialign’ }

A string indicating which kind of alignment analysis should be carried out during the progressive phase. Select between:

  • ‘global’ – traditional global alignment analysis based on the Needleman-Wunsch algorithm Needleman1970,
  • ‘dialign’ – global alignment analysis which seeks to maximize local similarities Morgenstern1996.
  • ‘overlap’ – semi-global alignment, where gaps introduced in the beginning and the end of a sequence do not score.

gop : int (default=-5)

The gap opening penalty (GOP) used in the analysis.

gep_scale : float (default=0.6)

The factor by which the penalty for the extension of gaps (gap extension penalty, GEP) shall be decreased. This approach is essentially inspired by the exension of the basic alignment algorithm for affine gap penalties [Goto81].

scale : tuple or list (default=(3,1,2))

The scaling factors for the modificaton of gap weights. The first value corresponds to sites of ascending sonority, the second value to sites of maximum sonority, and the third value corresponds to sites of decreasing sonority.

factor : float (default=0.3)

The factor by which the initial and the descending position shall be modified.

gap_weight : float (default=0)

The factor by which gaps in aligned columns contribute to the calculation of the column score. When set to 0, gaps will be ignored in the calculation. When set to 0.5, gaps will count half as much as other characters.

Notes

The most divergent sequences are those whose average distance to all other sequences is above the average distance of all sequence pairs.