LingPy

lingpy.lexstat.LexStat.analyze

LexStat.analyze(threshold, score_mode='library', model='sca', merge_vowels=True, gop=-2, gep_scale=0.6, scale=(1.2, 1.0, 1.1), factor=0.3, restricted_chars='T', pairwise_threshold=0.7, runs=100, modes=('global', 'local'), ratio=(1, 1), mode='overlap')

Conduct automatic cognate judgments following the method of List2012b.

Parameters :

threshold : float

The threshold which is used for the flat cluster analysis.

score_mode : { ‘library’, ‘sca’, ‘turchin’, ‘edit-dist’, ‘edit-tokens’ }

Define the score_mode on which the calculation of pairwise distances is based. Select between:

  • ‘library’ – the distance scores are based on the language-specific scoring schemes as described in List2012b (this is the default),
  • ‘sca’ – the distance scores are based on the language-independent SCA distance (see List2012b),
  • ‘turchin’ – the distance scores are based on the approach described in Turchin2010,
  • ‘edit-dist”’ – the distance scores are based on the normalized edit distance (Levenshtein1966), and
  • ‘edit-tokens’ – the distance scores are based on the normalized edit distance, yet the scores are derived from the tokenized representation of the sequences and not from their raw, untokenized form.

model : string (default=”sca”)

A string indicating the name of the Model object that shall be used for the analysis. Currently, three models are supported:

  • “dolgo” – a sound-class model based on Dolgopolsky1986,
  • “sca” – an extension of the “dolgo” sound-class model based on List2012a, and
  • “asjp” – an independent sound-class model which is based on the sound-class model of Brown2008 and the empirical data of Brown2011.

merge_vowels : bool (default=True)

Indicate, whether neighboring vowels should be merged into diphtongs, or whether they should be kept separated during the analysis.

gop : int (default=-5)

The gap opening penalty (gop) on which the analysis shall be based.

gep_scale : float (default=0.6)

The factor by which the penalty for the extension of gaps (gap extension penalty, GEP) shall be decreased. This approach is essentially inspired by the extension of the basic alignment algorithm for affine gap penalties by Gotoh1982.

scale : tuple or list (default=(3,1,2))

The scaling factors for the modificaton of gap weights. The first value corresponds to sites of ascending sonority, the second value to sites of maximum sonority, and the third value corresponds to sites of decreasing sonority.

factor : float (default=0.3)

The factor by which the initial and the descending position shall be modified.

restricted_chars : string (default=”T”)

Define which characters of the prosodic string of a sequence reflect its secondary structure (cf. List2012a) and should therefore be aligned specifically. This defaults to “T”, since this is the character that represents tones in the prosodic strings of sequences.

pairwise_threshold : float (default=0.7)

Only those sequence pairs whose distance is beyond this threshold will be considered when determining the distribution of attested segment pairs.

runs : int (default=100)

Define how many times the perturbation method shall be carried out in order to retrieve the expected distribution of segment pairs.

modes : tuple or list (default = (“global”,”local”))

Define the alignment modes of the pairwise analyses which are carried out in order to create the language-specific scoring scheme.

ratio : tuple (default=(1,1))

Define the ratio by which the traditional scoring scheme and the correspondence-based scoring scheme contribute to the actual library-based scoring scheme.

mode : string (default = “overlap”)

Define the alignment mode which is used in order to calculate pairwise distance scores from the language-specific scoring schemes.