mad¶
- madMedian absolute deviation test, either on raw values
or on 1st or 2nd derivatives.
This module was written by Matthias Cuntz while at Department of Computational Hydrosystems, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany, and continued while at Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Nancy, France.
Copyright (c) 2011-2020 Matthias Cuntz - mc (at) macu (dot) de Released under the MIT License; see LICENSE file for details.
Written Nov 2011 by Matthias Cuntz - mc (at) macu (dot) de
ND-array, act on axis=0, May 2012, Matthias Cuntz
Removed bug in broadcasting, Jun 2012, Matthias Cuntz
Better usage of numpy possibilities, e.g. using np.diff, Jun 2012, Matthias Cuntz
Ported to Python 3, Feb 2013, Matthias Cuntz
Use bottleneck for medians, otherwise loop over axis=1, Jul 2013, Matthias Cuntz and Juliane Mai
Re-allow masked arrays and arrays with NaNs, Jul 2013, Matthias Cuntz
Removed bug in NaN treatment, Oct 2013, Matthias Cuntz
Keyword nonzero, Oct 2013, Matthias Cuntz
Using numpy docstring format, May 2020, Matthias Cuntz
Code refactoring, Sep 2021, Matthias Cuntz
The following functions are provided
|
Median absolute deviation test, either on raw values, or on 1st or 2nd derivatives. |
- mad(datin, z=7, deriv=0, nozero=False)[source]¶
Median absolute deviation test, either on raw values, or on 1st or 2nd derivatives.
Returns mask with False everywhere except where
<(median-MAD*z/0.6745)
or>(md+MAD*z/0.6745)
.- Parameters
datin (array or masked array) – mad acts on
axis=0
.z (float, optional) – Input is allowed to deviate maximum
z
standard deviations from the median (default: 7)deriv (int, optional) –
0: Act on raw input (default).
1: Use first derivatives.
2: Use 2nd derivatives.
nozero (bool, optional) – True: exclude zeros (0.) from input
datin
.
- Returns
False everywhere except where input deviates more than
z
standard deviations from median- Return type
array of bool
Notes
If input is an array then mad is checked along the zeroth axis for outlier.
1st derivative is calculated as
d = datin[1:n]-datin[0:n-1]
because mean of left and right would give 0 for spikes.If
all(d.mask==True)
then returnd.mask
, which is all True.Examples
>>> import numpy as np >>> y = np.array([-0.25,0.68,0.94,1.15,2.26,2.35,2.37,2.40,2.47,2.54,2.62, ... 2.64,2.90,2.92,2.92,2.93,3.21,3.26,3.30,3.59,3.68,4.30, ... 4.64,5.34,5.42,8.01],dtype=float)
>>> # Normal MAD >>> print(mad(y)) [False False False False False False False False False False False False False False False False False False False False False False False False False False]
>>> print(mad(y,z=4)) [False False False False False False False False False False False False False False False False False False False False False False False False False True]
>>> print(mad(y,z=3)) [ True False False False False False False False False False False False False False False False False False False False False False False False True True]
>>> # MAD on 2nd derivatives >>> print(mad(y,z=4,deriv=2)) [False False False False False False False False False False False False False False False False False False False False False False False True]
>>> # direct usage >>> my = np.ma.array(y, mask=mad(y,z=4)) >>> print(my) [-0.25 0.68 0.94 1.15 2.26 2.35 2.37 2.4 2.47 2.54 2.62 2.64 2.9 2.92 2.92 2.93 3.21 3.26 3.3 3.59 3.68 4.3 4.64 5.34 5.42 --]
>>> # MAD on several dimensions >>> yy = np.transpose(np.array([y,y])) >>> print(np.transpose(mad(yy,z=4))) [[False False False False False False False False False False False False False False False False False False False False False False False False False True] [False False False False False False False False False False False False False False False False False False False False False False False False False True]]
>>> yyy = np.transpose(np.array([y,y,y])) >>> print(np.transpose(mad(yyy,z=3))) [[ True False False False False False False False False False False False False False False False False False False False False False False False True True] [ True False False False False False False False False False False False False False False False False False False False False False False False True True] [ True False False False False False False False False False False False False False False False False False False False False False False False True True]]
>>> # Masked arrays >>> my = np.ma.array(y, mask=np.zeros(y.shape)) >>> my.mask[-1] = True >>> print(mad(my,z=4)) [True False False False False False False False False False False False False False False False False False False False False False False False False --]
>>> print(mad(my,z=3)) [True False False False False False False False False False False False False False False False False False False False False False False True True --]
>>> # Arrays with NaNs >>> ny = y.copy() >>> ny[-1] = np.nan >>> print(mad(ny,z=4)) [ True False False False False False False False False False False False False False False False False False False False False False False False False False]
>>> print(mad(ny,z=3)) [ True False False False False False False False False False False False False False False False False False False False False False False True True False]
>>> # Exclude zeros >>> zy = y.copy() >>> zy[1] = 0. >>> print(mad(zy,z=3)) [ True True False False False False False False False False False False False False False False False False False False False False False False True True]
>>> print(mad(zy,z=3,nozero=True)) [ True False False False False False False False False False False False False False False False False False False False False False False False True True]