Checks for fitting signals

This file contains functions that check if two signals fit or not. They can be used to check a gluing or molecular fit regions.

fit_checks.check_correlation(first_signal, second_signal, threshold=None)[source]

Returns the correlation coefficient between the two signals.

The signals can be either 1D arrays or 2D arrays containing the rolling slices of the input signals. In the 2D case, the function returns the sliding correlation between the original signals.

If a threshold is provided, returns True if the correlation is above the specified threshold.

Parameters
first_signal: array

The first signal array

second_signal: array

The second signal array

threshold: float or None

Threshold for the correlation coefficient.

Returns
correlation: float or boolean

If threshold is None, then the function returns an the correlation coefficient. If a threshold is provided, the function returns True if the correlation value is above the threshold.

fit_checks.check_linear_fit_intercept_and_correlation(first_signal, second_signal)[source]

Check if the intercept of a linear fit is near zero, and the correlation coefficient of the two signals.

Performs a linear fit to the data, assuming y = ax + b, with x the first_signal and y the second_signal. It will return the value np.abs(b / np.mean(y) * 100)

If the intercept is far from zero, it indicates that the two signals do not differ from a multiplication constant.

Parameters
first_signalarray

The first signal array

second_signalarray

The second signal array

Returns
intercept_percentfloat or boolean

The value of the intercept b, relative to the mean value of the second_signal.

correlationfloat

Correlation coefficient between the two samples

fit_checks.check_min_max_ratio(first_signal, second_signal, threshold=None)[source]

Returns the ration between minimum and maximum values (i.e. min / max).

The operation is performed for both signals and the minimum is returned. The aim is to detect regions of large variation e.g. edges of clouds. Similar large values will be returned when the signals are near 0, so the relative difference is large. Consequently, this test should be used in parallel with checks e.g. about signal to noise ratio.

If a threshold is provided, returns True if the reltio is above the specified threshold.

Parameters
first_signal: array

The first signal array

second_signal: array

The second signal array

threshold: float or None

Threshold for the correlation coefficient.

Returns
minmax: float or boolean

If threshold is None, then the function returns the min/max ratio. If a threshold is provided, the function returns True if the correlation value is above the threshold.

fit_checks.check_residuals_not_gaussian(first_signal, second_signal, threshold=None)[source]

Check if the residuals of the linear fit are not from a normal distribution.

The function uses a Shapiro-Wilk test on the residuals of a linear fit. Specifically, the function performs a linear fit to the data, assuming y = ax, and then calculates the residuals r = y - ax. It will return the p value of the Shapiro-Wilk test on the residuals.

If a threshold is provided, returns True if the p value is below the specified threshold, i.e. if the residuals are probably not gaussian.

Parameters
first_signal: array

The first signal array

second_signal: array

The second signal array

threshold: float or None

Threshold for the Shapiro-Wilk p-value.

Returns
p_value: float or boolean

If threshold is None, then the function returns the p-value of the Shapiro-Wilk test on the residuals. If a threshold is provided, the function returns True if p-value is below the threshold.

fit_checks.check_residuals_not_gaussian_dagostino(first_signal, second_signal, threshold=None)[source]

Check if the residuals of the linear fit are not from a normal distribution.

The function uses a D’agostino - Pearsons’s test on the residuals of a linear fit. Specifically, the function performs a linear fit to the data, assuming y = ax, and then calculates the residuals r = y - ax. It will return the p value of the D’agostino - Pearsons’s omnibus test on the residuals.

If a threshold is provided, returns True if the p value is below the specified threshold, i.e. if the residuals are probably not gaussian.

Parameters
first_signal: array

The first signal array

second_signal: array

The second signal array

threshold: float or None

Threshold for the Shapiro-Wilk p-value.

Returns
p_value: float or boolean

If threshold is None, then the function returns the p-value of the D’agostino - Pearsons’s test on the residuals. If a threshold is provided, the function returns True if p-value is below the threshold.

fit_checks.sliding_check_correlation(first_signal, second_signal, window_length=11, threshold=None)[source]

Returns the sliding correlation coefficient between the two signals.

If a threshold is provided, returns True if the correlation is above the specified threshold.

Parameters
first_signal: array

The first signal array

second_signal: array

The second signal array

window_length: int

The length of the window. It should be an odd number.

threshold: float or None

Threshold for the correlation coefficient.

Returns
correlation: float or boolean

If threshold is None, then the function returns an the correlation coefficient. If a threshold is provided, the function returns True if the correlation value is above the threshold.

fit_checks.sliding_check_linear_fit_intercept_and_correlation(first_signal, second_signal, window_length=11)[source]

Check if the intercept of a linear fit is near zero.

Performs a linear fit to the data, assuming y = ax + b, with x the first_signal and y the second_signal.

It will return the value np.abs(b / np.mean(y) * 100) and the correlation of the two signals.

Parameters
first_signal: array

The first signal array

second_signal: array

The second signal array

window_length: int

The length of the window. It should be an odd number.

Returns
interceptsfloat or boolean

The value of the intercept b, relative to the mean value of the second_signal.

correlationsfloat

Correlation coefficient between the two samples

fit_checks.sliding_check_min_max_ratio(first_signal, second_signal, window_length=11, threshold=None)[source]

Returns the sliding min/max ratio for both signals

If a threshold is provided, returns True if the min/max ratio is above the specified threshold.

Parameters
first_signal: array

The first signal array

second_signal: array

The second signal array

window_length: int

The length of the window. It should be an odd number.

threshold: float or None

Threshold for the correlation coefficient.

Returns
correlation: float or boolean

If threshold is None, then the function returns an the correlation coefficient. If a threshold is provided, the function returns True if the correlation value is above the threshold.

fit_checks.sliding_check_residuals_not_gaussian(first_signal, second_signal, window_length, threshold=None)[source]

Check if the residuals of the linear fit are not from a normal distribution.

The function uses a Shapiro-Wilk test on the residuals of a linear fit. Specifically, the function performs a linear fit to the data, assuming y = ax, and then calculates the residuals r = y - ax. It will return the p value of the Shapiro-Wilk test on the residuals.

If a threshold is provided, returns True if the p value is below the specified threshold, i.e. if the residuals are probably not gaussian.

Parameters
first_signal: array

The first signal array

second_signal: array

The second signal array

window_length: int

The length of the window. It should be an odd number.

threshold: float or None

Threshold for the Shapiro-Wilk p-value.

Returns
p_value: array

If threshold is None, then the function returns the p-value of the Shapiro-Wilk test on the residuals. If a threshold is provided, the function returns True if p-value is below the threshold.

fit_checks.sliding_check_residuals_not_gaussian_dagostino(first_signal, second_signal, window_length, threshold=None)[source]

Check if the residuals of the linear fit are not from a normal distribution.

The function uses a Shapiro-Wilk test on the residuals of a linear fit. Specifically, the function performs a linear fit to the data, assuming y = ax, and then calculates the residuals r = y - ax. It will return the p value of the Shapiro-Wilk test on the residuals.

If a threshold is provided, returns True if the p value is below the specified threshold, i.e. if the residuals are probably not gaussian.

Parameters
first_signal: array

The first signal array

second_signal: array

The second signal array

window_length: int

The length of the window. It should be an odd number.

threshold: float or None

Threshold for the Shapiro-Wilk p-value.

Returns
p_value: array

If threshold is None, then the function returns the p-value of the Shapiro-Wilk test on the residuals. If a threshold is provided, the function returns True if p-value is below the threshold.