statstools¶

This module implements statistical functionalities frequently used in hydrological modelling.

Module statstools implements the following members:

SimObs A named tuple containing one array of simulated and one array of observed values.

filter_series() Filter time series for the given date ranges or months.

prepare_arrays() Prepare and return two numpy arrays based on the given arguments.

Criterion Callback protocol for efficiency criteria like nse().

rmse() Calculate the root-mean-square error.

nse() Calculate the efficiency criteria after Nash & Sutcliffe.

nse_log() Calculate the efficiency criteria after Nash & Sutcliffe for logarithmic values.

corr2() Calculate the coefficient of determination via the square of the coefficient of correlation according to Bravais-Pearson.

kge() Calculate the Kling-Gupta efficiency [21].

bias_abs() Calculate the absolute difference between the means of the simulated and the observed values.

bias_rel() Calculate the relative difference between the means of the simulated and the observed values.

std_ratio() Calculate the ratio between the standard deviation of the simulated and the observed values.

corr() Calculate the product-moment correlation coefficient after Pearson.

hsepd_pdf() Calculate the probability densities based on the heteroskedastic skewed exponential power distribution.

hsepd_manual() Calculate the mean of the logarithmic probability densities of the heteroskedastic skewed exponential power distribution.

hsepd() Calculate the mean of the logarithmic probability densities of the heteroskedastic skewed exponential power distribution.

calc_mean_time() Return the weighted mean of the given time points.

calc_mean_time_deviation() Return the weighted deviation of the given timepoints from their mean time.

print_evaluationtable() Print a table containing the results of the given evaluation criteria for the given Node objects.

class hydpy.auxs.statstools.SimObs(sim: hydpy.core.typingtools.Vector[float], obs: hydpy.core.typingtools.Vector[float])[source]¶

Bases: tuple

A named tuple containing one array of simulated and one array of observed values.

property sim¶: Alias for field number 0

property obs¶: Alias for field number 1

hydpy.auxs.statstools.filter_series(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, date_ranges: Optional[Iterable[Tuple[Union[Date, datetime.datetime, str], Union[Date, datetime.datetime, str]]]] = None, months: Optional[Iterable[int]] = None) → hydpy.auxs.statstools.SimObs [source]¶

Filter time series for the given date ranges or months.

Often, we want to apply objective functions like nse() on a subset of the available simulated and observed values. The function filter_series() helps to extract the relevant data either by data ranges or by months. Common examples are to pass a single date range to ignore the first non-optimal values of a warm-up period, to pass a set of date ranges to focus on certain events or to pass a set of months to perform a seasonal analysis.

To show how filter_series() works, we prepare a daily initialisation time grid spanning two hydrological years:

>>> from hydpy import filter_series, pub, Node
>>> pub.timegrids = "2001-11-01", "2003-11-01", "1d"

Next, we prepare a Node object and assign some constantly increasing and decreasing values to its simulation and the observation series, respectively:

>>> import numpy
>>> node = Node("test")
>>> node.prepare_allseries()
>>> node.sequences.sim.series = numpy.arange(1, 2*365+1)
>>> node.sequences.obs.series = numpy.arange(2*365, 0, -1)

First, we select data of arbitrary sub-periods via the data_ranges argument. Each data range consists of the start-point and the end-point of a sub-period. Here, we choose all values that belong to 31 October or 1 November (note that unsorted data ranges are acceptable):

>>> date_ranges = [("2001-11-01", "2001-11-02"),
...                ("2002-10-31", "2002-11-02"),
...                ("2003-10-31", "2003-11-01")]
>>> results = filter_series(node=node, date_ranges=date_ranges)

filter_series() returns the data within index-sorted Series objects (note that the index addresses the left boundary of each time step):

>>> results.sim   
2001-11-01      1.0
2002-10-31    365.0
2002-11-01    366.0
2003-10-31    730.0
Name: sim...
>>> results.obs   
2001-11-01    730.0
2002-10-31    366.0
2002-11-01    365.0
2003-10-31      1.0
Name: obs...

To help avoiding possible hard-to-find errors, filter_series() performs the following checks:

>>> date_ranges = [("2001-10-31", "2003-11-01")]
>>> filter_series(node=node, date_ranges=date_ranges)
Traceback (most recent call last):
...
ValueError: While trying to filter the given series, the following error occurred: The given date (2001-10-31 00:00:00) is before the first date of the initialisation period (2001-11-01 00:00:00).

>>> date_ranges = [("2001-11-01", "2003-11-02")]
>>> filter_series(node=node, date_ranges=date_ranges)
Traceback (most recent call last):
...
ValueError: While trying to filter the given series, the following error occurred: The given date (2003-11-02 00:00:00) is behind the last date of the initialisation period (2003-11-01 00:00:00).

>>> date_ranges = [("2001-11-02", "2001-11-02")]
>>> filter_series(node=node, date_ranges=date_ranges)
Traceback (most recent call last):
...
ValueError: While trying to filter the given series, the following error occurred: The given first date `2001-11-02 00:00:00` is not before than the given last date `2001-11-02 00:00:00`.

Note that function filter_series() does not remove any duplicates:

>>> date_ranges = [("2001-11-01", "2001-11-05"),
...                ("2001-11-01", "2001-11-02"),
...                ("2001-11-04", "2001-11-06")]
>>> sim = filter_series(node=node, date_ranges=date_ranges).sim
>>> sim   
2001-11-01    1.0
2001-11-01    1.0
2001-11-02    2.0
2001-11-03    3.0
2001-11-04    4.0
2001-11-04    4.0
2001-11-05    5.0
Name: sim...

Instead of date ranges, one can specify months via integer numbers. We begin with selecting October (10) and November (11) individually:

>>> sim = filter_series(node=node, months=[11]).sim
>>> len(sim)
60
>>> sim   
2001-11-01      1.0
2001-11-02      2.0
...
2002-11-29    394.0
2002-11-30    395.0
Name: sim...

>>> sim = filter_series(node=node, months=[10]).sim
>>> len(sim)
62
>>> sim   
2002-10-01    335.0
2002-10-02    336.0
...
2003-10-30    729.0
2003-10-31    730.0
Name: sim...

One can select multiple months, which neither need to be sorted nor consecutive:

>>> sim = filter_series(node=node, months=[4, 1]).sim
>>> len(sim)
122
>>> sim   
2002-01-01     62.0
2002-01-02     63.0
...
2003-04-29    545.0
2003-04-30    546.0
Name: sim...

Note that you are also free to either pass the sim and obs series directly instead of a node (see function prepare_arrays() for further information):

>>> xs = node.sequences.sim.series
>>> ys = node.sequences.obs.series
>>> filter_series(sim=xs, obs=ys, months=[4, 1]).sim   
2002-01-01     62.0
2002-01-02     63.0
...
2003-04-29    545.0
2003-04-30    546.0
Name: sim...

Missing or double information for arguments date_ranges and months results in the following error messages:

>>> filter_series(node=node)
Traceback (most recent call last):
...
ValueError: While trying to filter the given series, the following error occurred: You need to define either the `date_ranges` or `months` argument, but none of them is given.

>>> filter_series(node=node, date_ranges=[], months=[])
Traceback (most recent call last):
...
ValueError: While trying to filter the given series, the following error occurred: You need to define either the `date_ranges` or `months` argument, but both of them are given.

hydpy.auxs.statstools.prepare_arrays(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → hydpy.auxs.statstools.SimObs [source]¶

Prepare and return two numpy arrays based on the given arguments.

Note that many functions provided by module statstools apply function prepare_arrays() internally (e.g. nse()). But you can also use it manually, as shown in the following examples.

Function prepare_arrays() can extract time-series data from Node objects. To set up an example for this, we define an initialisation period and prepare a Node object:

>>> from hydpy import pub, Node, round_, nan
>>> pub.timegrids = "01.01.2000", "07.01.2000", "1d"
>>> node = Node("test")

Next, we assign some values to the simulation and the observation sequences of the node:

>>> node.prepare_simseries()
>>> with pub.options.checkseries(False):
...     node.sequences.sim.series = 1.0, nan, nan, nan, 2.0, 3.0
...     node.sequences.obs.activate_ram()
...     node.sequences.obs.series = 4.0, 5.0, nan, nan, nan, 6.0

Now we can pass the node object to function prepare_arrays() and get the (unmodified) time-series data:

>>> from hydpy import prepare_arrays
>>> arrays = prepare_arrays(node=node)
>>> round_(arrays.sim)
1.0, nan, nan, nan, 2.0, 3.0
>>> round_(arrays.obs)
4.0, 5.0, nan, nan, nan, 6.0

Alternatively, we can pass directly any iterable (e.g. list and tuple objects) containing the simulated and observed data:

>>> arrays = prepare_arrays(sim=list(node.sequences.sim.series),
...                         obs=tuple(node.sequences.obs.series))
>>> round_(arrays.sim)
1.0, nan, nan, nan, 2.0, 3.0
>>> round_(arrays.obs)
4.0, 5.0, nan, nan, nan, 6.0

The optional skip_nan flag allows skipping all values, which are no numbers. Note that prepare_arrays() returns only those pairs of simulated and observed values which do not contain any nan value:

>>> arrays = prepare_arrays(node=node, skip_nan=True)
>>> round_(arrays.sim)
1.0, 3.0
>>> round_(arrays.obs)
4.0, 6.0

If you are interested in analysing a sub-period, adapt the global eval_ Timegrid beforehand. When passing a Node object, function prepare_arrays() then returns the data of the current evaluation sub-period only:

>>> pub.timegrids.eval_.dates = "02.01.2000", "06.01.2000"
>>> arrays = prepare_arrays(node=node)
>>> round_(arrays.sim)
nan, nan, nan, 2.0
>>> round_(arrays.obs)
5.0, nan, nan, nan

Suppose one instead passes the simulation and observation time-series directly (which possibly fit the evaluation period already). In that case, function prepare_arrays() ignores the current eval_ Timegrid by default:

>>> arrays = prepare_arrays(sim=arrays.sim, obs=arrays.obs)
>>> round_(arrays.sim)
nan, nan, nan, 2.0
>>> round_(arrays.obs)
5.0, nan, nan, nan

Use the subperiod argument to deviate from the default behaviour:

>>> arrays = prepare_arrays(node=node, subperiod=False)
>>> round_(arrays.sim)
1.0, nan, nan, nan, 2.0, 3.0
>>> round_(arrays.obs)
4.0, 5.0, nan, nan, nan, 6.0

>>> arrays = prepare_arrays(sim=arrays.sim, obs=arrays.obs, subperiod=True)
>>> round_(arrays.sim)
nan, nan, nan, 2.0
>>> round_(arrays.obs)
5.0, nan, nan, nan

The final examples show the error messages returned in case of invalid combinations of input arguments:

>>> prepare_arrays()
Traceback (most recent call last):
...
ValueError: Neither a `Node` object is passed to argument `node` nor are arrays passed to arguments `sim` and `obs`.

>>> prepare_arrays(sim=node.sequences.sim.series, node=node)
Traceback (most recent call last):
...
ValueError: Values are passed to both arguments `sim` and `node`, which is not allowed.

>>> prepare_arrays(obs=node.sequences.obs.series, node=node)
Traceback (most recent call last):
...
ValueError: Values are passed to both arguments `obs` and `node`, which is not allowed.

>>> prepare_arrays(sim=node.sequences.sim.series)
Traceback (most recent call last):
...
ValueError: A value is passed to argument `sim` but no value is passed to argument `obs`.

>>> prepare_arrays(obs=node.sequences.obs.series)
Traceback (most recent call last):
...
ValueError: A value is passed to argument `obs` but no value is passed to argument `sim`.

class hydpy.auxs.statstools.Criterion(*args, **kwds)[source]¶

Bases: typing_extensions.Protocol

Callback protocol for efficiency criteria like nse().

hydpy.auxs.statstools.rmse(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → float [source]¶

Calculate the root-mean-square error.

>>> from hydpy import rmse, round_
>>> rmse(sim=[1.0, 2.0, 3.0], obs=[1.0, 2.0, 3.0])
0.0
>>> round_(rmse(sim=[1.0, 2.0, 3.0], obs=[0.5, 2.0, 4.5]))
0.912871

See the documentation on function prepare_arrays() for some additional instructions for using rmse().

hydpy.auxs.statstools.nse(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → float [source]¶

Calculate the efficiency criteria after Nash & Sutcliffe.

If the simulated values predict the observed values as well as the average observed value (regarding the mean square error), the NSE value is zero:

>>> from hydpy import nse
>>> nse(sim=[2.0, 2.0, 2.0], obs=[1.0, 2.0, 3.0])
0.0
>>> nse(sim=[0.0, 2.0, 4.0], obs=[1.0, 2.0, 3.0])
0.0

For worse and better agreement, the NSE is negative or positive, respectively:

>>> nse(sim=[3.0, 2.0, 1.0], obs=[1.0, 2.0, 3.0])
-3.0
>>> nse(sim=[1.0, 2.0, 2.0], obs=[1.0, 2.0, 3.0])
0.5

The highest possible value is one:

>>> nse(sim=[1.0, 2.0, 3.0], obs=[1.0, 2.0, 3.0])
1.0

See the documentation on function prepare_arrays() for some additional instructions for using nse().

hydpy.auxs.statstools.nse_log(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → float [source]¶

Calculate the efficiency criteria after Nash & Sutcliffe for logarithmic values.

The following calculations repeat the ones of the documentation on function nse() but with exponentiated values. Hence, the results are similar or, as in the first and the last example, even identical:

>>> from hydpy import nse_log, round_
>>> from numpy import exp
>>> nse_log(sim=exp([2.0, 2.0, 2.0]), obs=exp([1.0, 2.0, 3.0]))
0.0
>>> nse_log(sim=exp([0.0, 2.0, 4.0]), obs=exp([1.0, 2.0, 3.0]))
0.0

>>> round_(nse(sim=exp([3.0, 2.0, 1.0]), obs=exp([1.0, 2.0, 3.0])))
-2.734185
>>> round_(nse(sim=exp([1.0, 2.0, 2.0]), obs=exp([1.0, 2.0, 3.0])))
0.002139

>>> nse(sim=exp([1.0, 2.0, 3.0]), obs=exp([1.0, 2.0, 3.0]))
1.0

See the documentation on function prepare_arrays() for some additional instructions for using nse_log().

hydpy.auxs.statstools.corr2(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → float [source]¶

Calculate the coefficient of determination via the square of the coefficient of correlation according to Bravais-Pearson.

For perfect positive or negative correlation, corr2() returns 1:

>>> from hydpy import corr2, round_
>>> corr2(sim=[1.0, 2.0, 3.0], obs=[1.0, 2.0, 3.0])
1.0
>>> corr2(sim=[3.0, 2.0, 1.0], obs=[1.0, 2.0, 3.0])
1.0

If there is no correlation at all, corr2() returns 0:

>>> corr2(sim=[1.0, 2.0, 3.0], obs=[1.0, 2.0, 1.0])
0.0

An intermediate example:

>>> round_(corr2(sim=[2.0, 2.0, 1.0], obs=[1.0, 2.0, 3.0]))
0.75

Take care if there is no variation in one of the data series. Then the correlation coefficient is not defined, and corr2() returns nan:

>>> corr2(sim=[2.0, 2.0, 2.0], obs=[2.0, 2.0, 3.0])
nan

See the documentation on function prepare_arrays() for some additional instructions for using corr2().

hydpy.auxs.statstools.kge(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → float [source]¶

Calculate the Kling-Gupta efficiency [21].

>>> from hydpy import  kge, round_
>>> kge(sim=[1.0, 2.0, 3.0], obs=[1.0, 2.0, 3.0])
1.0
>>> kge(sim=[3.0, 2.0, 1.0], obs=[1.0, 2.0, 3.0])
-3.0
>>> round_(kge(sim=[2.0, 2.0, 1.0], obs=[1.0, 2.0, 3.0]))
-2.688461

See the documentation on function prepare_arrays() for some additional instructions for using kge().

hydpy.auxs.statstools.bias_abs(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → float [source]¶

Calculate the absolute difference between the means of the simulated and the observed values.

>>> from hydpy import bias_abs, round_
>>> round_(bias_abs(sim=[2.0, 2.0, 2.0], obs=[1.0, 2.0, 3.0]))
0.0
>>> round_(bias_abs(sim=[5.0, 2.0, 2.0], obs=[1.0, 2.0, 3.0]))
1.0
>>> round_(bias_abs(sim=[1.0, 1.0, 1.0], obs=[1.0, 2.0, 3.0]))
-1.0

See the documentation on function prepare_arrays() for some additional instructions for using bias_abs().

hydpy.auxs.statstools.bias_rel(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → float [source]¶

Calculate the relative difference between the means of the simulated and the observed values.

>>> from hydpy import bias_rel, round_
>>> round_(bias_rel(sim=[2.0, 2.0, 2.0], obs=[1.0, 2.0, 3.0]))
0.0
>>> round_(bias_rel(sim=[5.0, 2.0, 2.0], obs=[1.0, 2.0, 3.0]))
0.5
>>> round_(bias_rel(sim=[1.0, 1.0, 1.0], obs=[1.0, 2.0, 3.0]))
-0.5

See the documentation on function prepare_arrays() for some additional instructions for using bias_rel().

hydpy.auxs.statstools.std_ratio(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → float [source]¶

Calculate the ratio between the standard deviation of the simulated and the observed values.

>>> from hydpy import round_, std_ratio
>>> round_(std_ratio(sim=[1.0, 2.0, 3.0], obs=[1.0, 2.0, 3.0]))
0.0
>>> round_(std_ratio(sim=[1.0, 1.0, 1.0], obs=[1.0, 2.0, 3.0]))
-1.0
>>> round_(std_ratio(sim=[0.0, 3.0, 6.0], obs=[1.0, 2.0, 3.0]))
2.0

See the documentation on function prepare_arrays() for some additional instructions for using std_ratio().

hydpy.auxs.statstools.corr(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → float [source]¶

Calculate the product-moment correlation coefficient after Pearson.

>>> from hydpy import corr, round_
>>> round_(corr(sim=[0.5, 1.0, 1.5], obs=[1.0, 2.0, 3.0]))
1.0
>>> round_(corr(sim=[4.0, 2.0, 0.0], obs=[1.0, 2.0, 3.0]))
-1.0
>>> round_(corr(sim=[1.0, 2.0, 1.0], obs=[1.0, 2.0, 3.0]))
0.0

Take care if there is no variation in one of the data series. Then the correlation coefficient is not defined, and corr() returns nan:

>>> round_(corr(sim=[2.0, 2.0, 2.0], obs=[1.0, 2.0, 3.0]))
nan

See the documentation on function prepare_arrays() for some additional instructions for use of function corr().

hydpy.auxs.statstools.hsepd_pdf(*, sigma1: float, sigma2: float, xi: float, beta: float, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → hydpy.core.typingtools.Vector[float][source]¶

Calculate the probability densities based on the heteroskedastic skewed exponential power distribution.

For convenience, we store the required parameters of the probability density function as well as the simulated and observed values in a dictionary:

>>> import numpy
>>> from hydpy import hsepd_pdf, round_
>>> general = {"sigma1": 0.2,
...            "sigma2": 0.0,
...            "xi": 1.0,
...            "beta": 0.0,
...            "sim": numpy.arange(10.0, 41.0),
...            "obs": numpy.full(31, 25.0)}

The following test function allows for varying one parameter and prints some and plots all the probability density values corresponding to different simulated values:

>>> def test(**kwargs):
...     from matplotlib import pyplot
...     special = general.copy()
...     name, values = list(kwargs.items())[0]
...     results = numpy.zeros((len(general["sim"]), len(values)+1))
...     results[:, 0] = general["sim"]
...     for jdx, value in enumerate(values):
...         special[name] = value
...         results[:, jdx+1] = hsepd_pdf(**special)
...         pyplot.plot(results[:, 0], results[:, jdx+1],
...                     label="%s=%.1f" % (name, value))
...     pyplot.legend()
...     for idx, result in enumerate(results):
...         if not (idx % 5):
...             round_(result)

When varying beta, the resulting probabilities correspond to the Laplace distribution (1.0), normal distribution (0.0), and the uniform distribution (-1.0), respectively. Note that we use -0.99 instead of -1.0 for approximating the uniform distribution to prevent from running into numerical problems, which are not solved yet:

>>> test(beta=[1.0, 0.0, -0.99])
0, 0.002032, 0.000886, 0.0
0, 0.008359, 0.010798, 0.0
0, 0.034382, 0.048394, 0.057739
0, 0.141421, 0.079788, 0.057739
0, 0.034382, 0.048394, 0.057739
0, 0.008359, 0.010798, 0.0
0, 0.002032, 0.000886, 0.0

When varying xi, the resulting density is negatively skewed (0.2), symmetric (1.0), and positively skewed (5.0), respectively:

>>> test(xi=[0.2, 1.0, 5.0])
0, 0.0, 0.000886, 0.003175
0, 0.0, 0.010798, 0.012957
0, 0.092845, 0.048394, 0.036341
0, 0.070063, 0.079788, 0.070063
0, 0.036341, 0.048394, 0.092845
0, 0.012957, 0.010798, 0.0
0, 0.003175, 0.000886, 0.0

In the above examples, the actual sigma (5.0) is calculated by multiplying sigma1 (0.2) with the mean simulated value (25.0), internally. This can be done for modelling homoscedastic errors. Instead, sigma2 is multiplied with the individual simulated values to account for heteroscedastic errors. With increasing values of sigma2, the resulting densities are modified as follows:

>>> test(sigma2=[0.0, 0.1, 0.2])
0, 0.000886, 0.002921, 0.005737
0, 0.010798, 0.018795, 0.022831
0, 0.048394, 0.044159, 0.037988
0, 0.079788, 0.053192, 0.039894
0, 0.048394, 0.04102, 0.032708
0, 0.010798, 0.023493, 0.023493
0, 0.000886, 0.011053, 0.015771

See the documentation on function prepare_arrays() for some additional instructions for using hsepd_pdf().

hydpy.auxs.statstools.hsepd_manual(*, sigma1: float, sigma2: float, xi: float, beta: float, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → float [source]¶

Calculate the mean of the logarithmic probability densities of the heteroskedastic skewed exponential power distribution.

The following examples stem from the documentation of function hsepd_pdf(), which is used by function hsepd_manual(). The first one deals with a heteroscedastic normal distribution:

>>> from hydpy import hsepd_manual, round_
>>> round_(hsepd_manual(sigma1=0.2, sigma2=0.2,
...                     xi=1.0, beta=0.0,
...                     sim=numpy.arange(10.0, 41.0),
...                     obs=numpy.full(31, 25.0)))
-3.682842

Too small probability density values are set to 1e-200 before calculating their logarithm (which means that the lowest possible value returned by function hsepd_manual() is approximately -460):

>>> round_(hsepd_manual(sigma1=0.2, sigma2=0.0,
...                     xi=1.0, beta=-0.99,
...                     sim=numpy.arange(10.0, 41.0),
...                     obs=numpy.full(31, 25.0)))
-209.539335

See the documentation on function prepare_arrays() for some additional instructions for using hsepd_manual().

hydpy.auxs.statstools.hsepd(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None, inits: Optional[Iterable[float]] = None, return_pars: bool = False, silent: bool = True) → Union[float, Tuple[float, Tuple[float, float, float, float]]][source]¶

Calculate the mean of the logarithmic probability densities of the heteroskedastic skewed exponential power distribution.

Function hsepd() serves the same purpose as function hsepd_manual() but tries to estimate the parameters of the heteroscedastic skewed exponential distribution via an optimisation algorithm. This is shown by generating a random sample. One thousant simulated values are scattered around the observed (true) value of 10.0 with a standard deviation of 2.0:

>>> import numpy
>>> numpy.random.seed(0)
>>> sim = numpy.random.normal(10.0, 2.0, 1000)
>>> obs = numpy.full(1000, 10.0)

First, as a reference, we calculate the “true” value based on function hsepd_manual() and the correct distribution parameters:

>>> from hydpy import Period, hsepd, hsepd_manual, pub, round_
>>> round_(hsepd_manual(
...     sigma1=0.2, sigma2=0.0, xi=1.0, beta=0.0, sim=sim, obs=obs))
-2.100093

When using function hsepd(), the returned value is even a little “better”:

>>> round_(hsepd(sim=sim, obs=obs))
-2.09983

This is due to the deviation from the random sample to its theoretical distribution. This is reflected by small differences between the estimated values and the theoretical values of sigma1 (0.2), sigma2 (0.0), xi (1.0), and beta (0.0). The estimated values are returned in the mentioned order through enabling the return_pars option:

>>> value, pars = hsepd(sim=sim, obs=obs, return_pars=True)
>>> round_(pars, decimals=5)
0.19966, 0.0, 0.96836, 0.0188

There is no guarantee that the optimisation numerical optimisation algorithm underlying function hsepd() will always find the parameters resulting in the largest value returned by function hsepd_manual(). You can increase its robustness (and decrease computation time) by supplying close initial parameter values:

>>> value, pars = hsepd(sim=sim, obs=obs, return_pars=True,
...                     inits=(0.2, 0.0, 1.0, 0.0))
>>> round_(pars, decimals=5)
0.19966, 0.0, 0.96836, 0.0188

However, the following example shows a case when this strategy results in worse results:

>>> value, pars = hsepd(sim=sim, obs=obs, return_pars=True,
...                     inits=(0.0, 0.2, 1.0, 0.0))
>>> round_(value)
-2.174492
>>> round_(pars)
0.0, 0.213179, 1.705485, 0.505112

See the documentation on function prepare_arrays() for some additional instructions for using hsepd().

hydpy.auxs.statstools.calc_mean_time(timepoints: hydpy.core.typingtools.VectorInput[float], weights: hydpy.core.typingtools.VectorInput[float]) → float [source]¶

Return the weighted mean of the given time points.

With equal given weights, the result is simply the mean of the given time points:

>>> from hydpy import calc_mean_time
>>> calc_mean_time(timepoints=[3., 7.],
...                weights=[2., 2.])
5.0

With different weights, the resulting time is shifted to the larger ones:

>>> calc_mean_time(timepoints=[3., 7.],
...                weights=[1., 3.])
6.0

Or, in the most extreme case:

>>> calc_mean_time(timepoints=[3., 7.],
...                weights=[0., 4.])
7.0

There are some checks for input plausibility, e.g.:

>>> calc_mean_time(timepoints=[3., 7.],
...                weights=[-2., 2.])
Traceback (most recent call last):
...
ValueError: While trying to calculate the weighted mean time, the following error occurred: For the following objects, at least one value is negative: weights.

hydpy.auxs.statstools.calc_mean_time_deviation(timepoints: hydpy.core.typingtools.VectorInput[float], weights: hydpy.core.typingtools.VectorInput[float], mean_time: Optional[float] = None) → float [source]¶

Return the weighted deviation of the given timepoints from their mean time.

With equal given weights, the is simply the standard deviation of the given time points:

>>> from hydpy import calc_mean_time_deviation
>>> calc_mean_time_deviation(timepoints=[3., 7.],
...                          weights=[2., 2.])
2.0

One can pass a precalculated mean time:

>>> from hydpy import round_
>>> round_(calc_mean_time_deviation(timepoints=[3., 7.],
...                                 weights=[2., 2.],
...                                 mean_time=4.))
2.236068

>>> round_(calc_mean_time_deviation(timepoints=[3., 7.],
...                                 weights=[1., 3.]))
1.732051

Or, in the most extreme case:

>>> calc_mean_time_deviation(timepoints=[3., 7.],
...                          weights=[0., 4.])
0.0

There are some checks for input plausibility, e.g.:

>>> calc_mean_time_deviation(timepoints=[3., 7.],
...                          weights=[-2., 2.])
Traceback (most recent call last):
...
ValueError: While trying to calculate the weighted time deviation from mean time, the following error occurred: For the following objects, at least one value is negative: weights.

hydpy.auxs.statstools.print_evaluationtable(*, nodes: Sequence[hydpy.core.devicetools.Node], criteria: Sequence[hydpy.auxs.statstools.Criterion], nodenames: Optional[Sequence[str]] = None, critnames: Optional[Sequence[str]] = None, critfactors: Union[float, Sequence[float]] = 1.0, critdigits: Union[int, Sequence[int]] = 2, subperiod: bool = True, average: bool = True, averagename: str = 'mean', filter_: float = 1.0, stepsize: Optional[typing_extensions.Literal[daily, d, monthly, m]] = None, aggregator: Union[str, Callable[[hydpy.core.typingtools.VectorInput[float]], float]] = 'mean', missingvalue: str = '-', decimalseperator: str = '.', file_: Optional[Union[str, TextIO]] = None) → None [source]¶

Print a table containing the results of the given evaluation criteria for the given Node objects.

First, we define two nodes with different simulation and observation data (see function prepare_arrays() for some explanations):

>>> from hydpy import pub, Node, nan
>>> pub.timegrids = "01.01.2000", "04.01.2000", "1d"
>>> nodes = Node("test1"), Node("test2")
>>> for node in nodes:
...     node.prepare_simseries()
...     node.sequences.sim.series = 1.0, 2.0, 3.0
...     node.sequences.obs.activate_ram()
...     node.sequences.obs.series = 4.0, 5.0, 6.0
>>> nodes[0].sequences.sim.series = 1.0, 2.0, 3.0
>>> nodes[0].sequences.obs.series = 4.0, 5.0, 6.0
>>> nodes[1].sequences.sim.series = 1.0, 2.0, 3.0
>>> with pub.options.checkseries(False):
...     nodes[1].sequences.obs.series = 3.0, nan, 1.0

Selecting functions corr() and bias_abs() as evaluation criteria, function print_evaluationtable() prints the following table:

>>> from hydpy import bias_abs, corr, print_evaluationtable
>>> print_evaluationtable(nodes=nodes,  
...                       criteria=(corr, bias_abs))
       corr  bias_abs
test1  1.00     -3.00
test2     -         -
mean   1.00     -3.00

One can pass alternative names for the node objects, the criteria functions, and the row containing the average values. Also, one can use the filter_ argument to force printing statistics in case of incomplete observation data. In the following example, we accept a maximum fraction of missing data of 50 %:

>>> print_evaluationtable(nodes=nodes,
...                       criteria=(corr, bias_abs),
...                       nodenames=("first node", "second node"),
...                       critnames=("corrcoef", "bias"),
...                       critdigits=1,
...                       averagename="average",
...                       filter_=0.5)   
             corrcoef  bias
first node        1.0  -3.0
second node      -1.0   0.0
average           0.0  -1.5

The number of assigned node objects and criteria functions must match the number of given alternative names:

>>> print_evaluationtable(nodes=nodes,
...                       criteria=(corr, bias_abs),
...                       nodenames=("first node",))
Traceback (most recent call last):
...
ValueError: While trying to evaluate the simulation results of some node objects, the following error occurred: 2 node objects are given which does not match with number of given alternative names being 1.

>>> print_evaluationtable(nodes=nodes,
...                       criteria=(corr, bias_abs),
...                       critnames=("corrcoef",))
Traceback (most recent call last):
...
ValueError: While trying to evaluate the simulation results of some node objects, the following error occurred: 2 criteria functions are given which does not match with number of given alternative names being 1.

Set the average argument to False to omit the row containing the average values:

>>> print_evaluationtable(nodes=nodes,  
...                       criteria=(corr, bias_abs),
...                       average=False)
       corr  bias_abs
test1  1.00     -3.00
test2     -         -

You can use the arguments critfactors and critdigits by passing either a single number or a sequence of criteria-specific numbers to modify the printed values:

>>> print_evaluationtable(nodes=nodes,  
...                       criteria=(corr, bias_abs),
...                       critfactors=(10.0, 0.1),
...                       critdigits=1)
       corr  bias_abs
test1  10.0      -0.3
test2     -         -
mean   10.0      -0.3

By default, function print_evaluationtable() prints the statics relevant for the actual evaluation period only:

>>> pub.timegrids.eval_.dates = "01.01.2000", "02.01.2000"
>>> print_evaluationtable(nodes=nodes,  
...                       criteria=(corr, bias_abs))
       corr  bias_abs
test1     -     -3.00
test2     -     -2.00
mean      -     -2.50

You can deviate from this default behaviour by setting the subperiod argument to False:

>>> print_evaluationtable(nodes=nodes,  
...                       criteria=(corr, bias_abs),
...                       subperiod=False)
       corr  bias_abs
test1  1.00     -3.00
test2     -         -
mean   1.00     -3.00

Use the stepsize argument (eventually in combination with argument aggregator) to print the statistics of previously aggregated time series. See function aggregate_series() for further information.

Here, the daily aggregation step size results in identical results as the original step size is also one day:

>>> pub.timegrids.eval_ = pub.timegrids.init
>>> print_evaluationtable(nodes=nodes,  
...                       criteria=(corr, bias_abs),
...                       stepsize="daily",
...                       aggregator="mean")
       corr  bias_abs
test1  1.00     -3.00
test2     -         -
mean   1.00     -3.00

For the monthly step size, the result table is empty, due to the too short initialisation period covering less than a month:

>>> pub.timegrids.eval_.dates = pub.timegrids.init.dates
>>> print_evaluationtable(nodes=nodes,  
...                       criteria=(corr, bias_abs),
...                       stepsize="monthly",
...                       aggregator="mean")
       corr  bias_abs
test1     -         -
test2     -         -
mean      -         -

statstools¶

Table of Contents

Previous topic

Next topic

This Page

Versions