statstools¶
This module implements statistical functionalities frequently used in hydrological modelling.
Module statstools
implements the following members:
SimObs
A named tuple containing one array of simulated and one array of observed values.
filter_series()
Filter time series for the given date ranges or months.
prepare_arrays()
Prepare and return twonumpy
arrays based on the given arguments.
Criterion
Callback protocol for efficiency criteria likense()
.
rmse()
Calculate the root-mean-square error.
nse()
Calculate the efficiency criteria after Nash & Sutcliffe.
nse_log()
Calculate the efficiency criteria after Nash & Sutcliffe for logarithmic values.
corr2()
Calculate the coefficient of determination via the square of the coefficient of correlation according to Bravais-Pearson.
bias_abs()
Calculate the absolute difference between the means of the simulated and the observed values.
bias_rel()
Calculate the relative difference between the means of the simulated and the observed values.
std_ratio()
Calculate the ratio between the standard deviation of the simulated and the observed values.
corr()
Calculate the product-moment correlation coefficient after Pearson.
hsepd_pdf()
Calculate the probability densities based on the heteroskedastic skewed exponential power distribution.
hsepd_manual()
Calculate the mean of the logarithmic probability densities of the heteroskedastic skewed exponential power distribution.
hsepd()
Calculate the mean of the logarithmic probability densities of the heteroskedastic skewed exponential power distribution.
calc_mean_time()
Return the weighted mean of the given time points.
calc_mean_time_deviation()
Return the weighted deviation of the given timepoints from their mean time.
print_evaluationtable()
Print a table containing the results of the given evaluation criteria for the givenNode
objects.
-
class
hydpy.auxs.statstools.
SimObs
(sim: hydpy.core.typingtools.Vector[float], obs: hydpy.core.typingtools.Vector[float])[source]¶ Bases:
tuple
A named tuple containing one array of simulated and one array of observed values.
-
property
sim
¶ Alias for field number 0
-
property
obs
¶ Alias for field number 1
-
property
-
hydpy.auxs.statstools.
filter_series
(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, date_ranges: Optional[Iterable[Tuple[Union[Date, datetime.datetime, str], Union[Date, datetime.datetime, str]]]] = None, months: Optional[Iterable[int]] = None) → hydpy.auxs.statstools.SimObs[source]¶ Filter time series for the given date ranges or months.
Often, we want to apply objective functions like
nse()
on a subset of the available simulated and observed values. The functionfilter_series()
helps to extract the relevant data either by data ranges or by months. Common examples are to pass a single date range to ignore the first non-optimal values of a warm-up period, to pass a set of date ranges to focus on certain events or to pass a set of months to perform a seasonal analysis.To show how
filter_series()
works, we prepare a daily initialisation time grid spanning two hydrological years:>>> from hydpy import filter_series, pub, Node >>> pub.timegrids = "2001-11-01", "2003-11-01", "1d"
Next, we prepare a
Node
object and assign some constantly increasing and decreasing values to its simulation and the observation series, respectively:>>> import numpy >>> node = Node("test") >>> node.prepare_allseries() >>> node.sequences.sim.series = numpy.arange(1, 2*365+1) >>> node.sequences.obs.series = numpy.arange(2*365, 0, -1)
First, we select data of arbitrary sub-periods via the data_ranges argument. Each data range consists of the start-point and the end-point of a sub-period. Here, we choose all values that belong to 31 October or 1 November (note that unsorted data ranges are acceptable):
>>> date_ranges = [("2001-11-01", "2001-11-02"), ... ("2002-10-31", "2002-11-02"), ... ("2003-10-31", "2003-11-01")] >>> results = filter_series(node=node, date_ranges=date_ranges)
filter_series()
returns the data within index-sortedSeries
objects (note that the index addresses the left boundary of each time step):>>> results.sim 2001-11-01 1.0 2002-10-31 365.0 2002-11-01 366.0 2003-10-31 730.0 Name: sim... >>> results.obs 2001-11-01 730.0 2002-10-31 366.0 2002-11-01 365.0 2003-10-31 1.0 Name: obs...
To help avoiding possible hard-to-find errors,
filter_series()
performs the following checks:>>> date_ranges = [("2001-10-31", "2003-11-01")] >>> filter_series(node=node, date_ranges=date_ranges) Traceback (most recent call last): ... ValueError: While trying to filter the given series, the following error occurred: The given date (2001-10-31 00:00:00) is before the first date of the initialisation period (2001-11-01 00:00:00).
>>> date_ranges = [("2001-11-01", "2003-11-02")] >>> filter_series(node=node, date_ranges=date_ranges) Traceback (most recent call last): ... ValueError: While trying to filter the given series, the following error occurred: The given date (2003-11-02 00:00:00) is behind the last date of the initialisation period (2003-11-01 00:00:00).
>>> date_ranges = [("2001-11-02", "2001-11-02")] >>> filter_series(node=node, date_ranges=date_ranges) Traceback (most recent call last): ... ValueError: While trying to filter the given series, the following error occurred: The given first date `2001-11-02 00:00:00` is not before than the given last date `2001-11-02 00:00:00`.
Note that function
filter_series()
does not remove any duplicates:>>> date_ranges = [("2001-11-01", "2001-11-05"), ... ("2001-11-01", "2001-11-02"), ... ("2001-11-04", "2001-11-06")] >>> sim = filter_series(node=node, date_ranges=date_ranges).sim >>> sim 2001-11-01 1.0 2001-11-01 1.0 2001-11-02 2.0 2001-11-03 3.0 2001-11-04 4.0 2001-11-04 4.0 2001-11-05 5.0 Name: sim...
Instead of date ranges, one can specify months via integer numbers. We begin with selecting October (10) and November (11) individually:
>>> sim = filter_series(node=node, months=[11]).sim >>> len(sim) 60 >>> sim 2001-11-01 1.0 2001-11-02 2.0 ... 2002-11-29 394.0 2002-11-30 395.0 Name: sim...
>>> sim = filter_series(node=node, months=[10]).sim >>> len(sim) 62 >>> sim 2002-10-01 335.0 2002-10-02 336.0 ... 2003-10-30 729.0 2003-10-31 730.0 Name: sim...
One can select multiple months, which neither need to be sorted nor consecutive:
>>> sim = filter_series(node=node, months=[4, 1]).sim >>> len(sim) 122 >>> sim 2002-01-01 62.0 2002-01-02 63.0 ... 2003-04-29 545.0 2003-04-30 546.0 Name: sim...
Note that you are also free to either pass the sim and obs series directly instead of a node (see function
prepare_arrays()
for further information):>>> xs = node.sequences.sim.series >>> ys = node.sequences.obs.series >>> filter_series(sim=xs, obs=ys, months=[4, 1]).sim 2002-01-01 62.0 2002-01-02 63.0 ... 2003-04-29 545.0 2003-04-30 546.0 Name: sim...
Missing or double information for arguments date_ranges and months results in the following error messages:
>>> filter_series(node=node) Traceback (most recent call last): ... ValueError: While trying to filter the given series, the following error occurred: You need to define either the `date_ranges` or `months` argument, but none of them is given.
>>> filter_series(node=node, date_ranges=[], months=[]) Traceback (most recent call last): ... ValueError: While trying to filter the given series, the following error occurred: You need to define either the `date_ranges` or `months` argument, but both of them are given.
-
hydpy.auxs.statstools.
prepare_arrays
(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → hydpy.auxs.statstools.SimObs[source]¶ Prepare and return two
numpy
arrays based on the given arguments.Note that many functions provided by module
statstools
apply functionprepare_arrays()
internally (e.g.nse()
). But you can also use it manually, as shown in the following examples.Function
prepare_arrays()
can extract time-series data fromNode
objects. To set up an example for this, we define an initialisation period and prepare aNode
object:>>> from hydpy import pub, Node, round_, nan >>> pub.timegrids = "01.01.2000", "07.01.2000", "1d" >>> node = Node("test")
Next, we assign some values to the simulation and the observation sequences of the node:
>>> node.prepare_simseries() >>> with pub.options.checkseries(False): ... node.sequences.sim.series = 1.0, nan, nan, nan, 2.0, 3.0 ... node.sequences.obs.activate_ram() ... node.sequences.obs.series = 4.0, 5.0, nan, nan, nan, 6.0
Now we can pass the node object to function
prepare_arrays()
and get the (unmodified) time-series data:>>> from hydpy import prepare_arrays >>> arrays = prepare_arrays(node=node) >>> round_(arrays.sim) 1.0, nan, nan, nan, 2.0, 3.0 >>> round_(arrays.obs) 4.0, 5.0, nan, nan, nan, 6.0
Alternatively, we can pass directly any iterable (e.g.
list
andtuple
objects) containing the simulated and observed data:>>> arrays = prepare_arrays(sim=list(node.sequences.sim.series), ... obs=tuple(node.sequences.obs.series)) >>> round_(arrays.sim) 1.0, nan, nan, nan, 2.0, 3.0 >>> round_(arrays.obs) 4.0, 5.0, nan, nan, nan, 6.0
The optional skip_nan flag allows skipping all values, which are no numbers. Note that
prepare_arrays()
returns only those pairs of simulated and observed values which do not contain any nan value:>>> arrays = prepare_arrays(node=node, skip_nan=True) >>> round_(arrays.sim) 1.0, 3.0 >>> round_(arrays.obs) 4.0, 6.0
If you are interested in analysing a sub-period, adapt the global
eval_
Timegrid
beforehand. When passing aNode
object, functionprepare_arrays()
then returns the data of the current evaluation sub-period only:>>> pub.timegrids.eval_.dates = "02.01.2000", "06.01.2000" >>> arrays = prepare_arrays(node=node) >>> round_(arrays.sim) nan, nan, nan, 2.0 >>> round_(arrays.obs) 5.0, nan, nan, nan
Suppose one instead passes the simulation and observation time-series directly (which possibly fit the evaluation period already). In that case, function
prepare_arrays()
ignores the currenteval_
Timegrid
by default:>>> arrays = prepare_arrays(sim=arrays.sim, obs=arrays.obs) >>> round_(arrays.sim) nan, nan, nan, 2.0 >>> round_(arrays.obs) 5.0, nan, nan, nan
Use the subperiod argument to deviate from the default behaviour:
>>> arrays = prepare_arrays(node=node, subperiod=False) >>> round_(arrays.sim) 1.0, nan, nan, nan, 2.0, 3.0 >>> round_(arrays.obs) 4.0, 5.0, nan, nan, nan, 6.0
>>> arrays = prepare_arrays(sim=arrays.sim, obs=arrays.obs, subperiod=True) >>> round_(arrays.sim) nan, nan, nan, 2.0 >>> round_(arrays.obs) 5.0, nan, nan, nan
The final examples show the error messages returned in case of invalid combinations of input arguments:
>>> prepare_arrays() Traceback (most recent call last): ... ValueError: Neither a `Node` object is passed to argument `node` nor are arrays passed to arguments `sim` and `obs`.
>>> prepare_arrays(sim=node.sequences.sim.series, node=node) Traceback (most recent call last): ... ValueError: Values are passed to both arguments `sim` and `node`, which is not allowed.
>>> prepare_arrays(obs=node.sequences.obs.series, node=node) Traceback (most recent call last): ... ValueError: Values are passed to both arguments `obs` and `node`, which is not allowed.
>>> prepare_arrays(sim=node.sequences.sim.series) Traceback (most recent call last): ... ValueError: A value is passed to argument `sim` but no value is passed to argument `obs`.
>>> prepare_arrays(obs=node.sequences.obs.series) Traceback (most recent call last): ... ValueError: A value is passed to argument `obs` but no value is passed to argument `sim`.
-
class
hydpy.auxs.statstools.
Criterion
(*args, **kwds)[source]¶ Bases:
typing_extensions.Protocol
Callback protocol for efficiency criteria like
nse()
.
-
hydpy.auxs.statstools.
rmse
(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → float[source]¶ Calculate the root-mean-square error.
>>> from hydpy import rmse, round_ >>> rmse(sim=[1.0, 2.0, 3.0], obs=[1.0, 2.0, 3.0]) 0.0 >>> round_(rmse(sim=[1.0, 2.0, 3.0], obs=[0.5, 2.0, 4.5])) 0.912871
See the documentation on function
prepare_arrays()
for some additional instructions for usingrmse()
.
-
hydpy.auxs.statstools.
nse
(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → float[source]¶ Calculate the efficiency criteria after Nash & Sutcliffe.
If the simulated values predict the observed values as well as the average observed value (regarding the mean square error), the NSE value is zero:
>>> from hydpy import nse >>> nse(sim=[2.0, 2.0, 2.0], obs=[1.0, 2.0, 3.0]) 0.0 >>> nse(sim=[0.0, 2.0, 4.0], obs=[1.0, 2.0, 3.0]) 0.0
For worse and better agreement, the NSE is negative or positive, respectively:
>>> nse(sim=[3.0, 2.0, 1.0], obs=[1.0, 2.0, 3.0]) -3.0 >>> nse(sim=[1.0, 2.0, 2.0], obs=[1.0, 2.0, 3.0]) 0.5
The highest possible value is one:
>>> nse(sim=[1.0, 2.0, 3.0], obs=[1.0, 2.0, 3.0]) 1.0
See the documentation on function
prepare_arrays()
for some additional instructions for usingnse()
.
-
hydpy.auxs.statstools.
nse_log
(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → float[source]¶ Calculate the efficiency criteria after Nash & Sutcliffe for logarithmic values.
The following calculations repeat the ones of the documentation on function
nse()
but with exponentiated values. Hence, the results are similar or, as in the first and the last example, even identical:>>> from hydpy import nse_log, round_ >>> from numpy import exp >>> nse_log(sim=exp([2.0, 2.0, 2.0]), obs=exp([1.0, 2.0, 3.0])) 0.0 >>> nse_log(sim=exp([0.0, 2.0, 4.0]), obs=exp([1.0, 2.0, 3.0])) 0.0
>>> round_(nse(sim=exp([3.0, 2.0, 1.0]), obs=exp([1.0, 2.0, 3.0]))) -2.734185 >>> round_(nse(sim=exp([1.0, 2.0, 2.0]), obs=exp([1.0, 2.0, 3.0]))) 0.002139
>>> nse(sim=exp([1.0, 2.0, 3.0]), obs=exp([1.0, 2.0, 3.0])) 1.0
See the documentation on function
prepare_arrays()
for some additional instructions for usingnse_log()
.
-
hydpy.auxs.statstools.
corr2
(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → float[source]¶ Calculate the coefficient of determination via the square of the coefficient of correlation according to Bravais-Pearson.
For perfect positive or negative correlation,
corr2()
returns 1:>>> from hydpy import corr2, round_ >>> corr2(sim=[1.0, 2.0, 3.0], obs=[1.0, 2.0, 3.0]) 1.0 >>> corr2(sim=[3.0, 2.0, 1.0], obs=[1.0, 2.0, 3.0]) 1.0
If there is no correlation at all,
corr2()
returns 0:>>> corr2(sim=[1.0, 2.0, 3.0], obs=[1.0, 2.0, 1.0]) 0.0
An intermediate example:
>>> round_(corr2(sim=[2.0, 2.0, 1.0], obs=[1.0, 2.0, 3.0])) 0.75
Take care if there is no variation in one of the data series. Then the correlation coefficient is not defined, and
corr2()
returnsnan
:>>> corr2(sim=[2.0, 2.0, 2.0], obs=[2.0, 2.0, 3.0]) nan
See the documentation on function
prepare_arrays()
for some additional instructions for usingcorr2()
.
-
hydpy.auxs.statstools.
kge
(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → float[source]¶ Calculate the Kling-Gupta efficiency [21].
>>> from hydpy import kge, round_ >>> kge(sim=[1.0, 2.0, 3.0], obs=[1.0, 2.0, 3.0]) 1.0 >>> kge(sim=[3.0, 2.0, 1.0], obs=[1.0, 2.0, 3.0]) -3.0 >>> round_(kge(sim=[2.0, 2.0, 1.0], obs=[1.0, 2.0, 3.0])) -2.688461
See the documentation on function
prepare_arrays()
for some additional instructions for usingkge()
.
-
hydpy.auxs.statstools.
bias_abs
(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → float[source]¶ Calculate the absolute difference between the means of the simulated and the observed values.
>>> from hydpy import bias_abs, round_ >>> round_(bias_abs(sim=[2.0, 2.0, 2.0], obs=[1.0, 2.0, 3.0])) 0.0 >>> round_(bias_abs(sim=[5.0, 2.0, 2.0], obs=[1.0, 2.0, 3.0])) 1.0 >>> round_(bias_abs(sim=[1.0, 1.0, 1.0], obs=[1.0, 2.0, 3.0])) -1.0
See the documentation on function
prepare_arrays()
for some additional instructions for usingbias_abs()
.
-
hydpy.auxs.statstools.
bias_rel
(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → float[source]¶ Calculate the relative difference between the means of the simulated and the observed values.
>>> from hydpy import bias_rel, round_ >>> round_(bias_rel(sim=[2.0, 2.0, 2.0], obs=[1.0, 2.0, 3.0])) 0.0 >>> round_(bias_rel(sim=[5.0, 2.0, 2.0], obs=[1.0, 2.0, 3.0])) 0.5 >>> round_(bias_rel(sim=[1.0, 1.0, 1.0], obs=[1.0, 2.0, 3.0])) -0.5
See the documentation on function
prepare_arrays()
for some additional instructions for usingbias_rel()
.
-
hydpy.auxs.statstools.
std_ratio
(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → float[source]¶ Calculate the ratio between the standard deviation of the simulated and the observed values.
>>> from hydpy import round_, std_ratio >>> round_(std_ratio(sim=[1.0, 2.0, 3.0], obs=[1.0, 2.0, 3.0])) 0.0 >>> round_(std_ratio(sim=[1.0, 1.0, 1.0], obs=[1.0, 2.0, 3.0])) -1.0 >>> round_(std_ratio(sim=[0.0, 3.0, 6.0], obs=[1.0, 2.0, 3.0])) 2.0
See the documentation on function
prepare_arrays()
for some additional instructions for usingstd_ratio()
.
-
hydpy.auxs.statstools.
corr
(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → float[source]¶ Calculate the product-moment correlation coefficient after Pearson.
>>> from hydpy import corr, round_ >>> round_(corr(sim=[0.5, 1.0, 1.5], obs=[1.0, 2.0, 3.0])) 1.0 >>> round_(corr(sim=[4.0, 2.0, 0.0], obs=[1.0, 2.0, 3.0])) -1.0 >>> round_(corr(sim=[1.0, 2.0, 1.0], obs=[1.0, 2.0, 3.0])) 0.0
Take care if there is no variation in one of the data series. Then the correlation coefficient is not defined, and
corr()
returnsnan
:>>> round_(corr(sim=[2.0, 2.0, 2.0], obs=[1.0, 2.0, 3.0])) nan
See the documentation on function
prepare_arrays()
for some additional instructions for use of functioncorr()
.
-
hydpy.auxs.statstools.
hsepd_pdf
(*, sigma1: float, sigma2: float, xi: float, beta: float, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → hydpy.core.typingtools.Vector[float][source]¶ Calculate the probability densities based on the heteroskedastic skewed exponential power distribution.
For convenience, we store the required parameters of the probability density function as well as the simulated and observed values in a dictionary:
>>> import numpy >>> from hydpy import hsepd_pdf, round_ >>> general = {"sigma1": 0.2, ... "sigma2": 0.0, ... "xi": 1.0, ... "beta": 0.0, ... "sim": numpy.arange(10.0, 41.0), ... "obs": numpy.full(31, 25.0)}
The following test function allows for varying one parameter and prints some and plots all the probability density values corresponding to different simulated values:
>>> def test(**kwargs): ... from matplotlib import pyplot ... special = general.copy() ... name, values = list(kwargs.items())[0] ... results = numpy.zeros((len(general["sim"]), len(values)+1)) ... results[:, 0] = general["sim"] ... for jdx, value in enumerate(values): ... special[name] = value ... results[:, jdx+1] = hsepd_pdf(**special) ... pyplot.plot(results[:, 0], results[:, jdx+1], ... label="%s=%.1f" % (name, value)) ... pyplot.legend() ... for idx, result in enumerate(results): ... if not (idx % 5): ... round_(result)
When varying beta, the resulting probabilities correspond to the Laplace distribution (1.0), normal distribution (0.0), and the uniform distribution (-1.0), respectively. Note that we use -0.99 instead of -1.0 for approximating the uniform distribution to prevent from running into numerical problems, which are not solved yet:
>>> test(beta=[1.0, 0.0, -0.99]) 10.0, 0.002032, 0.000886, 0.0 15.0, 0.008359, 0.010798, 0.0 20.0, 0.034382, 0.048394, 0.057739 25.0, 0.141421, 0.079788, 0.057739 30.0, 0.034382, 0.048394, 0.057739 35.0, 0.008359, 0.010798, 0.0 40.0, 0.002032, 0.000886, 0.0
When varying xi, the resulting density is negatively skewed (0.2), symmetric (1.0), and positively skewed (5.0), respectively:
>>> test(xi=[0.2, 1.0, 5.0]) 10.0, 0.0, 0.000886, 0.003175 15.0, 0.0, 0.010798, 0.012957 20.0, 0.092845, 0.048394, 0.036341 25.0, 0.070063, 0.079788, 0.070063 30.0, 0.036341, 0.048394, 0.092845 35.0, 0.012957, 0.010798, 0.0 40.0, 0.003175, 0.000886, 0.0
In the above examples, the actual sigma (5.0) is calculated by multiplying sigma1 (0.2) with the mean simulated value (25.0), internally. This can be done for modelling homoscedastic errors. Instead, sigma2 is multiplied with the individual simulated values to account for heteroscedastic errors. With increasing values of sigma2, the resulting densities are modified as follows:
>>> test(sigma2=[0.0, 0.1, 0.2]) 10.0, 0.000886, 0.002921, 0.005737 15.0, 0.010798, 0.018795, 0.022831 20.0, 0.048394, 0.044159, 0.037988 25.0, 0.079788, 0.053192, 0.039894 30.0, 0.048394, 0.04102, 0.032708 35.0, 0.010798, 0.023493, 0.023493 40.0, 0.000886, 0.011053, 0.015771
See the documentation on function
prepare_arrays()
for some additional instructions for usinghsepd_pdf()
.
-
hydpy.auxs.statstools.
hsepd_manual
(*, sigma1: float, sigma2: float, xi: float, beta: float, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None) → float[source]¶ Calculate the mean of the logarithmic probability densities of the heteroskedastic skewed exponential power distribution.
The following examples stem from the documentation of function
hsepd_pdf()
, which is used by functionhsepd_manual()
. The first one deals with a heteroscedastic normal distribution:>>> from hydpy import hsepd_manual, round_ >>> round_(hsepd_manual(sigma1=0.2, sigma2=0.2, ... xi=1.0, beta=0.0, ... sim=numpy.arange(10.0, 41.0), ... obs=numpy.full(31, 25.0))) -3.682842
Too small probability density values are set to 1e-200 before calculating their logarithm (which means that the lowest possible value returned by function
hsepd_manual()
is approximately -460):>>> round_(hsepd_manual(sigma1=0.2, sigma2=0.0, ... xi=1.0, beta=-0.99, ... sim=numpy.arange(10.0, 41.0), ... obs=numpy.full(31, 25.0))) -209.539335
See the documentation on function
prepare_arrays()
for some additional instructions for usinghsepd_manual()
.
-
hydpy.auxs.statstools.
hsepd
(*, sim: Optional[hydpy.core.typingtools.VectorInput[float]] = None, obs: Optional[hydpy.core.typingtools.VectorInput[float]] = None, node: Optional[hydpy.core.devicetools.Node] = None, skip_nan: bool = False, subperiod: Optional[bool] = None, inits: Optional[Iterable[float]] = None, return_pars: bool = False, silent: bool = True) → Union[float, Tuple[float, Tuple[float, float, float, float]]][source]¶ Calculate the mean of the logarithmic probability densities of the heteroskedastic skewed exponential power distribution.
Function
hsepd()
serves the same purpose as functionhsepd_manual()
but tries to estimate the parameters of the heteroscedastic skewed exponential distribution via an optimisation algorithm. This is shown by generating a random sample. One thousant simulated values are scattered around the observed (true) value of 10.0 with a standard deviation of 2.0:>>> import numpy >>> numpy.random.seed(0) >>> sim = numpy.random.normal(10.0, 2.0, 1000) >>> obs = numpy.full(1000, 10.0)
First, as a reference, we calculate the “true” value based on function
hsepd_manual()
and the correct distribution parameters:>>> from hydpy import Period, hsepd, hsepd_manual, pub, round_ >>> round_(hsepd_manual( ... sigma1=0.2, sigma2=0.0, xi=1.0, beta=0.0, sim=sim, obs=obs)) -2.100093
When using function
hsepd()
, the returned value is even a little “better”:>>> round_(hsepd(sim=sim, obs=obs)) -2.09983
This is due to the deviation from the random sample to its theoretical distribution. This is reflected by small differences between the estimated values and the theoretical values of sigma1 (0.2), sigma2 (0.0), xi (1.0), and beta (0.0). The estimated values are returned in the mentioned order through enabling the return_pars option:
>>> value, pars = hsepd(sim=sim, obs=obs, return_pars=True) >>> round_(pars, decimals=5) 0.19966, 0.0, 0.96836, 0.0188
There is no guarantee that the optimisation numerical optimisation algorithm underlying function
hsepd()
will always find the parameters resulting in the largest value returned by functionhsepd_manual()
. You can increase its robustness (and decrease computation time) by supplying close initial parameter values:>>> value, pars = hsepd(sim=sim, obs=obs, return_pars=True, ... inits=(0.2, 0.0, 1.0, 0.0)) >>> round_(pars, decimals=5) 0.19966, 0.0, 0.96836, 0.0188
However, the following example shows a case when this strategy results in worse results:
>>> value, pars = hsepd(sim=sim, obs=obs, return_pars=True, ... inits=(0.0, 0.2, 1.0, 0.0)) >>> round_(value) -2.174492 >>> round_(pars) 0.0, 0.213179, 1.705485, 0.505112
See the documentation on function
prepare_arrays()
for some additional instructions for usinghsepd()
.
-
hydpy.auxs.statstools.
calc_mean_time
(timepoints: hydpy.core.typingtools.VectorInput[float], weights: hydpy.core.typingtools.VectorInput[float]) → float[source]¶ Return the weighted mean of the given time points.
With equal given weights, the result is simply the mean of the given time points:
>>> from hydpy import calc_mean_time >>> calc_mean_time(timepoints=[3., 7.], ... weights=[2., 2.]) 5.0
With different weights, the resulting time is shifted to the larger ones:
>>> calc_mean_time(timepoints=[3., 7.], ... weights=[1., 3.]) 6.0
Or, in the most extreme case:
>>> calc_mean_time(timepoints=[3., 7.], ... weights=[0., 4.]) 7.0
There are some checks for input plausibility, e.g.:
>>> calc_mean_time(timepoints=[3., 7.], ... weights=[-2., 2.]) Traceback (most recent call last): ... ValueError: While trying to calculate the weighted mean time, the following error occurred: For the following objects, at least one value is negative: weights.
-
hydpy.auxs.statstools.
calc_mean_time_deviation
(timepoints: hydpy.core.typingtools.VectorInput[float], weights: hydpy.core.typingtools.VectorInput[float], mean_time: Optional[float] = None) → float[source]¶ Return the weighted deviation of the given timepoints from their mean time.
With equal given weights, the is simply the standard deviation of the given time points:
>>> from hydpy import calc_mean_time_deviation >>> calc_mean_time_deviation(timepoints=[3., 7.], ... weights=[2., 2.]) 2.0
One can pass a precalculated mean time:
>>> from hydpy import round_ >>> round_(calc_mean_time_deviation(timepoints=[3., 7.], ... weights=[2., 2.], ... mean_time=4.)) 2.236068
>>> round_(calc_mean_time_deviation(timepoints=[3., 7.], ... weights=[1., 3.])) 1.732051
Or, in the most extreme case:
>>> calc_mean_time_deviation(timepoints=[3., 7.], ... weights=[0., 4.]) 0.0
There are some checks for input plausibility, e.g.:
>>> calc_mean_time_deviation(timepoints=[3., 7.], ... weights=[-2., 2.]) Traceback (most recent call last): ... ValueError: While trying to calculate the weighted time deviation from mean time, the following error occurred: For the following objects, at least one value is negative: weights.
-
hydpy.auxs.statstools.
print_evaluationtable
(*, nodes: Sequence[hydpy.core.devicetools.Node], criteria: Sequence[hydpy.auxs.statstools.Criterion], nodenames: Optional[Sequence[str]] = None, critnames: Optional[Sequence[str]] = None, critfactors: Union[float, Sequence[float]] = 1.0, critdigits: Union[int, Sequence[int]] = 2, subperiod: bool = True, average: bool = True, averagename: str = 'mean', filter_: float = 1.0, stepsize: Optional[typing_extensions.Literal[daily, d, monthly, m]] = None, aggregator: Union[str, Callable[[hydpy.core.typingtools.VectorInput[float]], float]] = 'mean', missingvalue: str = '-', decimalseperator: str = '.', file_: Optional[Union[str, TextIO]] = None) → None[source]¶ Print a table containing the results of the given evaluation criteria for the given
Node
objects.First, we define two nodes with different simulation and observation data (see function
prepare_arrays()
for some explanations):>>> from hydpy import pub, Node, nan >>> pub.timegrids = "01.01.2000", "04.01.2000", "1d" >>> nodes = Node("test1"), Node("test2") >>> for node in nodes: ... node.prepare_simseries() ... node.sequences.sim.series = 1.0, 2.0, 3.0 ... node.sequences.obs.activate_ram() ... node.sequences.obs.series = 4.0, 5.0, 6.0 >>> nodes[0].sequences.sim.series = 1.0, 2.0, 3.0 >>> nodes[0].sequences.obs.series = 4.0, 5.0, 6.0 >>> nodes[1].sequences.sim.series = 1.0, 2.0, 3.0 >>> with pub.options.checkseries(False): ... nodes[1].sequences.obs.series = 3.0, nan, 1.0
Selecting functions
corr()
andbias_abs()
as evaluation criteria, functionprint_evaluationtable()
prints the following table:>>> from hydpy import bias_abs, corr, print_evaluationtable >>> print_evaluationtable(nodes=nodes, ... criteria=(corr, bias_abs)) corr bias_abs test1 1.00 -3.00 test2 - - mean 1.00 -3.00
One can pass alternative names for the node objects, the criteria functions, and the row containing the average values. Also, one can use the filter_ argument to force printing statistics in case of incomplete observation data. In the following example, we accept a maximum fraction of missing data of 50 %:
>>> print_evaluationtable(nodes=nodes, ... criteria=(corr, bias_abs), ... nodenames=("first node", "second node"), ... critnames=("corrcoef", "bias"), ... critdigits=1, ... averagename="average", ... filter_=0.5) corrcoef bias first node 1.0 -3.0 second node -1.0 0.0 average 0.0 -1.5
The number of assigned node objects and criteria functions must match the number of given alternative names:
>>> print_evaluationtable(nodes=nodes, ... criteria=(corr, bias_abs), ... nodenames=("first node",)) Traceback (most recent call last): ... ValueError: While trying to evaluate the simulation results of some node objects, the following error occurred: 2 node objects are given which does not match with number of given alternative names being 1.
>>> print_evaluationtable(nodes=nodes, ... criteria=(corr, bias_abs), ... critnames=("corrcoef",)) Traceback (most recent call last): ... ValueError: While trying to evaluate the simulation results of some node objects, the following error occurred: 2 criteria functions are given which does not match with number of given alternative names being 1.
Set the average argument to
False
to omit the row containing the average values:>>> print_evaluationtable(nodes=nodes, ... criteria=(corr, bias_abs), ... average=False) corr bias_abs test1 1.00 -3.00 test2 - -
You can use the arguments critfactors and critdigits by passing either a single number or a sequence of criteria-specific numbers to modify the printed values:
>>> print_evaluationtable(nodes=nodes, ... criteria=(corr, bias_abs), ... critfactors=(10.0, 0.1), ... critdigits=1) corr bias_abs test1 10.0 -0.3 test2 - - mean 10.0 -0.3
By default, function
print_evaluationtable()
prints the statics relevant for the actual evaluation period only:>>> pub.timegrids.eval_.dates = "01.01.2000", "02.01.2000" >>> print_evaluationtable(nodes=nodes, ... criteria=(corr, bias_abs)) corr bias_abs test1 - -3.00 test2 - -2.00 mean - -2.50
You can deviate from this default behaviour by setting the subperiod argument to
False
:>>> print_evaluationtable(nodes=nodes, ... criteria=(corr, bias_abs), ... subperiod=False) corr bias_abs test1 1.00 -3.00 test2 - - mean 1.00 -3.00
Use the stepsize argument (eventually in combination with argument aggregator) to print the statistics of previously aggregated time series. See function
aggregate_series()
for further information.Here, the daily aggregation step size results in identical results as the original step size is also one day:
>>> pub.timegrids.eval_ = pub.timegrids.init >>> print_evaluationtable(nodes=nodes, ... criteria=(corr, bias_abs), ... stepsize="daily", ... aggregator="mean") corr bias_abs test1 1.00 -3.00 test2 - - mean 1.00 -3.00
For the monthly step size, the result table is empty, due to the too short initialisation period covering less than a month:
>>> pub.timegrids.eval_.dates = pub.timegrids.init.dates >>> print_evaluationtable(nodes=nodes, ... criteria=(corr, bias_abs), ... stepsize="monthly", ... aggregator="mean") corr bias_abs test1 - - test2 - - mean - -