seriestools

This module provides features for working with time series.

Module seriestools implements the following members:


hydpy.core.seriestools.aggregate_series(series: hydpy.core.typingtools.VectorInput[float], stepsize: typing_extensions.Literal[daily, d, monthly, m] = 'monthly', aggregator: Union[str, Callable[[hydpy.core.typingtools.VectorInput[float]], float]] = 'mean', subperiod: bool = True, basetime: str = '00:00')pandas.core.frame.DataFrame[source]

Aggregate the time series on a monthly or daily basis.

Often, we need some kind of aggregation before analysing deviations between simulation results and observations. Function aggregate_series() performs such aggregation on a monthly or daily basis. You are free to specify arbitrary aggregation functions.

We first show the default behaviour of function aggregate_series(), which is to calculate monthly averages. Therefore, we first say the hydrological summer half-year 2001 to be our simulation period and define a daily simulation step size:

>>> from hydpy import aggregate_series, pub, Node
>>> pub.timegrids = "01.11.2000", "01.05.2001", "1d"

Next, we prepare a Node object and assign some constantly increasing values to its simulation series:

>>> import numpy
>>> node = Node("test")
>>> node.prepare_simseries()
>>> sim = node.sequences.sim
>>> sim.series = numpy.arange(1, 181+1)

aggregate_series() returns the data within index-sorted Series objects (note that the index addresses the left boundary of each time step:

>>> aggregate_series(series=sim.series)
            series
2000-11-01    15.5
2000-12-01    46.0
2001-01-01    77.0
2001-02-01   106.5
2001-03-01   136.0
2001-04-01   166.5

The following example shows how to restrict the considered period via the eval_ Timegrid of the Timegrids object available in the pub module and how to pass a different aggregation function:

>>> pub.timegrids.eval_.dates = "2001-01-01", "2001-03-01"
>>> aggregate_series(series=sim.series, aggregator=numpy.sum)
            series
2001-01-01  2387.0
2001-02-01  2982.0

Functions aggregate_series() raises errors like the following for unsuitable functions:

>>> def wrong():
...     return None
>>> aggregate_series(series=sim.series, aggregator=wrong)
Traceback (most recent call last):
...
TypeError: While trying to aggregate the given series, the following error occurred: While trying to perform the aggregation based on method `wrong`, the following error occurred: wrong() takes 0 positional arguments but 1 was given

When passing a string, aggregate_series() queries it from numpy:

>>> pub.timegrids.eval_.dates = "2001-01-01", "2001-02-01"
>>> aggregate_series(series=sim.series, aggregator="sum")
            series
2001-01-01  2387.0

aggregate_series() raises the following error when the requested function does not exist:

>>> aggregate_series(series=sim.series, aggregator="Sum")
Traceback (most recent call last):
...
ValueError: While trying to aggregate the given series, the following error occurred: Module `numpy` does not provide a function named `Sum`.

To prevent from wrong conclusions, aggregate_series() generally ignores all data of incomplete intervals:

>>> pub.timegrids = "2000-11-30", "2001-04-02", "1d"
>>> node.prepare_simseries()
>>> sim.series = numpy.arange(30, 152+1)
>>> sim = node.sequences.sim
>>> aggregate_series(series=sim.series, aggregator="sum")
            series
2000-12-01  1426.0
2001-01-01  2387.0
2001-02-01  2982.0
2001-03-01  4216.0
>>> pub.timegrids.eval_.dates = "2001-01-02", "2001-02-28"
>>> aggregate_series(series=sim.series)
Empty DataFrame
Columns: [series]
Index: []

If you want to analyse the data of the complete initialisation period independently of the state of eval_, set argument subperiod to False:

>>> aggregate_series(series=sim.series, aggregator="sum", subperiod=False)
            series
2000-12-01  1426.0
2001-01-01  2387.0
2001-02-01  2982.0
2001-03-01  4216.0

The following example shows that even with only one missing value at the respective ends of the simulation period, aggregate_series() does not return any result for the first (November 2000) and the last aggregation interval (April 2001):

>>> pub.timegrids = "02.11.2000", "30.04.2001", "1d"
>>> node.prepare_simseries()
>>> sim.series = numpy.arange(2, 180+1)
>>> aggregate_series(series=node.sequences.sim.series)
            series
2000-12-01    46.0
2001-01-01    77.0
2001-02-01   106.5
2001-03-01   136.0

Now we prepare a time-grid with an hourly simulation step size, to show some examples on daily aggregation:

>>> pub.timegrids = "01.01.2000 22:00", "05.01.2000 22:00", "1h"
>>> node.prepare_simseries()
>>> sim = node.sequences.sim
>>> sim.series = numpy.arange(1, 1+4*24)

By default, function aggregate_series() aggregates daily from 0 o’clock to 0 o’clock, which here results in a loss of the first two and the last 22 values of the entire period:

>>> aggregate_series(series=sim.series, stepsize="daily")
            series
2000-01-02    14.5
2000-01-03    38.5
2000-01-04    62.5

If you want the aggregation to start at a different time of the day, use the basetime argument. In our example, starting at 22 o’clock fits the defined initialisation time grid and ensures the usage of all available data:

>>> aggregate_series(series=sim.series, stepsize="daily", basetime="22:00")
                     series
2000-01-01 22:00:00    12.5
2000-01-02 22:00:00    36.5
2000-01-03 22:00:00    60.5
2000-01-04 22:00:00    84.5

So far, the basetime argument works for daily aggregation only:

>>> aggregate_series(series=sim.series, stepsize="monthly", basetime="22:00")
Traceback (most recent call last):
...
ValueError: While trying to aggregate the given series, the following error occurred: Use the `basetime` argument in combination with a `daily` aggregation step size only.

aggregate_series() does not support aggregation for simulation step sizes larger one day:

>>> pub.timegrids = "01.01.2000 22:00", "05.01.2000 22:00", "1d"
>>> node.prepare_simseries()
>>> sim = node.sequences.sim
>>> sim.series = numpy.arange(1, 1+4)
>>> aggregate_series(series=sim.series, stepsize="daily")
            series
2000-01-02     2.0
2000-01-03     3.0
2000-01-04     4.0
>>> pub.timegrids = "01.01.2000 22:00", "05.01.2000 22:00", "2d"
>>> node.prepare_simseries()
>>> aggregate_series(series=node.sequences.sim.series, stepsize="daily")
Traceback (most recent call last):
...
ValueError: While trying to aggregate the given series, the following error occurred: Data aggregation is not supported for simulation step sizes greater one day.

We are looking forward supporting other useful aggregation step sizes later:

>>> pub.timegrids = "01.01.2000 22:00", "05.01.2000 22:00", "1d"
>>> node.prepare_simseries()
>>> aggregate_series(series=node.sequences.sim.series, stepsize="yearly")
Traceback (most recent call last):
...
ValueError: While trying to aggregate the given series, the following error occurred: Argument `stepsize` received value `yearly`, but only the following ones are supported: `monthly` (default) and `daily`.