seriestools¶
This module provides features for working with time series.
Module seriestools
implements the following members:
aggregate_series()
Aggregate the time series on a monthly or daily basis.
-
hydpy.core.seriestools.
aggregate_series
(series: hydpy.core.typingtools.VectorInput[float], stepsize: typing_extensions.Literal[daily, d, monthly, m] = 'monthly', aggregator: Union[str, Callable[[hydpy.core.typingtools.VectorInput[float]], float]] = 'mean', subperiod: bool = True, basetime: str = '00:00') → pandas.core.frame.DataFrame[source]¶ Aggregate the time series on a monthly or daily basis.
Often, we need some kind of aggregation before analysing deviations between simulation results and observations. Function
aggregate_series()
performs such aggregation on a monthly or daily basis. You are free to specify arbitrary aggregation functions.We first show the default behaviour of function
aggregate_series()
, which is to calculate monthly averages. Therefore, we first say the hydrological summer half-year 2001 to be our simulation period and define a daily simulation step size:>>> from hydpy import aggregate_series, pub, Node >>> pub.timegrids = "01.11.2000", "01.05.2001", "1d"
Next, we prepare a
Node
object and assign some constantly increasing values to its simulation series:>>> import numpy >>> node = Node("test") >>> node.prepare_simseries() >>> sim = node.sequences.sim >>> sim.series = numpy.arange(1, 181+1)
aggregate_series()
returns the data within index-sortedSeries
objects (note that the index addresses the left boundary of each time step:>>> aggregate_series(series=sim.series) series 2000-11-01 15.5 2000-12-01 46.0 2001-01-01 77.0 2001-02-01 106.5 2001-03-01 136.0 2001-04-01 166.5
The following example shows how to restrict the considered period via the
eval_
Timegrid
of theTimegrids
object available in thepub
module and how to pass a different aggregation function:>>> pub.timegrids.eval_.dates = "2001-01-01", "2001-03-01" >>> aggregate_series(series=sim.series, aggregator=numpy.sum) series 2001-01-01 2387.0 2001-02-01 2982.0
Functions
aggregate_series()
raises errors like the following for unsuitable functions:>>> def wrong(): ... return None >>> aggregate_series(series=sim.series, aggregator=wrong) Traceback (most recent call last): ... TypeError: While trying to aggregate the given series, the following error occurred: While trying to perform the aggregation based on method `wrong`, the following error occurred: wrong() takes 0 positional arguments but 1 was given
When passing a string,
aggregate_series()
queries it fromnumpy
:>>> pub.timegrids.eval_.dates = "2001-01-01", "2001-02-01" >>> aggregate_series(series=sim.series, aggregator="sum") series 2001-01-01 2387.0
aggregate_series()
raises the following error when the requested function does not exist:>>> aggregate_series(series=sim.series, aggregator="Sum") Traceback (most recent call last): ... ValueError: While trying to aggregate the given series, the following error occurred: Module `numpy` does not provide a function named `Sum`.
To prevent from wrong conclusions,
aggregate_series()
generally ignores all data of incomplete intervals:>>> pub.timegrids = "2000-11-30", "2001-04-02", "1d" >>> node.prepare_simseries() >>> sim.series = numpy.arange(30, 152+1) >>> sim = node.sequences.sim >>> aggregate_series(series=sim.series, aggregator="sum") series 2000-12-01 1426.0 2001-01-01 2387.0 2001-02-01 2982.0 2001-03-01 4216.0
>>> pub.timegrids.eval_.dates = "2001-01-02", "2001-02-28" >>> aggregate_series(series=sim.series) Empty DataFrame Columns: [series] Index: []
If you want to analyse the data of the complete initialisation period independently of the state of
eval_
, set argument subperiod toFalse
:>>> aggregate_series(series=sim.series, aggregator="sum", subperiod=False) series 2000-12-01 1426.0 2001-01-01 2387.0 2001-02-01 2982.0 2001-03-01 4216.0
The following example shows that even with only one missing value at the respective ends of the simulation period,
aggregate_series()
does not return any result for the first (November 2000) and the last aggregation interval (April 2001):>>> pub.timegrids = "02.11.2000", "30.04.2001", "1d" >>> node.prepare_simseries() >>> sim.series = numpy.arange(2, 180+1) >>> aggregate_series(series=node.sequences.sim.series) series 2000-12-01 46.0 2001-01-01 77.0 2001-02-01 106.5 2001-03-01 136.0
Now we prepare a time-grid with an hourly simulation step size, to show some examples on daily aggregation:
>>> pub.timegrids = "01.01.2000 22:00", "05.01.2000 22:00", "1h" >>> node.prepare_simseries() >>> sim = node.sequences.sim >>> sim.series = numpy.arange(1, 1+4*24)
By default, function
aggregate_series()
aggregates daily from 0 o’clock to 0 o’clock, which here results in a loss of the first two and the last 22 values of the entire period:>>> aggregate_series(series=sim.series, stepsize="daily") series 2000-01-02 14.5 2000-01-03 38.5 2000-01-04 62.5
If you want the aggregation to start at a different time of the day, use the basetime argument. In our example, starting at 22 o’clock fits the defined initialisation time grid and ensures the usage of all available data:
>>> aggregate_series(series=sim.series, stepsize="daily", basetime="22:00") series 2000-01-01 22:00:00 12.5 2000-01-02 22:00:00 36.5 2000-01-03 22:00:00 60.5 2000-01-04 22:00:00 84.5
So far, the basetime argument works for daily aggregation only:
>>> aggregate_series(series=sim.series, stepsize="monthly", basetime="22:00") Traceback (most recent call last): ... ValueError: While trying to aggregate the given series, the following error occurred: Use the `basetime` argument in combination with a `daily` aggregation step size only.
aggregate_series()
does not support aggregation for simulation step sizes larger one day:>>> pub.timegrids = "01.01.2000 22:00", "05.01.2000 22:00", "1d" >>> node.prepare_simseries() >>> sim = node.sequences.sim >>> sim.series = numpy.arange(1, 1+4) >>> aggregate_series(series=sim.series, stepsize="daily") series 2000-01-02 2.0 2000-01-03 3.0 2000-01-04 4.0
>>> pub.timegrids = "01.01.2000 22:00", "05.01.2000 22:00", "2d" >>> node.prepare_simseries() >>> aggregate_series(series=node.sequences.sim.series, stepsize="daily") Traceback (most recent call last): ... ValueError: While trying to aggregate the given series, the following error occurred: Data aggregation is not supported for simulation step sizes greater one day.
We are looking forward supporting other useful aggregation step sizes later:
>>> pub.timegrids = "01.01.2000 22:00", "05.01.2000 22:00", "1d" >>> node.prepare_simseries() >>> aggregate_series(series=node.sequences.sim.series, stepsize="yearly") Traceback (most recent call last): ... ValueError: While trying to aggregate the given series, the following error occurred: Argument `stepsize` received value `yearly`, but only the following ones are supported: `monthly` (default) and `daily`.