Project structure¶
This section describes the typical file structure of HydPy projects, comprising network files, control files, condition files, and series files. We refer to the HydPy-H-Lahn example project to illustrate our explanations, which has the following file structure:
HydPy-H-Lahn
conditions
control
network
series
default
All project files are in the project’s sub-subdirectories. Except for conditions, these sub-subdirectories are usually named default. You can add directories with different names to, for example, hold the parameter values of multiple calibrations in one project.
HydPy offers functionalities for reading and writing project files. Besides time series files, all project files are usually Python files (ending with “.py”). HydPy writes them so that they appear like “normal” configuration files. From the programmer’s perspective, this requires some tricks, which we mention in the respective subsections. One main advantage is that you can copy individual configuration fragments into a Python console to check precisely how they work. We frequently use this feature throughout the documentation.
HydPy allows users to deviate from the default file structure and, due to its design as a Python library, even to set up projects directly via scripts or define alternative file formats, but these are more advanced topics left for the reference manual.
Network files¶
Network files define a HydPy project’s network by introducing and coupling elements and nodes. Consider the following minimal example of a network file’s content:
>>> from hydpy import Node, Element
>>> _ = Node("n", variable="Q")
>>> _ = Element("e1", outlets="n")
>>> _ = Element("e2", inlets="n")
Node n connects element e1 with element e2, so we have a network of three devices. From the perspective of e1, n is an outlet node, and from the perspective of e2, an inlet node:
>>> assert Element("e1").outlets.n is Element("e2").inlets.n
Given the curt names, we cannot safely guess the purposes of e1 and e2 because network files are model-agnostic. The only thing for sure is that the model of e1 must be able to route discharge to a downstream node, and the model of e2 must be able to receive discharge from an upstream node. Hence, we could use this network file for various model-type combinations.
The names of elements and nodes serve as their identifiers, which means you never make two node or two element instances with the same name. If it looks like you create a new instance, you actually just get a reference to an already existing one (possibly with already set up node connections):
>>> Element("e1")
Element("e1",
outlets="n")
Each network file corresponds to one selection. The HydPy-H-Lahn project
defines three selections: one for all stream models and two for
the land models in the headwater and non-headwater subbasins. The
combination of all individual selections gives a selection named “complete”, which is
automatically creatable by property complete
of class Selections
.
The described “name as identifier” mechanism allows us to define the same device in multiple network files of the same project. So, one can create an arbitrary number of selections to structure the same network after different criteria. The only (self-evident) requisite is the consistency of all individual definitions. You cannot, for example, add an inlet node to an element if it is already the same element’s outlet node:
>>> Element("e1", inlets="n")
Traceback (most recent call last):
...
ValueError: For element `e1`, the given inlet node `n` is already defined as a(n) outlet node, which is not allowed.
Besides these standards, the reference manual covers many
features which help to organise HydPy projects (see, for example, the keyword
features of class Device
and its collection type Devices
) or to build more complex
networks, for example, those that pass on different types of data (configurable by the
variable
attribute of class Node
).
Control files¶
Control files select model types, prepare model instances, and set parameter values. Each element defined in the network files requires one control file, which sets up its main model, including all submodels.
The HydPy-H-Lahn project relies on two main model types: the land model hland_96
and the stream model musk_classic
.
The control file “stream_dill_assl_lahn2.py”, for example, selects the latter for routing
the outflow of the subbasin Dill to a location in the river Lahn. The control file is
short because musk_classic
is relatively simple. The first (Python-code) line
selects the model type by a so-called “wildcard import”, making all relevant
information directly available:
>>> from hydpy.models.musk_classic import *
The following line defines a simulation time step size of one hour:
>>> simulationstep("1h")
Note that the simulationstep()
line is optional. It allows for adjusting parameter
values that depend on the simulation time step size, so one can set up a model for
testing purposes without embedding it in a complete project. However, when executing
the file within the context of a project, the project’s simulation step counts (HydPy
then ignores the control file’s specification) so that the same control file works for
different simulation time step sizes.
The parameterstep()
line is similar but mandatory. It defines the time unit of the
subsequently specified values of time-dependent parameters. The given example
selects a parameter time step size of one day:
>>> parameterstep("1d")
Note
A note for programmers: Function parameterstep()
prepares a suitable model instance
and makes it and its main components directly available in the local namespace.
This trick allows for the simple further model preparation steps.
As in nearly all cases, the discussed control file only sets the required values of control parameters and does not modify the predefined values of other parameter groups. The parameter value specifications are not conducted via “assignment expressions” but “bracket expressions”, like when calling a regular function:
>>> nmbsegments(lag=0.417)
>>> coefficients(damp=0.0)
Here, the parameter values are not set directly via positional arguments but by
parameter-specific keyword arguments unique to the classes NmbSegments
and Coefficients
. Note that the lag argument is time-dependent and
so, according to the specified parameter step size, is given in days, while the “true”
value of the NmbSegments
instance refers to the simulation step size of
one hour:
>>> nmbsegments.value == round(24.0 * 0.417)
True
Due to the higher complexity of hland_96
, the control file “land_dill_assl.py” is much
longer. We focus on a few aspects not relevant to musk_mct
. Therefore, we must
first clear the local namespace (one could also just start a fresh Python process):
>>> from hydpy import reverse_model_wildcard_import
>>> reverse_model_wildcard_import()
hland_96
requires submodels and the control file must select them. It does so by
importing the main model (hland_96
) by a wildcard import but all submodels
(evap_aet_hbv96
, evap_pet_hbv96
, and rconc_uh
) by a module import:
>>> from hydpy.models.hland_96 import *
>>> from hydpy.models import evap_aet_hbv96
>>> from hydpy.models import evap_pet_hbv96
>>> from hydpy.models import rconc_uh
The time step-related lines work as described above:
>>> simulationstep("1h")
>>> parameterstep("1d")
The subbasin’s area is set via a positional argument:
>>> area(692.3)
The parameter NmbZones
is notable, as it requires integer values and,
more importantly, modifies the shape of other parameters. After setting its value, you
can prepare parameters with zone-specific values like ZoneArea
:
>>> zonearea.shape
Traceback (most recent call last):
...
hydpy.core.exceptiontools.AttributeNotReady: Shape information for variable `zonearea` can only be retrieved after it has been defined.
>>> nmbzones(12)
>>> assert nmbzones == zonearea.shape[0]
>>> zonearea(14.41, 7.06, 70.83, 84.36, 70.97, 198.0, 27.75, 130.0, 27.28,
... 56.94, 1.09, 3.61)
Strictly speaking, NmbZones
is specific to the
HydPy-H model family. Still, there are many models which rely on
hydrological response units, stream segments, or different forms of (spatial)
subdivisions and use the same logic of a control parameter defining the number of
subdivisions and many parameters or sequences shaped as vectors or matrixes to handle
different values for individual (spatial) units.
Another example of a HydPy-H-speciality, which also follows a general HydPy design principle, is the definition of “spatial types” (mostly land use types) via constants. HydPy-H provides such constants for defining the types of the individual zones:
>>> zonetype(FIELD, FOREST, FIELD, FOREST, FIELD, FOREST, FIELD, FOREST, FIELD,
... FOREST, FIELD, FOREST)
When preparing zone-specific parameters, you can decide between defining individual, land type-specific, and subbasin-wide values:
>>> zonez(2.0, 2.0, 3.0, 3.0, 4.0, 4.0, 5.0, 5.0, 6.0, 6.0, 7.0, 7.0)
>>> cfmax(field=4.55853, forest=2.735118)
>>> fc(278.0)
Often, one does not wish to define individual values for each control file but more
general ones. HydPy supports this via “auxiliary files”. In the discussed control
file, the parameter PCorr
instance takes its value from the auxiliary
file “land.py” (to get this working in a doctest requires changing the working
directory):
>>> import os
>>> from hydpy import data
>>> os.chdir(os.path.join(data.__path__[0], "HydPy-H-Lahn", "control", "default"))
>>> pcorr(auxfile="land")
>>> pcorr
pcorr(1.0)
All submodels are generally added at a control file’s end because they might expect
some main model parameters to be already prepared. Each main model provides a suitable
method for adding specific submodel types. Such methods should be applied after a
with statement. Within the subsequent with block, one can directly set the
submodel’s parameters as explained above. The discussed control file uses the
add_rconcmodel_v1()
method to add a rconc_uh
instance (and configures its Unit Hydrograph ordinates in a triangle shape):
>>> with model.add_rconcmodel_v1(rconc_uh):
... uh("triangle", tb=0.36728)
>>> from hydpy import print_vector
>>> print_vector(model.rconcmodel.parameters.control.uh.values)
0.02574, 0.077221, 0.128701, 0.180182, 0.213581, 0.170644, 0.119163,
0.067682, 0.017086
Adding a sub-submodel to a submodel works via nested with blocks:
>>> with model.add_aetmodel_v1(evap_aet_hbv96):
... temperaturethresholdice(nan)
... soilmoisturelimit(0.9)
... excessreduction(0.0)
... with model.add_petmodel_v1(evap_pet_hbv96):
... airtemperaturefactor(0.1)
... altitudefactor(0.0)
... precipitationfactor(0.02)
... evapotranspirationfactor(1.0)
The last example covers two new cases. First, nan
serves to mark “missing” or
“not required” values. Parameter TemperatureThresholdIce
requires no
values because it only applies to ILAKE
zones, while the Dill
subbasin only consists of FIELD
and FOREST
zones.
Second, main models often transmit some parameter values to their submodels, which
helps to avoid duplicate and potentially inconsistent definitions. In the discussed
control file, this applies, for example, to the parameter pairs
NmbZones
and NmbHRU
and FC
and
MaxSoilWater
:
>>> assert nmbzones == model.aetmodel.parameters.control.nmbhru
>>> assert fc == model.aetmodel.parameters.control.maxsoilwater
Before writing a control file, one should read the documentation of the relevant application models in the reference manual, which provides complete lists of the control parameters that need configuration, detailed application examples, and much more.
Condition files¶
Condition files represent model states and logged data at a particular time point. They are usually written at the end of a simulation run and later read before simulating another period that starts where the old one has ended. Instead, their names usually include the prefix init (for initial conditions) and a suffix indicating the relevant date, using underscores as separators. Each element defined in the network files requires one condition file, and so each condition file corresponds to one main model and one control file.
Condition files are similar to control files but almost always shorter and simpler. We take the condition file of the Dill subbasin for 1 January 1996 as an example, which, like the discussed control file, starts with a wildcard import that selects the relevant main model:
>>> from hydpy.models.hland_96 import *
Opposed to the control file, importing the relevant submodels is unnecessary, as they must already be available before reading the condition file.
The following call of function controlcheck()
is optional when working with a complete
HydPy project but required when executing a condition file independently for testing
(for the following doctests to work, we must not only remove the old wildcard import
artifacts but also fake to be “inside” a condition file by taking its name on):
>>> reverse_model_wildcard_import()
>>> temp = __file__
>>> __file__ = "land_dill_assl.py"
>>> controlcheck(projectdir=r"HydPy-H-Lahn", controldir="default", firstdate="1996-01-01", stepsize="1d")
>>> __file__ = temp
This step builds a connection to the corresponding control file. We need this
connection for interactive testing because, for example, the shape of some condition
sequences depends on the control parameter NmbZones
:
>>> assert model.parameters.control.nmbzones == ic.shape[0]
The name controlcheck()
reflects that the function enables checking whether a condition
file is consistent with the corresponding control file.
Note
A note for programmers: Behind the scenes, controlcheck()
operates like the control
file function parameterstep()
to simplify the appearance of condition files.
Setting their values works like for control parameters with “bracket expressions” but without land type-specific options because condition sequences usually contain calculated values that tend to be dissimilar for all zones:
>>> sm(185.13164, 181.18755, 199.80432, 196.55888, 212.04018, 209.48859,
... 222.12115, 220.12671, 230.30756, 228.70779, 236.91943, 235.64427)
>>> uz(7.25228)
Setting the conditions of submodels requires writing the complete paths to the respective sequences (we might add a more convenient syntax based on the with statement later):
>>> model.rconcmodel.sequences.logs.quh(0.0)
Series files¶
HydPy currently supports three different time series file formats, of which the ASCII and the NetCDF-CF format should be the right choice in most applications.
HydPy’s ASCII format (file ending “.asc”) is simpler but less efficient. Each file
stores the time series of one sequence type for one element or node. By default, the
filename follows a strict pattern, defined by the filename
property of
class IOSequence
. land_dill_assl_hland_96_input_p.asc is an example filename of a
time series managed by an element (or, more precisely, an element’s model) and
dill_ass_obs_q.asc is an example filename of a time series managed by a node. Both
start with the respective device names (land_dill_assl and dill_assl), defined by
the properties descr_device
of class ModelSequence
and
descr_device
of class NodeSequence
. The rest of the files’ basenames
are defined by property descr_sequence
of class ModelSequence
, which
combines the model type (hland_96), the sequence group (input) and the sequence
type (p), and property descr_sequence
of class NodeSequence
, which
combines the sequence type (obs) and the handled variable (q).
Internally, each ASCII file starts with information about the covered data period and
the temporal resolution, described via a Timegrid
instance. Consider the following
example:
>>> from hydpy import Timegrid
>>> timegrid = Timegrid("1996-01-01", "1996-01-05", "1d")
The two dates define the start of the first and the end of the last data interval.
Hence, the example Timegrid
instance would be suitable for a time series file
containing, for example, the precipitation sums of four days:
>>> assert len(timegrid) == 4
The data section after the Timegrid
header contains no time stamps. So, temporal
equidistance is strictly required, with missing values marked as nan
. The
individual time series of non-scalar sequences are placed in tab-separated columns.
HydPy’s NetCDF-CF file format (file ending “.nc”) is much more compact, usually times faster, and supports reading and writing data “just in time” during simulation runs. On the downside, it is more opaque and hard to handle because it stores all data in binary form. It follows the NetCDF Climate and Forecast (CF) Metadata Conventions and is, for example, compatible with Delft-FEWS.
You can use function summarise_ncfile()
to gain insights into HydPy-compatible NetCDF
files. Here, we let it show the structure of the NetCDF precipitation input file of
the HydPy-H-Lahn example project:
>>> filepath = os.path.join(
... data.__path__[0], "HydPy-H-Lahn", "series", "default", "hland_96_input_p.nc"
... )
>>> from hydpy import repr_, summarise_ncfile
>>> print(repr_(summarise_ncfile(filepath)))
GENERAL
file path = .../hydpy/data/HydPy-H-Lahn/series/default/hland_96_input_p.nc
file format = NETCDF4
disk format = HDF5
Attributes
hydts_timeRef = begin
title = Daily total precipitation sum HydPy-H-HBV96 model river Lahn
project = Open Source Project HydPy - A Python framework for the development and application of hydrological models
version = v5.0
institution = HydPy Developers
author = Bastian Klein (klein@bafg.de)
contact = Bastian Klein (klein@bafg.de), Dennis Meissner (meissner@bafg.de)
source = Deutscher Wetterdienst, gridded_precipitation_dataset_(HYRAS-DE PRE) v5.0 spatially averaged over HydPy-H-HBV96 subbasins
conditions_of_use = The use of the data is free of charge. The data is licensed under Attribution-NonCommercial-ShareAlike International 4.0 (CC BY-NC-SA 4.0).
citation = Klein, B. & D. Meissner (2024): Daily total precipitation sum HydPy-H-HBV96 model river Lahn. HydPy Developers [Data set]. Database Deutscher Wetterdienst gridded_precipitation_dataset_(HYRAS-DE PRE) v5.0 spatially averaged over HydPy-H-HBV96 subbasins
url = https://github.com/hydpy-dev
Conventions = CF-1.8
timereference = left interval boundary
history = 2024-09-12 10:08:29 UTC: created by R-package hydts using ncdf4 package
date_created = 2024-09-12 10:08:29 UTC
created_by = R version 4.4.0 Patched (2024-05-13 r86547 ucrt), packages hydts (version 1.15.1), ncdf4 (version 1.22)
DIMENSIONS
stations = 4
time = 11384
str_len = 40
VARIABLES
time
dimensions = time
shape = 11384
data type = float64
Attributes
units = days since 1900-01-01 00:00:00 +0100
long_name = time
axis = T
calendar = standard
hland_96_input_p
dimensions = time, stations
shape = 11384, 4
data type = float64
Attributes
units = mm
_FillValue = -9999.0
long_name = Daily Precipitation Sum
station_id
dimensions = stations, str_len
shape = 4, 40
data type = |S1
Attributes
long_name = station or node identification code
station_names
dimensions = stations, str_len
shape = 4, 40
data type = |S1
Attributes
long_name = station or node name
river_names
dimensions = stations, str_len
shape = 4, 40
data type = |S1
Attributes
long_name = river name
TIME GRID
first date = 1989-11-01 00:00:00+01:00
last date = 2021-01-01 00:00:00+01:00
step size = 1d
The time series of all sequences of the same type are stored in one file. So, by default, a NetCDF filename is shorter than an ASCII filename as it does not need a device-specific prefix (for example, hland_96_input_p.nc instead of land_dill_assl_hland_96_input_p.asc and obs_q.asc instead of dill_assl_obs_q.asc). The device names are instead managed by a file-internal NetCDF variable named station_id, whose shape is determined by the NetCDF dimensions stations (usually the number of devices, but see below) and char_leng_name (usually the longest device name, but see below). The name of the data variable agrees with the respective file basename (in the given examples, hland_96_input_p and obs_q).
The second NetCDF variable used for describing the data layout is named time, whose
shape is determined by a NetCDF dimension also named time. This variable contains
floating point numbers representing, for example, the elapsed days between a reference
date and the actual date (see method to_cfunits()
of class Date
for some
examples).
As far as we know, the NetCDF-CF convention does not clarify if these time points define the start or the end time points of data measurement intervals (left timestep vs right timestamp). As a surrogate, HydPy inserts an attribute named timereference when writing a NetCDF file, with the possible values left interval boundary and right interval boundary for “interval data” and current time for “time point data”. We advise also adding this attribute when using other tools for writing NetCDF files to be read by HydPy.
The time series are aligned in a (2-dimensional) matrix, with the first axis reflecting the time and the second axis reflecting the location. There are additional columns for multi-dimensional sequences that address sublocations (for example, hydrological response units). The station_id variable distinguishes them by suffixing their indexes to the device name.
HydPy writes all floating-point data with a precision of 64 bits. It is not strictly required but, in some situations, might prove beneficial to design externally prepared NetCDF files with the same accuracy.
See the documentation on module netcdftools
, which uses many examples to explain the
NetCDF-CF format in more detail.
The third supported time series file format relies on the Numpy format (file ending “.npy”). It resembles the ASCII format but saves data in binary form. We only recommend if if one requires a more efficient alternative to the ASCII format and a less complex alternative to the NetCDF format.
All time series files can specify dates with or without time zone information. Without
time zone information, HydPy usually assumes the currently selected
utcoffset
, which defaults to +60 minutes. The only exception is for NetCDF
files, where it always assumes UTC+00 in compliance with the NetCDF-CF conventions.
A new HydPy feature, applicable for all file formats but only realised for the group of
input sequences so far, is the alternative usage of standard names. Class
StandardInputNames
lists these standard names, and the input sequences of all model
types reference one of them. When switching the time series naming
convention
from model-specific to HydPy, the filename
“land_dill_assl_hland_96_input_p.asc” becomes “land_dill_assl_precipitation.asc” and
“hland_96_input_p.nc” becomes “precipitation.nc”. Such a standardisation often means a
relevant simplification when dealing with multiple model types.