netcdftools¶
This module extends the features of module filetools
for loading data from and
storing data to netCDF4 files, consistent with the NetCDF Climate and Forecast (CF)
Metadata Conventions.
Usually, we apply the features implemented in this module only indirectly by using
the context managers netcdfreading()
and
netcdfwriting()
. However, here we try to be a little more explicit by
using their underlying methods. Therefore, we need to follow three steps:
Call either method
open_netcdfreader()
or methodopen_netcdfwriter()
of theSequenceManager
object available in modulepub
to prepare aNetCDFInterface
object for reading or writing.Call either the usual reading or writing methods of other HydPy classes like method
load_fluxseries()
of classHydPy
or methodsave_stateseries()
of classElements
. The preparedNetCDFInterface
object collects all requests of those sequences one wants to read from or write to NetCDF files.Finalise reading or writing by calling either method
close_netcdfreader()
orclose_netcdfwriter()
.
Step 2 is a logging process only, telling the NetCDFInterface
object which data needs
to be read or written. The actual reading from or writing to NetCDF files is triggered
by step 3.
During step 2, the NetCDFInterface
object and its subobjects are accessible, allowing
to inspect their current state or modify their behaviour.
The following real code examples show how to perform these three steps both for reading
and writing data, based on the example configuration defined by function
prepare_io_example_1()
:
>>> from hydpy.examples import prepare_io_example_1
>>> nodes, elements = prepare_io_example_1()
(1) We prepare a NetCDFInterface
object for writing data by calling the method
open_netcdfwriter()
:
>>> from hydpy import pub
>>> pub.sequencemanager.open_netcdfwriter()
We tell the
SequenceManager
to write all the time-series data to NetCDF files:
>>> pub.sequencemanager.filetype = "nc"
(3) We store all the time-series handled by the Node
and Element
objects of the
example dataset by calling save_allseries()
of class Nodes
and
save_allseries()
of class Elements
. (In real cases, you would not write the
with TestIO(): line. This code block makes sure we pollute the IO testing directory
instead of our current working directory):
>>> from hydpy import TestIO
>>> with TestIO():
... nodes.save_allseries()
... elements.save_allseries()
(4) We again log all sequences, but after telling the SequenceManager
to average each
time series spatially:
>>> with TestIO(), pub.sequencemanager.aggregation("mean"):
... nodes.save_allseries()
... elements.save_allseries()
(5) We can now navigate into the details of the logged time series data via the
NetCDFInterface
object and its subobjects. For example, we can query the logged flux
sequence objects of type NKor
belonging to application model lland_v1
(those of elements element1 and element2; the trailing numbers are the indices of
the relevant hydrological response units):
>>> writer = pub.sequencemanager.netcdfwriter
>>> writer.lland_v1_flux_nkor.subdevicenames
('element1_0', 'element2_0', 'element2_1')
(6) In the example discussed here, all sequences belong to the same folder (default). Storing sequences in separate folders goes hand in hand with storing them in separate NetCDF files. In such cases, you must include the folder in the attribute name:
>>> writer.foldernames
('default',)
>>> writer.default_lland_v1_flux_nkor.subdevicenames
('element1_0', 'element2_0', 'element2_1')
(7) We close the NetCDFInterface
object, which is the moment where the writing
process happens. After that, the interface object is not available anymore:
>>> from hydpy import TestIO
>>> with TestIO():
... pub.sequencemanager.close_netcdfwriter()
>>> pub.sequencemanager.netcdfwriter
Traceback (most recent call last):
...
hydpy.core.exceptiontools.AttributeNotReady: The sequence file manager does currently handle no NetCDF writer object.
(8) We set the time series values of two test sequences to zero to demonstrate that reading the data back in actually works:
>>> nodes.node2.sequences.sim.series = 0.0
>>> elements.element2.model.sequences.fluxes.nkor.series = 0.0
(9) We move up a gear and and prepare a NetCDFInterface
object for reading data, log
all NodeSequence
and ModelSequence
objects, and read their time series data from
the created NetCDF file. We temporarily disable the checkseries
option to
prevent raising an exception when reading incomplete data from the files:
>>> with TestIO(), pub.options.checkseries(False):
... pub.sequencemanager.open_netcdfreader()
... nodes.load_simseries()
... elements.load_allseries()
... pub.sequencemanager.close_netcdfreader()
We check if the data is available via the test sequences again:
>>> nodes.node2.sequences.sim.series
InfoArray([64., 65., 66., 67.])
>>> elements.element2.model.sequences.fluxes.nkor.series
InfoArray([[16., 17.],
[18., 19.],
[20., 21.],
[22., 23.]])
>>> pub.sequencemanager.netcdfreader
Traceback (most recent call last):
...
RuntimeError: The sequence file manager does currently handle no NetCDF reader object.
(11) We cannot invert spatial aggregation. Hence reading averaged time series is left
for postprocessing tools. To show that writing the averaged series worked, we access
both relevant NetCDF files more directly using the underlying NetCDF4 library (note
that averaging 1-dimensional time series as those of node sequence Sim
is allowed for
the sake of consistency):
>>> from hydpy.core.netcdftools import netcdf4
>>> from numpy import array
>>> filepath = "project/series/default/node_sim_q_mean.nc"
>>> with TestIO(), netcdf4.Dataset(filepath) as ncfile:
... array(ncfile["sim_q"][:])
array([[60.],
[61.],
[62.],
[63.]])
>>> filepath = "project/series/default/lland_v1_flux_nkor_mean.nc"
>>> with TestIO(), netcdf4.Dataset(filepath) as ncfile:
... array(ncfile["flux_nkor"][:])[:, 1]
array([16.5, 18.5, 20.5, 22.5])
Besides the testing-related specialities, the described workflow is more or less
standard but allows for different modifications. We illustrate them in the
documentation of the other features implemented in module netcdftools
but also the
documentation on class SequenceManager
of module filetools
and class IOSequence
of module sequencetools
.
Using the NetCDF format allows reading or writing data “just in time” during simulation
runs. The documentation of class “HydPy” explains how to select and set the relevant
IOSequence
objects for this option. See the documentation on method
provide_jitaccess()
of class NetCDFInterface
for more in-depth
information.
Module netcdftools
implements the following members:
str2chars()
Return andarray
object containing the byte characters (second axis) of all given strings (first axis).
chars2str()
Inversion function ofstr2chars()
.
create_dimension()
Add a new dimension with the given name and length to the given NetCDF file.
create_variable()
Add a new variable with the given name, datatype, and dimensions to the given NetCDF file.
query_variable()
Return the variable with the given name from the given NetCDF file.
query_timegrid()
Return theTimegrid
defined by the given NetCDF file.
query_array()
Return the data of the variable with the given name from the given NetCDF file.
get_filepath()
Return the path of the given NetCDF file.
JITAccessInfo
Helper class for structuring reading from or writing to a NetCDF file “just in time” during a simulation run for a specificNetCDFVariableFlat
object.
JITAccessHandler
Handler used by theSequenceManager
object available in modulepub
for reading data from and/or writing data to NetCDF files at each step of a simulation run.
NetCDFInterface
Interface betweenSequenceManager
and multiple NetCDF files.
Subdevice2Index
Return type of methodquery_subdevice2index()
.
NetCDFVariableBase
Base class forNetCDFVariableAgg
andNetCDFVariableFlat
.
NetCDFVariableAgg
Relates objects of a specificIOSequence
subclass with a single NetCDF variable for writing aggregated time series data.
NetCDFVariableFlat
Relates objects of a specificIOSequence
subclass with a single NetCDF variable for reading or writing their complete time-series data.
- hydpy.core.netcdftools.dimmapping = {'nmb_characters': 'char_leng_name', 'nmb_subdevices': 'stations', 'nmb_timepoints': 'time'}¶
Dimension-related terms within NetCDF files.
You can change this mapping if it does not suit your requirements. For example, change the value of the keyword “nmb_subdevices” if you prefer to call this dimension “location” instead of “stations” within NetCDF files:
>>> from hydpy.core.netcdftools import dimmapping >>> dimmapping["nmb_subdevices"] = "location"
- hydpy.core.netcdftools.varmapping = {'subdevices': 'station_id', 'timepoints': 'time'}¶
Variable-related terms within NetCDF files.
You can change this mapping if it does not suit your requirements. For example, change the value of the keyword “timepoints” if you prefer to call this variable “period” instead of “time” within NetCDF files:
>>> from hydpy.core.netcdftools import varmapping >>> varmapping["timepoints"] = "period"
- hydpy.core.netcdftools.fillvalue = nan¶
Default fill value for writing NetCDF files.
You can set another
float
value before writing a NetCDF file:>>> from hydpy.core import netcdftools >>> netcdftools.fillvalue = -777.0
- hydpy.core.netcdftools.str2chars(strings: Sequence[str]) ndarray[Any, dtype[bytes]] [source]¶
Return a
ndarray
object containing the byte characters (second axis) of all given strings (first axis).>>> from hydpy.core.netcdftools import str2chars >>> str2chars(['street', 'St.', 'Straße', 'Str.']) array([[b's', b't', b'r', b'e', b'e', b't', b''], [b'S', b't', b'.', b'', b'', b'', b''], [b'S', b't', b'r', b'a', b'Ã', b'', b'e'], [b'S', b't', b'r', b'.', b'', b'', b'']], dtype='|S1')
>>> str2chars([]) array([], shape=(0, 0), dtype='|S1')
- hydpy.core.netcdftools.chars2str(chars: ndarray[Any, dtype[bytes]]) List[str] [source]¶
Inversion function of
str2chars()
.>>> from hydpy.core.netcdftools import chars2str
>>> chars2str([[b"s", b"t", b"r", b"e", b"e", b"t", b""], ... [b"S", b"t", b".", b"", b"", b"", b""], ... [b"S", b"t", b"r", b"a", b"\xc3", b"\x9f", b"e"], ... [b"S", b"t", b"r", b".", b"", b"", b""]]) ['street', 'St.', 'Straße', 'Str.']
>>> chars2str([]) []
>>> chars2str([[b"s", b"t", b"r", b"e", b"e", b"t"], ... [b"S", b"t", b".", b"", b"", b""], ... [b"S", b"t", b"r", b"a", b"\xc3", b"e"], ... [b"S", b"t", b"r", b".", b"", b""]]) Traceback (most recent call last): ... ValueError: Cannot decode `b'Stra\xc3e'` (not UTF-8 compliant).
- hydpy.core.netcdftools.create_dimension(ncfile: Dataset, name: str, length: int) None [source]¶
Add a new dimension with the given name and length to the given NetCDF file.
Essentially,
create_dimension()
only calls the equally named method of the NetCDF library but adds information to possible error messages:>>> from hydpy import TestIO >>> from hydpy.core.netcdftools import netcdf4 >>> with TestIO(): ... ncfile = netcdf4.Dataset("test.nc", "w") >>> from hydpy.core.netcdftools import create_dimension >>> create_dimension(ncfile, "dim1", 5) >>> dim = ncfile.dimensions["dim1"] >>> dim.size if hasattr(dim, "size") else dim 5
>>> try: ... create_dimension(ncfile, "dim1", 5) ... except BaseException as exc: ... print(exc) While trying to add dimension `dim1` with length `5` to the NetCDF file `test.nc`, the following error occurred: ...
>>> ncfile.close()
- hydpy.core.netcdftools.create_variable(ncfile: Dataset, name: str, datatype: str, dimensions: Sequence[str]) None [source]¶
Add a new variable with the given name, datatype, and dimensions to the given NetCDF file.
Essentially,
create_variable()
only calls the equally named method of the NetCDF library but adds information to possible error messages:>>> from hydpy import TestIO >>> from hydpy.core.netcdftools import netcdf4 >>> with TestIO(): ... ncfile = netcdf4.Dataset("test.nc", "w") >>> from hydpy.core.netcdftools import create_variable >>> try: ... create_variable(ncfile, "var1", "f8", ("dim1",)) ... except BaseException as exc: ... print(str(exc).strip('"')) While trying to add variable `var1` with datatype `f8` and dimensions `('dim1',)` to the NetCDF file `test.nc`, the following error occurred: ...
>>> from hydpy.core.netcdftools import create_dimension >>> create_dimension(ncfile, "dim1", 5) >>> create_variable(ncfile, "var1", "f8", ("dim1",)) >>> import numpy >>> numpy.array(ncfile["var1"][:]) array([nan, nan, nan, nan, nan])
>>> ncfile.close()
- hydpy.core.netcdftools.query_variable(ncfile: Dataset, name: str) Variable [source]¶
Return the variable with the given name from the given NetCDF file.
Essentially,
query_variable()
only queries the variable via keyword access using the NetCDF library but adds information to possible error messages:>>> from hydpy.core.netcdftools import query_variable >>> from hydpy import TestIO >>> from hydpy.core.netcdftools import netcdf4 >>> with TestIO(): ... file_ = netcdf4.Dataset("model.nc", "w") >>> query_variable(file_, "flux_prec") Traceback (most recent call last): ... RuntimeError: NetCDF file `model.nc` does not contain variable `flux_prec`.
>>> from hydpy.core.netcdftools import create_variable >>> create_variable(file_, "flux_prec", "f8", ()) >>> isinstance(query_variable(file_, "flux_prec"), netcdf4.Variable) True
>>> file_.close()
- hydpy.core.netcdftools.query_timegrid(ncfile: Dataset, sequence: IOSequence) Timegrid [source]¶
Return the
Timegrid
defined by the given NetCDF file.query_timegrid()
relies on the timereference attribute of the given NetCDF file, if available, and falls back to the globaltimestampleft
option when necessary. The NetCDF files of the LahnH example project (and all other NetCDF files written by HydPy) include such information:>>> from hydpy.examples import prepare_full_example_2 >>> hp, pub, TestIO = prepare_full_example_2() >>> from netCDF4 import Dataset >>> filepath = "LahnH/series/default/hland_v1_input_p.nc" >>> with TestIO(), Dataset(filepath) as ncfile: ... ncfile.timereference 'left interval boundary'
We start our examples considering the input sequence
P
, which handles precipitation sums.query_timegrid()
requires an instance ofP
to determine that each value of the time series of the NetCDF file references a time interval and not a time point:>>> p = hp.elements.land_dill.model.sequences.inputs.p
If the file-specific setting does not collide with the current value of
timestampleft
,query_timegrid()
works silently:>>> from hydpy.core.netcdftools import query_timegrid >>> with TestIO(), Dataset(filepath) as ncfile: ... query_timegrid(ncfile, p) Timegrid("1996-01-01 00:00:00", "2007-01-01 00:00:00", "1d")
If a file-specific setting is missing,
query_timegrid()
applies the currenttimestampleft
value:>>> with TestIO(), Dataset(filepath, "r+") as ncfile: ... del ncfile.timereference >>> from hydpy.core.testtools import warn_later >>> with TestIO(), Dataset(filepath) as ncfile: ... query_timegrid(ncfile, p) Timegrid("1996-01-01 00:00:00", "2007-01-01 00:00:00", "1d")
>>> with TestIO(), Dataset(filepath) as ncfile, pub.options.timestampleft(False): ... query_timegrid(ncfile, p) Timegrid("1995-12-31 00:00:00", "2006-12-31 00:00:00", "1d")
If the file-specific setting and
timestampleft
conflict,query_timegrid()
favours the file attribute and warns about this assumption:>>> with TestIO(), Dataset(filepath, "r+") as ncfile: ... ncfile.timereference = "right interval boundary" >>> with TestIO(), warn_later(), Dataset(filepath) as ncfile: ... query_timegrid(ncfile, p) Timegrid("1995-12-31 00:00:00", "2006-12-31 00:00:00", "1d") UserWarning: The `timereference` attribute (`right interval boundary`) of the NetCDF file `...hland_v1_input_p.nc` conflicts with the current value of the global `timestampleft` option (`True`). The file-specific information is prioritised.
State sequences like
SM
handle data for specific time points instead of time intervals. Theirseries
vector contains the calculated values for the end of each simulation step. Hence, without file-specific information,query_timegrid()
ignores thetimestampleft
option and follows the right interval boundary convention:>>> sm = hp.elements.land_dill.model.sequences.states.sm >>> with TestIO(), Dataset(filepath, "r+") as ncfile: ... del ncfile.timereference >>> with TestIO(), Dataset(filepath) as ncfile: ... query_timegrid(ncfile, sm) Timegrid("1995-12-31 00:00:00", "2006-12-31 00:00:00", "1d")
Add a timereference attribute with the value current time to explicitly include this information in a NetCDF file:
>>> with TestIO(), Dataset(filepath, "r+") as ncfile: ... ncfile.timereference = "current time" >>> with TestIO(), Dataset(filepath) as ncfile: ... query_timegrid(ncfile, sm) Timegrid("1995-12-31 00:00:00", "2006-12-31 00:00:00", "1d")
query_timegrid()
raises special warnings when a NetCDF file’s timereference attribute conflicts with its judgement whether the contained data addresses time intervals or time points:>>> with TestIO(), warn_later(), Dataset(filepath) as ncfile: ... query_timegrid(ncfile, p) Timegrid("1995-12-31 00:00:00", "2006-12-31 00:00:00", "1d") UserWarning: The `timereference` attribute (`current time`) of the NetCDF file `...hland_v1_input_p.nc` conflicts with the type of the relevant sequence (`P`). The file-specific information is prioritised.
>>> with TestIO(), Dataset(filepath, "r+") as ncfile: ... ncfile.timereference = "left interval boundary" >>> with TestIO(), warn_later(), Dataset(filepath) as ncfile: ... query_timegrid(ncfile, sm) Timegrid("1996-01-01 00:00:00", "2007-01-01 00:00:00", "1d") UserWarning: The `timereference` attribute (`left interval boundary`) of the NetCDF file `...hland_v1_input_p.nc` conflicts with the type of the relevant sequence (`SM`). The file-specific information is prioritised.
query_timegrid()
also raises specific warnings for misstated timereference attributes describing the different fallbacks for data related to time intervals and time points:>>> with TestIO(), Dataset(filepath, "r+") as ncfile: ... ncfile.timereference = "wrong" >>> with TestIO(), warn_later(), Dataset(filepath) as ncfile: ... query_timegrid(ncfile, p) Timegrid("1996-01-01 00:00:00", "2007-01-01 00:00:00", "1d") UserWarning: The value of the `timereference` attribute (`wrong`) of the NetCDF file `...hland_v1_input_p.nc` is not among the accepted values (`left...`, `right...`, `current...`). Assuming `left interval boundary` according to the current value of the global `timestampleft` option.
>>> with TestIO(), warn_later(), Dataset(filepath) as ncfile: ... query_timegrid(ncfile, sm) Timegrid("1995-12-31 00:00:00", "2006-12-31 00:00:00", "1d") UserWarning: The value of the `timereference` attribute (`wrong`) of the NetCDF file `...hland_v1_input_p.nc` is not among the accepted values (`left...`, `right...`, `current...`). Assuming `current time` according to the type of the relevant sequence (`SM`).
- hydpy.core.netcdftools.query_array(ncfile: Dataset, name: str) ndarray[Any, dtype[float64]] [source]¶
Return the data of the variable with the given name from the given NetCDF file.
The following example shows that
query_array()
returnsnan
entries for representing missing values even when the respective NetCDF variable defines a different fill value:>>> from hydpy import TestIO >>> from hydpy.core import netcdftools >>> from hydpy.core.netcdftools import netcdf4, create_dimension, create_variable >>> import numpy >>> with TestIO(): ... with netcdf4.Dataset("test.nc", "w") as ncfile: ... create_dimension(ncfile, "time", 2) ... create_dimension(ncfile, "stations", 3) ... netcdftools.fillvalue = -999.0 ... create_variable(ncfile, "var", "f8", ("time", "stations")) ... netcdftools.fillvalue = numpy.nan ... ncfile = netcdf4.Dataset("test.nc", "r") >>> from hydpy.core.netcdftools import query_variable, query_array >>> query_variable(ncfile, "var")[:].data array([[-999., -999., -999.], [-999., -999., -999.]]) >>> query_array(ncfile, "var") array([[nan, nan, nan], [nan, nan, nan]]) >>> ncfile.close()
Usually, HydPy expects all data variables in NetCDF files to be 2-dimensional, with time on the first and location on the second axis. However,
query_array()
allows for an exception for compatibility with Delft-FEWS. When working with ensembles, Delft-FEWS defines a third dimension called realization and puts it between the first dimension (time) and the last dimension (stations). In our experience, this additional dimension is always of length one, meaning we can safely ignore it:>>> with TestIO(): ... with netcdf4.Dataset("test.nc", "w") as ncfile: ... create_dimension(ncfile, "time", 2) ... create_dimension(ncfile, "realization", 1) ... create_dimension(ncfile, "stations", 3) ... var = create_variable(ncfile, "var", "f8", ... ("time", "realization", "stations")) ... ncfile["var"][:] = [[[1.1, 1.2, 1.3]], [[2.1, 2.2, 2.3]]] ... ncfile = netcdf4.Dataset("test.nc", "r") >>> var = query_variable(ncfile, "var")[:] >>> var.shape (2, 1, 3) >>> query_array(ncfile, "var").shape (2, 3) >>> query_array(ncfile, "var") array([[1.1, 1.2, 1.3], [2.1, 2.2, 2.3]]) >>> ncfile.close()
query_array()
raises errors if dimensionality is smaller than two or larger than three or if there are three dimensions and the length of the second dimension is not one:>>> with TestIO(): ... with netcdf4.Dataset("test.nc", "w") as ncfile: ... create_dimension(ncfile, "time", 2) ... var = create_variable(ncfile, "var", "f8", ("time",)) ... with netcdf4.Dataset("test.nc", "r") as ncfile: ... query_array(ncfile, "var") Traceback (most recent call last): ... RuntimeError: Variable `var` of NetCDF file `test.nc` must be 2-dimensional (or 3-dimensional with a length of one on the second axis) but has the shape `(2,)`.
>>> with TestIO(): ... with netcdf4.Dataset("test.nc", "w") as ncfile: ... create_dimension(ncfile, "time", 2) ... create_dimension(ncfile, "realization", 2) ... create_dimension(ncfile, "stations", 3) ... var = create_variable(ncfile, "var", "f8", ... ("time", "realization", "stations")) ... with netcdf4.Dataset("test.nc", "r") as ncfile: ... query_array(ncfile, "var") Traceback (most recent call last): ... RuntimeError: Variable `var` of NetCDF file `test.nc` must be 2-dimensional (or 3-dimensional with a length of one on the second axis) but has the shape `(2, 2, 3)`.
The skipping of the realization axis is very specific to Delft-FEWS. To prevent hiding problems when reading erroneous data from other sources,
query_array()
emits the following warning if the name of the second dimension is not realization:>>> from hydpy.core.testtools import warn_later >>> with TestIO(): ... with netcdf4.Dataset("test.nc", "w") as ncfile: ... create_dimension(ncfile, "time", 2) ... create_dimension(ncfile, "realisation", 1) ... create_dimension(ncfile, "stations", 3) ... var = create_variable(ncfile, "var", "f8", ... ("time", "realisation", "stations")) ... with netcdf4.Dataset("test.nc", "r") as ncfile, warn_later(): ... query_array(ncfile, "var") array([[nan, nan, nan], [nan, nan, nan]]) UserWarning: Variable `var` of NetCDF file `test.nc` is 3-dimensional and the length of the second dimension is one, but its name is `realisation` instead of `realization`.
- hydpy.core.netcdftools.get_filepath(ncfile: Dataset) str [source]¶
Return the path of the given NetCDF file.
>>> from hydpy import TestIO >>> from hydpy.core.netcdftools import netcdf4 >>> from hydpy.core.netcdftools import get_filepath >>> with TestIO(): ... with netcdf4.Dataset("test.nc", "w") as ncfile: ... get_filepath(ncfile) 'test.nc'
- class hydpy.core.netcdftools.JITAccessInfo(ncvariable: netcdf4.Variable, realisation: bool, timedelta: int, columns: Tuple[int, ...], data: NDArrayFloat)[source]¶
Bases:
NamedTuple
Helper class for structuring reading from or writing to a NetCDF file “just in time” during a simulation run for a specific
NetCDFVariableFlat
object.- ncvariable: Variable¶
Variable for the direct access to the relevant section of the NetCDF file.
- realisation: bool¶
Flag that indicates if the relevant
ncvariable
comes with an additional realization dimension (explained in the documentation on functionquery_array()
)
- timedelta: int¶
Difference between the relevant row of the NetCDF file and the current simulation index (as defined by
Idx_Sim
).
- class hydpy.core.netcdftools.JITAccessHandler(readers: Tuple[JITAccessInfo, ...], writers: Tuple[JITAccessInfo, ...])[source]¶
Bases:
NamedTuple
Handler used by the
SequenceManager
object available in modulepub
for reading data from and/or writing data to NetCDF files at each step of a simulation run.- readers: Tuple[JITAccessInfo, ...]¶
All
JITAccessInfo
objects responsible for reading data during the simulation run.
- writers: Tuple[JITAccessInfo, ...]¶
All
JITAccessInfo
objects responsible for writing data during the simulation run.
- class hydpy.core.netcdftools.NetCDFInterface[source]¶
Bases:
object
Interface between
SequenceManager
and multiple NetCDF files.The core task of class
NetCDFInterface
is to distribute differentIOSequence
objects on multiple instances of classNetCDFVariableBase
.(1) We prepare a
SequenceManager
object and some devices handling different sequences by applying functionprepare_io_example_1()
:>>> from hydpy.examples import prepare_io_example_1 >>> nodes, elements = prepare_io_example_1()
(2) We collect all sequences used in the following examples except
NKor
of element element1, which we reserve for special tests:>>> sequences = [] >>> for node in nodes: ... sequences.append(node.sequences.sim) >>> for element in elements: ... if element.model.name == "hland_v1": ... sequences.append(element.model.sequences.states.sp) ... else: ... sequences.append(element.model.sequences.inputs.nied) ... if element.name != "element1": ... sequences.append(element.model.sequences.fluxes.nkor)
(3) We prepare a
NetCDFInterface
object and log and write all test sequences exceptNKor
of element element1.NetCDFInterface
initialises oneNetCDFVariableFlat
and oneNetCDFVariableAgg
object for eachIOSequence
subtype:>>> from hydpy.core.netcdftools import NetCDFInterface >>> interface = NetCDFInterface() >>> len(interface) 0
>>> from hydpy import pub, TestIO >>> with TestIO(): ... for sequence in sequences: ... _ = interface.log(sequence, sequence.series) ... _ = interface.log(sequence, sequence.average_series()) >>> len(interface) 14
We change the relevant directory before logging the reserved sequence.
NetCDFInterface
initialises two newNetCDFVariableBase
objects, despite otherNetCDFVariableBase
objects related to the same sequence type being already available:>>> nkor = elements.element1.model.sequences.fluxes.nkor >>> with TestIO(): ... pub.sequencemanager.currentdir = "test" ... _ = interface.log(nkor, nkor.series) ... _ = interface.log(nkor, nkor.average_series()) >>> len(interface) 16
You can query all relevant folder names, filenames and variable names via properties
foldernames
,filenames
, andvariablenames
:>>> from hydpy import print_values >>> print_values(interface.foldernames) default, test >>> print_values(interface.filenames) hland_v1_state_sp, hland_v1_state_sp_mean, lland_v1_flux_nkor, lland_v1_flux_nkor_mean, lland_v1_input_nied, lland_v1_input_nied_mean, lland_v2_flux_nkor, lland_v2_flux_nkor_mean, lland_v2_input_nied, lland_v2_input_nied_mean, node_sim_q, node_sim_q_mean, node_sim_t, node_sim_t_mean >>> interface.variablenames ('flux_nkor', 'input_nied', 'sim_q', 'sim_t', 'state_sp')
NetCDFInterface
provides attribute access to itsNetCDFVariableBase
instances, both via their filenames and the combination of its folder names and filenames:>>> interface.node_sim_q is interface.default_node_sim_q True >>> print_values(sorted(set(dir(interface)) - set(object.__dir__(interface)))) default_hland_v1_state_sp, default_hland_v1_state_sp_mean, default_lland_v1_flux_nkor, default_lland_v1_flux_nkor_mean, default_lland_v1_input_nied, default_lland_v1_input_nied_mean, default_lland_v2_flux_nkor, default_lland_v2_flux_nkor_mean, default_lland_v2_input_nied, default_lland_v2_input_nied_mean, default_node_sim_q, default_node_sim_q_mean, default_node_sim_t, default_node_sim_t_mean, hland_v1_state_sp, hland_v1_state_sp_mean, lland_v1_input_nied, lland_v1_input_nied_mean, lland_v2_flux_nkor, lland_v2_flux_nkor_mean, lland_v2_input_nied, lland_v2_input_nied_mean, node_sim_q, node_sim_q_mean, node_sim_t, node_sim_t_mean, test_lland_v1_flux_nkor, test_lland_v1_flux_nkor_mean
If multiple NetCDF files have the same name, you must prefix the relevant folder name:
>>> interface.lland_v1_flux_nkor Traceback (most recent call last): ... AttributeError: The current NetCDFInterface object handles multiple NetCDF files named `lland_v1_flux_nkor`. Please be more specific. >>> hasattr(interface, "default_lland_v1_flux_nkor") True
NetCDFInterface
raises the following error for completely wrong attribute names:>>> interface.lland_v1 Traceback (most recent call last): ... AttributeError: The current NetCDFInterface object neither handles a NetCDF file named `lland_v1` nor does it define a member named `lland_v1`.
(4) We write all NetCDF files into the default folder of the testing directory, defined by
prepare_io_example_1()
:>>> from hydpy import TestIO >>> with TestIO(): ... interface.write()
(5) We define a shorter initialisation period and re-activate the time series of the test sequences:
>>> from hydpy import pub >>> pub.timegrids = "02.01.2000", "04.01.2000", "1d" >>> for sequence in sequences: ... sequence.prepare_series(allocate_ram=False) ... sequence.prepare_series(allocate_ram=True) >>> nkor.prepare_series(allocate_ram=False) >>> nkor.prepare_series(allocate_ram=True)
(6) We again initialise class
NetCDFInterface
, log all test sequences, and read the test data of the defined subperiod:>>> interface = NetCDFInterface() >>> with TestIO(): ... _ = interface.log(nkor, nkor.series) ... pub.sequencemanager.currentdir = "default" ... for sequence in sequences: ... _ = interface.log(sequence, None) ... interface.read() >>> nodes.node1.sequences.sim.series InfoArray([61., 62.]) >>> elements.element2.model.sequences.fluxes.nkor.series InfoArray([[18., 19.], [20., 21.]]) >>> elements.element4.model.sequences.states.sp.series InfoArray([[[74., 75., 76.], [77., 78., 79.]], [[80., 81., 82.], [83., 84., 85.]]])
- log(sequence: IOSequence, infoarray: InfoArray | None = None) NetCDFVariableFlat | NetCDFVariableAgg [source]¶
Prepare a
NetCDFVariableBase
object suitable for the givenIOSequence
object, when necessary, and pass the given arguments to itslog()
method.
- read() None [source]¶
Call method
read()
of all handledNetCDFVariableBase
objects.
- write() None [source]¶
Call method
write()
of all handledNetCDFVariableBase
objects.
- provide_jitaccess(deviceorder: Iterable[Node | Element]) Iterator[JITAccessHandler] [source]¶
Allow method
simulate()
of classHydPy
to read data from or write data to NetCDF files “just in time” during simulation runs.We consider it unlikely users need ever to call the method
provide_jitaccess()
directly. See the documentation on classHydPy
on applying it indirectly. However, the following explanations might give some additional insights into the options and limitations of the related functionalities.You can only either read from or write to each NetCDF file. We think this should rarely be a limitation for the anticipated workflows. One particular situation where one could eventually try to read and write simultaneously is when trying to overwrite some of the available input data. The following example tries to read the input data for all “headwater” catchments from specific NetCDF files but defines zero input values for all “non-headwater” catchments and tries to write them into the same files:
>>> from hydpy.examples import prepare_full_example_1 >>> prepare_full_example_1() >>> from hydpy import HydPy, print_values, pub, TestIO >>> with TestIO(): ... hp = HydPy("LahnH") ... pub.timegrids = "1996-01-01", "1996-01-05", "1d" ... hp.prepare_network() ... hp.prepare_models() ... hp.load_conditions() ... headwaters = pub.selections["headwaters"].elements ... nonheadwaters = pub.selections["nonheadwaters"].elements ... headwaters.prepare_inputseries(allocate_ram=False, read_jit=True) ... nonheadwaters.prepare_inputseries(allocate_ram=True, write_jit=True) ... for element in nonheadwaters: ... for sequence in element.model.sequences.inputs: ... sequence.series = 0.0 ... hp.simulate() Traceback (most recent call last): ... RuntimeError: While trying to prepare NetCDF files for reading or writing data "just in time" during the current simulation run, the following error occurred: For a specific NetCDF file, you can either read or write data during a simulation run but for file `...hland_v1_input_p.nc` both is requested.
Clearly, each NetCDF file we want to read data from needs to span the current simulation period:
>>> with TestIO(): ... pub.timegrids.init.firstdate = "1990-01-01" ... pub.timegrids.sim.firstdate = "1995-01-01" ... hp.prepare_inputseries(allocate_ram=False, read_jit=True) ... hp.simulate() Traceback (most recent call last): ... RuntimeError: While trying to prepare NetCDF files for reading or writing data "just in time" during the current simulation run, the following error occurred: The data of the NetCDF `...hland_v1_input_p.nc` (Timegrid("1996-01-01 00:00:00", "2007-01-01 00:00:00", "1d")) does not correctly cover the current simulation period (Timegrid("1995-01-01 00:00:00", "1996-01-05 00:00:00", "1d")).
However, each NetCDF file selected for writing must also cover the complete initialisation period. If there is no adequately named NetCDF file,
provide_jitaccess()
creates a new one for the current initialisation period. If an adequately named file exists,provide_jitaccess()
uses it without any attempt to extend it temporally or spatially. The following example shows the insertion of the output data of two subsequent simulation runs into the same NetCDF files:>>> with TestIO(): ... pub.timegrids = "1996-01-01", "1996-01-05", "1d" ... hp.prepare_inputseries(allocate_ram=False, read_jit=True) ... hp.prepare_factorseries(allocate_ram=True, write_jit=True) ... pub.timegrids.sim.lastdate = "1996-01-03" ... hp.simulate() ... pub.timegrids.sim.firstdate = "1996-01-03" ... pub.timegrids.sim.lastdate = "1996-01-05" ... hp.simulate() >>> print_values(hp.elements["land_dill"].model.sequences.factors.tmean.series) -0.572053, -1.084746, -2.767055, -6.242055 >>> from hydpy.core.netcdftools import netcdf4 >>> filepath = "LahnH/series/default/hland_v1_factor_tmean.nc" >>> with TestIO(), netcdf4.Dataset(filepath, "r") as ncfile: ... print_values(ncfile["factor_tmean"][:, 0]) -0.572053, -1.084746, -2.767055, -6.242055
Under particular circumstances, the data variable of a NetCDF file can be 3-dimensional. The documentation on function
query_array()
explains this in detail. The following example demonstrates that reading and writing such 3-dimensional variables “just in time” works correctly. Therefore, we add a realization dimension to the input file hland_v1_input_t.nc (part of the example project data) and the output file hland_v1_factor_tmean.nc (written in the previous example) and use them for redefining their data variables with this additional dimension. As expected, the results are the same as in the previous example:>>> with TestIO(): ... for name in ("input_t", "factor_tmean"): ... filepath = f"LahnH/series/default/hland_v1_{name}.nc" ... with netcdf4.Dataset(filepath, "r+") as ncfile: ... ncfile.renameVariable(name, "old") ... _ = ncfile.createDimension("realization", 1) ... var = ncfile.createVariable( ... name, "f8", dimensions=("time", "realization", "stations")) ... var[:] = ncfile["old"][:] if name == "input_t" else -999.0 ... pub.timegrids = "1996-01-01", "1996-01-05", "1d" ... hp.simulate() >>> with TestIO(), netcdf4.Dataset(filepath, "r") as ncfile: ... print_values(ncfile["factor_tmean"][:, 0, 0]) -0.572053, -1.084746, -2.767055, -6.242055
If we try to write the output of a simulation run beyond the original initial initialisation period into the same files,
provide_jitaccess()
raises an equal error as above:>>> with TestIO(): ... pub.timegrids = "1996-01-05", "1996-01-10", "1d" ... hp.prepare_inputseries(allocate_ram=True, read_jit=False) ... hp.prepare_factorseries(allocate_ram=True, write_jit=True) ... hp.simulate() Traceback (most recent call last): ... RuntimeError: While trying to prepare NetCDF files for reading or writing data "just in time" during the current simulation run, the following error occurred: The data of the NetCDF `...hland_v1_factor_tmean.nc` (Timegrid("1996-01-01 00:00:00", "1996-01-05 00:00:00", "1d")) does not correctly cover the current simulation period (Timegrid("1996-01-05 00:00:00", "1996-01-10 00:00:00", "1d")).
>>> hp.prepare_factorseries(allocate_ram=False, write_jit=False)
Regarding the spatial dimension, things are similar. You can write data for different sequences in subsequent simulation runs, but you need to ensure all required data columns are available right from the start. Hence, relying on the automatic file generation of
provide_jitaccess()
fails in the following example:>>> with TestIO(): ... pub.timegrids = "1996-01-01", "1996-01-05", "1d" ... hp.prepare_inputseries(allocate_ram=False, read_jit=True) ... headwaters.prepare_fluxseries(allocate_ram=True, write_jit=True) ... hp.simulate() ... nonheadwaters.prepare_fluxseries(allocate_ram=True, write_jit=True) ... hp.simulate() Traceback (most recent call last): ... RuntimeError: While trying to prepare NetCDF files for reading or writing data "just in time" during the current simulation run, the following error occurred: No data for sequence `flux_pc` and (sub)device `land_lahn_2_0` in NetCDF file `...hland_v1_flux_pc.nc` available.
One way to prepare complete NetCDF files that are HydPy compatible is to work with an ordinary NetCDF writer object via
netcdfwriting()
:>>> with TestIO(), pub.sequencemanager.filetype("nc"): ... hp.prepare_fluxseries(allocate_ram=False, write_jit=False) ... hp.prepare_fluxseries(allocate_ram=True, write_jit=False) ... with pub.sequencemanager.netcdfwriting(): ... hp.save_fluxseries() ... headwaters.prepare_fluxseries(allocate_ram=True, write_jit=True) ... hp.load_conditions() ... hp.simulate() >>> for element in hp.elements.search_keywords("catchment"): ... print_values(element.model.sequences.fluxes.qt.series) 11.78038, 8.901179, 7.131072, 6.017787 9.647824, 8.517795, 7.781311, 7.344944 20.58932, 8.66144, 7.281198, 6.402232 11.674045, 10.110371, 8.991987, 8.212314 >>> filepath_qt = "LahnH/series/default/hland_v1_flux_qt.nc" >>> with TestIO(), netcdf4.Dataset(filepath_qt, "r") as ncfile: ... for jdx in range(4): ... print_values(ncfile["flux_qt"][:, jdx]) 11.78038, 8.901179, 7.131072, 6.017787 9.647824, 8.517795, 7.781311, 7.344944 0.0, 0.0, 0.0, 0.0 0.0, 0.0, 0.0, 0.0 >>> with TestIO(): ... headwaters.prepare_fluxseries(allocate_ram=True, write_jit=False) ... nonheadwaters.prepare_fluxseries(allocate_ram=True, write_jit=True) ... hp.load_conditions() ... hp.simulate() >>> with TestIO(), netcdf4.Dataset(filepath_qt, "r") as ncfile: # ... for jdx in range(4): ... print_values(ncfile["flux_qt"][:, jdx]) 11.78038, 8.901179, 7.131072, 6.017787 9.647824, 8.517795, 7.781311, 7.344944 20.58932, 8.66144, 7.281198, 6.402232 11.674045, 10.110371, 8.991987, 8.212314
>>> hp.prepare_fluxseries(allocate_ram=False, write_jit=False)
There should be no limitation for reading data “just in time” and using different
deploymode
options. For demonstration, we first calculate the time series of theSim
sequences of all nodes, assign them to the correspondingObs
sequences afterwards, and then start another simulation to (again) write both the simulated and the observed values to NetCDF files:>>> with TestIO(): ... hp.prepare_simseries(allocate_ram=True, write_jit=True) ... hp.prepare_obsseries(allocate_ram=True, write_jit=True) ... hp.load_conditions() ... hp.simulate() ... for idx, node in enumerate(hp.nodes): ... node.sequences.obs.series = node.sequences.sim.series ... hp.load_conditions() ... hp.simulate() >>> for node in hp.nodes: ... print_values(node.sequences.sim.series) 11.78038, 8.901179, 7.131072, 6.017787 9.647824, 8.517795, 7.781311, 7.344944 42.3697, 27.210443, 22.930066, 20.20133 54.043745, 37.320814, 31.922053, 28.413644 >>> for node in hp.nodes: ... print_values(node.sequences.obs.series) 11.78038, 8.901179, 7.131072, 6.017787 9.647824, 8.517795, 7.781311, 7.344944 42.3697, 27.210443, 22.930066, 20.20133 54.043745, 37.320814, 31.922053, 28.413644 >>> filepath_sim = "LahnH/series/default/node_sim_q.nc" >>> with TestIO(), netcdf4.Dataset(filepath_sim, "r") as ncfile: ... for jdx in range(4): ... print_values(ncfile["sim_q"][:, jdx]) 11.78038, 8.901179, 7.131072, 6.017787 9.647824, 8.517795, 7.781311, 7.344944 42.3697, 27.210443, 22.930066, 20.20133 54.043745, 37.320814, 31.922053, 28.413644 >>> filepath_obs = "LahnH/series/default/node_obs_q.nc" >>> with TestIO(), netcdf4.Dataset(filepath_obs, "r") as ncfile: ... for jdx in range(4): ... print_values(ncfile["obs_q"][:, jdx]) 11.78038, 8.901179, 7.131072, 6.017787 9.647824, 8.517795, 7.781311, 7.344944 42.3697, 27.210443, 22.930066, 20.20133 54.043745, 37.320814, 31.922053, 28.413644
Now we stop all sequences from writing to NetCDF files, remove the two headwater elements from the currently active selection, and start another simulation run. The time series of both headwater nodes are zero due to the missing inflow from their inlet headwater sub-catchments. The non-headwater nodes only receive inflow from the two non-headwater sub-catchments:
>>> with TestIO(): ... hp.prepare_simseries(allocate_ram=True, write_jit=False) ... hp.prepare_obsseries(allocate_ram=True, write_jit=False) ... hp.update_devices(nodes=hp.nodes, elements=hp.elements - headwaters) ... hp.load_conditions() ... hp.simulate() >>> for node in hp.nodes: ... print_values(node.sequences.sim.series) 0.0, 0.0, 0.0, 0.0 0.0, 0.0, 0.0, 0.0 30.58932, 8.66144, 7.281198, 6.402232 42.263365, 18.771811, 16.273185, 14.614546
Finally, we set the
deploymode
of the headwater nodes dill and lahn_1 to oldsim and obs, respectively, and read their previously written time series “just in time”. As expected, the values of the two non-headwater nodes are identical to those of our initial example:>>> with TestIO(): ... hp.nodes["dill"].prepare_simseries(allocate_ram=True, read_jit=True) ... hp.nodes["dill"].deploymode = "oldsim" ... hp.nodes["lahn_1"].prepare_obsseries(allocate_ram=True, read_jit=True) ... hp.nodes["lahn_1"].deploymode = "obs" ... hp.load_conditions() ... hp.simulate() >>> for node in hp.nodes: ... print_values(node.sequences.sim.series) 11.78038, 8.901179, 7.131072, 6.017787 0.0, 0.0, 0.0, 0.0 42.3697, 27.210443, 22.930066, 20.20133 54.043745, 37.320814, 31.922053, 28.413644
- property foldernames: Tuple[str, ...]¶
The names of all folders the sequences shall be read from or written to.
- property filenames: Tuple[str, ...]¶
The names of all relevant
NetCDFVariableBase
objects.
- property variablenames: Tuple[str, ...]¶
The names of all handled
NetCDFVariableBase
objects.
- class hydpy.core.netcdftools.Subdevice2Index(dict_: Dict[str, int], name_sequence: str, name_ncfile: str)[source]¶
Bases:
object
Return type of method
query_subdevice2index()
.
- class hydpy.core.netcdftools.NetCDFVariableBase(name: str, filepath: str)[source]¶
Bases:
ABC
Base class for
NetCDFVariableAgg
andNetCDFVariableFlat
.- log(sequence: IOSequence, infoarray: InfoArray | None) None [source]¶
Log the given
IOSequence
object either for reading or writing data.When writing data, the second argument should be an
InfoArray
. When reading data, this argument is irrelevant. PassNone
.For writing, the infoarray argument allows for passing alternative data that replaces the original series of the
IOSequence
object, which helps write modified (e.g. spatially averaged) time series.The logged time-series data is available via attribute access:
>>> from hydpy.core.netcdftools import NetCDFVariableBase >>> from hydpy import make_abc_testable >>> NCVar = make_abc_testable(NetCDFVariableBase) >>> ncvar = NCVar("flux_nkor", "filepath.nc") >>> from hydpy.examples import prepare_io_example_1 >>> nodes, elements = prepare_io_example_1() >>> nkor = elements.element1.model.sequences.fluxes.nkor >>> ncvar.log(nkor, nkor.series) >>> "element1" in dir(ncvar) True >>> ncvar.element1.sequence is nkor True >>> "element2" in dir(ncvar) False >>> ncvar.element2 Traceback (most recent call last): ... AttributeError: The NetCDFVariable object `flux_nkor` does neither handle time series data under the (sub)device name `element2` nor does it define a member named `element2`.
- abstract property array: ndarray[Any, dtype[float64]]¶
A
ndarray
containing the values of all logged sequences.
- insert_subdevices(ncfile: Dataset) None [source]¶
Insert a variable of the names of the (sub)devices of the logged sequences into the given NetCDF file.
We prepare a
NetCDFVariableBase
subclass with fixed (sub)device names:>>> from hydpy.core.netcdftools import NetCDFVariableBase, chars2str >>> from hydpy import make_abc_testable, TestIO >>> from hydpy.core.netcdftools import netcdf4 >>> Var = make_abc_testable(NetCDFVariableBase) >>> Var.subdevicenames = "element1", "element_2", "element_ß"
The first dimension of the added variable corresponds to the number of (sub)devices, and the second dimension to the number of characters of the longest (sub)device name:
>>> var = Var("var", "filename.nc") >>> with TestIO(): ... ncfile = netcdf4.Dataset("filename.nc", "w") >>> var.insert_subdevices(ncfile) >>> ncfile["station_id"].dimensions ('stations', 'char_leng_name') >>> ncfile["station_id"].shape (3, 10) >>> chars2str(ncfile["station_id"][:].data) ['element1', 'element_2', 'element_ß'] >>> ncfile.close()
- query_subdevices(ncfile: Dataset) List[str] [source]¶
Query the names of the (sub)devices of the logged sequences from the given NetCDF file.
We apply the function
query_subdevices()
on an empty NetCDF file. The error message shows that the method tries to query the (sub)device names:>>> from hydpy.core.netcdftools import NetCDFVariableBase >>> from hydpy import make_abc_testable, TestIO >>> from hydpy.core.netcdftools import netcdf4 >>> Var = make_abc_testable(NetCDFVariableBase) >>> Var.subdevicenames = "element1", "element_2" >>> var = Var("flux_prec", "filename.nc") >>> with TestIO(): ... ncfile = netcdf4.Dataset("filename.nc", "w") >>> var.query_subdevices(ncfile) Traceback (most recent call last): ... RuntimeError: NetCDF file `filename.nc` does neither contain a variable named `flux_prec_station_id` nor `station_id` for defining the coordinate locations of variable `flux_prec`.
After inserting the (sub)device names, they can be queried and returned:
>>> var.insert_subdevices(ncfile) >>> Var("flux_prec", "filename.nc").query_subdevices(ncfile) ['element1', 'element_2']
>>> ncfile.close()
- query_subdevice2index(ncfile: Dataset) Subdevice2Index [source]¶
Return a
Subdevice2Index
object that maps the (sub)device names to their position within the given NetCDF file.Method
query_subdevice2index()
relies onquery_subdevices()
. The returnedSubdevice2Index
object remembers the NetCDF file from which the (sub)device names stem, allowing for clear error messages:>>> from hydpy.core.netcdftools import NetCDFVariableBase, str2chars >>> from hydpy import make_abc_testable, TestIO >>> from hydpy.core.netcdftools import netcdf4 >>> with TestIO(): ... ncfile = netcdf4.Dataset("filename.nc", "w") >>> Var = make_abc_testable(NetCDFVariableBase) >>> Var.subdevicenames = ["element3", "element1", "element1_1", "element2"] >>> var = Var("flux_prec", "filename.nc") >>> var.insert_subdevices(ncfile) >>> subdevice2index = var.query_subdevice2index(ncfile) >>> subdevice2index.get_index("element1_1") 2 >>> subdevice2index.get_index("element3") 0 >>> subdevice2index.get_index("element5") Traceback (most recent call last): ... RuntimeError: No data for sequence `flux_prec` and (sub)device `element5` in NetCDF file `filename.nc` available.
Additionally,
query_subdevice2index()
checks for duplicates:>>> ncfile["station_id"][:] = str2chars( ... ["element3", "element1", "element1_1", "element1"]) >>> var.query_subdevice2index(ncfile) Traceback (most recent call last): ... RuntimeError: The NetCDF file `filename.nc` contains duplicate (sub)device names for variable `flux_prec` (the first found duplicate is `element1`).
>>> ncfile.close()
- abstract read() None [source]¶
Read the data from a NetCDF file.
Raise a
RuntimeError
if the relevantNetCDFVariableBase
subclass does not support reading data.
- write() None [source]¶
Write the data to a new NetCDF file.
See the general documentation on class
NetCDFVariableFlat
for some examples.
- class hydpy.core.netcdftools.NetCDFVariableAgg(name: str, filepath: str)[source]¶
Bases:
NetCDFVariableBase
Relates objects of a specific
IOSequence
subclass with a single NetCDF variable for writing aggregated time series data.Essentially, class
NetCDFVariableAgg
is very similar to classNetCDFVariableFlat
but a little bit simpler, as it cannot read data from NetCDF files and always writes one column of data for each loggedIOSequence
object. The following examples are a selection of the more thoroughly explained examples of the documentation on classNetCDFVariableFlat
:>>> from hydpy.examples import prepare_io_example_1 >>> nodes, (element1, element2, element3, element4) = prepare_io_example_1() >>> from hydpy.core.netcdftools import NetCDFVariableAgg >>> var_nied = NetCDFVariableAgg("input_nied_mean", "nied.nc") >>> var_nkor = NetCDFVariableAgg("flux_nkor_mean", "nkor.nc") >>> var_sp = NetCDFVariableAgg("state_sp_mean", "sp.nc") >>> for element in (element1, element2): ... nied = element.model.sequences.inputs.nied ... var_nied.log(nied, nied.average_series()) ... nkor = element.model.sequences.fluxes.nkor ... var_nkor.log(nkor, nkor.average_series()) >>> sp = element4.model.sequences.states.sp >>> var_sp.log(sp, sp.average_series()) >>> from hydpy import pub, TestIO >>> with TestIO(): ... var_nied.write() ... var_nkor.write() ... var_sp.write()
As
NetCDFVariableAgg
provides no reading functionality, we show that the aggregated values are readily available using the external NetCDF4 library:>>> import numpy >>> with TestIO(), netcdf4.Dataset("nied.nc", "r") as ncfile: ... numpy.array(ncfile["input_nied_mean"][:]) array([[0., 4.], [1., 5.], [2., 6.], [3., 7.]])
>>> with TestIO(), netcdf4.Dataset("nkor.nc", "r") as ncfile: ... numpy.array(ncfile["flux_nkor_mean"][:]) array([[12. , 16.5], [13. , 18.5], [14. , 20.5], [15. , 22.5]])
>>> with TestIO(), netcdf4.Dataset("sp.nc", "r") as ncfile: ... numpy.array(ncfile["state_sp_mean"][:]) array([[70.5], [76.5], [82.5], [88.5]])
- property shape: Tuple[int, int]¶
Required shape of
array
.The first axis corresponds to the number of timesteps and the second axis to the number of devices. We show this for the 1-dimensional input sequence
NKor
:>>> from hydpy.examples import prepare_io_example_1 >>> nodes, elements = prepare_io_example_1() >>> from hydpy.core.netcdftools import NetCDFVariableAgg >>> ncvar = NetCDFVariableAgg("flux_nkor", "filename.nc") >>> for element in elements: ... if element.model.name.startswith("lland"): ... ncvar.log(element.model.sequences.fluxes.nkor, None) >>> ncvar.shape (4, 3)
There is no difference for 2-dimensional sequences as aggregating their time series also results in 1-dimensional data:
>>> ncvar = NetCDFVariableAgg("state_sp", "filename.nc") >>> ncvar.log(elements.element4.model.sequences.states.sp, None) >>> ncvar.shape (4, 1)
- property array: ndarray[Any, dtype[float64]]¶
The aggregated data of all logged
IOSequence
objects contained in a singlendarray
object.The documentation on
shape
explains the structure ofarray
. This first example confirms that the first axis corresponds to time while the second corresponds to the location:>>> from hydpy.examples import prepare_io_example_1 >>> nodes, elements = prepare_io_example_1() >>> from hydpy.core.netcdftools import NetCDFVariableAgg >>> ncvar = NetCDFVariableAgg("flux_nkor", "filename.nc") >>> for element in elements: ... if element.model.name.startswith("lland"): ... nkor = element.model.sequences.fluxes.nkor ... ncvar.log(nkor, nkor.average_series()) >>> ncvar.array array([[12. , 16.5, 25. ], [13. , 18.5, 28. ], [14. , 20.5, 31. ], [15. , 22.5, 34. ]])
There is no difference for 2-dimensional sequences as aggregating their time series also results in 1-dimensional data:
>>> ncvar = NetCDFVariableAgg("state_sp", "filename.nc") >>> sp = elements.element4.model.sequences.states.sp >>> ncvar.log(sp, sp.average_series()) >>> ncvar.array array([[70.5], [76.5], [82.5], [88.5]])
- read() None [source]¶
Raise a
RuntimeError
in any case.This method always raises the following exception to tell users why implementing a reading functionality is not possible:
>>> from hydpy.core.netcdftools import NetCDFVariableAgg >>> NetCDFVariableAgg("flux_nkor", "filename.nc").read() Traceback (most recent call last): ... RuntimeError: The process of aggregating values (of sequence `flux_nkor` and other sequences as well) is not invertible.
- class hydpy.core.netcdftools.NetCDFVariableFlat(name: str, filepath: str)[source]¶
Bases:
NetCDFVariableBase
Relates objects of a specific
IOSequence
subclass with a single NetCDF variable for reading or writing their complete time-series data.(1) We prepare some devices handling some sequences by applying the function
prepare_io_example_1()
. We limit our attention to the returned elements, which handle the more diverse sequences:>>> from hydpy.examples import prepare_io_example_1 >>> nodes, (element1, element2, element3, element4) = prepare_io_example_1()
(2) We define three
NetCDFVariableFlat
instances with differentarray
structures and log theNied
andNKor
sequences of the first two elements andSP
of the fourth element:>>> from hydpy.core.netcdftools import NetCDFVariableFlat >>> var_nied = NetCDFVariableFlat("input_nied", "nied.nc") >>> var_nkor = NetCDFVariableFlat("flux_nkor", "nkor.nc") >>> var_sp = NetCDFVariableFlat("state_sp", "sp.nc") >>> for element in (element1, element2): ... seqs = element.model.sequences ... var_nied.log(seqs.inputs.nied, seqs.inputs.nied.series) ... var_nkor.log(seqs.fluxes.nkor, seqs.fluxes.nkor.series) >>> sp = element4.model.sequences.states.sp >>> var_sp.log(sp, sp.series)
We write the data of all logged sequences to separate NetCDF files:
>>> from hydpy import TestIO >>> with TestIO(): ... var_nied.write() ... var_nkor.write() ... var_sp.write()
(4) We set all values of the selected sequences to -777 and check that they are different from the original values available via testarray attribute:
>>> seq1 = element1.model.sequences.inputs.nied >>> seq2 = element2.model.sequences.fluxes.nkor >>> seq3 = element4.model.sequences.states.sp >>> import numpy >>> for seq in (seq1, seq2, seq3): ... seq.series = -777.0 ... print(numpy.any(seq.series == seq.testarray)) False False False
(5) Again, we prepare three
NetCDFVariableFlat
instances and log the same sequences as above, open the existing NetCDF file for reading, read its data, and confirm that it has been correctly passed to the test sequences:>>> nied1 = NetCDFVariableFlat("input_nied", "nied.nc") >>> nkor1 = NetCDFVariableFlat("flux_nkor", "nkor.nc") >>> sp4 = NetCDFVariableFlat("state_sp", "sp.nc") >>> for element in (element1, element2): ... sequences = element.model.sequences ... nied1.log(sequences.inputs.nied, None) ... nkor1.log(sequences.fluxes.nkor, None) >>> sp4.log(sp, None) >>> with TestIO(): ... nied1.read() ... nkor1.read() ... sp4.read() >>> for seq in (seq1, seq2, seq3): ... print(numpy.all(seq.series == seq.testarray)) True True True
(6) Trying to read data not stored properly results in error messages like the following:
>>> nied1.log(element3.model.sequences.inputs.nied, None) >>> with TestIO(): ... nied1.read() Traceback (most recent call last): ... RuntimeError: While trying to read data from NetCDF file `nied.nc`, the following error occurred: No data for sequence `input_nied` and (sub)device `element3` in NetCDF file `nied.nc` available.
- property shape: Tuple[int, int]¶
Required shape of
array
.For 0-dimensional sequences like
Nied
, the first axis corresponds to the number of timesteps and the second axis to the number of devices:>>> from hydpy.examples import prepare_io_example_1 >>> nodes, elements = prepare_io_example_1() >>> from hydpy.core.netcdftools import NetCDFVariableFlat >>> ncvar = NetCDFVariableFlat("input_nied", "filename.nc") >>> for element in elements: ... if element.model.name.startswith("lland"): ... ncvar.log(element.model.sequences.inputs.nied, None) >>> ncvar.shape (4, 3)
For 1-dimensional sequences as
NKor
, the second axis corresponds to “subdevices”. Here, these “subdevices” are hydrological response units of different elements. The model instances of the three elements define one, two, and three response units, respectively, making up a sum of six subdevices:>>> ncvar = NetCDFVariableFlat("flux_nkor", "filename.nc") >>> for element in elements: ... if element.model.name.startswith("lland"): ... ncvar.log(element.model.sequences.fluxes.nkor, None) >>> ncvar.shape (4, 6)
The above assertions also hold for 2-dimensional sequences like
SP
. In this specific case, each “subdevice” corresponds to a single snow class (one element times three zones times two snow classes makes six subdevices):>>> ncvar = NetCDFVariableFlat("state_sp", "filename.nc") >>> ncvar.log(elements.element4.model.sequences.states.sp, None) >>> ncvar.shape (4, 6)
- property array: ndarray[Any, dtype[float64]]¶
The series data of all logged
IOSequence
objects contained in one singlendarray
object.The documentation on
shape
explains the structure ofarray
. The first example confirms that the first axis corresponds to time while the second corresponds to the location:>>> from hydpy.examples import prepare_io_example_1 >>> nodes, elements = prepare_io_example_1() >>> from hydpy.core.netcdftools import NetCDFVariableFlat >>> ncvar = NetCDFVariableFlat("input_nied", "filename.nc") >>> for element in elements: ... if element.model.name.startswith("lland"): ... nied1 = element.model.sequences.inputs.nied ... ncvar.log(nied1, nied1.series) >>> ncvar.array array([[ 0., 4., 8.], [ 1., 5., 9.], [ 2., 6., 10.], [ 3., 7., 11.]])
The flattening of higher dimensional sequences spreads the time-series of individual “subdevices” over the array’s columns. For the 1-dimensional sequence
NKor
, we find the time-series of both zones of the second element in columns two and three:>>> ncvar = NetCDFVariableFlat("flux_nkor", "filename.nc") >>> for element in elements: ... if element.model.name.startswith("lland"): ... nkor = element.model.sequences.fluxes.nkor ... ncvar.log(nkor, nkor.series) >>> ncvar.array[:, 1:3] array([[16., 17.], [18., 19.], [20., 21.], [22., 23.]])
The above assertions also hold for 2-dimensional sequences like
SP
. In this specific case, each column contains the series of a single snow class:>>> ncvar = NetCDFVariableFlat("state_sp", "filename.nc") >>> sp = elements.element4.model.sequences.states.sp >>> ncvar.log(sp, sp.series) >>> ncvar.array array([[68., 69., 70., 71., 72., 73.], [74., 75., 76., 77., 78., 79.], [80., 81., 82., 83., 84., 85.], [86., 87., 88., 89., 90., 91.]])
- property subdevicenames: Tuple[str, ...]¶
The names of the (sub)devices.
Property
subdevicenames
clarifies which column ofarray
contains the series of which (sub)device. For 0-dimensional series likeNied
, we require no subdivision. Hence, it returns the original device names:>>> from hydpy.examples import prepare_io_example_1 >>> nodes, elements = prepare_io_example_1() >>> from hydpy.core.netcdftools import NetCDFVariableFlat >>> ncvar = NetCDFVariableFlat("input_nied", "filename.nc") >>> for element in elements: ... if element.model.name.startswith("lland"): ... nied = element.model.sequences.inputs.nied ... ncvar.log(nied, nied.series) >>> ncvar.subdevicenames ('element1', 'element2', 'element3')
For 1-dimensional sequences like
NKor
, a suffix defines the index of the respective subdevice. For example, the third column ofarray
contains the series of the first hydrological response unit of the second element:>>> ncvar = NetCDFVariableFlat("flux_nkor", "filename.nc") >>> for element in elements: ... if element.model.name.startswith("lland"): ... nkor = element.model.sequences.fluxes.nkor ... ncvar.log(nkor, nkor.series) >>> ncvar.subdevicenames ('element1_0', 'element2_0', 'element2_1', 'element3_0', 'element3_1', 'element3_2')
2-dimensional sequences like
SP
require an additional suffix:>>> ncvar = NetCDFVariableFlat("state_sp", "filename.nc") >>> sp = elements.element4.model.sequences.states.sp >>> ncvar.log(sp, sp.series) >>> ncvar.subdevicenames ('element4_0_0', 'element4_0_1', 'element4_0_2', 'element4_1_0', 'element4_1_1', 'element4_1_2')
- read() None [source]¶
Read the data from the relevant NetCDF file.
See the general documentation on class
NetCDFVariableFlat
for some examples.