Programming style

Python allows for writing concise and easily readable software code that can be maintained and further developed with reasonable effort. However, code quality also depends on the experience and available time of the programmer writing it. In hydrology, much model code is written by PhD students with little programming experience that are under pressure not only to get their model running but also to tackle their scientific questions and publish their results. The source code resulting from such a rush is understandably often a mess. Even the relatively goodsoftware results often prove inadequate when transferring the software into practical applications or sharing it with other researchers.

In the long development process of HydPy, which also started as a quick side-project during a PhD thesis, we made many misleading design decisions ourselves. However, through much effort spent in periods of refactoring and consolidation, we came to a software architecture that, in our opinion, should be easily extensible and applicable in many contexts.

This section defines the steadily growing “HydPy Style Guide”, an attempt to explain the principles in the development of HydPy and make sure the contributions of different developers are as consistent as possible. Please understand the “HydPy Style Guide” as a refinement of PEP 8 — the “official” Style Guide for Python Code. PEP 8 gives coding conventions that help to write clear code. If everyone follows these conventions, diving into existing source code becomes much more straightforward, as one has less effort to unravel the mysteries of overly creative programming solutions.

In some regards, the HydPy Style Guide deviates from PEP 8, primarily due to the following two aims:

* We design the *HydPy* framework as a Python library applicable to
  hydrologists with little or no programming experience.  Ideally, such
  framework users should not even notice that they write valid Python code
  while preparing their configuration files or working interactively in the
  Python shell.
* We try to close the gap between the model code, model documentation and
  model tests as well as possible.  By reading (and testing) the documentation
  of a specific model, one should exactly understand how this model works
  within the corresponding version of the *HydPy* framework.

When contributing to the code basis, be aware that even slight changes can significantly affect the applicability of HydPy, and future developers must cope with your work. So, always make sure to check for possible side-effects of your code changes. Structure your code in a clear (mainly object-oriented) design and use Black for automatically formatting your code. Refactor thoroughly enough to avoid code duplicates. Last but not least, create smartly thought-through APIs for your objects, allowing everyone to use them smoothly both within doctests and within the Python shell.

Be aware of the usage of Black and Pylint in our Travis CI continuous integration workflow. Black checks that all committed files follow its standards. Pylint is an additional style checker that recognises missing documentation sections, repeated or inconsistent method definitions, and much more. The pylintrc file configures the general behaviour and strictness of Pylint. You are allowed to disable some checks locally in case you provide a good explanation. At best, simply at a link to a related issue explaining why Pylint is wrong in your particular code section, using the following pattern:

>>> # pylint: disable=abstract-method
>>> # due to pylint issue https://github.com/PyCQA/pylint/issues/179

This section describes some specific conventions for the development of HydPy but is no guidance on how to write good source code in general. If you have little experience in programming, first make sure to learn the basics of Python through some Python tutorials. Afterwards, improve your knowledge of code quality through reading more advanced literature like this book on object-oriented design.

Project structure

For HydPy, we prefer a flat folder structure with two subpackage levels. The individual modules can be of arbitrary length to cover particular topics completely. For example, module parametertools defines all base classes for creating model-specific parameter classes and related collection classes.

Subpackage core provides the essential features of HydPy, used for implementing hydrological models and workflows. One example is the mentioned parametertools module. Modules defined in subpackage core should never import features provided by modules of other subpackages, excepts those of subpackage cythons.

Subpackage auxs provides auxiliary features, only necessary for selected HydPy models and applications. One example is the module anntools defining artificial neural network classes usable as complex model parameters, currently relevant for the dam model only. Modules defined in subpackage auxs are allowed to import features from subpackages core and cythons.

Subpackage models contains the implemented hydrological models. Base models as dam are additional subpackages, providing, for example, different kinds of sequence classes in separate submodules. Application models as dam_v001, selecting valid combinations of base model features, are defined within single modules. Please follow the naming patterns of the modules of the already available models carefully when implementing new ones.

Subpackage exe provides features easing the execution of HydPy. Module commandtools (in combination with script hyd), for example, allows controlling HydPy from the command line.

Subpackage cythons is related to all Cython features of HydPy:

* It implements functionalities for "cythonizing" the Python models defined in
  subpackage `models`_.
* It contains `Cython`_ extension files, which mostly correspond to Python
  modules of other subpackages.  For example, the extension file |annutils|
  provides time-critical implementation details to module |anntools|.
* it contains the additional subpackage `autogen`_, including all
  automatically generated extension files and Dynamic Link Library files
  (*pyd* files on Windows and *so* files on Linux).

Extension files should not import any features from other subpackages. Python files controlling the automatic generation of extension files can import from the subpackage core.

Note that the names of the modules of subpackages core, auxs, and exe end in almost all cases with “tools” and those of the modules and extension files of subpackage cythons with “utils”, which helps to identify the different module types immediately and to circumvent name conflicts between and within modules.

Subpackage conf contains configuration files (currently XML schema files and coefficients for numerical integration algorithms), which might be generated automatically during HydPy’s build process.

Subpackage data provides example data usable within doctests, currently only the HydPy-H-Lahn example project.

Subpackage docs contains different subpackages. sphinx controls the automatic generation of the HTML documentation. rst contains all reStructuredText files written manually. figs contains all manually generated figures in the png format. After the build process, html contains the plotly plots automatically generated during testing. Note that the actual HTML generation takes place in a folder auto, automatically created and filled with information during the process.

Subpackage tests deals with testing. As explained in section Tests & documentation, the contained unit test modules are deprecated. Its subpackage iotesting is the place designated to store data during testing temporarily.

Imports

As recommended in PEP 8, clarify the sources of your imports. Always use the following pattern at the top of a new module and list the imports of a section in alphabetical order:

>>> # import...
>>> # ...from standard library
>>> import os
>>> import sys
>>> # ...from site-packages
>>> import numpy
>>> # ...from HydPy
>>> from hydpy.core import sequencetools
>>> from hydpy.cythons import pointerutils

Note that each import command stands in a separate line. Always import complete modules from HydPy without changing their names. — No wildcard imports!

We lift the wildcard ban for writing configuration files. Using the example of parameter control files, it would not be convenient always to write something like:

>>> from hydpy.models import hland
>>> model = hland.Model()
>>> from hydpy.core import parametertools
>>> model.parameters = parametertools.Parameters({"model": model})
>>> model.parameters.control = hland.ControlParameters(model.parameters.control)
>>> model.parameters.control.nmbzones = 2
>>> model.parameters.control.nmbzones
nmbzones(2)

Here a wildcard import (and the “magic” of function parameterstep()), allows for a much cleaner syntax:

>>> del model
>>> from hydpy.models.hland import *
>>> parameterstep("1d")
>>> nmbzones(2)
>>> nmbzones
nmbzones(2)

Note that the wildcard import is acceptable here, as there is only one import statement. There is no danger of name conflicts.

Besides the wildcard exeption explained above, there is another one related to modelimports.

Defensive programming

HydPy is intended to be applicable by researchers and practitioners who are no Python experts and may have little experience in programming in general. Hence, it is desirable to anticipate errors due to misleading input as thorough as possible and report them as soon as possible. So, in contradiction to PEP 8, it is often preferable to not just expose the names of simple public attributes. Whenever sensible, use protected attributes (defined by property or the more specific property features provided by module propertytools) to assure that the internal states of objects remain consistent. One example is that it is not allowed to assign an unknown string to the outputfiletype of an instance ofclass SequenceManager :

>>> from hydpy.core.filetools import SequenceManager
>>> sequencemanager = SequenceManager()
>>> sequencemanager.filetype = "test"
Traceback (most recent call last):
  ...
ValueError: The given sequence file type `test` is not implemented.  Please choose one of the following file types: npy, asc, and nc.

Of course, the extensive usage of protected attributes increases the length of the source code and slows computation time. However, regarding the first point, writing a graphical user interface would require much more source code (and still decrease flexibility). Regarding the second point, one should take into account that the computation times of the general framework functionalities discussed here should be negligible in comparison with the computation times of hydrological simulations in the majority of cases.

Exceptions

Unmodified Python error messages are often not sufficiently informative for HydPy applications due to two reasons. First, they are probably read by someone who has no experience in understanding Python’s exception handling system. Second, they do not tell in which hydrological context a problem occurs. It would be of little help to only know that the value of a parameter object of a particular type has been misspecified but not to know in which sub-catchment. Hence, try to add as much helpful information to error messages as possible. One useful helper function for doing so is elementphrase(), trying to determine the name of the relevant Element object and add it to the error message:

>>> from hydpy.models.hland import *
>>> parameterstep("1d")
>>> from hydpy import Element
>>> e1 = Element("e1", outlets="n1")
>>> e1.model = model
>>> k(hq=10.0)
Traceback (most recent call last):
...
ValueError: For the alternative calculation of parameter `k` of element `e1`, at least the keywords arguments `khq` and `hq` must be given.

Another recommended approach is exception chaining, for which we recommend using the function augment_excmessage():

>>> e1.keywords = "correct", "w r o n g"
Traceback (most recent call last):
...
ValueError: While trying to add the keyword `w r o n g` to device e1, the following error occurred: The given name string `w r o n g` does not define a valid variable identifier.  Valid identifiers do not contain characters like `-` or empty spaces, do not start with numbers, cannot be mistaken with Python built-ins like `for`...)

Naming conventions

The naming conventions of PEP 8 apply. Additionally, we encouraged to name classes and their instances as similar as possible whenever reasonable, often simply switching from CamelCase to lowercase, as shown in the following examples:

Class Name

Instance Name

Note

Sequences

sequences

each Model instance handles exactly one Sequence instance: model.sequences

InputSequences

inputs

“inputsequences” would be redundant for attribute access: model.sequences.inputs

If reasonable, each instance should define its preferred name via name attribute:

>>> from hydpy.models.hland import *
>>> InputSequences(None).name
'inputs'

Classes like Element or Node, where names (and not namespaces) are used to differentiate between instances, should implement instance name attributes when reasonable:

>>> from hydpy import Node
>>> Node("gauge1").name
'gauge1'

Group instances of the same type in collection objects with the same name, except an attached letter “s”. For example, we store different Element objects in an instance of class Elements and different Node objects in an instance of the class Nodes.

Collection classes

The subsection above deals with the naming (of the instances) of collection classes. Additionally, consider the following recommendations when implementing new collection classes.

Each collection object must be iterable:

>>> from hydpy import Nodes
>>> nodes = Nodes("gauge1", "gauge2")
>>> for node in nodes:
...     print(repr(node))
Node("gauge1", variable="Q")
Node("gauge2", variable="Q")

For assisting the user when working interactively in the Python shell, collection objects should expose their handled objects as attributes and let function “dir” return the attribute names, being identical with the name attributes of the handled objects:

>>> nodes.gauge1
Node("gauge1", variable="Q")
>>> nodes.gauge2
Node("gauge2", variable="Q")
>>> "gauge1" in dir(nodes)
True

Additionally, provide item access as a more type-safe and eventually more efficient alternative for writing complex scripts:

>>> nodes["gauge1"]
Node("gauge1", variable="Q")

Whenever useful, define convenience functions to simplify the handling of collection objects:

>>> nodes += Node("gauge1")
>>> nodes.gauge1 is Node("gauge1")
True
>>> len(nodes)
2
>>> "gauge1" in nodes
True
>>> nodes.gauge1 in nodes
True
>>> newnodes = nodes.copy()
>>> nodes is newnodes
False
>>> nodes.gauge1 is newnodes.gauge1
True
>>> nodes -= "gauge1"
>>> 'gauge1' in nodes
False

String representations

Be aware of the difference between str and repr(). Often, str is supposed to return strings describing objects in a condensed form for end-users when executing a program, while repr() is supposed to return strings containing all details of an object for developers when debugging a program. Some argue, due to its limited usage, giving repr() much attention is a waste of time in many cases. For HydPy, we think different. Defining comprehensive repr() return values simplifies reading the doctests of the online documentation and working interactively within the Python shell, thus being of high relevance for end-users, too. On the other hand, str is a little less relevant due to mainly being an alternative for the generation of exception messages. Hence, focus primarily on repr() and concentrate on str when the return value of repr() is too complicated for exception messages.

A good return value of repr() is one that a non-Python-programmer does not identify as a string. The first ideal case is that copy-pasting the string representation and evaluating it within the Python shell returns a reference to the same object.

A Python example:

>>> repr(None)
'None'
>>> eval("None") is None
True

A HydPy example:

>>> from hydpy import Node
>>> Node("gauge1")
Node("gauge1", variable="Q")
>>> eval('Node("gauge1", variable="Q")') is Node("gauge1")
True

In the second ideal case, evaluating the string representation results in an equal object.

A Python example:

>>> x = 1.5
>>> x
1.5
>>> eval("1.5") is x
False
>>> eval("1.5") == x
True

A HydPy example:

>>> from hydpy import Period
>>> Period("1d")
Period("1d")
>>> eval('Period("1d")') is Period("1d")
False
>>> eval('Period("1d")') == Period("1d")
True

For nested objects, the above goals may be hard to accomplish, but sometimes it’s worth it.

A Python example:

>>> [1., "a"]
[1.0, 'a']
>>> eval("[1.0, 'a']") == [1.0, "a"]
True

A HydPy example:

>>> from hydpy import Timegrid
>>> Timegrid("01.11.1996", "1.11.2006", "1d")
Timegrid("01.11.1996 00:00:00",
         "01.11.2006 00:00:00",
         "1d")
>>> eval('Timegrid("01.11.1996 00:00:00", "01.11.2006 00:00:00", "1d")') == Timegrid("01.11.1996", "1.11.2006", "1d")
True

For deeply nested objects, this strategy becomes infeasible, of course. Then try to find a way to “flatten” the string representation without losing too much information:

>>> from hydpy import Element, Elements
>>> Elements(Element("e_1", outlets="n_1"), Element("e_2", outlets="n_2"))
Elements("e_1", "e_2")

Finally, always consider using functions provided by module objecttools for simplifying the definition of repr() and str return values to keep the string representations of different HydPy objects, at least to a certain degree, consistent. For example, use function repr_ to let the user control the maximum number of decimal places of scalar floating-point values:

>>> from hydpy import pub, repr_
>>> class Number(float):
...     def __repr__(self):
...         return repr_(self)
>>> pub.options.reprdigits = 3
>>> Number(1./3.)
0.333

Introspection

One nice feature of Python is its “introspection” capability, allowing to analyse (and, when necessary, modify) objects at runtime with little effort.

HydPy makes extensive use of these introspection features whenever it serves the purpose of relieving non-programmers from writing code lines that do not deal with hydrological modelling directly. Section Imports discusses the usage of wildcard imports in parameter control files, where the real comfort comes from the “magic” implemented in function parameterstep(). Invoking this function does not only define the time interval length for the following parameter values. It also initialises a new model instance (if such an instance does not already exist) and directly exposes its control parameter objects in the local namespace. For the sake of the user’s comfort, each parameter control file purports to be a simple configuration file that somehow checks its own validity. On the downside, modifying the operating principle of HydPy’s parameter control files requires more thought than a more simple direct approach would.

We encourage to implement additional introspection features as long as they improve the intuitive usability for non-programmers and do not harm HydPy’s reliability. However, please be particularly cautious when doing so and document why and how thoroughly. To ensure traceability, one should usually add such code to modules like modelutils, importtools, and autodoctools. Module modelutils deals with all introspection needed to “cythonize” Python models automatically. Module importtools contains the function parameterstep() and related features. Module autodoctools serves the purpose to improve the automatic generation of the online documentation.

Typing

Python is a strongly but dynamically typed programming language, allowing to write very condensed, readable, and flexible (scripting) code. However, missing type information has also its drawbacks. With the HydPy sources reaching a certain size, we began to introduce static typing annotations based on module typing. In our experience, the additional information helps a lot, allowing code inspection and refactoring tools to analyse and modify the code more efficiently. We are going to increase our efforts in this direction, but do not have a “HydPy Typing Style Guide” at hand, so far. So please add the typing annotations you find useful. The minimum requirement for Python core modules is to declare the return type (or, when necessary, to declare the Union of possible return types) of each new function or method:

>>> from typing import List
>>> def test(nmb) -> List[int]:
...     return list(range(nmb))

For Cython extension files, adding type information understandable to Python tools is of even greater importance. Hence, accompany each Cython extension file with a stub file, annotating all public (sub)members.

Implementing models

Please inspect the source files of the already available hydrological models in detail to understand how to implement new ones correctly. HydPy provides many standard features, allowing you to write straightforward model source code in many cases. However, you are free to implement any functionalities you find missing (see, for example, the complex “connect” method defined by the exch_branch_hbv96 model). If those functionalities might be of importance to other models as well, consider generalising them and adding them to the suitable subpackage.

The main effort of creating new models is not to write the source code but to document it thoroughly and to prove it is working correctly. Each docstring of a calculation method must contain at least a short description, lists of the required, calculated, and updated variables (linked via substitutions), the basic equation in LaTeX style, and doctests covering all anticipated usages of the method, even the unlikely ones. The docstrings of all Parameter or Sequence_ subclasses containing “special” source code (for example, modifications of trim()) must contain doctests addressing these code sections. Finally, write integration tests for each application model based on class IntegrationTest, explaining all model functionalities in detail both with text and plotly plots, and preventing future regression by sufficiently complete tabulated calculation results.