Tools - Why Python should be used with caution

Why Python should be used with caution

Python is becoming more and more popular in scientific programming these days, and even major software projects are now written mostly or entirely in Python. This is a worrying trend, because Python, while being an excellent scripting language, is intrinsically unsuitable for large software projects for a number of reasons. On this page I have collected a few examples from my own experience with Python that illustrate why in my opinion Python is a bad choice for scientific software packages.

Speed

Python is an interpreted language and therefore by default substantially slower than compiled languages such as Fortran or C++. Extensive, nested loops in particular are essentially unusable in Python. Instead, the use of low-level features from additional libraries such as Numpy is required to process simple, large data arrays at a reasonable speed. As the functionality of such libraries is limited, some problems may be intrinsically difficult to solve in Python due to speed issues. In addition, there is no guarantee that third-party libraries will remain available, backward compatible, and supported in the future.

Lack of block delimiters

Python lacks the possibility to mark the end of a programming block. Curiously, the beginning of a block must be marked with a colon in Python, but unfortunately not the end. Instead, indentation with whitespace characters is used by the interpreter to identify which block a particular line of code belongs to. This makes automatic indentation, as offered by many modern text editors, entirely impossible. Instead, the programmer will have to take care of correct indentation to ensure that the structure of the code remains intact. This is cumbersome and simply a waste of time.

What is even worse is that the lack of semantic information on the extent of blocks means that the structure of any piece of Python code is completely destroyed if whitespace information is lost for some reason, e.g. during copy-and-paste operations. Only conceptual understanding by a human would be able to recover the correct code structure in this case, but some code may remain ambiguous without knowing the intentions of its original author.

Lastly, the characters used to mark blocks (spaces and tabulators) are invisible (that’s why they are usually only used for indentation and as separators), making it particularly hard to identify critical indentation errors and determine the indentation level of a particular line of code (e.g. the first line on a new page of printed Python code). In addition, whitespace characters are often considered as insignificant by software applications such as web browsers or text editors, thereby increasing the risk of accidentally losing significant whitespace information.

In summary, whitespace characters are a particularly unfortunate choice for marking the extent of blocks, and some people (including myself) consider this to be the most critical flaw of the Python programming language and essentially an error in the design of the language (e.g., [1], [2]).

Lack of variable declarations

Python does not provide a mechanism for declaring variables before they are used. This leads to several severe problems.

Firstly, a variable might accidentally be re-defined further down in the code, resulting in unexpected runtime behaviour. Without the requirement (or even possibility) to declare variable names, such an error could go unnoticed unless unusual behaviour is observed during execution of the code.

Secondly, Python code could be prone to typos in variable names, e.g. by writing couter instead of counter. While the Python interpreter would detect this error during runtime in certain circumstances (e.g. when accessing a variable before any assignment), the problem would still persist if both couter and counter had previously been assigned values. This could potentially lead to hard-to-detect errors in Python source code that would have been entirely avoided by requiring variable names to be declared prior to first assignment.

Dynamic typing

Variables in Python are strongly typed, but the assignment of a type is performed dynamically at runtime whenever a value is assigned. This behaviour could result in a number of issues and bugs that are very difficult to find. For example, it is not possible to check the code for type correctness prior to execution, because the type of a variable may either change during programme execution or be entirely undefined in the case of arbitrary user input. This could lead to runtime errors if a type mismatch is encountered.

Lack of compile time checks

As Python code is not generally compiled prior to execution, there is no general mechanism in place to check the code for certain types of errors before executing the programme. This means that errors will only be detectable during runtime, requiring sophisticated and extensive testing strategies before publishing code. However, it may well be impossible to test every single path through the code under all circumstances, in particular if user input is involved, potentially leaving an arbitrary number of undetected errors in the code. While this is true to some degree for compiled languages as well, a significant number of errors would already be detected at compile time, while all errors in Python code exclusively occur during runtime.

References

Close-block Delimiter Symbols Considered Helpful by Steven Hazel
Python White Space Discussion

« back

Website of Tobias Westmeier

Navigation