Tools - Programming Guidelines

Programming guidelines

On this page I have collected a few basic ideas on good programming practice that have evolved over many years as a result of my personal experience as a C and C++ programmer. These simple guidelines have helped me in writing efficient and well-structured code that is both easy to maintain and free of major errors. As such, they are highly subjective and not meant to be the ultimate truth, but rather a collection of general ideas on how to make code more efficient, stable, and easier to read. There may well be circumstances in which some of the rules below are not necessarily sensible.

Performance

No hard-coded limits

It is generally a good idea to never hard-code numerical limits in the number of array elements or objects. Instead, limits should always be determined dynamically based on actual need and available memory. For example, instead of defining

#define MAX_OBJECTS 10000
// ...
double catalog[MAX_OBJECTS];

one should rather use something along the lines of

size_t number = getNumber();  // value determined by user input
double *catalog;
// ...
try
{
    catalog = new double[number];
}
catch(std::bad_alloc &badAlloc)
{
    std::cerr << "Memory allocation failed: " << badAlloc.what() << '\n';
    return 1;
}
// ...
delete[] catalog;

Dynamically allocating memory allows for maximum flexibility, as the number of elements the array can contain (e.g. measurements read from an input file) is only restricted by the available memory and not by an arbitrary number somehow deemed large enough for all purposes by the programmer.

Alternatively, some of the container classes provided by the C++ standard library, such as std::vector or std::map, can be used to store a variable amount of data. This is safer than manually allocating memory, but note that using container classes could potentially be slower than direct memory access in certain situations.

No floating-point values

Well, the heading of this section is a bit of an over-statement, but generally floating-point variables (float, double) should be avoided whenever possible. The reason is that computers are digital devices and can therefore only handle discrete states usually represented by integer numbers. Floating-point numbers have been introduced for convenience, e.g. for handling and processing the results of scientific measurements, but should never be used in the implementation of a programme’s control structures. For example, never use floating-point numbers in a for loop:

for(float a = 0.5; a <= 10.0; a += 0.5)
{
    // do something with a
}

In fact, there is no guarantee that a will ever count up to 10.0 (see below) because of rounding effects. Instead, integer numbers must be used in the loop and then be converted:

for(int a = 1; a <= 20; a++)
{
    float b = static_cast<float>(a) / 2.0;
    // do something with b
}

In the second implementation with integer values, a and b are guaranteed to be 20 and 10.0, respectively, in the last iteration of the loop.

Another important rule is to never check floating-point values for equality. Floating-point numbers will generally not be exact but affected by rounding errors due to the limited amount of memory used to store them. For example,

double a = sqrt(2.0);
double b = sqrt(18.0) / 3.0;
if(a == b) std::cout << "equal";
else       std::cout << "not equal";

will print “not equal” on my computer, even though $\sqrt{18} / 3 = \sqrt{2}$. This is because both numbers will produce a different bit pattern in the floating-point representation due to rounding effects, as both are irrational numbers that cannot be exactly represented by a finite number of decimals. Hence, floating-point numbers must never be tested for (in)equality. Instead, one should check whether two floating-point values agree within a certain absolute or relative error using the “less than” or “greater than” operators (“<” and “>”).

Structured code

No repetition

The concept of “no repetition” means that anything more complex than basic operations should never occur more than once. This concept is also known as “once and only once” (OAOO). There are several good reasons for not repeating code:

Unnecessary repetitions will needlessly inflate the source code, leading to larger files and code that is difficult to read and understand.
Unnecessary repetitions turn into a nightmare when changes are necessary, as those will have to be implemented in all places where the affected lines of code occur. This can be time-consuming and may lead to inconsistent programme behaviour if parts of the code are accidentally left unchanged.

For example, instead of calculating indices of a two-dimensional array inline,

mask[x + dx * y] = 1;

it would be better to write a function to carry out the index calculation,

inline size_t index(size_t x, size_t y)
{
    return x + dx * y;
}
// ...
mask[index(x, y)] = 1;

This leads to code that is much easier to understand and maintain, and any changes, e.g. in the indexing order, will only have to be applied once rather than multiple times at different positions throughout the code. Making use of the preprocessor might also be a good option in such situations, as this could potentially reduce the computational overhead created by function calls.

No numerical constants

There should be no numerical constants in your code with the exception of the neutral elements of addition and multiplication (integer numbers 0 and 1 and floating-point numbers 0.0 and 1.0). In a few cases, other numbers are justified, in particular when applying well-defined physical or mathematical laws, e.g. the mean of two numbers, mean = (a + b) / 2.0, or the volume of a sphere, volume = (4.0 / 3.0) * pi * pow(radius, 3.0). In all other cases, numerical constants are to be avoided and should instead be defined “globally” as a constant or preprocessor directive, e.g.:

const double pi = 3.1415926;
// ...
area = pi * radius * radius;

As with the the “no repetition” rule, the main motivation for avoiding numerical constants is readability and easy maintenance. Numerical constants make the code more difficult to read and don’t carry any information about their purpose, potentially leaving the reader puzzled about the origin and meaning of a particular numerical value. Finally, any changes to a numerical value would have to be repeated across the entire source code, potentially leading to problems if individual values are accidentally left unchanged in the process. It is much safer and more convenient to define all constants in a single location in the code instead.

No continue, break, or goto

The continue, break, and goto statements offered by C++ will break with the flow of the programme and should therefore be avoided unless their use is justified (e.g., the switch statement requires break). Any nested algorithm can be designed such that continue, break, and goto statements will not be necessary. Making use of the goto statement is particularly questionable, as goto permits jumps to arbitrary positions within the code, thereby potentially compromising the stability of the programme and making the code hard to read and understand.

Naming conventions

Accessibility and maintainability of code can be greatly improved by adopting a consistent and logical naming convention for variables, functions, classes, and objects. Several conventions regarding naming are in use, and it doesn’t really matter which one is adopted as long as it is used consistently. Some people prefer to use an underscore to separate components of a composite name, while others capitalise the initials of subsequent components, e.g.:

parameter_list, flag_success, x_peak, read_entries()

versus

parameterList, flagSuccess, xPeak, readEntries()

Irrespective of which convention is used, names should be concise and easy to understand. It is advisable to avoid single-letter names and abbreviations and instead write, e.g., number instead of n, or catalog_entry instead of cat_ent. It may also be a good idea to use nouns as names representing data or objects and verbs for procedures and functions operating on these, e.g.:

Counter *counter = new Counter();
counter->reset();
counter->increment();

With the choice of names in this example, it is immediately obvious what the purpose of each of the instructions is, even without any additional background knowledge. As shown in the example above, it is also common practice to capitalise class names, while instances of a class, just like variable names, are written in small letters.

Indentation

While C++ does not require code blocks to be indented (curly braces are used instead to mark the beginning and end of a block), it is strongly recommended to use indentation, and I cannot think of anyone who would seriously question the important role of indentation in writing structured code that is easy to read and understand. Again, various styles have evolved, and any of these styles will be acceptable provided that it is used consistently throughout the source code. Two of the most commonly used styles are illustrated below:

if(x == 0)          // style 1
{
    doSomething();
    return;
}

if(x == 0) {        // style 2
    doSomething();
    return;
}

I personally use style 1, but many other options are possible, and style 2 in particular appears to be quite popular.

Another question is whether to use spaces or tabulators for indentation. This question is also known as the tabs versus spaces war, as many people are immune to logical arguments when it comes to defending their own choice. Therefore, let me summarise here the main arguments:

First of all, your computer doesn’t care at all. Indentation is solely intended to improve readability of code for humans (unless you use a pathological programming language such as Python, where the level of indentation determines the scope of a line of code due to the lack of a closing block delimiter). Hence, whether spaces or tabulators are used doesn’t matter to the computer.
The next argument to look at is the semantics of tabulators and spaces. Spaces, as their name already suggests, are semantically intended for spacing, i.e. as a separator between fundamental components of a language (such as words in the English language). Tabulators, on the other hand, were originally used in typewriters to move the carriage to a specific column. For that reason, tabulators would seem to be a more logical choice for indentation, as it immediately relates to their original role in typewriters. Therefore, using tabulators solely for indentation and spaces solely for separation would be the most semantically meaningful choice.
Another important point is efficiency. While programmers will always use 1 tabulator to indicate one level of indentation, they will almost certainly use multiple spaces for the same purpose, e.g. 2, 4 or 8 (some people even propose 5). As a result, documents indented with spaces will require more key strokes and be significantly larger than documents using tabulators for indentation. As an example, many websites take 10–20% longer to load than necessary simply because their creators decided to use spaces instead of tabulators for indenting their HTML code, thereby creating additional data volume void of any information.
As an impressive example, as I was editing this web page earlier, it was 18.3 kB is size when using tabulators, 20.3 kB when using 4 spaces per level, and 23.1 kB when using 8 spaces per level. The latter corresponds to a staggering 26% increase in file size and thus loading time!
Another problem with spaces is that there is no consistency in the number of spaces used for one level of indentation. This will lead to inconsistent indentation of code written by different authors and turn into a nightmare if code needs to be merged. On the other hand, there is universal agreement on using a single tabulator per indentation level, leading to more consistent code and making code amalgamation much easier. A related problem is that indentation errors (e.g., the accidental use of 5 instead of 4 spaces) are more likely and harder to detect when spaces are being used, whereas indentation errors with tabulators are immediately obvious due to the usually large horizontal displacement they create.

In summary, tabulators are in many ways a better and more logical choice for indentation, in particular as a result of their consistent and semantically meaningful use in the form of 1 tabulator for 1 level of indentation as well as the resulting reduction in file size.

Classes and objects

C++ provides strong support for object-oriented programming. It is not advisable to mix procedural and object-oriented elements (even though C++ allows this for reasons of C compatibility), as this will lead to code that is difficult to read, understand, and maintain. Hence, it is useful to split up any project into a useful set of classes with a sensible hierarchical structure. Each class should be contained in its own set of two files, with the header file (ClassName.h or ClassName.hpp) containing the declarations of the class members, while the source file (ClassName.cpp) contains their definitions. This allows the header file to be used to extract the structure of a class without being distracted by implementation details. Separating declaration and definition is also advisable for a number of other reasons that shall not be discussed here in detail.

A typical C++ class declaration in the header file might look like this (don’t forget the semicolon at the end):

class ClassName
{
    public:
    ClassName();  // constructor
    ~ClassName(); // destructor
    // + all public member functions

    private:
    // all variables, data, and private member functions
};

Within a class, all data members should be private, because other classes or non-member functions should never have direct access to data members. Access to data should only be provided through public member functions. There are two main reasons for enforcing this encapsulation. Firstly, it will allow control over read and write processes, thereby preventing illegal read and write operations, such as setting a variable to a meaningless value or trying to access non-existing array elements outside the valid size of the array. Secondly, separating the interface from the data will allow changes to be made to the way the data are internally stored and handled, e.g. for speed or memory optimisation, without affecting the functionality of any software that makes use of the class, because that software will only access public member functions whose interface would remain unchanged.

The definition of member functions will be placed in the source file, e.g.:

int ClassName::functionName()
{
    // do something sensible
    return 0;
}

Note that the constructor and destructor have no return type (not even “void”) and must not return anything under any circumstances. An empty “return;” statement can be used instead.

« back

Website of Tobias Westmeier

Navigation