Global Variables – Still a Major Source of Problems in IT Projects

Martin F. Johansen, 2020-01-21

Global variables are one of the troublemakers most developers know about. It is widely considered best practice to avoid using global variables. However, global variables are still actually quite popular, although in other ways than most developers consider being global variables. Even though developers do not think about what they are doing as using globals, they still cause the well known problems with globals.

Before going into how developers are using globals today, let us look at what global variables are and why they are problematic.

Recap of the Problem with Global Variables

Global variables are variables used in a function but that are not passed to it. (The term function is used here to mean function, method, procedure or subroutine.) A variable sent to a function indirectly -- for example, with a reference, a pointer, in an array or in a data structure -- is local, not global. For a variable to be global, it must be accessed in a way not involving the inputs to a function. There are, however, some ways to cheat. If you have a data structure called "globals" that you pass to all functions, this must be considered using globals even though you pass it as an input.

You can write entire programs using only global variables. No function takes input, and no function returns anything. The function computes with the global variables and writes the results into global variables. This will work fine for very small projects.

A more common situation, however, is to pass some data as input to a function, return data sometimes, but often read and write globals.

If this is done for anything but a small project, one will quickly run into situations where it is difficult to understand how a global variable got its value, or which value it is supposed to have. You can also run into situations where a global variable has an unexpected or meaningless value in a function. All of this is because some other function has used the variable. This problem is due to an implicit coupling between the two functions using the global -- they depend on each other because of their common use of a global variable. The coupling is implicit, because it just happened: The same variable was used in both functions. The coupling might also be unintended, something that is difficult to avoid if there are no restrictions with where or how to use the variable.

When many functions depend on each other this way, an entire program can become monolithic. Executing one function means you have to know which series of other functions to execute first.

Executing a function does not make sense before having set up the global state a certain way. This adversely affects understandability and testability. The function can be difficult to understand because somewhere else the globals were set up a certain way. This can often be anywhere in the program, at any point in the past.

In order to test a function, one needs to know the internals of the function. Which variables must be put in which state in order for the function to work correctly? If a function is changed, then the global state set up in the tests must also be changed. This causes a tight coupling between tests and the implementation detail of functions.

Globals also adversely affect reusability. In order to reuse a function, one must have the required global state set up. This might be in conflict with existing global state. It also causes a tight coupling to the internals of a function. Calling the function will also be cumbersome, as the required data must be copied into globals before calling the function and extracted from globals after the function.

Another issue with globals are memory allocation issues. Which parts of a software system are responsible for cleaning up the global memory used? For languages with manual memory management, leaks are a risk as it is difficult to keep track of which functions are using, allocating or freeing memory. Memory and globals are even a problem for garbage collected languages. A global cache or log data structure can fill up memory, and since the variables are global, it remains forever possible they can be used, so they cannot be automatically freed.

Another issue with globals are concurrency issues. If two or more concurrently executing functions are accessing some global memory, how to avoid race-conditions on one hand or blocking on the other hand. Blocking can cause slow execution, as concurrent systems are waiting for each other. Blocking can also cause deadlocks as concurrently executing parts of the program are waiting for each other.

Another issue with globals is that they have a tendency to grow in complexity. As more and more global state is required to hold all the global data, more and more complex data structures are created to hold it.

In summary, software with a considerable use of globals will be difficult to test, difficult to reuse, difficult to understand, development scales badly as more and more problems occur. The software will be monolithic with implicit couplings between functions and between tests and functions.

Use of Globals is Still Prevalent

The recap just given is familiar to most developers. However, what is less familiar is how this is still a surprisingly large problem with current development practises.

The reason why globals are still a problem, is that developers are still largely using them, but in a way that most do not consider global variables. However, since they still are global variables, the problems discussed above are still plaguing software projects. Let's now look at several such cases. After that, a solution is suggested that largely avoids the problem of global variables.

Databases

One major case of global variables are databases. Databases are not considered global variables because they have a neat interface with some protection mechanisms such as ACID (atomicity, consistency, isolation, durability), authorization and transaction management. These neat interfaces and protection mechanisms do indeed protect against some of the problems of using globals, but not all. It is still the case that the database can be read from and written to in the middle of a function.

In order to test such a function, the database must be set up correctly for the database accesses to be right, just as with global variables. This creates a coupling between the tests and the internals of the functions.

There is also coupling between different functions using the same part of the database. This makes the software monolithic. Many functions have a coupling with the database and there is coupling between internals of different functions as they depend on the global state being a particular way. A major factor causing this is the database schema, the way data is laid out in the database. This is true whether the database has a schema or not: Data is laid out a certain way, and code expecting something else can fail.

Parts of the software are difficult to reuse, because a database of the same kind and with a similar schema must be available for the functions to run correctly. This might not be the same as the reusing systems database setup.

A database has a tendency to glow in complexity as more and more data is added to it and more and more people are using it. The database must have a structure that fits most of the use cases. It must have authorization that hinders the wrong people from accessing some of the data.

Most databases also have a complex interface, for example SQL or other query languages. The query languages are often written directly as text into the programming language accessing the global data. If not, then complex libraries such as object-relational mappers are used.

As data from the database is so readily available, and since it is often acceptable to use it anywhere, it is easy to read and write data often. This can dramatically affect the speed of an application. Databases rely on disks, which can be slow. Databases are often accessed over a network connection that can also be slow. Also, as data is read and written all over the place, transactions are used to prevent inconsistencies. These transactions can cause complex behavior that is very difficult to understand. They can slow a program, if many transactions are blocking. They can cause a process to fail, if the transaction cannot be completed.

As we can see, databases cause most of the same problems as global variables. As a program grows, these problems amplify, negatively affecting scalability.

Disks

Another major case of global variables are disks. Again, disks are not considered global variables as they also have a neat interface and authorization. Again, it is common to read and write files in the middle of functions.

In order to test a function using a disk, the correct files must be set up on the disk before executing this function. This again causes coupling between internals of functions, be it functions for tests or business logic, making software more monolithic.

Reusing a function reading and writing files is more difficult, as the right setup of files must be available. This might not be the same as the reusing software's file setup.

Disks have file systems. These file systems have grown quite complex and have a complex interface. As an entire disk is available to any program, it needs access controls. Also, disks are shared between many different programs, and conflicts can occur even between programs.

Security is also a huge issue. As the whole disk is basically available to all programs, this is a huge source of security problems as the disk is somehow read from or written to from a program that should not have had access to that part of the disk.

Disks are quite a bit slower than memory. Thus, reading and writing files often can slow down a program. Reading and writing a file can also fail if the file is not there or if another program is currently using it.

As for databases, disks also cause most of the same problems as global variables. As a program grows, these problems amplify, negatively affecting scalability.

Stateful Network Services

Another major case of global variables are stateful network services. Indeed both disks and databases are often network services. For a network service to inherit the problems of global variables, they must contain global state. A network service only performing a computation based on its input does not inherit the problems. Usually, network services are often the source and receptor of data, and that is the kind of network services that can cause similar problems as with global variables.

To test a function depending on network services, one must either use mocks to set up with the required global state. Another common thing is to set up test versions of network services. These test versions must have their contents carefully set up and maintained and in order to test functions using these network services.

Another problem for network services is concurrency. The sequence of calling network services is important: Race conditions, slowness or deadlocks due to blocking are considerable risks.

As network services are globally available, they need authorization. This increases the amount of global state that clients need to take into account and tests need to set up.

Functions that use network services internally get coupled with the network services. If you want to reuse functionality that depends on network services you either have to get access to the network service or set it up yourself. In addition, the required state must be set up within the network service to accommodate the requirements of the functions using them. This affects reusability.

So-called serverless architectures with stateless services can be a boon to scalability. But note that a network service is not stateless if it e.g. writes to a database, even though the database is used over a further network connection.

Other CPUs

Another major case of global variables are other CPUs. Today it is more relevant than ever to execute a program over many CPUs. If data is modified by a function in one CPU and then accessed in the middle of a function in another CPU, this must be considered global variables as the contents of the variable was changed after it was passed to the function.

This creates a coupling between the internals of the functions. A function only works if the functions in another CPU is executed at a certain time or at a certain speed. To test the function, functions must be run in other CPUs or mocked. This causes coupling between the tests and the internals of a function. It is also more difficult to reuse a function, as the same parallel setup must be used when reusing the function, and this might not be the way the reusing system is set up.

There are ways to deal with concurrency issues between CPUs, such as locks. Locks can cause deadlocks and slowness, and are famous for their complexity, variety and low scalability.

Memory management is difficult when memory is handled by many CPUs. When data is read and written to and allocated and unallocated by many different running functions, it is difficult to keep track of which is doing what.

Clocks and True Random Sources

Another case of global variables are clocks. Some functions often fetch the current time in the middle of the function. This impacts testability, as the clock must be mocked or the input set up with the current time. The use of a clock might actually make a function untestable, as the input has dependencies on a particular time that is not the current time.

A similar example is true random sources. A function can be difficult to test if it fetches a true random number during execution.

The way clocks and true random sources are implemented in a computer is through a specialized CPU instruction that takes no parameters and returns a value based on the value in a hardware component. This component can be a clock or a source of random noise.

Class and Object Fields, Local Static Variables

Finally, some languages have global variables with some limitations. Examples include class and object fields and local static variables.

In object oriented languages, it is common to set up object and class fields (taken to mean the same as properties and attributes). Class fields are less common, but object fields are very common. The object fields are accessible either directly in the object methods or through a pointer such as "this"; they do not need to be sent to the object methods. Although this is more restricted than globals, it still causes some of the problems of globals.

Before calling an object method the object must be set up correctly. This is known to cause a cascade of other objects that also needs to be set up, a phenomenon similar to global state. This negatively affects testability.

The reuse of an object method is more difficult as it is coupled with the class it is a part of. Of course, the class itself might be reusable as a whole.

In many languages, a local variable of a function can be set up as static, meaning it will keep its value between calls to a function. This is similar to class fields and causes similar problems as with global variables.

Captured Variables in Inner Functions

Captured variables in inner functions are another example of globals - a variable not passed to a function but accessible to it.

Having captured variables binds an inner function to its outer function.

A Solution

One might object to the above discussion saying that we need to use databases, files, network services, other CPUs and clocks to build actual working programs. This is true, however, it is possible to avoid using globals in ways that are not immediately obvious.

The first important thing is to accept that the above cases are cases of global variables and that if one uses them in the middle of functions, one will suffer the problems of global variables.

One solution is to develop most of a program without using global variables and restrict the use of globals to some parts of the program. The functions not using global variables will not suffer the problems of global variables. They can then be developed, tested and analyzed separately.

To make a function not use global variables, pass to it the data it needs and return the results. These functions will then be more reusable, more testable, and there will be less coupling to their internals from tests and other functions. This makes the functions units. These units can be developed, tested and analyzed in isolation. One does not need the full global state to work with them. A program made from isolatable units is less monolithic.

A surprisingly large part of software can be made without using globals. Developers are so used to using globals today that they are unaware of this. When a software system is built using current best practises, global variables will be used almost everywhere. Thus, it seems like it is necessary. The problems caused by using global variables also seem unavoidable and are taken for granted.

If a program is built with a high use of globals, it is not trivial to refactor it to not use globals. This will likely result in long rows of variables passed to functions. This might be considered an argument against not using globals, but it is actually a manifestation of the amount of globals used if their use is unrestricted. A program built mostly without using globals from the start will have short lists of parameters to the functions because functions getting more focussed input is a part of the design of the software from the start. This example demonstrates this: When ordinary global variables were in favor, refactoring it was difficult, but now a days when ordinary globals have fallen out of favor, writing code without them is not a problem.

If one builds most of a software system without using globals, one will not suffer their problems either. One will create software that is composed of testable, reusable units that can be built, tested and analyzed in separation. This can improve the scalability of a software project, as smaller teams or single developers can focus on parts of the software separately.