Notes on the C Preprocessor: Introduction

3 minute read

My graduate work on SuperC made made me way too familiar with the C preprocessor’s ins and outs, more than I ever could have imagined (or wanted). SuperC’s novel preprocessing and parsing algorithms let you parse a program without having to run the preprocessor first. Solving this challenge exposed me to interesting quirks of the preprocessor and strange usage patterns that appear in the wild. I’d like to share these and bring attention to this underrated aspect of compilers, hopefully providing insight for future language development and software tools.

Lurking between the lexer and parser, it can be hard to distinguish the preprocessor from the C language itself. For instance #include is not part of the C language, but a preprocessing feature that basically just copies in a given file before compilation. This and the rest of the preprocessor constructs, macros (#define) and conditional compilation (#ifdef), are completely distinct from the C language, sharing only its lexical specification. This makes for a powerful tool that is used to augment the diminutive C language, even enabling what resemble generics, iterators, modules, and more.

The preprocessor is not without its flaws. For one, it makes program analysis harder. Software tools for C are easily tripped up by unpreprocessed code. Yet the preprocessor is used a surprising amount. Ernst et al., looking at a variety of software packages, found that preprocessor directives make up over eight percent of source lines and macros are used on 25% of source lines on average1. One easy way for tools to avoid preprocessor directives and macros is to preprocessor first and work on pure C instead. Preprocessing first, however, loses much of the original program. Library headers are no longer separate, macros are expanded and lose their original meaning, and conditional compilation eliminates whole chunks of the source code. A preprocessed C file might be unrecognizable to the original developer.

Stroustrup recognized the problems with the preprocessor and designed C++ to minimize the need for it2:

Macros are very important in C but have far fewer uses in C++. The first rule about macros is: don’t use them unless you have to. Almost every macro demonstrates a flaw in the programming language, in the program, or in the programmer. Because they rearrange the program text before the compiler proper sees it, macros are also a major problem for many programming support tools. So when you use macros, you should expect inferior service from tools such as debuggers, cross-reference tools, and profilers.

The preprocessor should probably be replaced, but doing so is not easy. For one, there is an enormous amount of existing systems with sprawling C codebases that make extensive use of the preprocessor. A more subtle challenge is that it’s hard to match the utility of the preprocessor. While some usages are already subsumed by newer C programming constructs, const instead of certain macros, others are not trivial to replace, such as poor-man’s generics, without substantial additions to the programming language itself. Still, there are plenty of things you can do with the preprocessor that are basically absurd and don’t need to be supported in a preprocessor replacement. Take the following program that uses a macro to redefine the right curly brace:

#include <stdio.h>
#define CURLY }
int main(char **argv, int argc) {
  printf("hello, world!\n");

This code is completely legal and will compile and run, but would anyone ever need to do this? Probably not.

The trick is to give developers those preprocessor features they want and need, while leaving out those that are not useful or make language tools choke. To replace it entirely likely requires a combination of efforts across language design, tool building, software engineering, and development practices. This is a testament to the sheer usefulness of the preprocessor.

In this series of posts, we will dig into the preprocessor, how it’s used, and what makes it so hard to analyze. To get started, the first post in this series will cover a classic preprocessor pitfall and illustrate some preprocessor internals. From there we’ll look at dynamic includes, highly-configured arrays and structs, free macros in includes, poor-man’s iterators and generics, and more.

  1. Michael D. Ernst, Greg J. Badros, and David Notkin. An Empirical Analysis of C Preprocessor Use. Transactions on Software Engineering, 2002. 

  2. Bjarne Stroustrup. The C++ Programming Language, Fourth Edition, 2013. Section 12.6 Macros.