Draft 2002-10-10

Chapter 2

Source Files

Officially, the unit of compilation is called a translation unit. Colloquially, a translation unit is a source file, although the source does not have to be a file. Nonetheless, this book uses the phrase source file because it is familiar to most readers. The term translation encompasses compilation and interpretation, although most C++ translators are compilers. This chapter discusses how C++ reads and compiles (translates) source files (translation units).

Source File Conventions and Rules

A C++ program can have many source files, and each file can be compiled separately. The compilation process conceptually has several phases (although a compiler can merge phases if it can do so without affecting the observable results):

Preprocessor

Step 4 is the preprocessing step, which is separate from the main compilation step. Historically, the preprocessor has been a separate program, but compilers are not required to implement the preprocessor that way. Because of its history, though, the preprocessor has syntax and semantics that are quite different from the rest of C++.

The preprocessor handles preprocessing directives, which start with # as the first non-space character on a line. After the # is optional white space (no newlines are permitted) followed by the directive name. Each directive extends to the end of the line. Preprocessing directives can define and undefine macros, establish regions of conditional compilation, include other source files, and control the compilation process somewhat. Following are the preprocessing directives, with a brief description of each and a pointer to the section of this chapter where the directive is explained in depth.

#
Null directive has no effect.
#define
Defines a macro. (Macros section)
#elif
Else-if for conditional compilation. (Conditional compilation section)
#else
Else for conditional compilation. (Conditional compilation section)
#endif
End of conditional compilation. (Conditional compilation section)
#error
Issue an error message. (Control directives section)
#if
Test a condition. (Conditional compilation section)
#ifdef
Test whether a macro is defined. (Conditional compilation section)
#ifndef
Test whether a macro is not defined. (Conditional compilation section)
#include
Include another source file. (Include files section)
#line
Change the compiler's idea of the line number for error messages. (Control directives section)
#pragma
Control compiler. (Control directives section)
#undef
Undefine a macro. (Macros section)

Any other preprocessing token after the initial # character is an error.

Macros

A macro is a name that represents other text, called the macro expansion. When the macro name is seen in the source file, the preprocessor replaces the name with the expansion. A macro can have formal parameters, and actual arguments are substituted in the expansion.

C programmers are accustomed to using macros to declare constants and simple inline functions, but C++ offers const declarations and true inline functions. Macros are therefore used much less often in C++ than in C.

Simple Macros

A macro is defined with the #define directive. A macro's scope is from the point of definition to the end of the source file, or until you undefine the macro with the #undef directive. The simple form of a macro definition is just a macro name and its replacement:

#define name replacement

where the name can be any valid identifier, and replacement is the rest of the line, not including initial or trailing white space. Macro names must be unique in a single source file. You can repeat a macro definition for the same macro name, but only if the replacements are also the same.

Note that a macro name cannot be the same as a reserved keyword. Many compilers accept keywords as macro names, but your program would still be wrong and would not be portable to a compiler that is more strict about detecting this particular error.

Wherever a macro name appears as a distinct token after its definition, it is replaced with the replacement text. Macro names are not replaced inside strings and character literals, though. The replacement text is rescanned for macro names, which are recursively expanded until no more replacements take place. During replacement, the original macro name is not expanded if it appears in any replacement text.

The replacement text is never interpreted as a preprocessing directive. This means, for example, you cannot #define a macro within a macro's replacement text. Also, directive names are not subject to macro replacement (although directive arguments are). Thus, the following are not allowed:

#define def  #define x y
def           // error: macro cannot expand into a directive

#define u undef
#u name       // error: unknown directive: u

Macros with Arguments

You can also declare a macro with a parameter list, which is sometimes called a function-like macro: follow the macro name with a parameter list enclosed in parentheses. There must not be a space between the macro name and the left parenthesis or else the preprocessor will interpret the parameter list as part of the replacement text for a simple macro. The parameter list is a comma-separated list of identifiers, for example,

#define declare(x,y, z)   x y = z
#define print(a)  (::std::cout << (a) << '\n')

int main()
{
  declare(int, x, 42);
  print(x);
}

To use a function-like macro, the macro name must be followed by a comma-separated argument list in parentheses. Commas within parentheses are not argument separators, and parentheses must be balanced within a single argument. The macro invocation must have the same number of arguments as the macro definition has parameters. Newlines are permitted as ordinary white space characters in a macro invocation.

In the macro replacement text, each occurrence of a parameter name is replaced by the corresponding argument. If the parameter is not an operand to the # or ## operator, the argument is scanned for macros and expanded. After the arguments have been expanded, the # and ## operators are evaluated, and the resulting text is rescanned for macros. The macro name is expanded only once, so rescanning does not expand the name of the macro being expanded.

Macro Text Operators

Within macro replacement text, you can use the # and ## operators, as described in this section. The text operators can be used only in macro replacement text.

The # operator is a unary operator sometimes called the stringify operator because it turns its operand into a string. It must be followed by a macro parameter name. The # operator and the parameter name are replaced by a character string literal whose contents are the text of the macro argument that corresponds to the macro parameter. White space in the argument is condensed to a single space character between tokens; leading and trailing white space are removed.

The ## operator is a binary operator sometimes called the concatenation operator because it concatenates preprocessor tokens. It must not appear at start of end of the macro replacement text. The operands of the ## operator are expanded normally, and then the tokens immediately adjacent to the ## operator are concatenated to form a single token. If the result is not a valid token, the behavior is undefined. The token is then expanded normally.

The order of evaluation of # and ## operators is undefined. If the order is important, you can control the order by using multiple macros.

The need for macros in C++ is must less than in C. C++ has inline functions, constants, reference parameters, templates, and other language features that replace historic uses for macros. Nonetheless, there are circumstances where macros can be valuable tools.

For example, when writing a container class template (see Chapter 10), it is important to detect when a template parameter is an integral type. There are several ways to do this, and one way is to use type traits (Chapter 9). A template declares a special tag for all integral types and a different tag for all other types. The traits template is then specialized for the integral types, which is more easily done using a macro, as shown in Example 2-1.

Example 2-1: Defining type traits with a macro.

// Type trait to test whether a type is an integer.
struct is_integer_tag {};
struct is_not_integer_tag {};

// The default is that a type is not an integral type.
template<typename T>
struct is_integer {
  enum { value = 0 };
  typedef is_not_integer_tag tag;
};

// Override the default explicitly for all integral types.
#define decl_is_integer(T)     \
template<>                     \
struct is_integer<T> {         \
  enum { value = 1 };          \
  typedef is_integer_tag tag;  \
}

decl_is_integer(char);
decl_is_integer(signed char);
decl_is_integer(unsigned char);
decl_is_integer(int);
decl_is_integer(unsigned int);
decl_is_integer(short);
decl_is_integer(unsigned short);
decl_is_integer(long);
decl_is_integer(unsigned long);

#undef decl_is_integer

Example 2-2 shows another use for macros when testing the string class. Notice the use of the # operator to print the function and arguments being tested.

Example 2-2: Testing functions.

#include <iostream>
#include <string>

int main()
{
  using namespace std;

  string s("hello, world");

#define test(func) cout << #func "=" << s.func << '\n'

  test(erase(9, 1));
  test(erase(5));
  test(find_first_not_of("aeiou"));
  ...
}

Most compilers have an option where the compiler runs only the preprocessor, and you can examine the results after all macros have been expanded and all preprocessor directives have been evaluated. This mode can be helpful when debugging an incorrect macro expansion.

Example 2-3 shows contrived examples that illustrate how macros are expanded. Try running the example through your compiler to see if the results are correct. (Other that white space, the results should be the same as what is shown in the comments.)

Example 2-3: Expanding macros.

#define x             x.y
#define str(x)        #x
#define xstr(x)       str(x)
#define concat(x, y)  x ## y
#define parens(x)     (x)
#define apply(x,y)    x(y)
#define hello         HI

x                                       //  x.y
concat(st, r)(hello)                    //  "hello"
concat(x,str)(hello)                    //  "HI"
concat(s, tr)parens(hello)              //  str(HI)
concat(s, tr)(parens(hello))            //  "parens(hello)"
apply(concat(s, tr), hello)             //  "HI"

The first macro expansion shows how the macro name x is not expanded in the replacement text.

The second example shows how the concat macro forms a new token str from its arguments. After the concat macro is evaluated, the text is rescanned. The str macro is then invoked with the hello argument. Because the x parameter is an operand of #, the argument is not expanded. Instead, # is applied to hello to produce the string literal "hello".

The third example is like the second, except is invokes xstr instead of str. The difference is that xstr expands its argument because the replacement text does not use the # or ## operators. The expanded argument is passed to str. Thus, hello is expanded to HI, and str(HI) results in "HI".

The fourth example also invokes concat to produce str, but str is followed by the token parens, not by a left parenthesis, so it is not expanded as a macro. Instead, the parens macro is invoked; because its parameter x is not an operand of # of ##, the argument hello is expanded to HI.

The fifth example is just like the second, but emphasizes how the argument to str is not expanded.

The final example shows how to expand hello as an argument to str, even when str is the result of the concat macro. The parameters of apply are expanded, resulting in the text str(HI), which expands to "HI".

Predefined Macros

The following macros are predefined. Do not undefine or redefine any of the predefined macros.

__cplusplus

Has the value 199711L. Future versions of the C++ standard will use larger values. Non-conforming compilers should use a different value.
__DATE__
Expands to the date of compilation, as a string literal in the form "Mmm dd yyyy", where dd begins with a space for days less than 10. An implementation is free to substitute a different date, but the form is always the same and the date is always valid.
__FILE__
Expands to the name, as a string literal, of the source file being compiled.
__LINE__
Expands to the line number, as a decimal constant, of the source file being compiled.
__STDC__
Is implementation-defined. C++ compilers might or might not define this macro, and if it is defined, the value is implementation-defined. Note that C compilers are required to define __STDC__ as 1, and in some implementations, the same preprocessor might be used for C and C++.
__TIME__
Expands to the compilation time in the form "hh:mm:ss". An implementation is free to substitute a different time, but the form is always the same and the time is always valid.

An implementation is free to predefine other macros that use any of the reserved names, such as names that contain two adjacent underscores or a leading underscore followed by an uppercase letter. For example, compilers often define macros to indicate the host or target platform, e.g., __linux__.

Include Files

The #include directive includes a standard header or source file. It has two forms. The first form:

#include <header>

searches for header and replaces the directive with the entire contents of the header. The second form:

#include "filename"

searches for sourcefile and replaces the directive with the entire contents of the source file.

You can use other preprocessor tokens, provided they expand to one of the two standard forms. Combining tokens to form a single <header> or "filename" is implementation-defined. To preserve portability, use macros only for the entire sequence, e.g.,

#define HEADER "this.h"
#include HEADER

Further details are implementation defined. In particular, the implementation defines how and where the preprocessor searches for header or filename, how filename maps to an external file name, whether file names heed or ignore case distinctions, and whether different character sequences for filename represent distinct external files.

The only guarantee is that if filename consists of a sequence of letters and underscore characters (including universal character names for letters in other languages), followed by a period and a single letter or underscore, then the implementation must provide a unique mapping of filename to a source file (optionally ignoring case distinctions).

The most common convention is that <header> refers only to standard headers and to vendor-supplied extensions to the standard. Compilers typically have a way for you to supply your own additional libraries and use the associated headers as <header> includes. The second form is used for all header files that are part of the application, and are typically located in the same directory or folder as the application's source files. The most common file name convention is to end the name with .h (for header). For example, suppose you wrote a class to represent an employee. Put the class declaration in employee.h and the definitions of the members in employee.cpp. Any other file that needs to use the employee class can #include "employee.h" and use the class declaration. For example,

#include <set>
#include "employee.h"

class business_group {
private:
  std::set<employee> employees_;
  ...
};

Conditional Compilation

The preprocessor has several directives for conditional compilation, that is, establishing regions within a source file where the directives determine whether the code in the region is compiled. A conditional region starts with #ifdef, #ifndef, or #if and ends with #endif. Each region can have any number of #elif directives and an optional #else directive after all the #elif directives.

The #if and #elif directives take a single parameter, a constant expression. The expression differs slightly from non-preprocessor constant expressions:

The #ifdef and #ifndef directives take a single identifier as a parameter and are equivalent to #if defined identifier and #if not defined identifier, respectively.

Macros are expanded, except for the operands of the defined operator. The constant expression is evaluated and if the result is non-zero, the #if condition is true, and the code in the region that immediately follows is compiled. The region ends with #else, #elif, or #endif. If the #if expression is false, the condition for the next #elif is evaluated, and if that expression is true, its region is compiled, and so on. If all #elif expressions are false, and #else is present, its region is compiled. Conditional processing ends with the corresponding #endif directive.

Conditionals can be nested. Within an inner region, the preprocessor keeps track of conditional directives even if the region is not being compiled, so conditional directives can be properly matched. Example 2-4 shows one way to nest conditional directives.

Example 2-4: Nesting conditional directives.

#define zero zero  // identifiers are converted to 0
#define one  true  // bool expressions are promoted to int

#if one
// This region is compiled
  #if zero
  This region can contain erroneous C++ code. The code is not
  compiled, so the errors don't matter.
  #else // This #else matches the inner #if
    // This region is compiled.
    const int zero = 0;
  #endif // This #endif matches the inner #if
  int x = zero;
#else
  This #else matches the outer #if. Because the #if
  condition was true, the #else region is not compiled.
#endif

The most common use for conditional directives is to guard header files from multiple inclusion. All the standard headers are guarded so including them more than once has no harmful effects. This is important because an implementation might include one header in another header. For example, <map> might include <utility> to get the declaration for the pair<> template. If you explicitly #include <map> and #include <utility>, you might end up including <utility> more than once.

You can guard your own headers by using conditional directives to define a guard macro and using the guard macro to ensure the file's contents are compiled only when the macro is not defined, as shown in Example 2-5.

Example 2-5: Guarding a header against multiple inclusion.

// In the header file employee.h:
#ifndef EMPLOYEE_H
#define EMPLOYEE_H
// Thus, the entire contents of the file are compiled only
// when EMPLOYEE_H is not defined. The first time the file
// is #included, the macro is not defined, in which case
// it is immediately defined. The second and subsequent
// times the same header is included in the same source
// file, the macro and conditional directives ensure
// that the entire file is skipped.

class employee { ... };

#endif  // end of employee.h

Control Directives

The control directives are #error, #line, and #pragma.

The #error directive tells the preprocessor to issue an error message and mark the source file as ill-formed, just as though the programmer made a programming error that the compiler detected. You can supply any preprocessor tokens as parameters, and those tokens are echoed in the error message, for example,

#if !defined(__cplusplus) || (__cplusplus < 199711L)
#error Not a conforming C++ compiler.
#endif

The #line directive changes the compiler's notion of the current file name and line number. The directive has two forms:

#line digits
#line digits string

The first form changes the line number (as expressed by the __LINE__ directive, and as used in error messages) to digits. The second form changes the line number to digits and the file name to the contents of string. The new file name is used as the value of the __FILE__ macro.

The #pragma directive is implementation-defined. An implementation can define certain pragma parameters to control the compiler. The preprocessor ignores any pragma that it does not recognize.

Because pragmas are highly compiler-dependent, you should avoid their use as much as possible. Most compilers let you control the compiler by providing command line options, configuration files, or project files. Do you best to keep compiler-specific information out of your source files.