Preprocessor
The C preprocessor provides many additional features not found in the
language itself. You can use these to create constants, to include data
from other files, to shoot yourself in the foot.
Problems with preprocessors are difficult to spot because they are not
obvious Even the compiler may misreport preprocessor errors. For
example, the following program generates an error on Line 5 when the
problem is really a bad
#define
statement on Line 1.
{
Good style is the best defense against preprocessor efforts. It is
extreme important. By religiously following the rules discussed here,
you can catch errors before they happen.
Simple Define Statements
The
SYMBOL
is any valid C symbol name (by convention,
#define
names are all uppercase). The
value
can be a simple number or an expression.Like variable declarations, a constant declaration needs a comment
explains it. This comment helps create a dictionary of constants.
Constant names are all upper-case.
Constant expressions
If the
value
of a
#define
statement is a compound expression, you can run problems. The following code looks correct, but it hides a fatal flaw.
/* Length of the object (inches) (partl=10, part2=20) */
#define LENGTH 10 + 20 /* Bad practice */
#define WIDTH 30 /* Width of table (in inches) */
#define LENGTH 10 + 20 /* Bad practice */
#define WIDTH 30 /* Width of table (in inches) */
/*..... */
/* Prints out an incorrect width */
printf( "The area i s %d\n" , LENGTH * WIDTH);
Expanding the printf line, you get:
printf( "The area i s %d\n" , LENGTH * WIDTH);
printf( "The area i s %d\n" , 10 + 20 * WIDTH);
printf( "The area i s %d\n" , 10 + 20 * 30);
/* Prints out an incorrect width */
printf( "The area i s %d\n" , LENGTH * WIDTH);
Expanding the printf line, you get:
printf( "The area i s %d\n" , LENGTH * WIDTH);
printf( "The area i s %d\n" , 10 + 20 * WIDTH);
printf( "The area i s %d\n" , 10 + 20 * 30);
This another example of how the C preprocessor can hide problems. Clearly
LENGTH
is 10 + 20, which is 30. So
LENGTH
is 30, right or Wrong.
LENGTH
literally
10 + 20
, and:
10 + 20 * 30
is vastly different from:
30 * 30
To avoid problems like this, always surround all
#define
expressions with parenthesis (
()
). Thus, the statement:
#define constants vs. consts
In ANSI C constants can be defined two ways: through the
#define
statement and through use of the
const
modifier. For example, the following two statement, are equivalent:
Which statement should you use? The
const
declaration is better because it is in the main part of the C language and provides more protection against mistakes.
As you've already seen, the
#define
statement is a problem. SIZE is a macro and always expands to 10 + 20.
The const int size is an integer. It has the value 30. So while the
statement:
area = size * size: /* Works */
Generates the right number. So the
const
declaration is less error-prone. Also, if you make a mistake in defining a
const
, the compiler generates an error message that points at the correct line.
With a
#define
, the error appears when the symbol is used, not when it is defined.
#define vs. typedef
The
#define
directive can be used to define types, such as:
#define INT32 long int /* 32 bit signed integer type */
The typedef is preferred over the
#define
because is better integrated into the C language, and it can create more kinds of variable types than a mere define.
INT_PTR ptr1, ptr2; /* This contains a subtle problem */
What's the problem with the line
INT_PTR ptr1, ptr2
? The problem is that
ptr2
of type integer, not a pointer to integer. If you expand this line, the problem, comes apparent:
INT_PTR ptr1, ptr2; /* This contains a subtle problem */
Abuse of #define directives
The problem with this approach is that you are obscuring the C language
itself. The maintenance programmer who comes after you will know C, not a
half-Pascal half-C mongrel.
Even the simple
FOR_EACH_ITEM
macro hides vital C code. Someone else reading the program would have to go back to the definition of
FOR_EACH_ITEM
to figure out what the code does. By using the code instead of a macro, no lookup is necessary,
Keywords and standard functions
Defining new language elements is one problem. A far more difficult
problem occurs when a programmer redefines existing keywords or standard
routines. For example, in one program, the author decided to create a
safer version of the string copy routine:
The programmer performing the port was baffled. There was nothing wrong
with the parameters to strcpy. And of course, because strcpy is a
standard function, there Shouldn't be a problem with it.
But in this case, strcpy is not a standard function. It's a non-standard macro that results in a great deal of confusion.
Think about how difficult it would be to find your way if someone gave
you directions like these: "When I say north I mean west, and when I say
west I mean north. Now, go north three blocks, turn west for one (when I
say one I mean four), and then east two. You can't miss it."
Parameterized Macros
/* Double a number */
Enclosing the entire macro in parenthesis avoids a lot of trouble similar to the problems with simple
#define
s.
Multi-line Macros
This is fine as long as the target of the
#define
is a single C statement. Problems occur when multiple statements are defined. The following example defines a macro
ABORT
that will print a message and exit the system. But it doesn't work when put inside an if statement.
This is obviously not what the programmer intended. A solution is to enclose multiple statements in braces.
executes the body of the loop once and exits. C treats the entire
do/while
as a single statement, so it's legal inside a
if/else
set.
If a macro contains more than one statement, use a do/while structure to
enclose the macro. (Don't forget to leave out the semicolon of the
statement).
When macros grow too long, they can be split up into many lines. The preprocessor uses the backslash (
\
) to indicate "continue on next line." The latest
ABORT
macro also uses this feature.
Always stack the backslashes in a column. Try and spot the missing backslash in the following two examples:
Macros and Subroutines
Complex macros can easily resemble subroutines. It is entirely possible
to create a macro that looks and codes exactly like a subroutine. The
standard functions
getc
and
getchar
are actually not functions at all, but macros. These types of macros
frequently use lower-case names, copying the function-naming convention.
If a macro mimics a subroutine, it should be documented as a function.
That involves putting a function-type comment block at the head of the
macro:
The #include Directive
Include files are used to define data structures, constants, and
function prototypes for items used by multiple modules. it is possible
to put code in an include file, but this is rarely done.
Style for #Includes
,Most programs put the
#include
directives in a group just after the heading comments. That way they
are all together in a known place. System includes are enclosed in
<>) come first, followed by any local includes (enclosed in
""
).
#include
directives come just after the heading comments. Put system includes first, followed by local includes.
#include
directives that use absolute file names, that is specify path and name, such as
/user/sam/program/data.h
and
Y:\DEVELOP\PROGRAM\DEFS.H
make your program non-portable. If the program is moved to another
machine, even one using the same operating system, the source will have
to be changed.
Protecting against double #Includes
Include files can contain
#include
directives. This means that you can easily include the same file twice. For example, suppose
database.h
and
symbol.h
both need the file
defs.h
. Then, putting these lines:
Conditional Compilation
The preprocessor allows you conditionally to compile sections o through the use of
#ifdef
,
#else
, and
#endif
directives.
Actually, the
#else
and
#endif
directives take no arguments. The following them is entirely a comment, but a necessary one. It serves to match
#else
and
#endif
directive with the initial
#ifdef
.
Note: Some strict ANSI compilers don't allow symbols after
#else
or
#endif
directives. In these cases, the comment DOS must be formally written as
/* DOS */
.
Where to define the control symbols
The control symbols for conditional compilation can be defined through #define statements in the code or the -D compiler option.
If the compiler option is used, the programmer must know how the program
was compiled in order to understand its function. If the control symbol
is defined in the code, the programmer needs no outside help.
Therefore, avoid the compiler option as much as possible.
Define (or undefine) conditional compilation control symbols in the code rather than using the
-D
option to the compiler.
Put the #define statements for control symbols at the very front of the
file. After all, they control how the rest of the program is produced.
Use the #undef statement for symbols that are not defined. This serves
several functions. It tells the program that this symbol is used for
conditional compilation. Also, #undef contains a comment that describes
the symbol Finally, to put the symbol in, all the programmer needs to do
is change the #undef to #define.
Commenting out code
Sometimes a programmer wants to get rid of a section of code. This may
be because of an unimplemented feature, or some other reason. One trick
is to comment it out, but this can lead to problems:
Unless your compiler has been extended for nested comments, this code
will not compile. The commented-out section ends at the line /* Add our
new symbols */, not at the bottom of the example.
Note: This will not work if the programmer defines the symbol (However, any programmer who defines this symbol should be shot.)