Add Book to My BookshelfPurchase This Book Online

Chapter 2 - Utility Routines

UNIX Systems Programming for SVR4
David A. Curry
 Copyright © 1996 O'Reilly & Associates, Inc.

Manipulating Character Classes
Particularly when parsing strings, it is often necessary to test characters for membership in particular sets, or character classes. The functions described in this section are provided for this purpose.
Testing Character Class Membership
The three functions isalpha, isupper, and islower test for three classes of letters:
   #include <ctype.h>
   int isalpha(int c);
   int isupper(int c);
   int islower(int c);
isupper tests for any character that is an uppercase letter and returns non-zero if it is, or zero if it is not. islower tests for any character that is a lowercase letter and returns non-zero if it is, or zero if it is not. isalpha returns non-zero for any character for which either isupper or islower is , and zero otherwise.
The two functions isdigit and isxdigit test for two classes of numbers:
   #include <ctype.h>
   int isdigit(int c);
   int isxdigit(int c);
isdigit returns non-zero for any character that is a decimal digit (0 through 9). isxdigit returns non-zero for any character that is a hexadecimal digit, (0 through 9, A through F, and a through f).
The isalnum function tests for letters or digits:
   #include <ctype.h>
   int isalnum(int c);
It returns non-zero for any character that satisfies either isalpha or isdigit.
The functions isspace, ispunct, and iscntrl test for non-alphanumeric characters:
   #include <ctype.h>
   int isspace(int c);
   int ispunct(int c);
   int iscntrl(int c);
isspace returns non-zero for any space, tab, carriage return, newline, vertical tab, or form feed, and zero for anything else. ispunct returns non-zero for any printable character for which neither isspace nor isalnum are . This generally equates to the set of punctuation and other special symbols. iscntrl tests for any control character, as defined by the character set. For ASCII, these are the characters with decimal values 0 through 31, inclusive.
The last three functions test for membership in broader character classes:
   #include <ctype.h>
   int isprint(int c);
   int isgraph(int c);
   int isascii(int c);
isprint returns non-zero for any printable character (generally, this means any non-control character) including space. isgraph returns non-zero for any printable character not including space. isascii returns non-zero for any ASCII character; these are the characters with decimal values 0 through 127, inclusive.
Changing Character Class Membership
Three functions move characters from one character class to another:
   #include <ctype.h>
   int toupper(int c);
   int tolower(int c);
   int toascii(int c);
toupper, when given a lowercase letter as an argument, returns the corresponding uppercase letter. If the argument is not a lowercase letter, it is returned unchanged. tolower, when given an uppercase letter as an argument, returns the corresponding lowercase letter. If the argument is not a lowercase letter, it is returned unchanged. toascii strips the eighth bit off any character it is passed, thus coercing the character into the ASCII character set. Example 2-9 shows a program that uses toupper and tolower to invert the case of all the letters it is given.
Example 2-9:  caseconv
#include <ctype.h>
void    outputChar(char);
int     inputChar(void);
int
main(int argc, char **argv)
{
    int c;
    while ((c = inputChar()) >= 0) {
        if (isupper(c))
            outputChar(tolower(c));
        else if (islower(c))
            outputChar(toupper(c));
        else
            outputChar(c);
    }
    exit(0);
}
       % cat input
       One
       Two
       Three
       Four
       Five
       Six
       Seven
       Eight
       Nine
       Ten
       % caseconv < input
       oNE
       tWO
       tHREE
       fOUR
       fIVE
       sIX
       sEVEN
       eIGHT
       nINE
       tEN
Porting Notes
All of the functions described in this section, except for isascii and toascii, are specified by the ANSI C standard. They exist in all versions of UNIX, even those that predate ANSI C.
On newer systems such as SVR4 that understand international character sets, isalpha, isupper, and islower return the proper values even for non-ASCII values such as letters with umlauts and other diacritical marks. isspace and ispunct also work properly for non-ASCII values such as the British pound symbol. On older UNIX systems, these functions work properly only on the ASCII character set.
On older versions of UNIX, toupper and tolower do not check their inputs before attempting to convert them to upper- or lowercase; this is the responsibility of the programmer. The ANSI C standard rectified this by prescribing that toupper and tolower should simply return their inputs if the conversion makes no sense. However, for portability, it is a good idea to always check the input yourself:
   if (isupper(c))
       c = tolower(c);
   if (islower(c))
       c = toupper(c);
On some older versions of UNIX, the isprint function returns false for the space character.

Previous SectionNext Section
Books24x7.com, Inc © 2000 –  Feedback