Add Book to My BookshelfPurchase This Book Online

Chapter 2 - Utility Routines

UNIX Systems Programming for SVR4
David A. Curry
 Copyright © 1996 O'Reilly & Associates, Inc.

Manipulating Character Strings
Probably the most often used utility routines are those that manipulate character strings. Because the C language does not provide any character string primitive operators, you must perform all operations with library routines.
All of the routines described in this section operate on character strings, which are arrays of one or more non-zero bytes terminated by a null (zero) byte. Passing so-called binary data to these routines, in which null bytes are legal values rather than terminators, does not produce the desired results.
All of the examples in this chapter assume the existence of two functions that are not part of the standard C library:
void outputLine(char *line);
char *inputLine(void);
outputLine prints the contents of the character array line on the standard output (the screen). inputLine reads one line of characters from the standard input (the keyboard) and returns a pointer to a character array containing the line. These two functions exist so that we can do input and output without explaining the use of the UNIX I/O functions, which the following two chapters describe.
Computing the Length of a String
The simplest function for computing the length of a string is strlen:
   #include <string.h>
   size_t strlen(const char *s);
The single argument s is the null-terminated string whose length is to be computed; the length of the string in bytes, not including the null character, is returned.
Two other functions, strspn and strcspn, are provided to compute the length of substrings:
   #include <string.h>
   size_t strspn(const char *s1, const char *s2);
   size_t strcspn(const char *s1, const char *s2);
strspn returns the length of the initial segment of s1 that consists entirely of characters from the set contained in s2. In some sense, strcspn does the opposite, returning the length of the initial segment of s1 that consists entirely of characters not in the set contained in s2.
To demonstrate the use of strlen, Example 2-1 shows a program that implements a bubble sort. Bubble sort is a simple (but not very efficient) sorting algorithm that works by making several passes through the objects to be sorted, comparing items in adjacent locations and interchanging them if they are out of order. If no items are interchanged on any pass through the data, the data is completely sorted and the algorithm can stop.
Example 2-1:  bsort-length
#include <string.h>
#define NSTRINGS    16          /* max. number of strings       */
#define MAXLENGTH   1024        /* max. length of one string    */
void    bubbleSort(char **, int);
void    outputLine(char *);
char    *inputLine(void);
int
main(int argc, char **argv)
{
    int n, nstrings;
    char *p, *q, *line;
    char *strptrs[NSTRINGS];
    char strings[NSTRINGS][MAXLENGTH];
    /*
     * Read in NSTRINGS strings from the standard input.
     */
    for (nstrings = 0; nstrings < NSTRINGS; nstrings++) {
        /*
         * Get a line from the input.
         */
        if ((line = inputLine()) == NULL)
            break;
        /*
         * Copy the line.
         */
        for (p = line, q = strings[nstrings]; *p != '\0'; p++, q++)
            *q = *p;
        *q = '\0';
        /*
         * Save a pointer to the line.
         */
        strptrs[nstrings] = strings[nstrings];
    }
    /*
     * Sort the strings.
     */
    bubbleSort(strptrs, nstrings);
    /*
     * Print the strings.
     */
    for (n = 0; n < nstrings; n++)
        outputLine(strptrs[n]);
    exit(0);
}
/*
* bubbleSort - implementation of the basic bubble sort algorithm.
*/
void
bubbleSort(char **strings, int nstrings)
{
    int i, j;
    char *tmp;
    int notdone;
    j = nstrings;
    notdone = 1;
    while (notdone) {
        notdone = 0;
        j = j - 1;
        for (i = 0; i < j; i++) {
            /*
             * Use strlen() to compare the strings
             * by length.
             */
            if (strlen(strings[i]) > strlen(strings[i+1])) {
                tmp = strings[i+1];
                strings[i+1] = strings[i];
                strings[i] = tmp;
                notdone = 1;
            }
        }
    }
}
    % cat input
    xxxxxx
    xxxxx
    xxxxxxx
    xx
    x
    xxxxxxxxx
    xxxx
    xxxxxxxx
    xxx
    xxxxxxxxxx
    % bsort-length < input
    x
    xx
    xxx
    xxxx
    xxxxx
    xxxxxx
    xxxxxxx
    xxxxxxxx
    xxxxxxxxx
    xxxxxxxxxx
bsort-length begins by using inputLine to read in up to NSTRINGS lines of data and storing them in the strings array. The strptrs array points to the strings, so that by rearranging the pointers, you can achieve the sort. After reading in the strings, the bubbleSort function is called. bubbleSort makes several passes through the strings, comparing the lengths of adjacent strings with strlen. When the first string is longer than the second, the pointers to those two strings are exchanged. Finally, when the sort finishes, the strings are printed with outputLine.
Comparing Character Strings
To compare two character strings, use the strcmp and strncmp functions:
   #include <string.h>
   int strcmp(const char *s1, const char *s2);
   int strncmp(const char *s1, const char *s2, size_t n);
strcmp compares s1 and s2 and returns an integer less than, equal to, or greater than zero, based on whether s1 is lexicographically less than, equal to, or greater than s2. strncmp makes the same comparison, but looks at only the first n characters of each string. Characters following the null terminator of either string are not compared.
On systems that use the ASCII character set, “lexicographically less than” and “lexicographically greater than” correspond to “alphabetically before” and “alphabetically after.” However, on systems that use character sets that do not preserve alphabetical order (such as EBCDIC), this relationship does not hold.
Example2-2 shows another version of the bubble sort program; this one sorts the strings into alphabetical order.
Example 2-2:  bsort-alpha
#include <string.h>
#define NSTRINGS    16          /* max. number of strings       */
#define MAXLENGTH   1024        /* max. length of one string    */
void    bubbleSort(char **, int);
void    outputLine(char *);
char    *inputLine(void);
int
main(int argc, char **argv)
{
    int n, nstrings;
    char *p, *q, *line;
    char *strptrs[NSTRINGS];
    char strings[NSTRINGS][MAXLENGTH];
    /*
     * Read in NSTRINGS strings from the standard input.
     */
    for (nstrings = 0; nstrings < NSTRINGS; nstrings++) {
        /*
         * Get a line from the input.
         */
        if ((line = inputLine()) == NULL)
            break;
        /*
         * Copy the line.
         */
        for (p = line, q = strings[nstrings]; *p != '\0'; p++, q++)
            *q = *p;
        *q = '\0';
        /*
         * Save a pointer to the line.
         */
        strptrs[nstrings] = strings[nstrings];
    }
    /*
     * Sort the strings.
     */
    bubbleSort(strptrs, nstrings);
    /*
     * Print the strings.
     */
    for (n = 0; n < nstrings; n++)
        outputLine(strptrs[n]);
    exit(0);
}
/*
* bubbleSort - implementation of the basic bubble sort algorithm.
*/
void
bubbleSort(char **strings, int nstrings)
{
    int i, j;
    char *tmp;
    int notdone;
    j = nstrings;
    notdone = 1;
    while (notdone) {
        notdone = 0;
        j = j - 1;
        for (i = 0; i < j; i++) {
            /*
             * Use strcmp() to compare the strings
             * alphabetically.
             */
            if (strcmp(strings[i], strings[i+1]) > 0) {
                tmp = strings[i+1];
                strings[i+1] = strings[i];
                strings[i] = tmp;
                notdone = 1;
            }
        }
    }
}
       % cat input
       one
       two
       three
       four
       five
       six
       seven
       eight
       nine
       ten
       % bsort-alpha < input
       eight
       five
       four
       nine
       one
       seven
       six
       ten
       three
       two
This program is identical to bsort-length, except that the strlen comparison is replaced with a call to strcmp.
Solaris 2.x, HP-UX 10.x, and IRIX 5.x provide two additional functions for comparing strings, strcasecmp and strncasecmp:
   #include <string.h>
   int strcasecmp(const char *s1, const char *s2);
   int strncasecmp(const char *s1, const char *s2, int n);
These functions are similar to strcmp and strncmp, except that they ignore the case of letters in the strings. Unfortunately, these two functions are not very portable—systems that use the Domain Name System (DNS) probably have them, because they are used for comparing host names (in which case is not significant), but systems that do not use the DNS probably do not.
Copying Character Strings
To copy one character string to another, the strcpy and strncpy functions are used:
   #include <string.h>
   char *strcpy(char *dst, const char *src);
   char *strncpy(char *dst, const char *src, size_t n);
In both cases, the string pointed to by src is copied into the array pointed to by dst, and dst is returned. The first function, strcpy, copies characters until it encounters the null byte terminating src. The second function, strncpy, copies characters until it either encounters the null byte in src or until n characters are copied, whichever comes first.
The string returned by strcpy is always null terminated. However, the string returned by strncpy is not. If the number of characters in src is less than n, a null byte is appended to dst. However, if there are n or more than n characters in src, then dst is not null terminated. For this reason, it is customary to always explicitly place a null byte at the end of dst immediately following a call to strncpy, as shown below:
   char dst[SIZE];
   strncpy(dst, src, SIZE-1);
   dst[SIZE-1] = '\0';
To append one string to another, the strcat and strncat functions are used:
   #include <string.h>
   char *strcat(char *dst, const char *src);
   char *strncat(char *dst, const char *src, size_t n);
Both of these functions traverse dst until a null byte is found, copy src onto the end, and then return dst. strcat copies characters until it encounters a null byte in src, while strncat copies characters until it either encounters a null byte in src or until n characters have been copied, whichever comes first. Both strcat and strncat always null-terminate dst.
Example 2-3 shows a program that uses strcpy and strcat to make lists of strings.
Example 2-3:  make-a-list
#include <string.h>
void    outputLine(char *);
char    *inputLine(void);
int
main(int argc, char **argv)
{
    int len;
    char *line;
    char list[1024];
    len = sizeof(list) - 2;
    list[0] = '\0';
    /*
     * For each line in the input...
     */
    while ((line = inputLine()) != NULL) {
        /*
         * Compute its length, plus room for a comma and a space.
         */
        len += strlen(line) + 2;
        /*
         * If we don't have room in the buffer, output
         * the buffer and start a new one.  Otherwise,
         * add a comma and this line.
         */
        if (len >= sizeof(list)) {
            if (list[0] != '\0')
                outputLine(list);
            strcpy(list, line);
            len = strlen(line);
        }
        else {
            strcat(list, ", ");
            strcat(list, line);
        }
    }
    /*
     * Output the last part of the list.
     */
    if (list[0] != '\0')
        outputLine(list);
    exit(0);
}
       % cat input
       one
       two
       three
       four
       five
       six
       seven
       eight
       nine
       ten
       % make-a-list < input
       one, two, three, four, five, six, seven, eight, nine, ten
The program reads lines until it encounters the end-of-file marker. It computes the length of each line using strlen, and then determines whether the current input fits into the array holding the current list or not. If not, it outputs the current list, and then uses strcpy to begin a new list. If the line will fit in the current list, strcat is used to append a comma and a space to the list, and then to append the current line as well.
All four of the functions described in this section assume that dst is large enough to hold the results of their work; no bounds checking is performed. If dst is not large enough, a memory access violation is likely to occur, resulting in abnormal program termination and a core dump.
Searching Character Strings
A number of routines are provided to search a character string for either a single character or a substring. The two simplest functions are strchr and strrchr:
   #include <string.h>
   char *strchr(const char *s, int c);
   char *strrchr(const char *s, int c);
Both functions traverse the string s and return a pointer to the first occurrence of the character c, or the predefined constant NULL if the character is not found. strchr starts at the beginning of the string and searches toward the end, while strrchr starts at the end of the string and searches toward the beginning. Example 2-4 shows a program that reads lines from its standard input and searches each line for the character given as the program's first argument.
Example 2-4:  search-char
#include <string.h>
void    markLine(char *, char *, char *);
void    outputLine(char *);
char    *inputLine(void);
int
main(int argc, char **argv)
{
    char c;
    char *p, *line;
    if (argc != 2) {
        outputLine("Usage: search-char character");
        exit(1);
    }
    c = argv[1][0];
    while ((line = inputLine()) != NULL) {
        if ((p = strchr(line, c)) != NULL) {
            outputLine(line);
            markLine(line, p, p);
            outputLine(line);
        }
    }
    exit(0);
}
       % cat input
       one
       two
       three
       four
       five
       six
       seven
       eight
       nine
       ten
       % search-char e < input
       one
         ^
       three
          ^
       five
          ^
       seven
        ^  
       eight
       ^   
       nine
          ^
       ten
        ^
In the example shown, the program searches for the letter e on each line. When it finds one, the program prints the line, and then uses the markLine function to mark the position in which the letter was found. The markLine function is defined as follows:
#include <stdio.h>
void
markLine(char *line, char *start, char *stop)
{
    char *p;
    for (p = line; p < start; p++)
        *p = ' ';
    for (p = start; p <= stop; p++)
        *p = '^';
    for (p = stop+1; *p != '\0'; p++)
        *p = ' ';
}
If instead of a single character you need to search a string for the first occurrence of any of several characters, you can use strpbrk:
   #include <string.h>
   char *strpbrk(const char *s1, const char *s2);
strpbrk searches the string s1, starting at the beginning, for the first occurrence of any character in the string s2. It returns a pointer to the character, or the predefined constant NULL if none of the characters are found. Example 2-5 shows another version of our searching program using strpbrk.
Example 2-5:  search-charset
#include <string.h>
void    markLine(char *, char *, char *);
void    outputLine(char *);
char    *inputLine(void);
int
main(int argc, char **argv)
{
    char *p, *line, *charset;
    if (argc != 2) {
        outputLine("Usage: search-charset character-set");
        exit(1);
    }
    charset = argv[1];
    while ((line = inputLine()) != NULL) {
        if ((p = strpbrk(line, charset)) != NULL) {
            outputLine(line);
            markLine(line, p, p);
            outputLine(line);
        }
    }
    exit(0);
}
       % cat input
       one
       two
       three
       four
       five
       six
       seven
       eight
       nine
       ten
       % search-charset onx < input
       one
       ^ 
       two
         ^
       four
        ^ 
       six
         ^
       seven
           ^
       nine
       ^  
       ten
         ^
To locate the first occurrence of a substring instead of a single character, use the strstr function:
   #include <string.h>
   char *strstr(const char *s1, const char *s2);
strstr traverses the string s1 from the beginning, and returns a pointer to the start of the first occurrence of the substring s2, or the predefined constant NULL if no substring is found. Example 2-6 shows a third version of our searching program; this one uses strstr to find the substring given as the program's first argument.
Example 2-6:  search-string
#include <string.h>
void    markLine(char *, char *, char *);
void    outputLine(char *);
char    *inputLine(void);
int
main(int argc, char **argv)
{
    char *p, *line, *string;
    if (argc != 2) {
        outputLine("Usage: search-string string");
        exit(1);
    }
    string = argv[1];
    while ((line = inputLine()) != NULL) {
        if ((p = strstr(line, string)) != NULL) {
            outputLine(line);
            markLine(line, p, p + strlen(string) - 1);
            outputLine(line);
        }
    }
    exit(0);
}
       % cat input
       john smith
       sally jones
       bob johnson
       bill davis
       mary upjohn
       % search-string john < input
       john smith
       ^^^^     
       bob johnson
           ^^^^  
       mary upjohn
              ^^^^
This example also shows another use of the strlen function, to compute the end of the matched sequence as an argument to the markLine function.
Our last string-searching function is really intended for breaking a string into tokens, each separated from the others by some set of field-separator tokens such as spaces, tabs, colons, or periods. The function is called strtok:
   #include <string.h>
   char *strtok(char *s1, const char *s2);
The string s1 is considered to be a sequence of zero or more text tokens separated by spans of one or more characters from the set contained in s2. The first call to strtok places a null character into s1 immediately following the first token, and returns a pointer to the token.
strtok keeps track of its position in s1, and subsequent calls, made with the predefined constant NULL as the first argument (to tell strtok to continue using the same input string), work through s1, extracting each token in turn. When no more tokens remain, strtok returns NULL. Example 2-7 gives a sample usage of strtok.
Example 2-7:  search-token
#include <string.h>
void    markLine(char *, char *, char *);
void    outputLine(char *);
char    *inputLine(void);
int
main(int argc, char **argv)
{
    char copyline[1024];
    char *p, *line, *token, *fieldsep;
    if (argc != 3) {
        outputLine("Usage: search-token token fieldsep");
        exit(1);
    }
    token = argv[1];
    fieldsep = argv[2];
    /*
     * For each line in the input...
     */
    while ((line = inputLine()) != NULL) {
        /*
         * Save a copy of the line.
         */
        strcpy(copyline, line);
        /*
         * Find the first token.
         */
        if ((p = strtok(line, fieldsep)) == NULL)
            continue;
        /*
         * Search through all the tokens.
         */
        do {
            if (strcmp(p, token) == 0) {
                outputLine(copyline);
                markLine(copyline, copyline + (p - line),
                         copyline + (p - line) + strlen(token) - 1);
                outputLine(copyline);
                p = NULL;
            }
            else {
                p = strtok(NULL, fieldsep);
            }
        } while (p != NULL);
    }
    exit(0);
}
       % cat input
       one,two:three,four:five,six
       ten:eight:six:four:two
       two,four:six,eight,ten
       one,two,three,four:five
       % search-token two , < input
       two,four:six,eight,ten
       ^^^                  
       one,two,three,four:five
           ^^^               
       % search-token two : < input
       ten:eight:six:four:two
                          ^^^
       % search-token two ,: < input
       one,two:three,four:five,six
           ^^^                   
       ten:eight:six:four:two
                          ^^^
       two,four:six,eight,ten
       ^^^                  
       one,two,three,four:five
           ^^^               
This example shows the different results obtained on the same input file when you use different field separator characters. Note that when you use both characters together, search-token makes another match that was not possible when using each character individually. Although not shown in this example, you can change the contents of the s2 string in between calls to strtok; for example, to extract a specific field from a line and then extract a subfield from the field.
This example also shows the use of the strcpy function discussed earlier. Because strtok destroys the string contained in s1 (by placing nulls into it), search-token makes a copy of the string before searching it, so that it can print it out later. search-token also uses the strcmp function to match the tokens, and the strlen function to tell markLine how to highlight the match.
Non-Standard Character String Functions
All of the functions described up until this point (except strcasecmp and strncasecmp) are specified in the ANSI C standard, and should be present on most modern UNIX systems. However, SVR4 provides a number of additional functions for manipulating character strings that are not part of the ANSI C or POSIX standards. Do not use these functions if portability is an issue, but they may be useful to you otherwise.
You can include all of the functions described in this section in your program by linking with the -lgen library on Solaris 2.x and IRIX 5.x. Hewlett-Packard elected not to include most of these functions in their version of the system.
Searching character strings
The strfind function is similar to strstr, described earlier:
     #include <libgen.h>
     int strfind(const char *s1, const char *s2);
As with strstr, strfind searches the string s1 for the first occurrence of the string s2. However, instead of returning a pointer to the substring, strfind returns the integer offset of the beginning of the substring from the beginning of s1. If it cannot find the substring, strfind returns –1.
error 'ASP 0113'

Script timed out

/viewer_r.asp

The maximum amount of time for a script to execute was exceeded. You can change this limit by specifying a new value for the property Server.ScriptTimeOut or by changing the value in the IIS administration tools.