Add Book to My BookshelfPurchase This Book Online

Chapter 16 - Miscellaneous Routines

UNIX Systems Programming for SVR4
David A. Curry
 Copyright © 1996 O'Reilly & Associates, Inc.

Searching
SVR4 provides a number of useful routines for performing standard types of searches in memory, including linear search, binary search, and hash tables. These tasks are performed frequently, and a set of library routines that provide good algorithmic implementations of them is a valuable addition to the UNIX programming library. Unfortunately, most other implementations do not provide these functions.
Linear Search
A linear search is the most inefficient of searches, but it is useful for small lists. The search begins at the front of the list, and compares each item in turn until the desired item is found. On average, n/2 comparisons are performed in each search, where n is the size of the list.
The linear search algorithm is implemented by the lsearch and lfind functions:
    #include <search.h>
    void *lsearch(const void *key, void *base, size_t *nelp,
            size_t width, int (*compar)(const void *, const void *));
    void *lfind(const void *key, const void *base, size_t *nelp,
            size_t width, int (*compar)(const void *, const void *));
These functions implement Algorithm S from Donald Knuth's The Art of Computer Programming, Volume 3, Section 6.1, published by Addison-Wesley.
In both cases, key is the datum to be found in the table, base points to the first element in the table, nelp points to an integer containing the number of elements currently in the table, and width is the size of a table element in bytes. The compar parameter is a pointer to a function (e.g., strcmp) used to compare two elements of the table. The function must return 0 if the elements are equal, and non-zero otherwise.
The lsearch function searches for the key in the table, and returns a pointer to it. If the key is not found, it is added to the end of the table, nelp is incremented, and a pointer to the new entry is returned.
The lfind function searches for the key in the table, and returns a pointer to it. If the key is not found, a null pointer is returned instead.
The pointers to the key and the element at the base of the table may be of any type. The comparison function need not compare every byte of its arguments; this allows arbitrary data types (strings, integers, structures) to be searched. A side effect of using lsearch to create the table is to remove duplicates from a list, since it only adds an element to the list if it is not already present.
Example 16-2 shows a small program that demonstrates the use of lsearch and lfind. The program prompts the user for several strings, and adds them to a table. Since it uses lsearch to add them to the table, duplicates won't be added. The program then prints the resulting table, and lets the user search for strings. The searches are done with lfind, so that strings not in the table do not get added.
Example 16-2:  lsearch
#include <search.h>
#include <string.h>
#include <stdio.h>
#define TABLESIZE   10      /* max. size of the table       */
#define ELEMENTSIZE 16      /* max. size of a table element */
int compare(const void *, const void *);
int
main(void)
{
    int i;
    char *p;
    size_t nel;
    char line[ELEMENTSIZE];
    char table[TABLESIZE][ELEMENTSIZE];
    /*
     * Tell the user what to do.
     */
    printf("Enter %d strings, not all unique.\n\n", TABLESIZE);
    /*
     * Read in some strings.
     */
    nel = 0;
    for (i = 0; i < TABLESIZE; i++) {
        /*
         * Prompt for each string.
         */
        printf("%2d> ", i + 1);
        /*
         * Read the string.
         */
        if (fgets(line, sizeof(line), stdin) == NULL)
            exit(0);
        /*
         * Strip the newline.
         */
        line[strlen(line) - 1] = '\0';
        /*
         * Search for the string.  If it's not in the table,
         * lsearch will add it for us.
         */
        (void) lsearch(line, table, &nel, ELEMENTSIZE, compare);
    }
    /*
     * Print the contents of the table.
     */
    printf("\nContents of the table:\n");
    for (i = 0; i < nel; i++)
        printf("\t%s\n", table[i]);
    /*
     * Let the user search for things.
     */
    for (;;) {
        /*
         * Prompt for a search string.
         */
        printf("\nSearch for: ");
        /*
         * Read the search string.
         */
        if (fgets(line, sizeof(line), stdin) == NULL) {
            putchar('\n');
            exit(0);
        }
        /*
         * Strip the newline.
         */
        line[strlen(line) - 1] = '\0';
        /*
         * Search for the string.  lfind will return null
         * if it's not there.
         */
        p = (char *) lfind(line, table, &nel, ELEMENTSIZE, compare);
        /*
         * Print the search results.
         */
        if (p == NULL) {
            printf("String not found.\n");
        }
        else {
            printf("Found at location %d.\n",
                   ((int) p - (int) table) / ELEMENTSIZE + 1);
        }
    }
}
/*
* compare - compare two strings, return 0 if equal, non-zero if not.
*/
int
compare(const void *a, const void *b)
{
    return(strcmp((char *) a, (char *) b));
}
    % lsearch
    Enter 10 strings, not all unique.
   
     1> abcdef
     2> ghijkl
     3> mnopqr
     4> stuvwx
     5> yz
     6> abcdef
     7> ghijkl
     8> mnopqr
     9> stuvwx
    10> yz
   
    Contents of the table:
            abcdef
            ghijkl
            mnopqr
            stuvwx
            yz
   
    Search for: abc
    String not found.
   
    Search for: abcdef
    Found at location 1.
   
    Search for: ghijkl
    Found at location 2.
   
    Search for: mn
    String not found.
   
    Search for: yz
    Found at location 5.
   
    Search for: ^D
Binary Search
The binary search is one of the most efficient methods for searching large tables. Given a table of n entries, a binary search compares the item to be found against item n/2 in the table. If the item to be found is “less” than the item in the middle of the table, it then looks at the item halfway between the start of the table and the middle of the table. If the item to be found is “more” than the item in the middle of the table, it then looks at the item halfway between the middle of the table and the end of the table. This process continues, dividing the search space in half each time, until the item is found or not. In order for a binary search to work, the table must be sorted into increasing order. On average, log2 n comparisons are performed to find any item in the table. Even for large tables, this is very efficient—a table of one million entries only requires 20 comparisons to find any item in the table.
The binary search algorithm is implemented by the bsearch function:
    #include <stdlib.h>
    void *bsearch(const void *key, const void *base, size_t nel,
            size_t size, int (*compar)(const void *, const void *));
This function implements Algorithm B from Volume 3, Section 6.2.1 of The Art of Computer Programming.
The key parameter is the item to be found; base points to the beginning of the table to search. The table must be sorted into increasing order. The nel parameter gives the number of elements in the table, each of which is size bytes in size. The compar parameter must point to a function that compares two table entries and returns less than, equal to, or greater than 0 depending on whether the first item is to be considered less than, equal to, or greater than the second item. If the item is found, bsearch returns a pointer to it; if the item is not in the table, NULL is returned.
Example 16-3 shows a program that reads in the system spelling dictionary, /usr/dict/words, and then searches it. The file is already sorted, but the sort is case-independent. For this reason, we use strcasecmp in our comparison function.
Example 16-3:  bsearch
#include <search.h>
#include <string.h>
#include <stdio.h>
#define TABLESIZE   32768       /* max. size of the table       */
#define ELEMENTSIZE 32          /* max. size of a table element */
int compare(const void *, const void *);
int
main(void)
{
    char *p;
    FILE *fp;
    size_t nel;
    char line[ELEMENTSIZE];
    char table[TABLESIZE][ELEMENTSIZE];
    /*
     * Open the file.
     */
    if ((fp = fopen("/usr/dict/words", "r")) == NULL) {
        perror("/usr/dict/words");
        exit(1);
    }
    printf("Reading the table... ");
    fflush(stdout);
    /*
     * Read in the file.
     */
    for (nel = 0; nel < TABLESIZE; nel++) {
        /*
         * Read a line.
         */
        if (fgets(table[nel], ELEMENTSIZE, fp) == NULL)
            break;
        /*
         * Strip the newline.
         */
        table[nel][strlen(table[nel]) - 1] = '\0';
    }
   
    printf("done.\n");
    fclose(fp);
   
    /*
     * Let the user search for things.
     */
    for (;;) {
        /*
         * Prompt for a search string.
         */
        printf("\nSearch for: ");
        /*
         * Read the search string.
         */
        if (fgets(line, sizeof(line), stdin) == NULL) {
            putchar('\n');
            exit(0);
        }
        /*
         * Strip the newline.
         */
        line[strlen(line) - 1] = '\0';
        /*
         * Do a binary search for the string.
         */
        p = (char *) bsearch(line, table, nel, ELEMENTSIZE, compare);
        /*
         * Print the search results.
         */
        if (p == NULL) {
            printf("String not found.\n");
        }
        else {
            printf("Found at location %d.\n",
                   ((int) p - (int) table) / ELEMENTSIZE);
        }
    }
}
/*
* compare - compare two strings, return 0 if equal, non-zero if not.
*/
int
compare(const void *a, const void *b)
{
    return(strcasecmp((char *) a, (char *) b));
}
    % bsearch
    Reading the table... done.
   
    Search for: mambo
    Found at location 14113.
   
    Search for: zip
    Found at location 25121.
   
    Search for: alpha
    Found at location 722.
   
    Search for: xyzzy
    String not found.
   
    Search for: ^D
Hash Tables
Hash tables are frequently used to manage symbol tables in compilers and other similar programs. They store items in a series of buckets (for example, one bucket for each letter of the alphabet) where they can be found with a minimum of searching. The advantage to using a hash table as opposed to a linear or binary search is that items can be inserted into the table in any order (unlike binary search), yet they can be found quickly (unlike linear search). The disadvantage is that without a good estimate of how large your table needs to be, hashing can be very inefficient.
Hash tables are implemented with the hsearch, hcreate, and hdestroy functions:
    #include <search.h>
    typedef struct {
        char    *key;
        char    *data;
    } ENTRY;
    typedef enum { FIND, ENTER } ACTION;
    ENTRY *hsearch(ENTRY item, ACTION action);
    int hcreate(size_t nel);
    void hdestroy(void);
These functions implement Algorithm D from Volume 3, Section 6.4 of The Art of Computer Programming.
A hash table is created with the hcreate function; the nel parameter is an estimate of the maximum number of entries the table will contain. A hash table is destroyed with the hdestroy function. Only one hash table may be in use at a time.
The hsearch function searches for item in the hash table, using strcmp to compare the item.key fields. The item.data field points to arbitrary data associated with the key. If action is FIND, hsearch returns a pointer to the item, or NULL if it is not in the table. If action is ENTER, hsearch searches for the item, and if it is found, returns a pointer to the item already in the table. If it is not found, the item is added to the table, and a pointer to its location returned. The hsearch function uses malloc to allocate space for the table entries.
Example 16-4 is a sample program that uses hsearch to manage a list of people and some personal data about them. The program prompts for some input data, stores that in the hash table, and then lets the user search the table.
Example 16-4:  hsearch
#include <search.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
struct data {
    int age;
    int height;
    int weight;
};
int
main(void)
{
    char *p;
    ENTRY item;
    ENTRY *result;
    struct data *d;
    char buf[BUFSIZ];
   
    /*
     * Create the hash table.
     */
    hcreate(100);
    printf("Enter Name/age/height/weight; terminate with blank line.\n\n");
    /*
     * Read information until a blank line.
     */
    while (fgets(buf, sizeof(buf), stdin) != NULL) {
        /*
         * Blank line, all done.
         */
        if (*buf == '\n')
            break;
       
        /*
         * Allocate a data structure (we should check for
         * errors here).
         */
        d = (struct data *) malloc(sizeof(struct data));
        item.data = (char *) d;
        /*
         * Split up the data (we should check for errors
         * here).
         */
        p = strtok(buf, "/");
        item.key = strdup(p);
        p = strtok(NULL, "/");
        d->age = atoi(p);
        p = strtok(NULL, "/");
        d->height = atoi(p);
        p = strtok(NULL, "/");
        d->weight = atoi(p);
        /*
         * Add the item to the table.
         */
        (void) hsearch(item, ENTER);
    }
    /*
     * Let the user search for things.
     */
    for (;;) {
        /*
         * Prompt for a search string.
         */
        printf("\nSearch for: ");
        /*
         * Read the search string.
         */
        if (fgets(buf, sizeof(buf), stdin) == NULL) {
            putchar('\n');
            hdestroy();
            exit(0);
        }
        /*
         * Strip the newline.
         */
        buf[strlen(buf) - 1] = '\0';
        /*
         * Look in the table for the item.
         */
        item.key = buf;
        result = hsearch(item, FIND);
        /*
         * Print the search results.
         */
        if (result == NULL) {
            printf("Entry not found.\n");
        }
        else {
            d = (struct data *) result->data;
            printf("Name: %s\nAge: %d\nHeight: %d\nWeight: %d\n",
                   result->key, d->age, d->height, d->weight);
        }
    }
}
    % hsearch
    Enter Name/age/height/weight; terminate with blank line.
   
    Dave/32/73/220
    Cathy/34/64/120
    Trevor/8/48/85
    Sean/3/32/31
   
    Search for: Cathy
    Name: Cathy
    Age: 34
    Height: 64
    Weight: 120
   
    Search for: Trevor
    Name: Trevor
    Age: 8
    Height: 48
    Weight: 85
   
    Search for: Fred
    Entry not found.
   
    Search for: ^D
Binary Trees
Binary trees are an efficient way to maintain a list of items in sorted order. At any given node in the tree, all of the items below and to the left of that node are “less” than that node, and all of the items below and to the right of that node are “greater” than that node. For a tree with n nodes, searches of the tree can be performed in log2 n comparisons.
The binary tree algorithms are implemented with the tsearch, tfind, tdelete, and twalk functions:
    #include <search.h>
    typedef enum { preorder, postorder, endorder, leaf } VISIT;
    void *tsearch(const void *key, void **rootp,
            int (*compar)(const void *, const void *));
    void *tfind(const void *key, const void **rootp,
            int (*compar)(const void *, const void *));
    void *tdelete(const void *key, void **rootp,
            int (*compar)(const void *, const void *));
    void twalk(void *rootp, void(*action)(void **, VISIT, int));
These functions implement Algorithm D and Algorithm T from Volume 3, Section 6.2.2 of The Art of Computer Programming.
The compar parameter to the first three functions is a pointer to a function that compares two items and returns less than, equal to, or greater than 0 depending on whether the first key should be considered less than, equal to, or greater than the second key.
The tsearch function is used to build and search the tree. It searches the tree for key and, if found, returns a pointer to it. If not found, tsearch adds it to the tree and returns a pointer to it. Only pointers are copied into the tree; the calling program is responsible for saving the data. The rootp function returns a pointer to a variable that points to the root of the tree; if the pointer is NULL, a new tree will be created.
The tfind function is almost identical to tsearch, except that instead of adding an item to the tree if it is not already there, tfind returns NULL in that case. Note that there is one level less redirection in rootp when used with tfind.
The tdelete function removes an item from the tree. It returns a pointer to the item's parent node, or NULL if the item was not in the tree.
The twalk function traverses the tree rooted at rootp (any node may be used as the root of the tree for a walk below that node). The action parameter is a pointer to a function that is called at each node. The function takes three arguments: a pointer to the node being visited, the number of times the node has been visited, and the level at which the node resides in the tree, with the root being level zero. The second argument is given as an enumerated type with the following values:
preorder
The node has been visited for the first time, before any of its children.
postorder
The node has been visited for the second time, after its left child but before its right child.
endorder
The node has been visited for the third time, after both of its children.
leaf
The node is a leaf; it has no children (and hence is only visited once).
An alternative notation for trees uses the terms “preorder,” “inorder,” and “postorder” for the same three node visits; this may cause some confusion with the different meanings of “postorder.”
Example 16-5 shows a program that reads a number of strings from the standard input, storing them in a binary tree. It then prints the tree in alphabetical order.
Example 16-5:  tsearch
#include <search.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
struct node {
    char    *string;
    int     length;
};
int     compareNode(const void *, const void *);
void    printNode(void **, VISIT, int);
int
main(void)
{
    void *root;
    struct node *n;
    char buf[BUFSIZ];
    root = NULL;
    /*
     * Read strings until end of file.
     */
    while (fgets(buf, sizeof(buf), stdin) != NULL) {
        /*
         * Strip the newline.
         */
        buf[strlen(buf) - 1] = '\0';
        /*
         * Allocate a node structure.
         */
        n = (struct node *) malloc(sizeof(struct node));
        if (n == NULL) {
            fprintf(stderr, "out of memory.\n");
            exit(1);
        }
        /*
         * Save the information in the node.
         */
        n->string = strdup(buf);
        n->length = strlen(buf);
        /*
         * Add the item to the tree.
         */
        (void) tsearch((void *) n, &root, compareNode);
    }
    /*
     * Print out the tree in alphabetical order.
     */
    twalk(root, printNode);
    exit(0);
}
/*
* compareNode - compare the strings in two nodes.
*/
int
compareNode(const void *a, const void *b)
{
    struct node *aa, *bb;
    aa = (struct node *) a;
    bb = (struct node *) b;
    return(strcmp(aa->string, bb->string));
}
/*
* printNode - print a node - we only print if this is the postorder
*             (inorder) visit or a leaf; this results in
*             alphabetical order.
*/
void
printNode(void **node, VISIT order, int level)
{
    struct node *n;
    n = *(struct node **) node;
    if (order == postorder || order == leaf)
        printf("level=%d, length=%d, string=%s\n", level, n->length,
               n->string);
}
    % tsearch
   
    one
    two
    three
    four
    five
    six
    seven
    eight
    nine
    ten
    ^D
   
    level=3, length=5, string=eight
    level=2, length=4, string=five
    level=1, length=4, string=four
    level=2, length=4, string=nine
    level=0, length=3, string=one
    level=4, length=5, string=seven
    level=3, length=3, string=six
    level=4, length=3, string=ten
    level=2, length=5, string=three
    level=1, length=3, string=two
Queues
Two functions are provided to manipulate queues built from doubly-linked lists:
    #include <search.h>
    void insque(struct qelem *elem, struct qelem *pred);
    void remque(struct qelem *elem);
Each element in the list must be of type struct qelem:
    struct qelem {
        struct qelem    *q_forw;
        struct qelem    *q_back;
        char            *q_data;
    };
The insque function inserts the element pointed to by elem into the queue immediately after the element pointed to by pred. The remque function removes the element pointed to by elem from the queue.
HP-UX 10.x does not provide the struct qelem data type; instead the arguments to insque and remque are of type void *.

Previous SectionNext Section
Books24x7.com, Inc © 2000 –  Feedback