Add Book to My BookshelfPurchase This Book Online

Chapter 14 - Networking with Sockets

UNIX Systems Programming for SVR4
David A. Curry
 Copyright © 1996 O'Reilly & Associates, Inc.

Networking Concepts
Before discussing how network programs are written, we define a number of the underlying concepts.
Host Names and Addresses
Humans use host names to communicate with a host. Programs use host addresses.
Host names
Each host on the network has a unique host name. On a private network, host names can be simple, such as “fred” or “wilma.” On the Internet, however, a host name must be a fully qualified domain name, such as “fred.some.college.edu” or “wilma.company.com.”
The Internet Domain Name System allows the host name space to be subdivided into a number of logical areas, or domains. This allows the administration of the host name space to be spread out such that in general, each organization on the Internet can administer its own name space. In olden days, the entire host name space was controlled by the Network Information Center, and any time a new host was added to the network, it had to be registered with them. With over nine million hosts on the Internet as of January 1996, this is obviously no longer workable. Another reason for subdividing the name space is to allow host names to be re-used in different areas of the name space. Before the domain name system, there could be one and only one host named “fred” on the entire Internet. Again, with over nine million hosts, this rapidly becomes unworkable unless we all use host names such as “aaaaaaa,” “aaaaaab,” and so forth. The domain name system allows the “fred” host name to be used in each logical area. There can still be one and only one “fred” within a logical area, but two different logical areas can each have a “fred.”
At the top level of the system are the largest domains; each country has a two-letter domain. For example, “us” is the United States, “se” is Sweden, and “mx” is Mexico. In the United States, there are four other top-level domains: “edu” is educational institutions (mostly colleges and universities), “mil” is military organizations, “gov” is non-military government organizations, and “com” is commercial organizations. These domains should really be under the “us” domain, since they are specific to the United States, but historical reasons make it otherwise.
Each top-level domain is subdivided into other domains. For example, the “edu” domain is divided into domains for each college or university: “mit.edu,” “purdue.edu,” “berkeley.edu,” and so on. These domains can then be subdivided even further, for example, “cs.purdue.edu” for the Computer Science department, “cc.purdue.edu” for the Computer Center, and “physics.purdue.edu” for the Physics department. There is, generally speaking, no practical limit to how many times a domain may be subdivided, although most are not broken up beyond three or four levels.
The last subdivision of a domain is the host name. For example, “fred.cs.berkeley.edu” and “wilma.cs.berkeley.edu.” On hosts within the “cs.berkeley.edu” domain, these hosts can be referred to as simply “fred” and “wilma.” However, from a host not in the “cs.berkeley.edu" domain, the fully-qualified domain name (“fred.cs.berkeley.edu” or “wilma.cs.berkeley.edu”) must be used. Note that because the domain name is part of the host name, “fred.cc.purdue.edu,” “fred.mit.edu,” “fred.army.mil,” “fred.se,” and “fred.co.ac.uk” all refer to different hosts.
To get the local host's name, you can use the uname function, described in Chapter 9, System Configuration and Resource Limits. For portability reasons, though, when using the Berkeley socket interface, it is more common to obtain the host name using the gethostname function:
    int gethostname(char *name, int len);
This function places the local host's name into the character array pointed to by name, which is len bytes in size. It returns 0 on success; on failure it returns -1 and stores the reason for failure in errno. Depending on the particular configuration of your host, gethostname may or may not return the fully-qualified domain name for the host.
Host addresses
Host names are a useful way to identify hosts to other human beings, but they do not provide enough information in and of themselves to allow the networking software to make much use of them. For this reason, each host also has a host address. A host address is a unique 32-bit number; each host on the network has a different address.
Host addresses, also called network addresses or Internet addresses, are usually written in “dotted-quad” notation, in which each byte of the address is converted to an unsigned decimal number and separated from the next by a period (dot). For example, the hexadecimal network address 0x7b2d4359 would be written as 123.45.67.89.
Each network address consists of two parts: a network number and a host number. There are different types of addresses: Class A network addresses use one byte for the network number and three bytes for the host number; Class B network addresses use two bytes for the network number and two bytes for the host number; Class C addresses use three bytes for the network number and one byte for the host number. It is also possible to divide the host number part of an address further; part of it can be used to represent a subnetwork number, and the rest of it can be used to represent the host number on that subnetwork.
The network number part of an address is used by the network routing software to decide how to deliver data from one network (say, the one at Berkeley) to another (say, the one at Harvard). It corresponds in some ways to the area code part of a telephone number that tells the telephone switches how to route the call from one area of the country to another. The subnetwork number tells the network routing software within a given network what part of the network to deliver the data to. For example, within Berkeley, the subnetwork number would indicate whether the data should go to the Computer Science department or the English department. It corresponds in some ways to the exchange part of a telephone number in the United States, which tells the telephone system which central office should receive the data. Finally, the host number part of an address indicates the specific host that is to receive the data, just as the last part of a telephone number identifies the specific telephone to ring.
To translate between host names and host addresses, several functions are provided:
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <netinet/in.h>
struct hostent *gethostent(void);
struct hostent *gethostbyname(const char *name);
struct hostent *gethostbyaddr(const char *addr, int len, int type);
int sethostent(int stayopen);
int endhostent(void);
These functions look up host names and host addresses in one of several different databases, depending on how your system is configured. The /etc/hosts file lists host name and address pairs, and is usually used only for local area addresses. The Network Information Service (Yellow Pages) provides a different interface to the /etc/hosts file. Finally, the name server provides a distributed (by domain) database of host name and address information. On SVR4, the file /etc/nsswitch.conf controls which databases are used, and the order in which they are searched.
The sethostent function opens the database and sets the “current entry” pointer to the beginning of the file. The stayopen parameter, if non-zero, indicates that the database should remain open across calls to the other functions; this cuts down on the number of system calls used to open the database. The endhostent function closes the database. The gethostent function reads the next host name and address from the database, and returns it. The gethostbyname function searches for the entry in the database for the host with name name, and returns its entry. The gethostbyaddr function searches for the entry in the database for the host with address addr, whose length is specified by len, and type is given by type and returns its entry. All three of these functions return NULL if the entry cannot be found or end of file is encountered. On success, they return a pointer to a structure of type struct hostent:
    struct hostent {
        char     *h_name;
        char    **h_aliases;
        int       h_addrtype;
        int       h_length;
        char    **h_addr_list;
    };
The h_name field will contain the official host name of the host (usually this is the fully-qualified domain name). The h_aliases element will contain pointers to any other names the host is known by. The h_addrtype field indicates the type of addresses these are. The h_length element indicates how long (in bytes) an address is. And finally, h_addr_list will contain a list of the addresses for that host.
 NoteOlder systems use a h_addr field in the structure instead of h_addr_list; this was changed when it was realized that systems may have more than one address. On newer systems, h_addr is usually defined to refer to h_addr_list[0], for backward compatibility.
Services and Port Numbers
On any given host on the network, a number of network services may be provided. For example, a single host may offer remote login, file transfer, and electronic mail delivery. To distinguish data sent to the file transfer service from data sent to, say, the electronic mail service, each service is assigned a port number. The port number is a small integer used to identify the service to which data is to be delivered.
In order for two hosts to communicate using some service, they must agree on the port number to be used for that service. If two hosts used different port numbers for the same service, they would not be able to communicate. All standard Internet protocols use well-known ports for this purpose. For example, if host “fred” wants to transfer a file to host “wilma” using the File Transfer Protocol (FTP), it knows that it should use port number 21. If “fred” tries to use some other port number for this purpose, things won't work, because “wilma” is expecting FTP traffic on port 21. Likewise, if “fred” sends some other type of traffic (say, remote login) to port 21 on “wilma,” things won't work, because “wilma” is expecting file transfer traffic on that port.
Most versions of UNIX, SVR4 included, use the file /etc/services to store the list of well-known port numbers. This file lists the name of the service and the port number and protocol (TCP or UDP; see below) to be used for communicating with that service. The /etc/services file is read using the following functions:
    #include <netdb.h>
    struct servent *getservent(void);
    struct servent *getservbyname(const char *name, char *proto);
    struct servent *getservbyport(int port, char *proto);
    int setservent(int stayopen);
    int endservent(void);
The setservent function opens the services file and sets the “current entry” pointer to the start of the file. The stayopen parameter, if non-zero, indicates that the file should remain open across calls to the other functions. The endservent function closes the services file.
The getservent function reads the next entry in the file and returns it. The getservbyname function searches for the service with name name and returns the entry for it. The getservbyport function searches for the service with port number port and returns the entry for it. The proto argument to these two functions is either “tcp” or “udp.” There are actually two sets of port numbers, one for TCP (streams-based) services and one for UDP (datagram-based) services; it is therefore necessary to indicate which port number is of interest. All three of these functions return NULL if the entry cannot be found or end-of-file is encountered. If they succeed, they return a pointer to a structure of type struct servent:
    struct servent {
        char     *s_name;
        char    **s_aliases;
        int       s_port;
        char     *s_proto;
    };
The s_name field indicates the official name of the service; the s_aliases field indicates any alternate names for the service. The s_port field provides the port number, and the s_proto field indicates the protocol to use when communicating with the service.
Network Byte Order
When implementing integer storage on a computer, manufacturers have two choices. They can place the most significant byte in the lowest memory address, with less significant bytes stored in higher addresses; this is called “big endian” notation. Or they can place the most significant byte in the highest memory address, with less significant bytes stored in lower addresses; this is called “little endian” notation. Intel chips (80x86, Pentium, etc.) and Digital Equipment Corporation VAX computers are well-known little-endian architectures; Motorola 680x0 chips and Sun SPARC systems are two well-known big-endian architectures.
A 32-bit integer value as stored on a big-endian machine looks different than one stored on a little-endian machine. To copy data from one type of host to the other, it is necessary to transform the data into the proper format. However, without knowing the notation used by both machines, it is impossible to do this. Since there is no way to tell which format a remote machine on the network uses, a network byte order has been defined. The network byte order (which happens to be big-endian) insures that all traffic arriving at a host from the network will be in the same format. The host can then convert from this standard format to whatever format it uses internally. Similarly, all traffic sent by the host is converted to network byte order before it leaves, insuring that whatever host receives it will know what format it is in.
The Berkeley networking paradigm specifies that each network program must perform these byte order conversions itself. (It would be difficult to do it anywhere else, since only the program knows the structure of the data it is transferring, and what parts need to be converted.) Four functions are provided to make these translations:
    #include <sys/types.h>
    #include <netinet/in.h>
    u_long htonl(u_long hostlong);
    u_short htons(u_short hostshort);
    u_long ntohl(u_long netlong);
    u_long ntohs(u_short netshort);
The htonl function converts the 32-bit hostlong value from host byte order to network byte order. The htons function converts the 16-bit hostshort value from host byte order to network byte order. The ntohl function converts the 32-bit netlong value from network byte order to host byte order. And the ntohs function converts the 16-bit netshort value from network byte order to host byte order. These functions are usually implemented as C preprocessor macros, and may be “no-ops,” depending on the host architecture.
Remember to use these functions whenever integer data is exchanged across the network. Character strings do not need to be converted, since they are arrays of one-byte values. There is no network floating point format; floating point numbers should generally be exchanged only by converting them to integers or by printing them as character strings and then sending the strings to the remote side, where they are converted back into floating point numbers.
The gethostby* and getservby* functions return integer values in network byte order.

Previous SectionNext Section
Books24x7.com, Inc © 2000 –  Feedback