Add Book to My BookshelfPurchase This Book Online

Chapter 6 - Practical Considerations

Pthreads Programming
Bradford Nichols, Dick Buttlar and Jacqueline Proulx Farrell
 Copyright © 1996 O'Reilly & Associates, Inc.

Understanding Pthreads Implementation
Pthreads implementations fall into three basic categories: 
 Based on pure user space. 
 Based on pure kernel thread. 
 Implementations somewhere between the two. These hybrid implementations are referred to variously as two-level schedulers, lightweight processes (LWPs), or activations
All implementations in these categories conform to the Pthreads standard and provide concurrency (the basic goal of threads). However, your platform's choice of implementation has a radical effect on the scheduling and performance of the threads in your program. Just look for a moment at the extremes! Pure user-space thread implementations don't provide global scheduling scope and don't actually allow multiple threads from the same process to execute in parallel on different CPUs. At the other extreme, pure kernel-thread implementations don't scale well when a process has 10, 20, 30, or more threads. 
Because Pthreads implementations are varied and complex and because implementations are evolving and improving at a swift rate, we can't do justice to them in the brief space we have in this book. The goal of this section is to introduce you to those differences in architectures that impact the way your program performs on various implementations. 
We'll set the stage for later discussions by reviewing some basic vocabulary. 
Two Worlds
User mode commonly refers to the times when a process (or, by extension, a thread) is executing the instructions in its program or a library (to which the program is linked). The program or library knows about the various objects upon which it operates (such as code, data, and other abstractions) because they are defined in user space and not in the underlying operating system kernel. 
Kernel mode refers to a process's (or a thread's) operational mode when it's executing within the operating system's kernel—usually as a result of a system call or an exception. In kernel mode, a process runs the instructions of the core operating system to access resources and services on a program's behalf. While it's running in kernel mode, the process can access objects that are defined in kernel space and, thereby, known only to the kernel. 
Two Kinds of Threads
The threads we've discussed in this book are user threads. They are programming abstractions that exist to be accessed by calls from within your program. In fact, the Pthreads standard doesn't require the operating system kernel to know anything at all about them. Whether a Pthread has any meaning inside the kernel or within kernel mode is up to the implementation. 
A kernel thread*can be something quite different. It's an abstraction for an operating system execution point within a process. To support the Pthreads standard, an implementor doesn't need to use kernel threads. As we'll see, the standard allows for great flexibility in the underlying implementation. 
 *The various UNIX operating systems use different terms for kernel thread. Digital UNIX, which was derived from Mach 2.5, uses the term kernel thread; Sun's Solaris uses the term lightweight process (LWP); others use the term activation or two-level scheduler.
Some platforms have native, nonstandard user-space thread implementations that predate the Pthreads standard. (The proliferation of these nonstandard interfaces was actually the motivating force behind the effort to define the Pthreads standard.) These native thread interfaces often have very similar semantics to those of the Pthreads interfaces, but they don't fully comply with the syntax and functionality the standard requires. On these platforms, an additional layer—sometimes only an include file—exists to turn the native user-space threads into Pthreads that conform to the portable Pthread interface. 
Who's Providing the Thread?
A Pthreads implementation supports user threads by a Pthreads-compliant library and, optionally, by changes to the operating system kernel. So, when we issue a pthread_create call on a given implementation, what is involved in creating the thread—the Pthreads library alone, the kernel itself, or some combination of the two? We'll look at the various possibilities. 
User-space Pthreads implementations
In pure user-space implementations, the kernel isn't involved at all in providing a user thread. As shown in Figure 6-1, the Pthreads library itself schedules threads, multiplexing all of a process's threads onto its single execution context. The kernel has no notion of threads; it continues to schedule processes as it usually does. 
This design is known as an all-to-one mapping. Out of all of a process's threads that are able to run at a given time, the Pthreads library selects just one to run in its process's context when that process is next scheduled by the kernel. 
Figure 6-1: User-space thread implementations
A pure user-space implementation can be based quite simply on tools that UNIX programmers have traditionally used to manage multiple contexts within a single process: namely, setjmp, longjmp, and signals. The Pthreads library may define a user thread as a data structure that stores an execution point in the form of a jmp_buf structure saved by a setjmp call. When the current thread is rescheduled, it resumes the new thread that has been selected to run by performing a longjmp to the new thread's stored jmp_buf execution point. 
There are several advantages to a pure user-space implementation: 
 Because it doesn't require changes to the operating system itself, it allows many UNIX vendors, and vendors of other operating systems, to quickly provide a Pthreads-compliant library without having to invent kernel threads. For instance, Digital implemented its Pthreads library in this way on versions of OpenVMS prior to Version 7.0. (Version 7.0 uses kernel threads.) Additionally, DCE includes a user-space implementation, thus encouraging its vendors to provide support for DCE threads by relieving them of wholescale changes to their operating systems. 
 Because user-space implementation doesn't use expensive system calls to create threads and doesn't require the operating system to perform a context switch between threads, certain types of multithreaded applications can run faster than they would in a kernel-thread implementation. Among these applications are those that run exclusively on uniprocessing systems and those that don't have enough CPU-bound work to effectively use multiple CPUs. 
 Because user-space threads aren't known to the operating system, they can be created quickly and without impact to the kernel. This scales well: you can create more and more threads without overloading the system. Each thread is just another timeslice from the set of resources originally assigned to your process. 
There are also two considerable disadvantages: 
 The Pthreads library manages the scheduling of user threads using an all-to-one mapping of threads to a single process's execution context. As a result, threads within the same process compete against each other for CPU cycles. The operating system never sees an individual thread, only the process. If you raise the priority of a thread, it'll run more often and longer than other threads of lower priority in the same process. If it was your intention to give it a scheduling advantage over threads from other processes on the system, you'll be disappointed. To get the responsiveness you expect for a real-time thread from this type of implementation, you must either throw everybody else off the system or always run your entire process and all of its threads at a higher priority than everyone else. Either approach is likely to bring a system administrator to your office. 
 Because the Pthreads library's thread-scheduling ability is limited to threads within a process, it restricts your multithreaded program from taking advantage of multiple CPUs. Because the operating system is utterly unaware that many streams of processing are beneath a given process, it allocates available CPUs to processes, not threads. All threads in a process must share the CPU on which the process was scheduled (and do so in the timeslice given to the process). The threads can never run in parallel across the available CPUs, even if another CPU happens to be idle! 
Kernel thread-based Pthreads implementations
In pure kernel thread-based implementations, the Pthreads library creates a kernel thread for each user thread. Because each kernel thread represents the execution context of a single user thread, this design is known as a one-to-one mapping. As we show in Figure 6-2, when a CPU becomes available, the kernel chooses a kernel thread to run from among all the kernel threads available on the system, regardless of which processes they represent. 
Figure 6-2: Kernel thread-based implementations
A pure kernel thread-based implementation depends upon the operating system to define, store, and reload the execution states of individual threads. The operating system must now manage on a per-thread basis some of the information it's traditionally maintained for an individual process. For instance, each thread must have its own scheduling priority, its own set of saved registers, and its own CPU assignment. Other types of information, such as the file table, remain associated with the process. 
A good example of a pure kernel thread-based implementation is the pre-Version 4.0 Digital UNIX, which was known as DEC OSF/1 at the time. Digital UNIX, based in part on the Mach operating system developed at Carnegie-Mellon University (CMU), adopted Mach's kernel thread design. Mach threads operate at a much lower level than Pthreads and provide minimal functionality. Prior to Version 4.0, the Digital UNIX Pthreads library requested a new Mach kernel thread from the system for each pthread_create call. Because the Mach kernel thread design provides few synchronization primitives, it's the role of the Pthreads library to implement such features as mutexes and thread joins atop the Mach kernel thread functionality. 
The advantages of a pure kernel thread-based implementation set to right the disadvantages of the pure user-space implementation: 
 The Pthreads library schedules user threads on a one-to-one basis to kernel threads. As a result, threads compete against all other threads on the system for CPU cycles, not just against other threads in the same process. The kernel is aware of threads. If you raise the priority of a thread, it'll run more often and longer than other threads of lower priority throughout the system. 
 Because the kernel schedules threads globally across the entire system, multiple threads in your program can run on different CPUs simultaneously, as long as their relative priorities are higher than those of other threads on the system. Unlike a pure user-space implementation, a pure kernel thread-based implementation doesn't limit your program to a single executing thread. 
The disadvantages of a kernel thread-based implementation are as follows: 
 Although less expensive than creating a new process, the creation of a new kernel thread does require some kernel overhead—the processing of a system call and the maintenance of kernel data structures. If your application will never run on a multiprocessor, or if its threads are not CPU bound, this overhead is unnecessary. A user-space implementation would probably provide better performance. 
 Because some cost is associated with creating and maintaining kernel threads, applications that use a lot of threads ("a lot" meaning 10 or more on some systems, hundreds on others) can significantly load a system and degrade its overall performance, thus affecting all running applications. 
Two-level scheduler Pthreads implementations: the best of both worlds
In a two-level scheduler implementation, the Pthreads library and the operating system kernel cooperate to schedule user threads. Like a pure kernel thread-based implementation, a two-level scheduler implementation maps user threads to kernel threads, but instead of mapping each user thread to a kernel thread, it may map many user threads to any of a pool of kernel threads (see Figure 6-3). This is known as a some-to-one-mapping. A user thread may not have a unique relationship to a specific kernel thread; rather, it may be mapped to different kernel threads at different times. 
Figure 6-3: Two-level scheduler implementations
Both the Pthreads library and the kernel maintain data structures that represent threads (user threads and kernel threads, respectively). The Pthreads library assigns user threads to run in a process's available kernel threads;* the kernel schedules kernel threads from the collection of all processes' runnable kernel threads. The two levels of scheduling allow better customized fits of actual execution contexts (kernel threads) to user-specified concurrency (user threads). 
 *The Solaris Pthreads library maps user threads to LWPs. Digital UNIX Version 4.0 and OpenVMS Version 7.0 map user threads to kernel threads.
For example, if a program's user threads frequently sleep on timers, events, or I/O completion, it makes little sense to dedicate a kernel thread to each of them. The kernel threads will see little CPU activity. It's much more efficient to allow the Pthreads library in a two-level scheduler implementation to accommodate some of its user threads' spare and sporadic execution behavior by allotting them a single kernel thread altogether. For this type of program, the two-level scheduler effectively provides the benefits of a pure user-space implementation—less kernel overhead and better performance. 
At the other extreme, another program's user threads might be completely CPU-bound and runnable. Here, the two-level scheduler might assign a kernel thread to each user thread up to the number of CPUs on the system, acting like a pure kernel thread-based implementation. Whenever a CPU becomes available, the kernel may thereby select any of these user threads for scheduling.*
 *The policies by which two-level scheduler implementations apportion their kernel threads to deserving user-space threads vary considerably. Some sophisticated implementations, such as Digital UNIX, may actually detect a change in a thread's execution behavior (for instance, as it becomes more or less CPU-bound) and adjust their kernel-thread assignments accordingly. Discussing the full range of implementation possibilities is beyond the scope of this book. However, if you are interested in reading more about two-level scheduler designs, we encourage you to look at the following publications:
 UNIX Internals: The New Frontiers by Uresh Vahala, Prentice Hall, 1996. Discusses recent technological developments in UNIX operating systems, including Solaris, SVR4, Digital UNIX, and Mach.
 "Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism" by Anderson, Bershad, Lazowaska, and Levy, Department of Computer Science and Engineering, University of Washington, Seattle. Describes activations in their research operating system. This technology bears some resemblance to the Digital UNIX Version 4.0 two-level scheduler implementation.
 "SunOS Multi-Threaded Architecture," USENIX Winter Conference Proceedings, Dallas, Texas, 1991. Describes Solaris's lightweight process implementation.
Of course, most multithreaded programs are at neither extreme. In fact, a single program may encounter periods of high I/O activity and intense CPU use over time as it executes. Its resource demands may change based on the input assigned to it on a given run. It may be subject to different constraints, such as processor speed and I/O responsiveness, depending upon the platform on which it's run. The ability to tailor its kernel thread allocation policies to an individual program is the greatest advantage of a two-level scheduler implementation. It can adapt automatically (or respond to customizations) to be more responsive to different programs, or maintain an optimal execution environment as a program's execution behavior changes. 
Unlike a pure user-space implementation, a two-level scheduler implementation doesn't bind all user threads in a process to a single kernel execution context. Instead, it allows multiple threads in a process to run in parallel on multiple CPUs. Unlike a pure kernel thread-based implementation, a two-level scheduler implementation doesn't create a kernel thread for every user thread. By not doing so, it avoids needless overhead if a kernel thread is not used enough to justify its creation. All in all, when you design tasks for a multithreaded program that will run under a two-level scheduler, you can be less finicky about segregating CPU-bound from I/O-bound work. The two-level scheduler will adopt user-to-kernel thread mappings that are suitable to the program's actual execution behavior. 
Perhaps the only disadvantage of a two-level scheduler is in its level of internal complexity and the effort a system developer must muster to implement one. That, fortunately, is not a problem for you, the application developer. Nevertheless, you may share in some of this complexity when you attempt to debug a multithreaded program on a two-level scheduler implementation and discover that it's difficult to keep track of how your user threads relate to the kernel threads that get placed into execution. 
What a great way to get into our next topic! 

Previous SectionNext Section
Books24x7.com, Inc © 2000 –  Feedback