1=head1 NAME 2 3perlthrtut - Tutorial on threads in Perl 4 5=head1 DESCRIPTION 6 7This tutorial describes the use of Perl interpreter threads (sometimes 8referred to as I<ithreads>) that was first introduced in Perl 5.6.0. In this 9model, each thread runs in its own Perl interpreter, and any data sharing 10between threads must be explicit. The user-level interface for I<ithreads> 11uses the L<threads> class. 12 13B<NOTE>: There was another older Perl threading flavor called the 5.005 model 14that used the L<Threads> class. This old model was known to have problems, is 15deprecated, and was removed for release 5.10. You are 16strongly encouraged to migrate any existing 5.005 threads code to the new 17model as soon as possible. 18 19You can see which (or neither) threading flavour you have by 20running C<perl -V> and looking at the C<Platform> section. 21If you have C<useithreads=define> you have ithreads, if you 22have C<use5005threads=define> you have 5.005 threads. 23If you have neither, you don't have any thread support built in. 24If you have both, you are in trouble. 25 26The L<threads> and L<threads::shared> modules are included in the core Perl 27distribution. Additionally, they are maintained as a separate modules on 28CPAN, so you can check there for any updates. 29 30=head1 What Is A Thread Anyway? 31 32A thread is a flow of control through a program with a single 33execution point. 34 35Sounds an awful lot like a process, doesn't it? Well, it should. 36Threads are one of the pieces of a process. Every process has at least 37one thread and, up until now, every process running Perl had only one 38thread. With 5.8, though, you can create extra threads. We're going 39to show you how, when, and why. 40 41=head1 Threaded Program Models 42 43There are three basic ways that you can structure a threaded 44program. Which model you choose depends on what you need your program 45to do. For many non-trivial threaded programs, you'll need to choose 46different models for different pieces of your program. 47 48=head2 Boss/Worker 49 50The boss/worker model usually has one I<boss> thread and one or more 51I<worker> threads. The boss thread gathers or generates tasks that need 52to be done, then parcels those tasks out to the appropriate worker 53thread. 54 55This model is common in GUI and server programs, where a main thread 56waits for some event and then passes that event to the appropriate 57worker threads for processing. Once the event has been passed on, the 58boss thread goes back to waiting for another event. 59 60The boss thread does relatively little work. While tasks aren't 61necessarily performed faster than with any other method, it tends to 62have the best user-response times. 63 64=head2 Work Crew 65 66In the work crew model, several threads are created that do 67essentially the same thing to different pieces of data. It closely 68mirrors classical parallel processing and vector processors, where a 69large array of processors do the exact same thing to many pieces of 70data. 71 72This model is particularly useful if the system running the program 73will distribute multiple threads across different processors. It can 74also be useful in ray tracing or rendering engines, where the 75individual threads can pass on interim results to give the user visual 76feedback. 77 78=head2 Pipeline 79 80The pipeline model divides up a task into a series of steps, and 81passes the results of one step on to the thread processing the 82next. Each thread does one thing to each piece of data and passes the 83results to the next thread in line. 84 85This model makes the most sense if you have multiple processors so two 86or more threads will be executing in parallel, though it can often 87make sense in other contexts as well. It tends to keep the individual 88tasks small and simple, as well as allowing some parts of the pipeline 89to block (on I/O or system calls, for example) while other parts keep 90going. If you're running different parts of the pipeline on different 91processors you may also take advantage of the caches on each 92processor. 93 94This model is also handy for a form of recursive programming where, 95rather than having a subroutine call itself, it instead creates 96another thread. Prime and Fibonacci generators both map well to this 97form of the pipeline model. (A version of a prime number generator is 98presented later on.) 99 100=head1 What kind of threads are Perl threads? 101 102If you have experience with other thread implementations, you might 103find that things aren't quite what you expect. It's very important to 104remember when dealing with Perl threads that I<Perl Threads Are Not X 105Threads> for all values of X. They aren't POSIX threads, or 106DecThreads, or Java's Green threads, or Win32 threads. There are 107similarities, and the broad concepts are the same, but if you start 108looking for implementation details you're going to be either 109disappointed or confused. Possibly both. 110 111This is not to say that Perl threads are completely different from 112everything that's ever come before. They're not. Perl's threading 113model owes a lot to other thread models, especially POSIX. Just as 114Perl is not C, though, Perl threads are not POSIX threads. So if you 115find yourself looking for mutexes, or thread priorities, it's time to 116step back a bit and think about what you want to do and how Perl can 117do it. 118 119However, it is important to remember that Perl threads cannot magically 120do things unless your operating system's threads allow it. So if your 121system blocks the entire process on C<sleep()>, Perl usually will, as well. 122 123B<Perl Threads Are Different.> 124 125=head1 Thread-Safe Modules 126 127The addition of threads has changed Perl's internals 128substantially. There are implications for people who write 129modules with XS code or external libraries. However, since Perl data is 130not shared among threads by default, Perl modules stand a high chance of 131being thread-safe or can be made thread-safe easily. Modules that are not 132tagged as thread-safe should be tested or code reviewed before being used 133in production code. 134 135Not all modules that you might use are thread-safe, and you should 136always assume a module is unsafe unless the documentation says 137otherwise. This includes modules that are distributed as part of the 138core. Threads are a relatively new feature, and even some of the standard 139modules aren't thread-safe. 140 141Even if a module is thread-safe, it doesn't mean that the module is optimized 142to work well with threads. A module could possibly be rewritten to utilize 143the new features in threaded Perl to increase performance in a threaded 144environment. 145 146If you're using a module that's not thread-safe for some reason, you 147can protect yourself by using it from one, and only one thread at all. 148If you need multiple threads to access such a module, you can use semaphores and 149lots of programming discipline to control access to it. Semaphores 150are covered in L</"Basic semaphores">. 151 152See also L</"Thread-Safety of System Libraries">. 153 154=head1 Thread Basics 155 156The L<threads> module provides the basic functions you need to write 157threaded programs. In the following sections, we'll cover the basics, 158showing you what you need to do to create a threaded program. After 159that, we'll go over some of the features of the L<threads> module that 160make threaded programming easier. 161 162=head2 Basic Thread Support 163 164Thread support is a Perl compile-time option. It's something that's 165turned on or off when Perl is built at your site, rather than when 166your programs are compiled. If your Perl wasn't compiled with thread 167support enabled, then any attempt to use threads will fail. 168 169Your programs can use the Config module to check whether threads are 170enabled. If your program can't run without them, you can say something 171like: 172 173 use Config; 174 $Config{useithreads} or die('Recompile Perl with threads to run this program.'); 175 176A possibly-threaded program using a possibly-threaded module might 177have code like this: 178 179 use Config; 180 use MyMod; 181 182 BEGIN { 183 if ($Config{useithreads}) { 184 # We have threads 185 require MyMod_threaded; 186 import MyMod_threaded; 187 } else { 188 require MyMod_unthreaded; 189 import MyMod_unthreaded; 190 } 191 } 192 193Since code that runs both with and without threads is usually pretty 194messy, it's best to isolate the thread-specific code in its own 195module. In our example above, that's what C<MyMod_threaded> is, and it's 196only imported if we're running on a threaded Perl. 197 198=head2 A Note about the Examples 199 200In a real situation, care should be taken that all threads are finished 201executing before the program exits. That care has B<not> been taken in these 202examples in the interest of simplicity. Running these examples I<as is> will 203produce error messages, usually caused by the fact that there are still 204threads running when the program exits. You should not be alarmed by this. 205 206=head2 Creating Threads 207 208The L<threads> module provides the tools you need to create new 209threads. Like any other module, you need to tell Perl that you want to use 210it; C<use threads;> imports all the pieces you need to create basic 211threads. 212 213The simplest, most straightforward way to create a thread is with C<create()>: 214 215 use threads; 216 217 my $thr = threads->create(\&sub1); 218 219 sub sub1 { 220 print("In the thread\n"); 221 } 222 223The C<create()> method takes a reference to a subroutine and creates a new 224thread that starts executing in the referenced subroutine. Control 225then passes both to the subroutine and the caller. 226 227If you need to, your program can pass parameters to the subroutine as 228part of the thread startup. Just include the list of parameters as 229part of the C<threads-E<gt>create()> call, like this: 230 231 use threads; 232 233 my $Param3 = 'foo'; 234 my $thr1 = threads->create(\&sub1, 'Param 1', 'Param 2', $Param3); 235 my @ParamList = (42, 'Hello', 3.14); 236 my $thr2 = threads->create(\&sub1, @ParamList); 237 my $thr3 = threads->create(\&sub1, qw(Param1 Param2 Param3)); 238 239 sub sub1 { 240 my @InboundParameters = @_; 241 print("In the thread\n"); 242 print('Got parameters >', join('<>', @InboundParameters), "<\n"); 243 } 244 245The last example illustrates another feature of threads. You can spawn 246off several threads using the same subroutine. Each thread executes 247the same subroutine, but in a separate thread with a separate 248environment and potentially separate arguments. 249 250C<new()> is a synonym for C<create()>. 251 252=head2 Waiting For A Thread To Exit 253 254Since threads are also subroutines, they can return values. To wait 255for a thread to exit and extract any values it might return, you can 256use the C<join()> method: 257 258 use threads; 259 260 my ($thr) = threads->create(\&sub1); 261 262 my @ReturnData = $thr->join(); 263 print('Thread returned ', join(', ', @ReturnData), "\n"); 264 265 sub sub1 { return ('Fifty-six', 'foo', 2); } 266 267In the example above, the C<join()> method returns as soon as the thread 268ends. In addition to waiting for a thread to finish and gathering up 269any values that the thread might have returned, C<join()> also performs 270any OS cleanup necessary for the thread. That cleanup might be 271important, especially for long-running programs that spawn lots of 272threads. If you don't want the return values and don't want to wait 273for the thread to finish, you should call the C<detach()> method 274instead, as described next. 275 276NOTE: In the example above, the thread returns a list, thus necessitating 277that the thread creation call be made in list context (i.e., C<my ($thr)>). 278See L<< threads/"$thr->join()" >> and L<threads/"THREAD CONTEXT"> for more 279details on thread context and return values. 280 281=head2 Ignoring A Thread 282 283C<join()> does three things: it waits for a thread to exit, cleans up 284after it, and returns any data the thread may have produced. But what 285if you're not interested in the thread's return values, and you don't 286really care when the thread finishes? All you want is for the thread 287to get cleaned up after when it's done. 288 289In this case, you use the C<detach()> method. Once a thread is detached, 290it'll run until it's finished; then Perl will clean up after it 291automatically. 292 293 use threads; 294 295 my $thr = threads->create(\&sub1); # Spawn the thread 296 297 $thr->detach(); # Now we officially don't care any more 298 299 sleep(15); # Let thread run for awhile 300 301 sub sub1 { 302 $a = 0; 303 while (1) { 304 $a++; 305 print("\$a is $a\n"); 306 sleep(1); 307 } 308 } 309 310Once a thread is detached, it may not be joined, and any return data 311that it might have produced (if it was done and waiting for a join) is 312lost. 313 314C<detach()> can also be called as a class method to allow a thread to 315detach itself: 316 317 use threads; 318 319 my $thr = threads->create(\&sub1); 320 321 sub sub1 { 322 threads->detach(); 323 # Do more work 324 } 325 326=head2 Process and Thread Termination 327 328With threads one must be careful to make sure they all have a chance to 329run to completion, assuming that is what you want. 330 331An action that terminates a process will terminate I<all> running 332threads. die() and exit() have this property, 333and perl does an exit when the main thread exits, 334perhaps implicitly by falling off the end of your code, 335even if that's not what you want. 336 337As an example of this case, this code prints the message 338"Perl exited with active threads: 2 running and unjoined": 339 340 use threads; 341 my $thr1 = threads->new(\&thrsub, "test1"); 342 my $thr2 = threads->new(\&thrsub, "test2"); 343 sub thrsub { 344 my ($message) = @_; 345 sleep 1; 346 print "thread $message\n"; 347 } 348 349But when the following lines are added at the end: 350 351 $thr1->join(); 352 $thr2->join(); 353 354it prints two lines of output, a perhaps more useful outcome. 355 356=head1 Threads And Data 357 358Now that we've covered the basics of threads, it's time for our next 359topic: Data. Threading introduces a couple of complications to data 360access that non-threaded programs never need to worry about. 361 362=head2 Shared And Unshared Data 363 364The biggest difference between Perl I<ithreads> and the old 5.005 style 365threading, or for that matter, to most other threading systems out there, 366is that by default, no data is shared. When a new Perl thread is created, 367all the data associated with the current thread is copied to the new 368thread, and is subsequently private to that new thread! 369This is similar in feel to what happens when a Unix process forks, 370except that in this case, the data is just copied to a different part of 371memory within the same process rather than a real fork taking place. 372 373To make use of threading, however, one usually wants the threads to share 374at least some data between themselves. This is done with the 375L<threads::shared> module and the C<:shared> attribute: 376 377 use threads; 378 use threads::shared; 379 380 my $foo :shared = 1; 381 my $bar = 1; 382 threads->create(sub { $foo++; $bar++; })->join(); 383 384 print("$foo\n"); # Prints 2 since $foo is shared 385 print("$bar\n"); # Prints 1 since $bar is not shared 386 387In the case of a shared array, all the array's elements are shared, and for 388a shared hash, all the keys and values are shared. This places 389restrictions on what may be assigned to shared array and hash elements: only 390simple values or references to shared variables are allowed - this is 391so that a private variable can't accidentally become shared. A bad 392assignment will cause the thread to die. For example: 393 394 use threads; 395 use threads::shared; 396 397 my $var = 1; 398 my $svar :shared = 2; 399 my %hash :shared; 400 401 ... create some threads ... 402 403 $hash{a} = 1; # All threads see exists($hash{a}) and $hash{a} == 1 404 $hash{a} = $var; # okay - copy-by-value: same effect as previous 405 $hash{a} = $svar; # okay - copy-by-value: same effect as previous 406 $hash{a} = \$svar; # okay - a reference to a shared variable 407 $hash{a} = \$var; # This will die 408 delete($hash{a}); # okay - all threads will see !exists($hash{a}) 409 410Note that a shared variable guarantees that if two or more threads try to 411modify it at the same time, the internal state of the variable will not 412become corrupted. However, there are no guarantees beyond this, as 413explained in the next section. 414 415=head2 Thread Pitfalls: Races 416 417While threads bring a new set of useful tools, they also bring a 418number of pitfalls. One pitfall is the race condition: 419 420 use threads; 421 use threads::shared; 422 423 my $a :shared = 1; 424 my $thr1 = threads->create(\&sub1); 425 my $thr2 = threads->create(\&sub2); 426 427 $thr1->join(); 428 $thr2->join(); 429 print("$a\n"); 430 431 sub sub1 { my $foo = $a; $a = $foo + 1; } 432 sub sub2 { my $bar = $a; $a = $bar + 1; } 433 434What do you think C<$a> will be? The answer, unfortunately, is I<it 435depends>. Both C<sub1()> and C<sub2()> access the global variable C<$a>, once 436to read and once to write. Depending on factors ranging from your 437thread implementation's scheduling algorithm to the phase of the moon, 438C<$a> can be 2 or 3. 439 440Race conditions are caused by unsynchronized access to shared 441data. Without explicit synchronization, there's no way to be sure that 442nothing has happened to the shared data between the time you access it 443and the time you update it. Even this simple code fragment has the 444possibility of error: 445 446 use threads; 447 my $a :shared = 2; 448 my $b :shared; 449 my $c :shared; 450 my $thr1 = threads->create(sub { $b = $a; $a = $b + 1; }); 451 my $thr2 = threads->create(sub { $c = $a; $a = $c + 1; }); 452 $thr1->join(); 453 $thr2->join(); 454 455Two threads both access C<$a>. Each thread can potentially be interrupted 456at any point, or be executed in any order. At the end, C<$a> could be 3 457or 4, and both C<$b> and C<$c> could be 2 or 3. 458 459Even C<$a += 5> or C<$a++> are not guaranteed to be atomic. 460 461Whenever your program accesses data or resources that can be accessed 462by other threads, you must take steps to coordinate access or risk 463data inconsistency and race conditions. Note that Perl will protect its 464internals from your race conditions, but it won't protect you from you. 465 466=head1 Synchronization and control 467 468Perl provides a number of mechanisms to coordinate the interactions 469between themselves and their data, to avoid race conditions and the like. 470Some of these are designed to resemble the common techniques used in thread 471libraries such as C<pthreads>; others are Perl-specific. Often, the 472standard techniques are clumsy and difficult to get right (such as 473condition waits). Where possible, it is usually easier to use Perlish 474techniques such as queues, which remove some of the hard work involved. 475 476=head2 Controlling access: lock() 477 478The C<lock()> function takes a shared variable and puts a lock on it. 479No other thread may lock the variable until the variable is unlocked 480by the thread holding the lock. Unlocking happens automatically 481when the locking thread exits the block that contains the call to the 482C<lock()> function. Using C<lock()> is straightforward: This example has 483several threads doing some calculations in parallel, and occasionally 484updating a running total: 485 486 use threads; 487 use threads::shared; 488 489 my $total :shared = 0; 490 491 sub calc { 492 while (1) { 493 my $result; 494 # (... do some calculations and set $result ...) 495 { 496 lock($total); # Block until we obtain the lock 497 $total += $result; 498 } # Lock implicitly released at end of scope 499 last if $result == 0; 500 } 501 } 502 503 my $thr1 = threads->create(\&calc); 504 my $thr2 = threads->create(\&calc); 505 my $thr3 = threads->create(\&calc); 506 $thr1->join(); 507 $thr2->join(); 508 $thr3->join(); 509 print("total=$total\n"); 510 511C<lock()> blocks the thread until the variable being locked is 512available. When C<lock()> returns, your thread can be sure that no other 513thread can lock that variable until the block containing the 514lock exits. 515 516It's important to note that locks don't prevent access to the variable 517in question, only lock attempts. This is in keeping with Perl's 518longstanding tradition of courteous programming, and the advisory file 519locking that C<flock()> gives you. 520 521You may lock arrays and hashes as well as scalars. Locking an array, 522though, will not block subsequent locks on array elements, just lock 523attempts on the array itself. 524 525Locks are recursive, which means it's okay for a thread to 526lock a variable more than once. The lock will last until the outermost 527C<lock()> on the variable goes out of scope. For example: 528 529 my $x :shared; 530 doit(); 531 532 sub doit { 533 { 534 { 535 lock($x); # Wait for lock 536 lock($x); # NOOP - we already have the lock 537 { 538 lock($x); # NOOP 539 { 540 lock($x); # NOOP 541 lockit_some_more(); 542 } 543 } 544 } # *** Implicit unlock here *** 545 } 546 } 547 548 sub lockit_some_more { 549 lock($x); # NOOP 550 } # Nothing happens here 551 552Note that there is no C<unlock()> function - the only way to unlock a 553variable is to allow it to go out of scope. 554 555A lock can either be used to guard the data contained within the variable 556being locked, or it can be used to guard something else, like a section 557of code. In this latter case, the variable in question does not hold any 558useful data, and exists only for the purpose of being locked. In this 559respect, the variable behaves like the mutexes and basic semaphores of 560traditional thread libraries. 561 562=head2 A Thread Pitfall: Deadlocks 563 564Locks are a handy tool to synchronize access to data, and using them 565properly is the key to safe shared data. Unfortunately, locks aren't 566without their dangers, especially when multiple locks are involved. 567Consider the following code: 568 569 use threads; 570 571 my $a :shared = 4; 572 my $b :shared = 'foo'; 573 my $thr1 = threads->create(sub { 574 lock($a); 575 sleep(20); 576 lock($b); 577 }); 578 my $thr2 = threads->create(sub { 579 lock($b); 580 sleep(20); 581 lock($a); 582 }); 583 584This program will probably hang until you kill it. The only way it 585won't hang is if one of the two threads acquires both locks 586first. A guaranteed-to-hang version is more complicated, but the 587principle is the same. 588 589The first thread will grab a lock on C<$a>, then, after a pause during which 590the second thread has probably had time to do some work, try to grab a 591lock on C<$b>. Meanwhile, the second thread grabs a lock on C<$b>, then later 592tries to grab a lock on C<$a>. The second lock attempt for both threads will 593block, each waiting for the other to release its lock. 594 595This condition is called a deadlock, and it occurs whenever two or 596more threads are trying to get locks on resources that the others 597own. Each thread will block, waiting for the other to release a lock 598on a resource. That never happens, though, since the thread with the 599resource is itself waiting for a lock to be released. 600 601There are a number of ways to handle this sort of problem. The best 602way is to always have all threads acquire locks in the exact same 603order. If, for example, you lock variables C<$a>, C<$b>, and C<$c>, always lock 604C<$a> before C<$b>, and C<$b> before C<$c>. It's also best to hold on to locks for 605as short a period of time to minimize the risks of deadlock. 606 607The other synchronization primitives described below can suffer from 608similar problems. 609 610=head2 Queues: Passing Data Around 611 612A queue is a special thread-safe object that lets you put data in one 613end and take it out the other without having to worry about 614synchronization issues. They're pretty straightforward, and look like 615this: 616 617 use threads; 618 use Thread::Queue; 619 620 my $DataQueue = Thread::Queue->new(); 621 my $thr = threads->create(sub { 622 while (my $DataElement = $DataQueue->dequeue()) { 623 print("Popped $DataElement off the queue\n"); 624 } 625 }); 626 627 $DataQueue->enqueue(12); 628 $DataQueue->enqueue("A", "B", "C"); 629 sleep(10); 630 $DataQueue->enqueue(undef); 631 $thr->join(); 632 633You create the queue with C<Thread::Queue-E<gt>new()>. Then you can 634add lists of scalars onto the end with C<enqueue()>, and pop scalars off 635the front of it with C<dequeue()>. A queue has no fixed size, and can grow 636as needed to hold everything pushed on to it. 637 638If a queue is empty, C<dequeue()> blocks until another thread enqueues 639something. This makes queues ideal for event loops and other 640communications between threads. 641 642=head2 Semaphores: Synchronizing Data Access 643 644Semaphores are a kind of generic locking mechanism. In their most basic 645form, they behave very much like lockable scalars, except that they 646can't hold data, and that they must be explicitly unlocked. In their 647advanced form, they act like a kind of counter, and can allow multiple 648threads to have the I<lock> at any one time. 649 650=head2 Basic semaphores 651 652Semaphores have two methods, C<down()> and C<up()>: C<down()> decrements the resource 653count, while C<up()> increments it. Calls to C<down()> will block if the 654semaphore's current count would decrement below zero. This program 655gives a quick demonstration: 656 657 use threads; 658 use Thread::Semaphore; 659 660 my $semaphore = Thread::Semaphore->new(); 661 my $GlobalVariable :shared = 0; 662 663 $thr1 = threads->create(\&sample_sub, 1); 664 $thr2 = threads->create(\&sample_sub, 2); 665 $thr3 = threads->create(\&sample_sub, 3); 666 667 sub sample_sub { 668 my $SubNumber = shift(@_); 669 my $TryCount = 10; 670 my $LocalCopy; 671 sleep(1); 672 while ($TryCount--) { 673 $semaphore->down(); 674 $LocalCopy = $GlobalVariable; 675 print("$TryCount tries left for sub $SubNumber (\$GlobalVariable is $GlobalVariable)\n"); 676 sleep(2); 677 $LocalCopy++; 678 $GlobalVariable = $LocalCopy; 679 $semaphore->up(); 680 } 681 } 682 683 $thr1->join(); 684 $thr2->join(); 685 $thr3->join(); 686 687The three invocations of the subroutine all operate in sync. The 688semaphore, though, makes sure that only one thread is accessing the 689global variable at once. 690 691=head2 Advanced Semaphores 692 693By default, semaphores behave like locks, letting only one thread 694C<down()> them at a time. However, there are other uses for semaphores. 695 696Each semaphore has a counter attached to it. By default, semaphores are 697created with the counter set to one, C<down()> decrements the counter by 698one, and C<up()> increments by one. However, we can override any or all 699of these defaults simply by passing in different values: 700 701 use threads; 702 use Thread::Semaphore; 703 704 my $semaphore = Thread::Semaphore->new(5); 705 # Creates a semaphore with the counter set to five 706 707 my $thr1 = threads->create(\&sub1); 708 my $thr2 = threads->create(\&sub1); 709 710 sub sub1 { 711 $semaphore->down(5); # Decrements the counter by five 712 # Do stuff here 713 $semaphore->up(5); # Increment the counter by five 714 } 715 716 $thr1->detach(); 717 $thr2->detach(); 718 719If C<down()> attempts to decrement the counter below zero, it blocks until 720the counter is large enough. Note that while a semaphore can be created 721with a starting count of zero, any C<up()> or C<down()> always changes the 722counter by at least one, and so C<< $semaphore->down(0) >> is the same as 723C<< $semaphore->down(1) >>. 724 725The question, of course, is why would you do something like this? Why 726create a semaphore with a starting count that's not one, or why 727decrement or increment it by more than one? The answer is resource 728availability. Many resources that you want to manage access for can be 729safely used by more than one thread at once. 730 731For example, let's take a GUI driven program. It has a semaphore that 732it uses to synchronize access to the display, so only one thread is 733ever drawing at once. Handy, but of course you don't want any thread 734to start drawing until things are properly set up. In this case, you 735can create a semaphore with a counter set to zero, and up it when 736things are ready for drawing. 737 738Semaphores with counters greater than one are also useful for 739establishing quotas. Say, for example, that you have a number of 740threads that can do I/O at once. You don't want all the threads 741reading or writing at once though, since that can potentially swamp 742your I/O channels, or deplete your process's quota of filehandles. You 743can use a semaphore initialized to the number of concurrent I/O 744requests (or open files) that you want at any one time, and have your 745threads quietly block and unblock themselves. 746 747Larger increments or decrements are handy in those cases where a 748thread needs to check out or return a number of resources at once. 749 750=head2 Waiting for a Condition 751 752The functions C<cond_wait()> and C<cond_signal()> 753can be used in conjunction with locks to notify 754co-operating threads that a resource has become available. They are 755very similar in use to the functions found in C<pthreads>. However 756for most purposes, queues are simpler to use and more intuitive. See 757L<threads::shared> for more details. 758 759=head2 Giving up control 760 761There are times when you may find it useful to have a thread 762explicitly give up the CPU to another thread. You may be doing something 763processor-intensive and want to make sure that the user-interface thread 764gets called frequently. Regardless, there are times that you might want 765a thread to give up the processor. 766 767Perl's threading package provides the C<yield()> function that does 768this. C<yield()> is pretty straightforward, and works like this: 769 770 use threads; 771 772 sub loop { 773 my $thread = shift; 774 my $foo = 50; 775 while($foo--) { print("In thread $thread\n"); } 776 threads->yield(); 777 $foo = 50; 778 while($foo--) { print("In thread $thread\n"); } 779 } 780 781 my $thr1 = threads->create(\&loop, 'first'); 782 my $thr2 = threads->create(\&loop, 'second'); 783 my $thr3 = threads->create(\&loop, 'third'); 784 785It is important to remember that C<yield()> is only a hint to give up the CPU, 786it depends on your hardware, OS and threading libraries what actually happens. 787B<On many operating systems, yield() is a no-op.> Therefore it is important 788to note that one should not build the scheduling of the threads around 789C<yield()> calls. It might work on your platform but it won't work on another 790platform. 791 792=head1 General Thread Utility Routines 793 794We've covered the workhorse parts of Perl's threading package, and 795with these tools you should be well on your way to writing threaded 796code and packages. There are a few useful little pieces that didn't 797really fit in anyplace else. 798 799=head2 What Thread Am I In? 800 801The C<threads-E<gt>self()> class method provides your program with a way to 802get an object representing the thread it's currently in. You can use this 803object in the same way as the ones returned from thread creation. 804 805=head2 Thread IDs 806 807C<tid()> is a thread object method that returns the thread ID of the 808thread the object represents. Thread IDs are integers, with the main 809thread in a program being 0. Currently Perl assigns a unique TID to 810every thread ever created in your program, assigning the first thread 811to be created a TID of 1, and increasing the TID by 1 for each new 812thread that's created. When used as a class method, C<threads-E<gt>tid()> 813can be used by a thread to get its own TID. 814 815=head2 Are These Threads The Same? 816 817The C<equal()> method takes two thread objects and returns true 818if the objects represent the same thread, and false if they don't. 819 820Thread objects also have an overloaded C<==> comparison so that you can do 821comparison on them as you would with normal objects. 822 823=head2 What Threads Are Running? 824 825C<threads-E<gt>list()> returns a list of thread objects, one for each thread 826that's currently running and not detached. Handy for a number of things, 827including cleaning up at the end of your program (from the main Perl thread, 828of course): 829 830 # Loop through all the threads 831 foreach my $thr (threads->list()) { 832 $thr->join(); 833 } 834 835If some threads have not finished running when the main Perl thread 836ends, Perl will warn you about it and die, since it is impossible for Perl 837to clean up itself while other threads are running. 838 839NOTE: The main Perl thread (thread 0) is in a I<detached> state, and so 840does not appear in the list returned by C<threads-E<gt>list()>. 841 842=head1 A Complete Example 843 844Confused yet? It's time for an example program to show some of the 845things we've covered. This program finds prime numbers using threads. 846 847 1 #!/usr/bin/perl 848 2 # prime-pthread, courtesy of Tom Christiansen 849 3 850 4 use strict; 851 5 use warnings; 852 6 853 7 use threads; 854 8 use Thread::Queue; 855 9 856 10 sub check_num { 857 11 my ($upstream, $cur_prime) = @_; 858 12 my $kid; 859 13 my $downstream = Thread::Queue->new(); 860 14 while (my $num = $upstream->dequeue()) { 861 15 next unless ($num % $cur_prime); 862 16 if ($kid) { 863 17 $downstream->enqueue($num); 864 18 } else { 865 19 print("Found prime: $num\n"); 866 20 $kid = threads->create(\&check_num, $downstream, $num); 867 21 if (! $kid) { 868 22 warn("Sorry. Ran out of threads.\n"); 869 23 last; 870 24 } 871 25 } 872 26 } 873 27 if ($kid) { 874 28 $downstream->enqueue(undef); 875 29 $kid->join(); 876 30 } 877 31 } 878 32 879 33 my $stream = Thread::Queue->new(3..1000, undef); 880 34 check_num($stream, 2); 881 882This program uses the pipeline model to generate prime numbers. Each 883thread in the pipeline has an input queue that feeds numbers to be 884checked, a prime number that it's responsible for, and an output queue 885into which it funnels numbers that have failed the check. If the thread 886has a number that's failed its check and there's no child thread, then 887the thread must have found a new prime number. In that case, a new 888child thread is created for that prime and stuck on the end of the 889pipeline. 890 891This probably sounds a bit more confusing than it really is, so let's 892go through this program piece by piece and see what it does. (For 893those of you who might be trying to remember exactly what a prime 894number is, it's a number that's only evenly divisible by itself and 1.) 895 896The bulk of the work is done by the C<check_num()> subroutine, which 897takes a reference to its input queue and a prime number that it's 898responsible for. After pulling in the input queue and the prime that 899the subroutine is checking (line 11), we create a new queue (line 13) 900and reserve a scalar for the thread that we're likely to create later 901(line 12). 902 903The while loop from line 14 to line 26 grabs a scalar off the input 904queue and checks against the prime this thread is responsible 905for. Line 15 checks to see if there's a remainder when we divide the 906number to be checked by our prime. If there is one, the number 907must not be evenly divisible by our prime, so we need to either pass 908it on to the next thread if we've created one (line 17) or create a 909new thread if we haven't. 910 911The new thread creation is line 20. We pass on to it a reference to 912the queue we've created, and the prime number we've found. In lines 21 913through 24, we check to make sure that our new thread got created, and 914if not, we stop checking any remaining numbers in the queue. 915 916Finally, once the loop terminates (because we got a 0 or C<undef> in the 917queue, which serves as a note to terminate), we pass on the notice to our 918child, and wait for it to exit if we've created a child (lines 27 and 91930). 920 921Meanwhile, back in the main thread, we first create a queue (line 33) and 922queue up all the numbers from 3 to 1000 for checking, plus a termination 923notice. Then all we have to do to get the ball rolling is pass the queue 924and the first prime to the C<check_num()> subroutine (line 34). 925 926That's how it works. It's pretty simple; as with many Perl programs, 927the explanation is much longer than the program. 928 929=head1 Different implementations of threads 930 931Some background on thread implementations from the operating system 932viewpoint. There are three basic categories of threads: user-mode threads, 933kernel threads, and multiprocessor kernel threads. 934 935User-mode threads are threads that live entirely within a program and 936its libraries. In this model, the OS knows nothing about threads. As 937far as it's concerned, your process is just a process. 938 939This is the easiest way to implement threads, and the way most OSes 940start. The big disadvantage is that, since the OS knows nothing about 941threads, if one thread blocks they all do. Typical blocking activities 942include most system calls, most I/O, and things like C<sleep()>. 943 944Kernel threads are the next step in thread evolution. The OS knows 945about kernel threads, and makes allowances for them. The main 946difference between a kernel thread and a user-mode thread is 947blocking. With kernel threads, things that block a single thread don't 948block other threads. This is not the case with user-mode threads, 949where the kernel blocks at the process level and not the thread level. 950 951This is a big step forward, and can give a threaded program quite a 952performance boost over non-threaded programs. Threads that block 953performing I/O, for example, won't block threads that are doing other 954things. Each process still has only one thread running at once, 955though, regardless of how many CPUs a system might have. 956 957Since kernel threading can interrupt a thread at any time, they will 958uncover some of the implicit locking assumptions you may make in your 959program. For example, something as simple as C<$a = $a + 2> can behave 960unpredictably with kernel threads if C<$a> is visible to other 961threads, as another thread may have changed C<$a> between the time it 962was fetched on the right hand side and the time the new value is 963stored. 964 965Multiprocessor kernel threads are the final step in thread 966support. With multiprocessor kernel threads on a machine with multiple 967CPUs, the OS may schedule two or more threads to run simultaneously on 968different CPUs. 969 970This can give a serious performance boost to your threaded program, 971since more than one thread will be executing at the same time. As a 972tradeoff, though, any of those nagging synchronization issues that 973might not have shown with basic kernel threads will appear with a 974vengeance. 975 976In addition to the different levels of OS involvement in threads, 977different OSes (and different thread implementations for a particular 978OS) allocate CPU cycles to threads in different ways. 979 980Cooperative multitasking systems have running threads give up control 981if one of two things happen. If a thread calls a yield function, it 982gives up control. It also gives up control if the thread does 983something that would cause it to block, such as perform I/O. In a 984cooperative multitasking implementation, one thread can starve all the 985others for CPU time if it so chooses. 986 987Preemptive multitasking systems interrupt threads at regular intervals 988while the system decides which thread should run next. In a preemptive 989multitasking system, one thread usually won't monopolize the CPU. 990 991On some systems, there can be cooperative and preemptive threads 992running simultaneously. (Threads running with realtime priorities 993often behave cooperatively, for example, while threads running at 994normal priorities behave preemptively.) 995 996Most modern operating systems support preemptive multitasking nowadays. 997 998=head1 Performance considerations 999 1000The main thing to bear in mind when comparing Perl's I<ithreads> to other threading 1001models is the fact that for each new thread created, a complete copy of 1002all the variables and data of the parent thread has to be taken. Thus, 1003thread creation can be quite expensive, both in terms of memory usage and 1004time spent in creation. The ideal way to reduce these costs is to have a 1005relatively short number of long-lived threads, all created fairly early 1006on (before the base thread has accumulated too much data). Of course, this 1007may not always be possible, so compromises have to be made. However, after 1008a thread has been created, its performance and extra memory usage should 1009be little different than ordinary code. 1010 1011Also note that under the current implementation, shared variables 1012use a little more memory and are a little slower than ordinary variables. 1013 1014=head1 Process-scope Changes 1015 1016Note that while threads themselves are separate execution threads and 1017Perl data is thread-private unless explicitly shared, the threads can 1018affect process-scope state, affecting all the threads. 1019 1020The most common example of this is changing the current working 1021directory using C<chdir()>. One thread calls C<chdir()>, and the working 1022directory of all the threads changes. 1023 1024Even more drastic example of a process-scope change is C<chroot()>: 1025the root directory of all the threads changes, and no thread can 1026undo it (as opposed to C<chdir()>). 1027 1028Further examples of process-scope changes include C<umask()> and 1029changing uids and gids. 1030 1031Thinking of mixing C<fork()> and threads? Please lie down and wait 1032until the feeling passes. Be aware that the semantics of C<fork()> vary 1033between platforms. For example, some Unix systems copy all the current 1034threads into the child process, while others only copy the thread that 1035called C<fork()>. You have been warned! 1036 1037Similarly, mixing signals and threads may be problematic. 1038Implementations are platform-dependent, and even the POSIX 1039semantics may not be what you expect (and Perl doesn't even 1040give you the full POSIX API). For example, there is no way to 1041guarantee that a signal sent to a multi-threaded Perl application 1042will get intercepted by any particular thread. (However, a recently 1043added feature does provide the capability to send signals between 1044threads. See L<threads/"THREAD SIGNALLING> for more details.) 1045 1046=head1 Thread-Safety of System Libraries 1047 1048Whether various library calls are thread-safe is outside the control 1049of Perl. Calls often suffering from not being thread-safe include: 1050C<localtime()>, C<gmtime()>, functions fetching user, group and 1051network information (such as C<getgrent()>, C<gethostent()>, 1052C<getnetent()> and so on), C<readdir()>, C<rand()>, and C<srand()>. In 1053general, calls that depend on some global external state. 1054 1055If the system Perl is compiled in has thread-safe variants of such 1056calls, they will be used. Beyond that, Perl is at the mercy of 1057the thread-safety or -unsafety of the calls. Please consult your 1058C library call documentation. 1059 1060On some platforms the thread-safe library interfaces may fail if the 1061result buffer is too small (for example the user group databases may 1062be rather large, and the reentrant interfaces may have to carry around 1063a full snapshot of those databases). Perl will start with a small 1064buffer, but keep retrying and growing the result buffer 1065until the result fits. If this limitless growing sounds bad for 1066security or memory consumption reasons you can recompile Perl with 1067C<PERL_REENTRANT_MAXSIZE> defined to the maximum number of bytes you will 1068allow. 1069 1070=head1 Conclusion 1071 1072A complete thread tutorial could fill a book (and has, many times), 1073but with what we've covered in this introduction, you should be well 1074on your way to becoming a threaded Perl expert. 1075 1076=head1 SEE ALSO 1077 1078Annotated POD for L<threads>: 1079L<http://annocpan.org/?mode=search&field=Module&name=threads> 1080 1081Lastest version of L<threads> on CPAN: 1082L<http://search.cpan.org/search?module=threads> 1083 1084Annotated POD for L<threads::shared>: 1085L<http://annocpan.org/?mode=search&field=Module&name=threads%3A%3Ashared> 1086 1087Lastest version of L<threads::shared> on CPAN: 1088L<http://search.cpan.org/search?module=threads%3A%3Ashared> 1089 1090Perl threads mailing list: 1091L<http://lists.cpan.org/showlist.cgi?name=iThreads> 1092 1093=head1 Bibliography 1094 1095Here's a short bibliography courtesy of Jürgen Christoffel: 1096 1097=head2 Introductory Texts 1098 1099Birrell, Andrew D. An Introduction to Programming with 1100Threads. Digital Equipment Corporation, 1989, DEC-SRC Research Report 1101#35 online as 1102ftp://ftp.dec.com/pub/DEC/SRC/research-reports/SRC-035.pdf 1103(highly recommended) 1104 1105Robbins, Kay. A., and Steven Robbins. Practical Unix Programming: A 1106Guide to Concurrency, Communication, and 1107Multithreading. Prentice-Hall, 1996. 1108 1109Lewis, Bill, and Daniel J. Berg. Multithreaded Programming with 1110Pthreads. Prentice Hall, 1997, ISBN 0-13-443698-9 (a well-written 1111introduction to threads). 1112 1113Nelson, Greg (editor). Systems Programming with Modula-3. Prentice 1114Hall, 1991, ISBN 0-13-590464-1. 1115 1116Nichols, Bradford, Dick Buttlar, and Jacqueline Proulx Farrell. 1117Pthreads Programming. O'Reilly & Associates, 1996, ISBN 156592-115-1 1118(covers POSIX threads). 1119 1120=head2 OS-Related References 1121 1122Boykin, Joseph, David Kirschen, Alan Langerman, and Susan 1123LoVerso. Programming under Mach. Addison-Wesley, 1994, ISBN 11240-201-52739-1. 1125 1126Tanenbaum, Andrew S. Distributed Operating Systems. Prentice Hall, 11271995, ISBN 0-13-219908-4 (great textbook). 1128 1129Silberschatz, Abraham, and Peter B. Galvin. Operating System Concepts, 11304th ed. Addison-Wesley, 1995, ISBN 0-201-59292-4 1131 1132=head2 Other References 1133 1134Arnold, Ken and James Gosling. The Java Programming Language, 2nd 1135ed. Addison-Wesley, 1998, ISBN 0-201-31006-6. 1136 1137comp.programming.threads FAQ, 1138L<http://www.serpentine.com/~bos/threads-faq/> 1139 1140Le Sergent, T. and B. Berthomieu. "Incremental MultiThreaded Garbage 1141Collection on Virtually Shared Memory Architectures" in Memory 1142Management: Proc. of the International Workshop IWMM 92, St. Malo, 1143France, September 1992, Yves Bekkers and Jacques Cohen, eds. Springer, 11441992, ISBN 3540-55940-X (real-life thread applications). 1145 1146Artur Bergman, "Where Wizards Fear To Tread", June 11, 2002, 1147L<http://www.perl.com/pub/a/2002/06/11/threads.html> 1148 1149=head1 Acknowledgements 1150 1151Thanks (in no particular order) to Chaim Frenkel, Steve Fink, Gurusamy 1152Sarathy, Ilya Zakharevich, Benjamin Sugars, Jürgen Christoffel, Joshua 1153Pritikin, and Alan Burlison, for their help in reality-checking and 1154polishing this article. Big thanks to Tom Christiansen for his rewrite 1155of the prime number generator. 1156 1157=head1 AUTHOR 1158 1159Dan Sugalski E<lt>dan@sidhe.org<gt> 1160 1161Slightly modified by Arthur Bergman to fit the new thread model/module. 1162 1163Reworked slightly by Jörg Walter E<lt>jwalt@cpan.org<gt> to be more concise 1164about thread-safety of Perl code. 1165 1166Rearranged slightly by Elizabeth Mattijsen E<lt>liz@dijkmat.nl<gt> to put 1167less emphasis on yield(). 1168 1169=head1 Copyrights 1170 1171The original version of this article originally appeared in The Perl 1172Journal #10, and is copyright 1998 The Perl Journal. It appears courtesy 1173of Jon Orwant and The Perl Journal. This document may be distributed 1174under the same terms as Perl itself. 1175 1176=cut 1177