1=head1 NAME 2 3perlsec - Perl security 4 5=head1 DESCRIPTION 6 7Perl is designed to make it easy to program securely even when running 8with extra privileges, like setuid or setgid programs. Unlike most 9command line shells, which are based on multiple substitution passes on 10each line of the script, Perl uses a more conventional evaluation scheme 11with fewer hidden snags. Additionally, because the language has more 12builtin functionality, it can rely less upon external (and possibly 13untrustworthy) programs to accomplish its purposes. 14 15=head1 SECURITY VULNERABILITY CONTACT INFORMATION 16 17If you believe you have found a security vulnerability in Perl, please email 18perl5-security-report@perl.org with details. This points to a closed 19subscription, unarchived mailing list. Please only use this address for 20security issues in the Perl core, not for modules independently distributed on 21CPAN. 22 23=head1 SECURITY MECHANISMS AND CONCERNS 24 25=head2 Taint mode 26 27Perl automatically enables a set of special security checks, called I<taint 28mode>, when it detects its program running with differing real and effective 29user or group IDs. The setuid bit in Unix permissions is mode 04000, the 30setgid bit mode 02000; either or both may be set. You can also enable taint 31mode explicitly by using the B<-T> command line flag. This flag is 32I<strongly> suggested for server programs and any program run on behalf of 33someone else, such as a CGI script. Once taint mode is on, it's on for 34the remainder of your script. 35 36While in this mode, Perl takes special precautions called I<taint 37checks> to prevent both obvious and subtle traps. Some of these checks 38are reasonably simple, such as verifying that path directories aren't 39writable by others; careful programmers have always used checks like 40these. Other checks, however, are best supported by the language itself, 41and it is these checks especially that contribute to making a set-id Perl 42program more secure than the corresponding C program. 43 44You may not use data derived from outside your program to affect 45something else outside your program--at least, not by accident. All 46command line arguments, environment variables, locale information (see 47L<perllocale>), results of certain system calls (C<readdir()>, 48C<readlink()>, the variable of C<shmread()>, the messages returned by 49C<msgrcv()>, the password, gcos and shell fields returned by the 50C<getpwxxx()> calls), and all file input are marked as "tainted". 51Tainted data may not be used directly or indirectly in any command 52that invokes a sub-shell, nor in any command that modifies files, 53directories, or processes, B<with the following exceptions>: 54 55=over 4 56 57=item * 58 59Arguments to C<print> and C<syswrite> are B<not> checked for taintedness. 60 61=item * 62 63Symbolic methods 64 65 $obj->$method(@args); 66 67and symbolic sub references 68 69 &{$foo}(@args); 70 $foo->(@args); 71 72are not checked for taintedness. This requires extra carefulness 73unless you want external data to affect your control flow. Unless 74you carefully limit what these symbolic values are, people are able 75to call functions B<outside> your Perl code, such as POSIX::system, 76in which case they are able to run arbitrary external code. 77 78=item * 79 80Hash keys are B<never> tainted. 81 82=back 83 84For efficiency reasons, Perl takes a conservative view of 85whether data is tainted. If an expression contains tainted data, 86any subexpression may be considered tainted, even if the value 87of the subexpression is not itself affected by the tainted data. 88 89Because taintedness is associated with each scalar value, some 90elements of an array or hash can be tainted and others not. 91The keys of a hash are B<never> tainted. 92 93For example: 94 95 $arg = shift; # $arg is tainted 96 $hid = $arg . 'bar'; # $hid is also tainted 97 $line = <>; # Tainted 98 $line = <STDIN>; # Also tainted 99 open FOO, "/home/me/bar" or die $!; 100 $line = <FOO>; # Still tainted 101 $path = $ENV{'PATH'}; # Tainted, but see below 102 $data = 'abc'; # Not tainted 103 104 system "echo $arg"; # Insecure 105 system "/bin/echo", $arg; # Considered insecure 106 # (Perl doesn't know about /bin/echo) 107 system "echo $hid"; # Insecure 108 system "echo $data"; # Insecure until PATH set 109 110 $path = $ENV{'PATH'}; # $path now tainted 111 112 $ENV{'PATH'} = '/bin:/usr/bin'; 113 delete @ENV{'IFS', 'CDPATH', 'ENV', 'BASH_ENV'}; 114 115 $path = $ENV{'PATH'}; # $path now NOT tainted 116 system "echo $data"; # Is secure now! 117 118 open(FOO, "< $arg"); # OK - read-only file 119 open(FOO, "> $arg"); # Not OK - trying to write 120 121 open(FOO,"echo $arg|"); # Not OK 122 open(FOO,"-|") 123 or exec 'echo', $arg; # Also not OK 124 125 $shout = `echo $arg`; # Insecure, $shout now tainted 126 127 unlink $data, $arg; # Insecure 128 umask $arg; # Insecure 129 130 exec "echo $arg"; # Insecure 131 exec "echo", $arg; # Insecure 132 exec "sh", '-c', $arg; # Very insecure! 133 134 @files = <*.c>; # insecure (uses readdir() or similar) 135 @files = glob('*.c'); # insecure (uses readdir() or similar) 136 137 # In either case, the results of glob are tainted, since the list of 138 # filenames comes from outside of the program. 139 140 $bad = ($arg, 23); # $bad will be tainted 141 $arg, `true`; # Insecure (although it isn't really) 142 143If you try to do something insecure, you will get a fatal error saying 144something like "Insecure dependency" or "Insecure $ENV{PATH}". 145 146The exception to the principle of "one tainted value taints the whole 147expression" is with the ternary conditional operator C<?:>. Since code 148with a ternary conditional 149 150 $result = $tainted_value ? "Untainted" : "Also untainted"; 151 152is effectively 153 154 if ( $tainted_value ) { 155 $result = "Untainted"; 156 } else { 157 $result = "Also untainted"; 158 } 159 160it doesn't make sense for C<$result> to be tainted. 161 162=head2 Laundering and Detecting Tainted Data 163 164To test whether a variable contains tainted data, and whose use would 165thus trigger an "Insecure dependency" message, you can use the 166C<tainted()> function of the Scalar::Util module, available in your 167nearby CPAN mirror, and included in Perl starting from the release 5.8.0. 168Or you may be able to use the following C<is_tainted()> function. 169 170 sub is_tainted { 171 local $@; # Don't pollute caller's value. 172 return ! eval { eval("#" . substr(join("", @_), 0, 0)); 1 }; 173 } 174 175This function makes use of the fact that the presence of tainted data 176anywhere within an expression renders the entire expression tainted. It 177would be inefficient for every operator to test every argument for 178taintedness. Instead, the slightly more efficient and conservative 179approach is used that if any tainted value has been accessed within the 180same expression, the whole expression is considered tainted. 181 182But testing for taintedness gets you only so far. Sometimes you have just 183to clear your data's taintedness. Values may be untainted by using them 184as keys in a hash; otherwise the only way to bypass the tainting 185mechanism is by referencing subpatterns from a regular expression match. 186Perl presumes that if you reference a substring using $1, $2, etc., that 187you knew what you were doing when you wrote the pattern. That means using 188a bit of thought--don't just blindly untaint anything, or you defeat the 189entire mechanism. It's better to verify that the variable has only good 190characters (for certain values of "good") rather than checking whether it 191has any bad characters. That's because it's far too easy to miss bad 192characters that you never thought of. 193 194Here's a test to make sure that the data contains nothing but "word" 195characters (alphabetics, numerics, and underscores), a hyphen, an at sign, 196or a dot. 197 198 if ($data =~ /^([-\@\w.]+)$/) { 199 $data = $1; # $data now untainted 200 } else { 201 die "Bad data in '$data'"; # log this somewhere 202 } 203 204This is fairly secure because C</\w+/> doesn't normally match shell 205metacharacters, nor are dot, dash, or at going to mean something special 206to the shell. Use of C</.+/> would have been insecure in theory because 207it lets everything through, but Perl doesn't check for that. The lesson 208is that when untainting, you must be exceedingly careful with your patterns. 209Laundering data using regular expression is the I<only> mechanism for 210untainting dirty data, unless you use the strategy detailed below to fork 211a child of lesser privilege. 212 213The example does not untaint C<$data> if C<use locale> is in effect, 214because the characters matched by C<\w> are determined by the locale. 215Perl considers that locale definitions are untrustworthy because they 216contain data from outside the program. If you are writing a 217locale-aware program, and want to launder data with a regular expression 218containing C<\w>, put C<no locale> ahead of the expression in the same 219block. See L<perllocale/SECURITY> for further discussion and examples. 220 221=head2 Switches On the "#!" Line 222 223When you make a script executable, in order to make it usable as a 224command, the system will pass switches to perl from the script's #! 225line. Perl checks that any command line switches given to a setuid 226(or setgid) script actually match the ones set on the #! line. Some 227Unix and Unix-like environments impose a one-switch limit on the #! 228line, so you may need to use something like C<-wU> instead of C<-w -U> 229under such systems. (This issue should arise only in Unix or 230Unix-like environments that support #! and setuid or setgid scripts.) 231 232=head2 Taint mode and @INC 233 234When the taint mode (C<-T>) is in effect, the "." directory is removed 235from C<@INC>, and the environment variables C<PERL5LIB> and C<PERLLIB> 236are ignored by Perl. You can still adjust C<@INC> from outside the 237program by using the C<-I> command line option as explained in 238L<perlrun>. The two environment variables are ignored because 239they are obscured, and a user running a program could be unaware that 240they are set, whereas the C<-I> option is clearly visible and 241therefore permitted. 242 243Another way to modify C<@INC> without modifying the program, is to use 244the C<lib> pragma, e.g.: 245 246 perl -Mlib=/foo program 247 248The benefit of using C<-Mlib=/foo> over C<-I/foo>, is that the former 249will automagically remove any duplicated directories, while the later 250will not. 251 252Note that if a tainted string is added to C<@INC>, the following 253problem will be reported: 254 255 Insecure dependency in require while running with -T switch 256 257=head2 Cleaning Up Your Path 258 259For "Insecure C<$ENV{PATH}>" messages, you need to set C<$ENV{'PATH'}> to 260a known value, and each directory in the path must be absolute and 261non-writable by others than its owner and group. You may be surprised to 262get this message even if the pathname to your executable is fully 263qualified. This is I<not> generated because you didn't supply a full path 264to the program; instead, it's generated because you never set your PATH 265environment variable, or you didn't set it to something that was safe. 266Because Perl can't guarantee that the executable in question isn't itself 267going to turn around and execute some other program that is dependent on 268your PATH, it makes sure you set the PATH. 269 270The PATH isn't the only environment variable which can cause problems. 271Because some shells may use the variables IFS, CDPATH, ENV, and 272BASH_ENV, Perl checks that those are either empty or untainted when 273starting subprocesses. You may wish to add something like this to your 274setid and taint-checking scripts. 275 276 delete @ENV{qw(IFS CDPATH ENV BASH_ENV)}; # Make %ENV safer 277 278It's also possible to get into trouble with other operations that don't 279care whether they use tainted values. Make judicious use of the file 280tests in dealing with any user-supplied filenames. When possible, do 281opens and such B<after> properly dropping any special user (or group!) 282privileges. Perl doesn't prevent you from opening tainted filenames for reading, 283so be careful what you print out. The tainting mechanism is intended to 284prevent stupid mistakes, not to remove the need for thought. 285 286Perl does not call the shell to expand wild cards when you pass C<system> 287and C<exec> explicit parameter lists instead of strings with possible shell 288wildcards in them. Unfortunately, the C<open>, C<glob>, and 289backtick functions provide no such alternate calling convention, so more 290subterfuge will be required. 291 292Perl provides a reasonably safe way to open a file or pipe from a setuid 293or setgid program: just create a child process with reduced privilege who 294does the dirty work for you. First, fork a child using the special 295C<open> syntax that connects the parent and child by a pipe. Now the 296child resets its ID set and any other per-process attributes, like 297environment variables, umasks, current working directories, back to the 298originals or known safe values. Then the child process, which no longer 299has any special permissions, does the C<open> or other system call. 300Finally, the child passes the data it managed to access back to the 301parent. Because the file or pipe was opened in the child while running 302under less privilege than the parent, it's not apt to be tricked into 303doing something it shouldn't. 304 305Here's a way to do backticks reasonably safely. Notice how the C<exec> is 306not called with a string that the shell could expand. This is by far the 307best way to call something that might be subjected to shell escapes: just 308never call the shell at all. 309 310 use English '-no_match_vars'; 311 die "Can't fork: $!" unless defined($pid = open(KID, "-|")); 312 if ($pid) { # parent 313 while (<KID>) { 314 # do something 315 } 316 close KID; 317 } else { 318 my @temp = ($EUID, $EGID); 319 my $orig_uid = $UID; 320 my $orig_gid = $GID; 321 $EUID = $UID; 322 $EGID = $GID; 323 # Drop privileges 324 $UID = $orig_uid; 325 $GID = $orig_gid; 326 # Make sure privs are really gone 327 ($EUID, $EGID) = @temp; 328 die "Can't drop privileges" 329 unless $UID == $EUID && $GID eq $EGID; 330 $ENV{PATH} = "/bin:/usr/bin"; # Minimal PATH. 331 # Consider sanitizing the environment even more. 332 exec 'myprog', 'arg1', 'arg2' 333 or die "can't exec myprog: $!"; 334 } 335 336A similar strategy would work for wildcard expansion via C<glob>, although 337you can use C<readdir> instead. 338 339Taint checking is most useful when although you trust yourself not to have 340written a program to give away the farm, you don't necessarily trust those 341who end up using it not to try to trick it into doing something bad. This 342is the kind of security checking that's useful for set-id programs and 343programs launched on someone else's behalf, like CGI programs. 344 345This is quite different, however, from not even trusting the writer of the 346code not to try to do something evil. That's the kind of trust needed 347when someone hands you a program you've never seen before and says, "Here, 348run this." For that kind of safety, you might want to check out the Safe 349module, included standard in the Perl distribution. This module allows the 350programmer to set up special compartments in which all system operations 351are trapped and namespace access is carefully controlled. Safe should 352not be considered bullet-proof, though: it will not prevent the foreign 353code to set up infinite loops, allocate gigabytes of memory, or even 354abusing perl bugs to make the host interpreter crash or behave in 355unpredictable ways. In any case it's better avoided completely if you're 356really concerned about security. 357 358=head2 Security Bugs 359 360Beyond the obvious problems that stem from giving special privileges to 361systems as flexible as scripts, on many versions of Unix, set-id scripts 362are inherently insecure right from the start. The problem is a race 363condition in the kernel. Between the time the kernel opens the file to 364see which interpreter to run and when the (now-set-id) interpreter turns 365around and reopens the file to interpret it, the file in question may have 366changed, especially if you have symbolic links on your system. 367 368Fortunately, sometimes this kernel "feature" can be disabled. 369Unfortunately, there are two ways to disable it. The system can simply 370outlaw scripts with any set-id bit set, which doesn't help much. 371Alternately, it can simply ignore the set-id bits on scripts. 372 373However, if the kernel set-id script feature isn't disabled, Perl will 374complain loudly that your set-id script is insecure. You'll need to 375either disable the kernel set-id script feature, or put a C wrapper around 376the script. A C wrapper is just a compiled program that does nothing 377except call your Perl program. Compiled programs are not subject to the 378kernel bug that plagues set-id scripts. Here's a simple wrapper, written 379in C: 380 381 #define REAL_PATH "/path/to/script" 382 main(ac, av) 383 char **av; 384 { 385 execv(REAL_PATH, av); 386 } 387 388Compile this wrapper into a binary executable and then make I<it> rather 389than your script setuid or setgid. 390 391In recent years, vendors have begun to supply systems free of this 392inherent security bug. On such systems, when the kernel passes the name 393of the set-id script to open to the interpreter, rather than using a 394pathname subject to meddling, it instead passes I</dev/fd/3>. This is a 395special file already opened on the script, so that there can be no race 396condition for evil scripts to exploit. On these systems, Perl should be 397compiled with C<-DSETUID_SCRIPTS_ARE_SECURE_NOW>. The F<Configure> 398program that builds Perl tries to figure this out for itself, so you 399should never have to specify this yourself. Most modern releases of 400SysVr4 and BSD 4.4 use this approach to avoid the kernel race condition. 401 402=head2 Protecting Your Programs 403 404There are a number of ways to hide the source to your Perl programs, 405with varying levels of "security". 406 407First of all, however, you I<can't> take away read permission, because 408the source code has to be readable in order to be compiled and 409interpreted. (That doesn't mean that a CGI script's source is 410readable by people on the web, though.) So you have to leave the 411permissions at the socially friendly 0755 level. This lets 412people on your local system only see your source. 413 414Some people mistakenly regard this as a security problem. If your program does 415insecure things, and relies on people not knowing how to exploit those 416insecurities, it is not secure. It is often possible for someone to 417determine the insecure things and exploit them without viewing the 418source. Security through obscurity, the name for hiding your bugs 419instead of fixing them, is little security indeed. 420 421You can try using encryption via source filters (Filter::* from CPAN, 422or Filter::Util::Call and Filter::Simple since Perl 5.8). 423But crackers might be able to decrypt it. You can try using the byte 424code compiler and interpreter described below, but crackers might be 425able to de-compile it. You can try using the native-code compiler 426described below, but crackers might be able to disassemble it. These 427pose varying degrees of difficulty to people wanting to get at your 428code, but none can definitively conceal it (this is true of every 429language, not just Perl). 430 431If you're concerned about people profiting from your code, then the 432bottom line is that nothing but a restrictive license will give you 433legal security. License your software and pepper it with threatening 434statements like "This is unpublished proprietary software of XYZ Corp. 435Your access to it does not give you permission to use it blah blah 436blah." You should see a lawyer to be sure your license's wording will 437stand up in court. 438 439=head2 Unicode 440 441Unicode is a new and complex technology and one may easily overlook 442certain security pitfalls. See L<perluniintro> for an overview and 443L<perlunicode> for details, and L<perlunicode/"Security Implications 444of Unicode"> for security implications in particular. 445 446=head2 Algorithmic Complexity Attacks 447 448Certain internal algorithms used in the implementation of Perl can 449be attacked by choosing the input carefully to consume large amounts 450of either time or space or both. This can lead into the so-called 451I<Denial of Service> (DoS) attacks. 452 453=over 4 454 455=item * 456 457Hash Algorithm - Hash algorithms like the one used in Perl are well 458known to be vulnerable to collision attacks on their hash function. 459Such attacks involve constructing a set of keys which collide into 460the same bucket producing inefficient behavior. Such attacks often 461depend on discovering the seed of the hash function used to map the 462keys to buckets. That seed is then used to brute-force a key set which 463can be used to mount a denial of service attack. In Perl 5.8.1 changes 464were introduced to harden Perl to such attacks, and then later in 465Perl 5.18.0 these features were enhanced and additional protections 466added. 467 468At the time of this writing, Perl 5.18.0 is considered to be 469well-hardened against algorithmic complexity attacks on its hash 470implementation. This is largely owed to the following measures 471mitigate attacks: 472 473=over 4 474 475=item Hash Seed Randomization 476 477In order to make it impossible to know what seed to generate an attack 478key set for, this seed is randomly initialized at process start. This 479may be overridden by using the PERL_HASH_SEED environment variable, see 480L<perlrun/PERL_HASH_SEED>. This environment variable controls how 481items are actually stored, not how they are presented via 482C<keys>, C<values> and C<each>. 483 484=item Hash Traversal Randomization 485 486Independent of which seed is used in the hash function, C<keys>, 487C<values>, and C<each> return items in a per-hash randomized order. 488Modifying a hash by insertion will change the iteration order of that hash. 489This behavior can be overridden by using C<hash_traversal_mask()> from 490L<Hash::Util> or by using the PERL_PERTURB_KEYS environment variable, 491see L<perlrun/PERL_PERTURB_KEYS>. Note that this feature controls the 492"visible" order of the keys, and not the actual order they are stored in. 493 494=item Bucket Order Perturbance 495 496When items collide into a given hash bucket the order they are stored in 497the chain is no longer predictable in Perl 5.18. This has the intention 498to make it harder to observe a collisions. This behavior can be overridden by using 499the PERL_PERTURB_KEYS environment variable, see L<perlrun/PERL_PERTURB_KEYS>. 500 501=item New Default Hash Function 502 503The default hash function has been modified with the intention of making 504it harder to infer the hash seed. 505 506=item Alternative Hash Functions 507 508The source code includes multiple hash algorithms to choose from. While we 509believe that the default perl hash is robust to attack, we have included the 510hash function Siphash as a fall-back option. At the time of release of 511Perl 5.18.0 Siphash is believed to be of cryptographic strength. This is 512not the default as it is much slower than the default hash. 513 514=back 515 516Without compiling a special Perl, there is no way to get the exact same 517behavior of any versions prior to Perl 5.18.0. The closest one can get 518is by setting PERL_PERTURB_KEYS to 0 and setting the PERL_HASH_SEED 519to a known value. We do not advise those settings for production use 520due to the above security considerations. 521 522B<Perl has never guaranteed any ordering of the hash keys>, and 523the ordering has already changed several times during the lifetime of 524Perl 5. Also, the ordering of hash keys has always been, and continues 525to be, affected by the insertion order and the history of changes made 526to the hash over its lifetime. 527 528Also note that while the order of the hash elements might be 529randomized, this "pseudo-ordering" should B<not> be used for 530applications like shuffling a list randomly (use C<List::Util::shuffle()> 531for that, see L<List::Util>, a standard core module since Perl 5.8.0; 532or the CPAN module C<Algorithm::Numerical::Shuffle>), or for generating 533permutations (use e.g. the CPAN modules C<Algorithm::Permute> or 534C<Algorithm::FastPermute>), or for any cryptographic applications. 535 536=item * 537 538Regular expressions - Perl's regular expression engine is so called NFA 539(Non-deterministic Finite Automaton), which among other things means that 540it can rather easily consume large amounts of both time and space if the 541regular expression may match in several ways. Careful crafting of the 542regular expressions can help but quite often there really isn't much 543one can do (the book "Mastering Regular Expressions" is required 544reading, see L<perlfaq2>). Running out of space manifests itself by 545Perl running out of memory. 546 547=item * 548 549Sorting - the quicksort algorithm used in Perls before 5.8.0 to 550implement the sort() function is very easy to trick into misbehaving 551so that it consumes a lot of time. Starting from Perl 5.8.0 a different 552sorting algorithm, mergesort, is used by default. Mergesort cannot 553misbehave on any input. 554 555=back 556 557See L<http://www.cs.rice.edu/~scrosby/hash/> for more information, 558and any computer science textbook on algorithmic complexity. 559 560=head1 SEE ALSO 561 562L<perlrun> for its description of cleaning up environment variables. 563