1 2=head1 NAME 3 4perlreftut - Mark's very short tutorial about references 5 6=head1 DESCRIPTION 7 8One of the most important new features in Perl 5 was the capability to 9manage complicated data structures like multidimensional arrays and 10nested hashes. To enable these, Perl 5 introduced a feature called 11`references', and using references is the key to managing complicated, 12structured data in Perl. Unfortunately, there's a lot of funny syntax 13to learn, and the main manual page can be hard to follow. The manual 14is quite complete, and sometimes people find that a problem, because 15it can be hard to tell what is important and what isn't. 16 17Fortunately, you only need to know 10% of what's in the main page to get 1890% of the benefit. This page will show you that 10%. 19 20=head1 Who Needs Complicated Data Structures? 21 22One problem that came up all the time in Perl 4 was how to represent a 23hash whose values were lists. Perl 4 had hashes, of course, but the 24values had to be scalars; they couldn't be lists. 25 26Why would you want a hash of lists? Let's take a simple example: You 27have a file of city and country names, like this: 28 29 Chicago, USA 30 Frankfurt, Germany 31 Berlin, Germany 32 Washington, USA 33 Helsinki, Finland 34 New York, USA 35 36and you want to produce an output like this, with each country mentioned 37once, and then an alphabetical list of the cities in that country: 38 39 Finland: Helsinki. 40 Germany: Berlin, Frankfurt. 41 USA: Chicago, New York, Washington. 42 43The natural way to do this is to have a hash whose keys are country 44names. Associated with each country name key is a list of the cities in 45that country. Each time you read a line of input, split it into a country 46and a city, look up the list of cities already known to be in that 47country, and append the new city to the list. When you're done reading 48the input, iterate over the hash as usual, sorting each list of cities 49before you print it out. 50 51If hash values can't be lists, you lose. In Perl 4, hash values can't 52be lists; they can only be strings. You lose. You'd probably have to 53combine all the cities into a single string somehow, and then when 54time came to write the output, you'd have to break the string into a 55list, sort the list, and turn it back into a string. This is messy 56and error-prone. And it's frustrating, because Perl already has 57perfectly good lists that would solve the problem if only you could 58use them. 59 60=head1 The Solution 61 62By the time Perl 5 rolled around, we were already stuck with this 63design: Hash values must be scalars. The solution to this is 64references. 65 66A reference is a scalar value that I<refers to> an entire array or an 67entire hash (or to just about anything else). Names are one kind of 68reference that you're already familiar with. Think of the President: 69a messy, inconvenient bag of blood and bones. But to talk about him, 70or to represent him in a computer program, all you need is the easy, 71convenient scalar string "Bill Clinton". 72 73References in Perl are like names for arrays and hashes. They're 74Perl's private, internal names, so you can be sure they're 75unambiguous. Unlike "Bill Clinton", a reference only refers to one 76thing, and you always know what it refers to. If you have a reference 77to an array, you can recover the entire array from it. If you have a 78reference to a hash, you can recover the entire hash. But the 79reference is still an easy, compact scalar value. 80 81You can't have a hash whose values are arrays; hash values can only be 82scalars. We're stuck with that. But a single reference can refer to 83an entire array, and references are scalars, so you can have a hash of 84references to arrays, and it'll act a lot like a hash of arrays, and 85it'll be just as useful as a hash of arrays. 86 87We'll come back to this city-country problem later, after we've seen 88some syntax for managing references. 89 90 91=head1 Syntax 92 93There are just two ways to make a reference, and just two ways to use 94it once you have it. 95 96=head2 Making References 97 98B<Make Rule 1> 99 100If you put a C<\> in front of a variable, you get a 101reference to that variable. 102 103 $aref = \@array; # $aref now holds a reference to @array 104 $href = \%hash; # $href now holds a reference to %hash 105 106Once the reference is stored in a variable like $aref or $href, you 107can copy it or store it just the same as any other scalar value: 108 109 $xy = $aref; # $xy now holds a reference to @array 110 $p[3] = $href; # $p[3] now holds a reference to %hash 111 $z = $p[3]; # $z now holds a reference to %hash 112 113 114These examples show how to make references to variables with names. 115Sometimes you want to make an array or a hash that doesn't have a 116name. This is analogous to the way you like to be able to use the 117string C<"\n"> or the number 80 without having to store it in a named 118variable first. 119 120B<Make Rule 2> 121 122C<[ ITEMS ]> makes a new, anonymous array, and returns a reference to 123that array. C<{ ITEMS }> makes a new, anonymous hash. and returns a 124reference to that hash. 125 126 $aref = [ 1, "foo", undef, 13 ]; 127 # $aref now holds a reference to an array 128 129 $href = { APR => 4, AUG => 8 }; 130 # $href now holds a reference to a hash 131 132 133The references you get from rule 2 are the same kind of 134references that you get from rule 1: 135 136 # This: 137 $aref = [ 1, 2, 3 ]; 138 139 # Does the same as this: 140 @array = (1, 2, 3); 141 $aref = \@array; 142 143 144The first line is an abbreviation for the following two lines, except 145that it doesn't create the superfluous array variable C<@array>. 146 147 148=head2 Using References 149 150What can you do with a reference once you have it? It's a scalar 151value, and we've seen that you can store it as a scalar and get it back 152again just like any scalar. There are just two more ways to use it: 153 154B<Use Rule 1> 155 156If C<$aref> contains a reference to an array, then you 157can put C<{$aref}> anywhere you would normally put the name of an 158array. For example, C<@{$aref}> instead of C<@array>. 159 160Here are some examples of that: 161 162Arrays: 163 164 165 @a @{$aref} An array 166 reverse @a reverse @{$aref} Reverse the array 167 $a[3] ${$aref}[3] An element of the array 168 $a[3] = 17; ${$aref}[3] = 17 Assigning an element 169 170 171On each line are two expressions that do the same thing. The 172left-hand versions operate on the array C<@a>, and the right-hand 173versions operate on the array that is referred to by C<$aref>, but 174once they find the array they're operating on, they do the same things 175to the arrays. 176 177Using a hash reference is I<exactly> the same: 178 179 %h %{$href} A hash 180 keys %h keys %{$href} Get the keys from the hash 181 $h{'red'} ${$href}{'red'} An element of the hash 182 $h{'red'} = 17 ${$href}{'red'} = 17 Assigning an element 183 184 185B<Use Rule 2> 186 187C<${$aref}[3]> is too hard to read, so you can write C<< $aref->[3] >> 188instead. 189 190C<${$href}{red}> is too hard to read, so you can write 191C<< $href->{red} >> instead. 192 193Most often, when you have an array or a hash, you want to get or set a 194single element from it. C<${$aref}[3]> and C<${$href}{'red'}> have 195too much punctuation, and Perl lets you abbreviate. 196 197If C<$aref> holds a reference to an array, then C<< $aref->[3] >> is 198the fourth element of the array. Don't confuse this with C<$aref[3]>, 199which is the fourth element of a totally different array, one 200deceptively named C<@aref>. C<$aref> and C<@aref> are unrelated the 201same way that C<$item> and C<@item> are. 202 203Similarly, C<< $href->{'red'} >> is part of the hash referred to by 204the scalar variable C<$href>, perhaps even one with no name. 205C<$href{'red'}> is part of the deceptively named C<%href> hash. It's 206easy to forget to leave out the C<< -> >>, and if you do, you'll get 207bizarre results when your program gets array and hash elements out of 208totally unexpected hashes and arrays that weren't the ones you wanted 209to use. 210 211 212=head1 An Example 213 214Let's see a quick example of how all this is useful. 215 216First, remember that C<[1, 2, 3]> makes an anonymous array containing 217C<(1, 2, 3)>, and gives you a reference to that array. 218 219Now think about 220 221 @a = ( [1, 2, 3], 222 [4, 5, 6], 223 [7, 8, 9] 224 ); 225 226@a is an array with three elements, and each one is a reference to 227another array. 228 229C<$a[1]> is one of these references. It refers to an array, the array 230containing C<(4, 5, 6)>, and because it is a reference to an array, 231B<USE RULE 2> says that we can write C<< $a[1]->[2] >> to get the 232third element from that array. C<< $a[1]->[2] >> is the 6. 233Similarly, C<< $a[0]->[1] >> is the 2. What we have here is like a 234two-dimensional array; you can write C<< $a[ROW]->[COLUMN] >> to get 235or set the element in any row and any column of the array. 236 237The notation still looks a little cumbersome, so there's one more 238abbreviation: 239 240=head1 Arrow Rule 241 242In between two B<subscripts>, the arrow is optional. 243 244Instead of C<< $a[1]->[2] >>, we can write C<$a[1][2]>; it means the 245same thing. Instead of C<< $a[0]->[1] >>, we can write C<$a[0][1]>; 246it means the same thing. 247 248Now it really looks like two-dimensional arrays! 249 250You can see why the arrows are important. Without them, we would have 251had to write C<${$a[1]}[2]> instead of C<$a[1][2]>. For 252three-dimensional arrays, they let us write C<$x[2][3][5]> instead of 253the unreadable C<${${$x[2]}[3]}[5]>. 254 255 256=head1 Solution 257 258Here's the answer to the problem I posed earlier, of reformatting a 259file of city and country names. 260 261 1 while (<>) { 262 2 chomp; 263 3 my ($city, $country) = split /, /; 264 4 push @{$table{$country}}, $city; 265 5 } 266 6 267 7 foreach $country (sort keys %table) { 268 8 print "$country: "; 269 9 my @cities = @{$table{$country}}; 270 10 print join ', ', sort @cities; 271 11 print ".\n"; 272 12 } 273 274 275The program has two pieces: Lines 1--5 read the input and build a 276data structure, and lines 7--12 analyze the data and print out the 277report. 278 279In the first part, line 4 is the important one. We're going to have a 280hash, C<%table>, whose keys are country names, and whose values are 281(references to) arrays of city names. After acquiring a city and 282country name, the program looks up C<$table{$country}>, which holds (a 283reference to) the list of cities seen in that country so far. Line 4 is 284totally analogous to 285 286 push @array, $city; 287 288except that the name C<array> has been replaced by the reference 289C<{$table{$country}}>. The C<push> adds a city name to the end of the 290referred-to array. 291 292In the second part, line 9 is the important one. Again, 293C<$table{$country}> is (a reference to) the list of cities in the country, so 294we can recover the original list, and copy it into the array C<@cities>, 295by using C<@{$table{$country}}>. Line 9 is totally analogous to 296 297 @cities = @array; 298 299except that the name C<array> has been replaced by the reference 300C<{$table{$country}}>. The C<@> tells Perl to get the entire array. 301 302The rest of the program is just familiar uses of C<chomp>, C<split>, C<sort>, 303C<print>, and doesn't involve references at all. 304 305There's one fine point I skipped. Suppose the program has just read 306the first line in its input that happens to mention Greece. 307Control is at line 4, C<$country> is C<'Greece'>, and C<$city> is 308C<'Athens'>. Since this is the first city in Greece, 309C<$table{$country}> is undefined---in fact there isn't an C<'Greece'> key 310in C<%table> at all. What does line 4 do here? 311 312 4 push @{$table{$country}}, $city; 313 314 315This is Perl, so it does the exact right thing. It sees that you want 316to push C<Athens> onto an array that doesn't exist, so it helpfully 317makes a new, empty, anonymous array for you, installs it in the table, 318and then pushes C<Athens> onto it. This is called `autovivification'. 319 320 321=head1 The Rest 322 323I promised to give you 90% of the benefit with 10% of the details, and 324that means I left out 90% of the details. Now that you have an 325overview of the important parts, it should be easier to read the 326L<perlref> manual page, which discusses 100% of the details. 327 328Some of the highlights of L<perlref>: 329 330=over 4 331 332=item * 333 334You can make references to anything, including scalars, functions, and 335other references. 336 337=item * 338 339In B<USE RULE 1>, you can omit the curly brackets whenever the thing 340inside them is an atomic scalar variable like C<$aref>. For example, 341C<@$aref> is the same as C<@{$aref}>, and C<$$aref[1]> is the same as 342C<${$aref}[1]>. If you're just starting out, you may want to adopt 343the habit of always including the curly brackets. 344 345=item * 346 347To see if a variable contains a reference, use the `ref' function. 348It returns true if its argument is a reference. Actually it's a 349little better than that: It returns HASH for hash references and 350ARRAY for array references. 351 352=item * 353 354If you try to use a reference like a string, you get strings like 355 356 ARRAY(0x80f5dec) or HASH(0x826afc0) 357 358If you ever see a string that looks like this, you'll know you 359printed out a reference by mistake. 360 361A side effect of this representation is that you can use C<eq> to see 362if two references refer to the same thing. (But you should usually use 363C<==> instead because it's much faster.) 364 365=item * 366 367You can use a string as if it were a reference. If you use the string 368C<"foo"> as an array reference, it's taken to be a reference to the 369array C<@foo>. This is called a I<soft reference> or I<symbolic reference>. 370 371=back 372 373You might prefer to go on to L<perllol> instead of L<perlref>; it 374discusses lists of lists and multidimensional arrays in detail. After 375that, you should move on to L<perldsc>; it's a Data Structure Cookbook 376that shows recipes for using and printing out arrays of hashes, hashes 377of arrays, and other kinds of data. 378 379=head1 Summary 380 381Everyone needs compound data structures, and in Perl the way you get 382them is with references. There are four important rules for managing 383references: Two for making references and two for using them. Once 384you know these rules you can do most of the important things you need 385to do with references. 386 387=head1 Credits 388 389Author: Mark-Jason Dominus, Plover Systems (C<mjd-perl-ref+@plover.com>) 390 391This article originally appeared in I<The Perl Journal> 392(http://tpj.com) volume 3, #2. Reprinted with permission. 393 394The original title was I<Understand References Today>. 395 396=head2 Distribution Conditions 397 398Copyright 1998 The Perl Journal. 399 400When included as part of the Standard Version of Perl, or as part of 401its complete documentation whether printed or otherwise, this work may 402be distributed only under the terms of Perl's Artistic License. Any 403distribution of this file or derivatives thereof outside of that 404package require that special arrangements be made with copyright 405holder. 406 407Irrespective of its distribution, all code examples in these files are 408hereby placed into the public domain. You are permitted and 409encouraged to use this code in your own programs for fun or for profit 410as you see fit. A simple comment in the code giving credit would be 411courteous but is not required. 412 413 414 415 416=cut 417