1 2=head1 NAME 3 4perlreftut - Mark's very short tutorial about references 5 6=head1 DESCRIPTION 7 8One of the most important new features in Perl 5 was the capability to 9manage complicated data structures like multidimensional arrays and 10nested hashes. To enable these, Perl 5 introduced a feature called 11`references', and using references is the key to managing complicated, 12structured data in Perl. Unfortunately, there's a lot of funny syntax 13to learn, and the main manual page can be hard to follow. The manual 14is quite complete, and sometimes people find that a problem, because 15it can be hard to tell what is important and what isn't. 16 17Fortunately, you only need to know 10% of what's in the main page to get 1890% of the benefit. This page will show you that 10%. 19 20=head1 Who Needs Complicated Data Structures? 21 22One problem that came up all the time in Perl 4 was how to represent a 23hash whose values were lists. Perl 4 had hashes, of course, but the 24values had to be scalars; they couldn't be lists. 25 26Why would you want a hash of lists? Let's take a simple example: You 27have a file of city and country names, like this: 28 29 Chicago, USA 30 Frankfurt, Germany 31 Berlin, Germany 32 Washington, USA 33 Helsinki, Finland 34 New York, USA 35 36and you want to produce an output like this, with each country mentioned 37once, and then an alphabetical list of the cities in that country: 38 39 Finland: Helsinki. 40 Germany: Berlin, Frankfurt. 41 USA: Chicago, New York, Washington. 42 43The natural way to do this is to have a hash whose keys are country 44names. Associated with each country name key is a list of the cities in 45that country. Each time you read a line of input, split it into a country 46and a city, look up the list of cities already known to be in that 47country, and append the new city to the list. When you're done reading 48the input, iterate over the hash as usual, sorting each list of cities 49before you print it out. 50 51If hash values can't be lists, you lose. In Perl 4, hash values can't 52be lists; they can only be strings. You lose. You'd probably have to 53combine all the cities into a single string somehow, and then when 54time came to write the output, you'd have to break the string into a 55list, sort the list, and turn it back into a string. This is messy 56and error-prone. And it's frustrating, because Perl already has 57perfectly good lists that would solve the problem if only you could 58use them. 59 60=head1 The Solution 61 62By the time Perl 5 rolled around, we were already stuck with this 63design: Hash values must be scalars. The solution to this is 64references. 65 66A reference is a scalar value that I<refers to> an entire array or an 67entire hash (or to just about anything else). Names are one kind of 68reference that you're already familiar with. Think of the President 69of the United States: a messy, inconvenient bag of blood and bones. 70But to talk about him, or to represent him in a computer program, all 71you need is the easy, convenient scalar string "George Bush". 72 73References in Perl are like names for arrays and hashes. They're 74Perl's private, internal names, so you can be sure they're 75unambiguous. Unlike "George Bush", a reference only refers to one 76thing, and you always know what it refers to. If you have a reference 77to an array, you can recover the entire array from it. If you have a 78reference to a hash, you can recover the entire hash. But the 79reference is still an easy, compact scalar value. 80 81You can't have a hash whose values are arrays; hash values can only be 82scalars. We're stuck with that. But a single reference can refer to 83an entire array, and references are scalars, so you can have a hash of 84references to arrays, and it'll act a lot like a hash of arrays, and 85it'll be just as useful as a hash of arrays. 86 87We'll come back to this city-country problem later, after we've seen 88some syntax for managing references. 89 90 91=head1 Syntax 92 93There are just two ways to make a reference, and just two ways to use 94it once you have it. 95 96=head2 Making References 97 98=head3 B<Make Rule 1> 99 100If you put a C<\> in front of a variable, you get a 101reference to that variable. 102 103 $aref = \@array; # $aref now holds a reference to @array 104 $href = \%hash; # $href now holds a reference to %hash 105 106Once the reference is stored in a variable like $aref or $href, you 107can copy it or store it just the same as any other scalar value: 108 109 $xy = $aref; # $xy now holds a reference to @array 110 $p[3] = $href; # $p[3] now holds a reference to %hash 111 $z = $p[3]; # $z now holds a reference to %hash 112 113 114These examples show how to make references to variables with names. 115Sometimes you want to make an array or a hash that doesn't have a 116name. This is analogous to the way you like to be able to use the 117string C<"\n"> or the number 80 without having to store it in a named 118variable first. 119 120B<Make Rule 2> 121 122C<[ ITEMS ]> makes a new, anonymous array, and returns a reference to 123that array. C<{ ITEMS }> makes a new, anonymous hash, and returns a 124reference to that hash. 125 126 $aref = [ 1, "foo", undef, 13 ]; 127 # $aref now holds a reference to an array 128 129 $href = { APR => 4, AUG => 8 }; 130 # $href now holds a reference to a hash 131 132 133The references you get from rule 2 are the same kind of 134references that you get from rule 1: 135 136 # This: 137 $aref = [ 1, 2, 3 ]; 138 139 # Does the same as this: 140 @array = (1, 2, 3); 141 $aref = \@array; 142 143 144The first line is an abbreviation for the following two lines, except 145that it doesn't create the superfluous array variable C<@array>. 146 147If you write just C<[]>, you get a new, empty anonymous array. 148If you write just C<{}>, you get a new, empty anonymous hash. 149 150 151=head2 Using References 152 153What can you do with a reference once you have it? It's a scalar 154value, and we've seen that you can store it as a scalar and get it back 155again just like any scalar. There are just two more ways to use it: 156 157=head3 B<Use Rule 1> 158 159You can always use an array reference, in curly braces, in place of 160the name of an array. For example, C<@{$aref}> instead of C<@array>. 161 162Here are some examples of that: 163 164Arrays: 165 166 167 @a @{$aref} An array 168 reverse @a reverse @{$aref} Reverse the array 169 $a[3] ${$aref}[3] An element of the array 170 $a[3] = 17; ${$aref}[3] = 17 Assigning an element 171 172 173On each line are two expressions that do the same thing. The 174left-hand versions operate on the array C<@a>. The right-hand 175versions operate on the array that is referred to by C<$aref>. Once 176they find the array they're operating on, both versions do the same 177things to the arrays. 178 179Using a hash reference is I<exactly> the same: 180 181 %h %{$href} A hash 182 keys %h keys %{$href} Get the keys from the hash 183 $h{'red'} ${$href}{'red'} An element of the hash 184 $h{'red'} = 17 ${$href}{'red'} = 17 Assigning an element 185 186Whatever you want to do with a reference, B<Use Rule 1> tells you how 187to do it. You just write the Perl code that you would have written 188for doing the same thing to a regular array or hash, and then replace 189the array or hash name with C<{$reference}>. "How do I loop over an 190array when all I have is a reference?" Well, to loop over an array, you 191would write 192 193 for my $element (@array) { 194 ... 195 } 196 197so replace the array name, C<@array>, with the reference: 198 199 for my $element (@{$aref}) { 200 ... 201 } 202 203"How do I print out the contents of a hash when all I have is a 204reference?" First write the code for printing out a hash: 205 206 for my $key (keys %hash) { 207 print "$key => $hash{$key}\n"; 208 } 209 210And then replace the hash name with the reference: 211 212 for my $key (keys %{$href}) { 213 print "$key => ${$href}{$key}\n"; 214 } 215 216=head3 B<Use Rule 2> 217 218B<Use Rule 1> is all you really need, because it tells you how to to 219absolutely everything you ever need to do with references. But the 220most common thing to do with an array or a hash is to extract a single 221element, and the B<Use Rule 1> notation is cumbersome. So there is an 222abbreviation. 223 224C<${$aref}[3]> is too hard to read, so you can write C<< $aref->[3] >> 225instead. 226 227C<${$href}{red}> is too hard to read, so you can write 228C<< $href->{red} >> instead. 229 230If C<$aref> holds a reference to an array, then C<< $aref->[3] >> is 231the fourth element of the array. Don't confuse this with C<$aref[3]>, 232which is the fourth element of a totally different array, one 233deceptively named C<@aref>. C<$aref> and C<@aref> are unrelated the 234same way that C<$item> and C<@item> are. 235 236Similarly, C<< $href->{'red'} >> is part of the hash referred to by 237the scalar variable C<$href>, perhaps even one with no name. 238C<$href{'red'}> is part of the deceptively named C<%href> hash. It's 239easy to forget to leave out the C<< -> >>, and if you do, you'll get 240bizarre results when your program gets array and hash elements out of 241totally unexpected hashes and arrays that weren't the ones you wanted 242to use. 243 244 245=head2 An Example 246 247Let's see a quick example of how all this is useful. 248 249First, remember that C<[1, 2, 3]> makes an anonymous array containing 250C<(1, 2, 3)>, and gives you a reference to that array. 251 252Now think about 253 254 @a = ( [1, 2, 3], 255 [4, 5, 6], 256 [7, 8, 9] 257 ); 258 259@a is an array with three elements, and each one is a reference to 260another array. 261 262C<$a[1]> is one of these references. It refers to an array, the array 263containing C<(4, 5, 6)>, and because it is a reference to an array, 264B<Use Rule 2> says that we can write C<< $a[1]->[2] >> to get the 265third element from that array. C<< $a[1]->[2] >> is the 6. 266Similarly, C<< $a[0]->[1] >> is the 2. What we have here is like a 267two-dimensional array; you can write C<< $a[ROW]->[COLUMN] >> to get 268or set the element in any row and any column of the array. 269 270The notation still looks a little cumbersome, so there's one more 271abbreviation: 272 273=head2 Arrow Rule 274 275In between two B<subscripts>, the arrow is optional. 276 277Instead of C<< $a[1]->[2] >>, we can write C<$a[1][2]>; it means the 278same thing. Instead of C<< $a[0]->[1] = 23 >>, we can write 279C<$a[0][1] = 23>; it means the same thing. 280 281Now it really looks like two-dimensional arrays! 282 283You can see why the arrows are important. Without them, we would have 284had to write C<${$a[1]}[2]> instead of C<$a[1][2]>. For 285three-dimensional arrays, they let us write C<$x[2][3][5]> instead of 286the unreadable C<${${$x[2]}[3]}[5]>. 287 288=head1 Solution 289 290Here's the answer to the problem I posed earlier, of reformatting a 291file of city and country names. 292 293 1 my %table; 294 295 2 while (<>) { 296 3 chomp; 297 4 my ($city, $country) = split /, /; 298 5 $table{$country} = [] unless exists $table{$country}; 299 6 push @{$table{$country}}, $city; 300 7 } 301 302 8 foreach $country (sort keys %table) { 303 9 print "$country: "; 304 10 my @cities = @{$table{$country}}; 305 11 print join ', ', sort @cities; 306 12 print ".\n"; 307 13 } 308 309 310The program has two pieces: Lines 2--7 read the input and build a data 311structure, and lines 8-13 analyze the data and print out the report. 312We're going to have a hash, C<%table>, whose keys are country names, 313and whose values are references to arrays of city names. The data 314structure will look like this: 315 316 317 %table 318 +-------+---+ 319 | | | +-----------+--------+ 320 |Germany| *---->| Frankfurt | Berlin | 321 | | | +-----------+--------+ 322 +-------+---+ 323 | | | +----------+ 324 |Finland| *---->| Helsinki | 325 | | | +----------+ 326 +-------+---+ 327 | | | +---------+------------+----------+ 328 | USA | *---->| Chicago | Washington | New York | 329 | | | +---------+------------+----------+ 330 +-------+---+ 331 332We'll look at output first. Supposing we already have this structure, 333how do we print it out? 334 335 8 foreach $country (sort keys %table) { 336 9 print "$country: "; 337 10 my @cities = @{$table{$country}}; 338 11 print join ', ', sort @cities; 339 12 print ".\n"; 340 13 } 341 342C<%table> is an 343ordinary hash, and we get a list of keys from it, sort the keys, and 344loop over the keys as usual. The only use of references is in line 10. 345C<$table{$country}> looks up the key C<$country> in the hash 346and gets the value, which is a reference to an array of cities in that country. 347B<Use Rule 1> says that 348we can recover the array by saying 349C<@{$table{$country}}>. Line 10 is just like 350 351 @cities = @array; 352 353except that the name C<array> has been replaced by the reference 354C<{$table{$country}}>. The C<@> tells Perl to get the entire array. 355Having gotten the list of cities, we sort it, join it, and print it 356out as usual. 357 358Lines 2-7 are responsible for building the structure in the first 359place. Here they are again: 360 361 2 while (<>) { 362 3 chomp; 363 4 my ($city, $country) = split /, /; 364 5 $table{$country} = [] unless exists $table{$country}; 365 6 push @{$table{$country}}, $city; 366 7 } 367 368Lines 2-4 acquire a city and country name. Line 5 looks to see if the 369country is already present as a key in the hash. If it's not, the 370program uses the C<[]> notation (B<Make Rule 2>) to manufacture a new, 371empty anonymous array of cities, and installs a reference to it into 372the hash under the appropriate key. 373 374Line 6 installs the city name into the appropriate array. 375C<$table{$country}> now holds a reference to the array of cities seen 376in that country so far. Line 6 is exactly like 377 378 push @array, $city; 379 380except that the name C<array> has been replaced by the reference 381C<{$table{$country}}>. The C<push> adds a city name to the end of the 382referred-to array. 383 384There's one fine point I skipped. Line 5 is unnecessary, and we can 385get rid of it. 386 387 2 while (<>) { 388 3 chomp; 389 4 my ($city, $country) = split /, /; 390 5 #### $table{$country} = [] unless exists $table{$country}; 391 6 push @{$table{$country}}, $city; 392 7 } 393 394If there's already an entry in C<%table> for the current C<$country>, 395then nothing is different. Line 6 will locate the value in 396C<$table{$country}>, which is a reference to an array, and push 397C<$city> into the array. But 398what does it do when 399C<$country> holds a key, say C<Greece>, that is not yet in C<%table>? 400 401This is Perl, so it does the exact right thing. It sees that you want 402to push C<Athens> onto an array that doesn't exist, so it helpfully 403makes a new, empty, anonymous array for you, installs it into 404C<%table>, and then pushes C<Athens> onto it. This is called 405`autovivification'--bringing things to life automatically. Perl saw 406that they key wasn't in the hash, so it created a new hash entry 407automatically. Perl saw that you wanted to use the hash value as an 408array, so it created a new empty array and installed a reference to it 409in the hash automatically. And as usual, Perl made the array one 410element longer to hold the new city name. 411 412=head1 The Rest 413 414I promised to give you 90% of the benefit with 10% of the details, and 415that means I left out 90% of the details. Now that you have an 416overview of the important parts, it should be easier to read the 417L<perlref> manual page, which discusses 100% of the details. 418 419Some of the highlights of L<perlref>: 420 421=over 4 422 423=item * 424 425You can make references to anything, including scalars, functions, and 426other references. 427 428=item * 429 430In B<Use Rule 1>, you can omit the curly brackets whenever the thing 431inside them is an atomic scalar variable like C<$aref>. For example, 432C<@$aref> is the same as C<@{$aref}>, and C<$$aref[1]> is the same as 433C<${$aref}[1]>. If you're just starting out, you may want to adopt 434the habit of always including the curly brackets. 435 436=item * 437 438This doesn't copy the underlying array: 439 440 $aref2 = $aref1; 441 442You get two references to the same array. If you modify 443C<< $aref1->[23] >> and then look at 444C<< $aref2->[23] >> you'll see the change. 445 446To copy the array, use 447 448 $aref2 = [@{$aref1}]; 449 450This uses C<[...]> notation to create a new anonymous array, and 451C<$aref2> is assigned a reference to the new array. The new array is 452initialized with the contents of the array referred to by C<$aref1>. 453 454Similarly, to copy an anonymous hash, you can use 455 456 $href2 = {%{$href1}}; 457 458=item * 459 460To see if a variable contains a reference, use the C<ref> function. It 461returns true if its argument is a reference. Actually it's a little 462better than that: It returns C<HASH> for hash references and C<ARRAY> 463for array references. 464 465=item * 466 467If you try to use a reference like a string, you get strings like 468 469 ARRAY(0x80f5dec) or HASH(0x826afc0) 470 471If you ever see a string that looks like this, you'll know you 472printed out a reference by mistake. 473 474A side effect of this representation is that you can use C<eq> to see 475if two references refer to the same thing. (But you should usually use 476C<==> instead because it's much faster.) 477 478=item * 479 480You can use a string as if it were a reference. If you use the string 481C<"foo"> as an array reference, it's taken to be a reference to the 482array C<@foo>. This is called a I<soft reference> or I<symbolic 483reference>. The declaration C<use strict 'refs'> disables this 484feature, which can cause all sorts of trouble if you use it by accident. 485 486=back 487 488You might prefer to go on to L<perllol> instead of L<perlref>; it 489discusses lists of lists and multidimensional arrays in detail. After 490that, you should move on to L<perldsc>; it's a Data Structure Cookbook 491that shows recipes for using and printing out arrays of hashes, hashes 492of arrays, and other kinds of data. 493 494=head1 Summary 495 496Everyone needs compound data structures, and in Perl the way you get 497them is with references. There are four important rules for managing 498references: Two for making references and two for using them. Once 499you know these rules you can do most of the important things you need 500to do with references. 501 502=head1 Credits 503 504Author: Mark Jason Dominus, Plover Systems (C<mjd-perl-ref+@plover.com>) 505 506This article originally appeared in I<The Perl Journal> 507( http://www.tpj.com/ ) volume 3, #2. Reprinted with permission. 508 509The original title was I<Understand References Today>. 510 511=head2 Distribution Conditions 512 513Copyright 1998 The Perl Journal. 514 515This documentation is free; you can redistribute it and/or modify it 516under the same terms as Perl itself. 517 518Irrespective of its distribution, all code examples in these files are 519hereby placed into the public domain. You are permitted and 520encouraged to use this code in your own programs for fun or for profit 521as you see fit. A simple comment in the code giving credit would be 522courteous but is not required. 523 524 525 526 527=cut 528