1*0Sstevel@tonic-gate=head1 NAME 2*0Sstevel@tonic-gate 3*0Sstevel@tonic-gateperlpacktut - tutorial on C<pack> and C<unpack> 4*0Sstevel@tonic-gate 5*0Sstevel@tonic-gate=head1 DESCRIPTION 6*0Sstevel@tonic-gate 7*0Sstevel@tonic-gateC<pack> and C<unpack> are two functions for transforming data according 8*0Sstevel@tonic-gateto a user-defined template, between the guarded way Perl stores values 9*0Sstevel@tonic-gateand some well-defined representation as might be required in the 10*0Sstevel@tonic-gateenvironment of a Perl program. Unfortunately, they're also two of 11*0Sstevel@tonic-gatethe most misunderstood and most often overlooked functions that Perl 12*0Sstevel@tonic-gateprovides. This tutorial will demystify them for you. 13*0Sstevel@tonic-gate 14*0Sstevel@tonic-gate 15*0Sstevel@tonic-gate=head1 The Basic Principle 16*0Sstevel@tonic-gate 17*0Sstevel@tonic-gateMost programming languages don't shelter the memory where variables are 18*0Sstevel@tonic-gatestored. In C, for instance, you can take the address of some variable, 19*0Sstevel@tonic-gateand the C<sizeof> operator tells you how many bytes are allocated to 20*0Sstevel@tonic-gatethe variable. Using the address and the size, you may access the storage 21*0Sstevel@tonic-gateto your heart's content. 22*0Sstevel@tonic-gate 23*0Sstevel@tonic-gateIn Perl, you just can't access memory at random, but the structural and 24*0Sstevel@tonic-gaterepresentational conversion provided by C<pack> and C<unpack> is an 25*0Sstevel@tonic-gateexcellent alternative. The C<pack> function converts values to a byte 26*0Sstevel@tonic-gatesequence containing representations according to a given specification, 27*0Sstevel@tonic-gatethe so-called "template" argument. C<unpack> is the reverse process, 28*0Sstevel@tonic-gatederiving some values from the contents of a string of bytes. (Be cautioned, 29*0Sstevel@tonic-gatehowever, that not all that has been packed together can be neatly unpacked - 30*0Sstevel@tonic-gatea very common experience as seasoned travellers are likely to confirm.) 31*0Sstevel@tonic-gate 32*0Sstevel@tonic-gateWhy, you may ask, would you need a chunk of memory containing some values 33*0Sstevel@tonic-gatein binary representation? One good reason is input and output accessing 34*0Sstevel@tonic-gatesome file, a device, or a network connection, whereby this binary 35*0Sstevel@tonic-gaterepresentation is either forced on you or will give you some benefit 36*0Sstevel@tonic-gatein processing. Another cause is passing data to some system call that 37*0Sstevel@tonic-gateis not available as a Perl function: C<syscall> requires you to provide 38*0Sstevel@tonic-gateparameters stored in the way it happens in a C program. Even text processing 39*0Sstevel@tonic-gate(as shown in the next section) may be simplified with judicious usage 40*0Sstevel@tonic-gateof these two functions. 41*0Sstevel@tonic-gate 42*0Sstevel@tonic-gateTo see how (un)packing works, we'll start with a simple template 43*0Sstevel@tonic-gatecode where the conversion is in low gear: between the contents of a byte 44*0Sstevel@tonic-gatesequence and a string of hexadecimal digits. Let's use C<unpack>, since 45*0Sstevel@tonic-gatethis is likely to remind you of a dump program, or some desperate last 46*0Sstevel@tonic-gatemessage unfortunate programs are wont to throw at you before they expire 47*0Sstevel@tonic-gateinto the wild blue yonder. Assuming that the variable C<$mem> holds a 48*0Sstevel@tonic-gatesequence of bytes that we'd like to inspect without assuming anything 49*0Sstevel@tonic-gateabout its meaning, we can write 50*0Sstevel@tonic-gate 51*0Sstevel@tonic-gate my( $hex ) = unpack( 'H*', $mem ); 52*0Sstevel@tonic-gate print "$hex\n"; 53*0Sstevel@tonic-gate 54*0Sstevel@tonic-gatewhereupon we might see something like this, with each pair of hex digits 55*0Sstevel@tonic-gatecorresponding to a byte: 56*0Sstevel@tonic-gate 57*0Sstevel@tonic-gate 41204d414e204120504c414e20412043414e414c2050414e414d41 58*0Sstevel@tonic-gate 59*0Sstevel@tonic-gateWhat was in this chunk of memory? Numbers, characters, or a mixture of 60*0Sstevel@tonic-gateboth? Assuming that we're on a computer where ASCII (or some similar) 61*0Sstevel@tonic-gateencoding is used: hexadecimal values in the range C<0x40> - C<0x5A> 62*0Sstevel@tonic-gateindicate an uppercase letter, and C<0x20> encodes a space. So we might 63*0Sstevel@tonic-gateassume it is a piece of text, which some are able to read like a tabloid; 64*0Sstevel@tonic-gatebut others will have to get hold of an ASCII table and relive that 65*0Sstevel@tonic-gatefirstgrader feeling. Not caring too much about which way to read this, 66*0Sstevel@tonic-gatewe note that C<unpack> with the template code C<H> converts the contents 67*0Sstevel@tonic-gateof a sequence of bytes into the customary hexadecimal notation. Since 68*0Sstevel@tonic-gate"a sequence of" is a pretty vague indication of quantity, C<H> has been 69*0Sstevel@tonic-gatedefined to convert just a single hexadecimal digit unless it is followed 70*0Sstevel@tonic-gateby a repeat count. An asterisk for the repeat count means to use whatever 71*0Sstevel@tonic-gateremains. 72*0Sstevel@tonic-gate 73*0Sstevel@tonic-gateThe inverse operation - packing byte contents from a string of hexadecimal 74*0Sstevel@tonic-gatedigits - is just as easily written. For instance: 75*0Sstevel@tonic-gate 76*0Sstevel@tonic-gate my $s = pack( 'H2' x 10, map { "3$_" } ( 0..9 ) ); 77*0Sstevel@tonic-gate print "$s\n"; 78*0Sstevel@tonic-gate 79*0Sstevel@tonic-gateSince we feed a list of ten 2-digit hexadecimal strings to C<pack>, the 80*0Sstevel@tonic-gatepack template should contain ten pack codes. If this is run on a computer 81*0Sstevel@tonic-gatewith ASCII character coding, it will print C<0123456789>. 82*0Sstevel@tonic-gate 83*0Sstevel@tonic-gate 84*0Sstevel@tonic-gate=head1 Packing Text 85*0Sstevel@tonic-gate 86*0Sstevel@tonic-gateLet's suppose you've got to read in a data file like this: 87*0Sstevel@tonic-gate 88*0Sstevel@tonic-gate Date |Description | Income|Expenditure 89*0Sstevel@tonic-gate 01/24/2001 Ahmed's Camel Emporium 1147.99 90*0Sstevel@tonic-gate 01/28/2001 Flea spray 24.99 91*0Sstevel@tonic-gate 01/29/2001 Camel rides to tourists 235.00 92*0Sstevel@tonic-gate 93*0Sstevel@tonic-gateHow do we do it? You might think first to use C<split>; however, since 94*0Sstevel@tonic-gateC<split> collapses blank fields, you'll never know whether a record was 95*0Sstevel@tonic-gateincome or expenditure. Oops. Well, you could always use C<substr>: 96*0Sstevel@tonic-gate 97*0Sstevel@tonic-gate while (<>) { 98*0Sstevel@tonic-gate my $date = substr($_, 0, 11); 99*0Sstevel@tonic-gate my $desc = substr($_, 12, 27); 100*0Sstevel@tonic-gate my $income = substr($_, 40, 7); 101*0Sstevel@tonic-gate my $expend = substr($_, 52, 7); 102*0Sstevel@tonic-gate ... 103*0Sstevel@tonic-gate } 104*0Sstevel@tonic-gate 105*0Sstevel@tonic-gateIt's not really a barrel of laughs, is it? In fact, it's worse than it 106*0Sstevel@tonic-gatemay seem; the eagle-eyed may notice that the first field should only be 107*0Sstevel@tonic-gate10 characters wide, and the error has propagated right through the other 108*0Sstevel@tonic-gatenumbers - which we've had to count by hand. So it's error-prone as well 109*0Sstevel@tonic-gateas horribly unfriendly. 110*0Sstevel@tonic-gate 111*0Sstevel@tonic-gateOr maybe we could use regular expressions: 112*0Sstevel@tonic-gate 113*0Sstevel@tonic-gate while (<>) { 114*0Sstevel@tonic-gate my($date, $desc, $income, $expend) = 115*0Sstevel@tonic-gate m|(\d\d/\d\d/\d{4}) (.{27}) (.{7})(.*)|; 116*0Sstevel@tonic-gate ... 117*0Sstevel@tonic-gate } 118*0Sstevel@tonic-gate 119*0Sstevel@tonic-gateUrgh. Well, it's a bit better, but - well, would you want to maintain 120*0Sstevel@tonic-gatethat? 121*0Sstevel@tonic-gate 122*0Sstevel@tonic-gateHey, isn't Perl supposed to make this sort of thing easy? Well, it does, 123*0Sstevel@tonic-gateif you use the right tools. C<pack> and C<unpack> are designed to help 124*0Sstevel@tonic-gateyou out when dealing with fixed-width data like the above. Let's have a 125*0Sstevel@tonic-gatelook at a solution with C<unpack>: 126*0Sstevel@tonic-gate 127*0Sstevel@tonic-gate while (<>) { 128*0Sstevel@tonic-gate my($date, $desc, $income, $expend) = unpack("A10xA27xA7A*", $_); 129*0Sstevel@tonic-gate ... 130*0Sstevel@tonic-gate } 131*0Sstevel@tonic-gate 132*0Sstevel@tonic-gateThat looks a bit nicer; but we've got to take apart that weird template. 133*0Sstevel@tonic-gateWhere did I pull that out of? 134*0Sstevel@tonic-gate 135*0Sstevel@tonic-gateOK, let's have a look at some of our data again; in fact, we'll include 136*0Sstevel@tonic-gatethe headers, and a handy ruler so we can keep track of where we are. 137*0Sstevel@tonic-gate 138*0Sstevel@tonic-gate 1 2 3 4 5 139*0Sstevel@tonic-gate 1234567890123456789012345678901234567890123456789012345678 140*0Sstevel@tonic-gate Date |Description | Income|Expenditure 141*0Sstevel@tonic-gate 01/28/2001 Flea spray 24.99 142*0Sstevel@tonic-gate 01/29/2001 Camel rides to tourists 235.00 143*0Sstevel@tonic-gate 144*0Sstevel@tonic-gateFrom this, we can see that the date column stretches from column 1 to 145*0Sstevel@tonic-gatecolumn 10 - ten characters wide. The C<pack>-ese for "character" is 146*0Sstevel@tonic-gateC<A>, and ten of them are C<A10>. So if we just wanted to extract the 147*0Sstevel@tonic-gatedates, we could say this: 148*0Sstevel@tonic-gate 149*0Sstevel@tonic-gate my($date) = unpack("A10", $_); 150*0Sstevel@tonic-gate 151*0Sstevel@tonic-gateOK, what's next? Between the date and the description is a blank column; 152*0Sstevel@tonic-gatewe want to skip over that. The C<x> template means "skip forward", so we 153*0Sstevel@tonic-gatewant one of those. Next, we have another batch of characters, from 12 to 154*0Sstevel@tonic-gate38. That's 27 more characters, hence C<A27>. (Don't make the fencepost 155*0Sstevel@tonic-gateerror - there are 27 characters between 12 and 38, not 26. Count 'em!) 156*0Sstevel@tonic-gate 157*0Sstevel@tonic-gateNow we skip another character and pick up the next 7 characters: 158*0Sstevel@tonic-gate 159*0Sstevel@tonic-gate my($date,$description,$income) = unpack("A10xA27xA7", $_); 160*0Sstevel@tonic-gate 161*0Sstevel@tonic-gateNow comes the clever bit. Lines in our ledger which are just income and 162*0Sstevel@tonic-gatenot expenditure might end at column 46. Hence, we don't want to tell our 163*0Sstevel@tonic-gateC<unpack> pattern that we B<need> to find another 12 characters; we'll 164*0Sstevel@tonic-gatejust say "if there's anything left, take it". As you might guess from 165*0Sstevel@tonic-gateregular expressions, that's what the C<*> means: "use everything 166*0Sstevel@tonic-gateremaining". 167*0Sstevel@tonic-gate 168*0Sstevel@tonic-gate=over 3 169*0Sstevel@tonic-gate 170*0Sstevel@tonic-gate=item * 171*0Sstevel@tonic-gate 172*0Sstevel@tonic-gateBe warned, though, that unlike regular expressions, if the C<unpack> 173*0Sstevel@tonic-gatetemplate doesn't match the incoming data, Perl will scream and die. 174*0Sstevel@tonic-gate 175*0Sstevel@tonic-gate=back 176*0Sstevel@tonic-gate 177*0Sstevel@tonic-gate 178*0Sstevel@tonic-gateHence, putting it all together: 179*0Sstevel@tonic-gate 180*0Sstevel@tonic-gate my($date,$description,$income,$expend) = unpack("A10xA27xA7xA*", $_); 181*0Sstevel@tonic-gate 182*0Sstevel@tonic-gateNow, that's our data parsed. I suppose what we might want to do now is 183*0Sstevel@tonic-gatetotal up our income and expenditure, and add another line to the end of 184*0Sstevel@tonic-gateour ledger - in the same format - saying how much we've brought in and 185*0Sstevel@tonic-gatehow much we've spent: 186*0Sstevel@tonic-gate 187*0Sstevel@tonic-gate while (<>) { 188*0Sstevel@tonic-gate my($date, $desc, $income, $expend) = unpack("A10xA27xA7xA*", $_); 189*0Sstevel@tonic-gate $tot_income += $income; 190*0Sstevel@tonic-gate $tot_expend += $expend; 191*0Sstevel@tonic-gate } 192*0Sstevel@tonic-gate 193*0Sstevel@tonic-gate $tot_income = sprintf("%.2f", $tot_income); # Get them into 194*0Sstevel@tonic-gate $tot_expend = sprintf("%.2f", $tot_expend); # "financial" format 195*0Sstevel@tonic-gate 196*0Sstevel@tonic-gate $date = POSIX::strftime("%m/%d/%Y", localtime); 197*0Sstevel@tonic-gate 198*0Sstevel@tonic-gate # OK, let's go: 199*0Sstevel@tonic-gate 200*0Sstevel@tonic-gate print pack("A10xA27xA7xA*", $date, "Totals", $tot_income, $tot_expend); 201*0Sstevel@tonic-gate 202*0Sstevel@tonic-gateOh, hmm. That didn't quite work. Let's see what happened: 203*0Sstevel@tonic-gate 204*0Sstevel@tonic-gate 01/24/2001 Ahmed's Camel Emporium 1147.99 205*0Sstevel@tonic-gate 01/28/2001 Flea spray 24.99 206*0Sstevel@tonic-gate 01/29/2001 Camel rides to tourists 1235.00 207*0Sstevel@tonic-gate 03/23/2001Totals 1235.001172.98 208*0Sstevel@tonic-gate 209*0Sstevel@tonic-gateOK, it's a start, but what happened to the spaces? We put C<x>, didn't 210*0Sstevel@tonic-gatewe? Shouldn't it skip forward? Let's look at what L<perlfunc/pack> says: 211*0Sstevel@tonic-gate 212*0Sstevel@tonic-gate x A null byte. 213*0Sstevel@tonic-gate 214*0Sstevel@tonic-gateUrgh. No wonder. There's a big difference between "a null byte", 215*0Sstevel@tonic-gatecharacter zero, and "a space", character 32. Perl's put something 216*0Sstevel@tonic-gatebetween the date and the description - but unfortunately, we can't see 217*0Sstevel@tonic-gateit! 218*0Sstevel@tonic-gate 219*0Sstevel@tonic-gateWhat we actually need to do is expand the width of the fields. The C<A> 220*0Sstevel@tonic-gateformat pads any non-existent characters with spaces, so we can use the 221*0Sstevel@tonic-gateadditional spaces to line up our fields, like this: 222*0Sstevel@tonic-gate 223*0Sstevel@tonic-gate print pack("A11 A28 A8 A*", $date, "Totals", $tot_income, $tot_expend); 224*0Sstevel@tonic-gate 225*0Sstevel@tonic-gate(Note that you can put spaces in the template to make it more readable, 226*0Sstevel@tonic-gatebut they don't translate to spaces in the output.) Here's what we got 227*0Sstevel@tonic-gatethis time: 228*0Sstevel@tonic-gate 229*0Sstevel@tonic-gate 01/24/2001 Ahmed's Camel Emporium 1147.99 230*0Sstevel@tonic-gate 01/28/2001 Flea spray 24.99 231*0Sstevel@tonic-gate 01/29/2001 Camel rides to tourists 1235.00 232*0Sstevel@tonic-gate 03/23/2001 Totals 1235.00 1172.98 233*0Sstevel@tonic-gate 234*0Sstevel@tonic-gateThat's a bit better, but we still have that last column which needs to 235*0Sstevel@tonic-gatebe moved further over. There's an easy way to fix this up: 236*0Sstevel@tonic-gateunfortunately, we can't get C<pack> to right-justify our fields, but we 237*0Sstevel@tonic-gatecan get C<sprintf> to do it: 238*0Sstevel@tonic-gate 239*0Sstevel@tonic-gate $tot_income = sprintf("%.2f", $tot_income); 240*0Sstevel@tonic-gate $tot_expend = sprintf("%12.2f", $tot_expend); 241*0Sstevel@tonic-gate $date = POSIX::strftime("%m/%d/%Y", localtime); 242*0Sstevel@tonic-gate print pack("A11 A28 A8 A*", $date, "Totals", $tot_income, $tot_expend); 243*0Sstevel@tonic-gate 244*0Sstevel@tonic-gateThis time we get the right answer: 245*0Sstevel@tonic-gate 246*0Sstevel@tonic-gate 01/28/2001 Flea spray 24.99 247*0Sstevel@tonic-gate 01/29/2001 Camel rides to tourists 1235.00 248*0Sstevel@tonic-gate 03/23/2001 Totals 1235.00 1172.98 249*0Sstevel@tonic-gate 250*0Sstevel@tonic-gateSo that's how we consume and produce fixed-width data. Let's recap what 251*0Sstevel@tonic-gatewe've seen of C<pack> and C<unpack> so far: 252*0Sstevel@tonic-gate 253*0Sstevel@tonic-gate=over 3 254*0Sstevel@tonic-gate 255*0Sstevel@tonic-gate=item * 256*0Sstevel@tonic-gate 257*0Sstevel@tonic-gateUse C<pack> to go from several pieces of data to one fixed-width 258*0Sstevel@tonic-gateversion; use C<unpack> to turn a fixed-width-format string into several 259*0Sstevel@tonic-gatepieces of data. 260*0Sstevel@tonic-gate 261*0Sstevel@tonic-gate=item * 262*0Sstevel@tonic-gate 263*0Sstevel@tonic-gateThe pack format C<A> means "any character"; if you're C<pack>ing and 264*0Sstevel@tonic-gateyou've run out of things to pack, C<pack> will fill the rest up with 265*0Sstevel@tonic-gatespaces. 266*0Sstevel@tonic-gate 267*0Sstevel@tonic-gate=item * 268*0Sstevel@tonic-gate 269*0Sstevel@tonic-gateC<x> means "skip a byte" when C<unpack>ing; when C<pack>ing, it means 270*0Sstevel@tonic-gate"introduce a null byte" - that's probably not what you mean if you're 271*0Sstevel@tonic-gatedealing with plain text. 272*0Sstevel@tonic-gate 273*0Sstevel@tonic-gate=item * 274*0Sstevel@tonic-gate 275*0Sstevel@tonic-gateYou can follow the formats with numbers to say how many characters 276*0Sstevel@tonic-gateshould be affected by that format: C<A12> means "take 12 characters"; 277*0Sstevel@tonic-gateC<x6> means "skip 6 bytes" or "character 0, 6 times". 278*0Sstevel@tonic-gate 279*0Sstevel@tonic-gate=item * 280*0Sstevel@tonic-gate 281*0Sstevel@tonic-gateInstead of a number, you can use C<*> to mean "consume everything else 282*0Sstevel@tonic-gateleft". 283*0Sstevel@tonic-gate 284*0Sstevel@tonic-gateB<Warning>: when packing multiple pieces of data, C<*> only means 285*0Sstevel@tonic-gate"consume all of the current piece of data". That's to say 286*0Sstevel@tonic-gate 287*0Sstevel@tonic-gate pack("A*A*", $one, $two) 288*0Sstevel@tonic-gate 289*0Sstevel@tonic-gatepacks all of C<$one> into the first C<A*> and then all of C<$two> into 290*0Sstevel@tonic-gatethe second. This is a general principle: each format character 291*0Sstevel@tonic-gatecorresponds to one piece of data to be C<pack>ed. 292*0Sstevel@tonic-gate 293*0Sstevel@tonic-gate=back 294*0Sstevel@tonic-gate 295*0Sstevel@tonic-gate 296*0Sstevel@tonic-gate 297*0Sstevel@tonic-gate=head1 Packing Numbers 298*0Sstevel@tonic-gate 299*0Sstevel@tonic-gateSo much for textual data. Let's get onto the meaty stuff that C<pack> 300*0Sstevel@tonic-gateand C<unpack> are best at: handling binary formats for numbers. There is, 301*0Sstevel@tonic-gateof course, not just one binary format - life would be too simple - but 302*0Sstevel@tonic-gatePerl will do all the finicky labor for you. 303*0Sstevel@tonic-gate 304*0Sstevel@tonic-gate 305*0Sstevel@tonic-gate=head2 Integers 306*0Sstevel@tonic-gate 307*0Sstevel@tonic-gatePacking and unpacking numbers implies conversion to and from some 308*0Sstevel@tonic-gateI<specific> binary representation. Leaving floating point numbers 309*0Sstevel@tonic-gateaside for the moment, the salient properties of any such representation 310*0Sstevel@tonic-gateare: 311*0Sstevel@tonic-gate 312*0Sstevel@tonic-gate=over 4 313*0Sstevel@tonic-gate 314*0Sstevel@tonic-gate=item * 315*0Sstevel@tonic-gate 316*0Sstevel@tonic-gatethe number of bytes used for storing the integer, 317*0Sstevel@tonic-gate 318*0Sstevel@tonic-gate=item * 319*0Sstevel@tonic-gate 320*0Sstevel@tonic-gatewhether the contents are interpreted as a signed or unsigned number, 321*0Sstevel@tonic-gate 322*0Sstevel@tonic-gate=item * 323*0Sstevel@tonic-gate 324*0Sstevel@tonic-gatethe byte ordering: whether the first byte is the least or most 325*0Sstevel@tonic-gatesignificant byte (or: little-endian or big-endian, respectively). 326*0Sstevel@tonic-gate 327*0Sstevel@tonic-gate=back 328*0Sstevel@tonic-gate 329*0Sstevel@tonic-gateSo, for instance, to pack 20302 to a signed 16 bit integer in your 330*0Sstevel@tonic-gatecomputer's representation you write 331*0Sstevel@tonic-gate 332*0Sstevel@tonic-gate my $ps = pack( 's', 20302 ); 333*0Sstevel@tonic-gate 334*0Sstevel@tonic-gateAgain, the result is a string, now containing 2 bytes. If you print 335*0Sstevel@tonic-gatethis string (which is, generally, not recommended) you might see 336*0Sstevel@tonic-gateC<ON> or C<NO> (depending on your system's byte ordering) - or something 337*0Sstevel@tonic-gateentirely different if your computer doesn't use ASCII character encoding. 338*0Sstevel@tonic-gateUnpacking C<$ps> with the same template returns the original integer value: 339*0Sstevel@tonic-gate 340*0Sstevel@tonic-gate my( $s ) = unpack( 's', $ps ); 341*0Sstevel@tonic-gate 342*0Sstevel@tonic-gateThis is true for all numeric template codes. But don't expect miracles: 343*0Sstevel@tonic-gateif the packed value exceeds the allotted byte capacity, high order bits 344*0Sstevel@tonic-gateare silently discarded, and unpack certainly won't be able to pull them 345*0Sstevel@tonic-gateback out of some magic hat. And, when you pack using a signed template 346*0Sstevel@tonic-gatecode such as C<s>, an excess value may result in the sign bit 347*0Sstevel@tonic-gategetting set, and unpacking this will smartly return a negative value. 348*0Sstevel@tonic-gate 349*0Sstevel@tonic-gate16 bits won't get you too far with integers, but there is C<l> and C<L> 350*0Sstevel@tonic-gatefor signed and unsigned 32-bit integers. And if this is not enough and 351*0Sstevel@tonic-gateyour system supports 64 bit integers you can push the limits much closer 352*0Sstevel@tonic-gateto infinity with pack codes C<q> and C<Q>. A notable exception is provided 353*0Sstevel@tonic-gateby pack codes C<i> and C<I> for signed and unsigned integers of the 354*0Sstevel@tonic-gate"local custom" variety: Such an integer will take up as many bytes as 355*0Sstevel@tonic-gatea local C compiler returns for C<sizeof(int)>, but it'll use I<at least> 356*0Sstevel@tonic-gate32 bits. 357*0Sstevel@tonic-gate 358*0Sstevel@tonic-gateEach of the integer pack codes C<sSlLqQ> results in a fixed number of bytes, 359*0Sstevel@tonic-gateno matter where you execute your program. This may be useful for some 360*0Sstevel@tonic-gateapplications, but it does not provide for a portable way to pass data 361*0Sstevel@tonic-gatestructures between Perl and C programs (bound to happen when you call 362*0Sstevel@tonic-gateXS extensions or the Perl function C<syscall>), or when you read or 363*0Sstevel@tonic-gatewrite binary files. What you'll need in this case are template codes that 364*0Sstevel@tonic-gatedepend on what your local C compiler compiles when you code C<short> or 365*0Sstevel@tonic-gateC<unsigned long>, for instance. These codes and their corresponding 366*0Sstevel@tonic-gatebyte lengths are shown in the table below. Since the C standard leaves 367*0Sstevel@tonic-gatemuch leeway with respect to the relative sizes of these data types, actual 368*0Sstevel@tonic-gatevalues may vary, and that's why the values are given as expressions in 369*0Sstevel@tonic-gateC and Perl. (If you'd like to use values from C<%Config> in your program 370*0Sstevel@tonic-gateyou have to import it with C<use Config>.) 371*0Sstevel@tonic-gate 372*0Sstevel@tonic-gate signed unsigned byte length in C byte length in Perl 373*0Sstevel@tonic-gate s! S! sizeof(short) $Config{shortsize} 374*0Sstevel@tonic-gate i! I! sizeof(int) $Config{intsize} 375*0Sstevel@tonic-gate l! L! sizeof(long) $Config{longsize} 376*0Sstevel@tonic-gate q! Q! sizeof(long long) $Config{longlongsize} 377*0Sstevel@tonic-gate 378*0Sstevel@tonic-gateThe C<i!> and C<I!> codes aren't different from C<i> and C<I>; they are 379*0Sstevel@tonic-gatetolerated for completeness' sake. 380*0Sstevel@tonic-gate 381*0Sstevel@tonic-gate 382*0Sstevel@tonic-gate=head2 Unpacking a Stack Frame 383*0Sstevel@tonic-gate 384*0Sstevel@tonic-gateRequesting a particular byte ordering may be necessary when you work with 385*0Sstevel@tonic-gatebinary data coming from some specific architecture whereas your program could 386*0Sstevel@tonic-gaterun on a totally different system. As an example, assume you have 24 bytes 387*0Sstevel@tonic-gatecontaining a stack frame as it happens on an Intel 8086: 388*0Sstevel@tonic-gate 389*0Sstevel@tonic-gate +---------+ +----+----+ +---------+ 390*0Sstevel@tonic-gate TOS: | IP | TOS+4:| FL | FH | FLAGS TOS+14:| SI | 391*0Sstevel@tonic-gate +---------+ +----+----+ +---------+ 392*0Sstevel@tonic-gate | CS | | AL | AH | AX | DI | 393*0Sstevel@tonic-gate +---------+ +----+----+ +---------+ 394*0Sstevel@tonic-gate | BL | BH | BX | BP | 395*0Sstevel@tonic-gate +----+----+ +---------+ 396*0Sstevel@tonic-gate | CL | CH | CX | DS | 397*0Sstevel@tonic-gate +----+----+ +---------+ 398*0Sstevel@tonic-gate | DL | DH | DX | ES | 399*0Sstevel@tonic-gate +----+----+ +---------+ 400*0Sstevel@tonic-gate 401*0Sstevel@tonic-gateFirst, we note that this time-honored 16-bit CPU uses little-endian order, 402*0Sstevel@tonic-gateand that's why the low order byte is stored at the lower address. To 403*0Sstevel@tonic-gateunpack such a (signed) short we'll have to use code C<v>. A repeat 404*0Sstevel@tonic-gatecount unpacks all 12 shorts: 405*0Sstevel@tonic-gate 406*0Sstevel@tonic-gate my( $ip, $cs, $flags, $ax, $bx, $cd, $dx, $si, $di, $bp, $ds, $es ) = 407*0Sstevel@tonic-gate unpack( 'v12', $frame ); 408*0Sstevel@tonic-gate 409*0Sstevel@tonic-gateAlternatively, we could have used C<C> to unpack the individually 410*0Sstevel@tonic-gateaccessible byte registers FL, FH, AL, AH, etc.: 411*0Sstevel@tonic-gate 412*0Sstevel@tonic-gate my( $fl, $fh, $al, $ah, $bl, $bh, $cl, $ch, $dl, $dh ) = 413*0Sstevel@tonic-gate unpack( 'C10', substr( $frame, 4, 10 ) ); 414*0Sstevel@tonic-gate 415*0Sstevel@tonic-gateIt would be nice if we could do this in one fell swoop: unpack a short, 416*0Sstevel@tonic-gateback up a little, and then unpack 2 bytes. Since Perl I<is> nice, it 417*0Sstevel@tonic-gateproffers the template code C<X> to back up one byte. Putting this all 418*0Sstevel@tonic-gatetogether, we may now write: 419*0Sstevel@tonic-gate 420*0Sstevel@tonic-gate my( $ip, $cs, 421*0Sstevel@tonic-gate $flags,$fl,$fh, 422*0Sstevel@tonic-gate $ax,$al,$ah, $bx,$bl,$bh, $cx,$cl,$ch, $dx,$dl,$dh, 423*0Sstevel@tonic-gate $si, $di, $bp, $ds, $es ) = 424*0Sstevel@tonic-gate unpack( 'v2' . ('vXXCC' x 5) . 'v5', $frame ); 425*0Sstevel@tonic-gate 426*0Sstevel@tonic-gate(The clumsy construction of the template can be avoided - just read on!) 427*0Sstevel@tonic-gate 428*0Sstevel@tonic-gateWe've taken some pains to construct the template so that it matches 429*0Sstevel@tonic-gatethe contents of our frame buffer. Otherwise we'd either get undefined values, 430*0Sstevel@tonic-gateor C<unpack> could not unpack all. If C<pack> runs out of items, it will 431*0Sstevel@tonic-gatesupply null strings (which are coerced into zeroes whenever the pack code 432*0Sstevel@tonic-gatesays so). 433*0Sstevel@tonic-gate 434*0Sstevel@tonic-gate 435*0Sstevel@tonic-gate=head2 How to Eat an Egg on a Net 436*0Sstevel@tonic-gate 437*0Sstevel@tonic-gateThe pack code for big-endian (high order byte at the lowest address) is 438*0Sstevel@tonic-gateC<n> for 16 bit and C<N> for 32 bit integers. You use these codes 439*0Sstevel@tonic-gateif you know that your data comes from a compliant architecture, but, 440*0Sstevel@tonic-gatesurprisingly enough, you should also use these pack codes if you 441*0Sstevel@tonic-gateexchange binary data, across the network, with some system that you 442*0Sstevel@tonic-gateknow next to nothing about. The simple reason is that this 443*0Sstevel@tonic-gateorder has been chosen as the I<network order>, and all standard-fearing 444*0Sstevel@tonic-gateprograms ought to follow this convention. (This is, of course, a stern 445*0Sstevel@tonic-gatebacking for one of the Lilliputian parties and may well influence the 446*0Sstevel@tonic-gatepolitical development there.) So, if the protocol expects you to send 447*0Sstevel@tonic-gatea message by sending the length first, followed by just so many bytes, 448*0Sstevel@tonic-gateyou could write: 449*0Sstevel@tonic-gate 450*0Sstevel@tonic-gate my $buf = pack( 'N', length( $msg ) ) . $msg; 451*0Sstevel@tonic-gate 452*0Sstevel@tonic-gateor even: 453*0Sstevel@tonic-gate 454*0Sstevel@tonic-gate my $buf = pack( 'NA*', length( $msg ), $msg ); 455*0Sstevel@tonic-gate 456*0Sstevel@tonic-gateand pass C<$buf> to your send routine. Some protocols demand that the 457*0Sstevel@tonic-gatecount should include the length of the count itself: then just add 4 458*0Sstevel@tonic-gateto the data length. (But make sure to read L<"Lengths and Widths"> before 459*0Sstevel@tonic-gateyou really code this!) 460*0Sstevel@tonic-gate 461*0Sstevel@tonic-gate 462*0Sstevel@tonic-gate 463*0Sstevel@tonic-gate=head2 Floating point Numbers 464*0Sstevel@tonic-gate 465*0Sstevel@tonic-gateFor packing floating point numbers you have the choice between the 466*0Sstevel@tonic-gatepack codes C<f> and C<d> which pack into (or unpack from) single-precision or 467*0Sstevel@tonic-gatedouble-precision representation as it is provided by your system. (There 468*0Sstevel@tonic-gateis no such thing as a network representation for reals, so if you want 469*0Sstevel@tonic-gateto send your real numbers across computer boundaries, you'd better stick 470*0Sstevel@tonic-gateto ASCII representation, unless you're absolutely sure what's on the other 471*0Sstevel@tonic-gateend of the line.) 472*0Sstevel@tonic-gate 473*0Sstevel@tonic-gate 474*0Sstevel@tonic-gate 475*0Sstevel@tonic-gate=head1 Exotic Templates 476*0Sstevel@tonic-gate 477*0Sstevel@tonic-gate 478*0Sstevel@tonic-gate=head2 Bit Strings 479*0Sstevel@tonic-gate 480*0Sstevel@tonic-gateBits are the atoms in the memory world. Access to individual bits may 481*0Sstevel@tonic-gatehave to be used either as a last resort or because it is the most 482*0Sstevel@tonic-gateconvenient way to handle your data. Bit string (un)packing converts 483*0Sstevel@tonic-gatebetween strings containing a series of C<0> and C<1> characters and 484*0Sstevel@tonic-gatea sequence of bytes each containing a group of 8 bits. This is almost 485*0Sstevel@tonic-gateas simple as it sounds, except that there are two ways the contents of 486*0Sstevel@tonic-gatea byte may be written as a bit string. Let's have a look at an annotated 487*0Sstevel@tonic-gatebyte: 488*0Sstevel@tonic-gate 489*0Sstevel@tonic-gate 7 6 5 4 3 2 1 0 490*0Sstevel@tonic-gate +-----------------+ 491*0Sstevel@tonic-gate | 1 0 0 0 1 1 0 0 | 492*0Sstevel@tonic-gate +-----------------+ 493*0Sstevel@tonic-gate MSB LSB 494*0Sstevel@tonic-gate 495*0Sstevel@tonic-gateIt's egg-eating all over again: Some think that as a bit string this should 496*0Sstevel@tonic-gatebe written "10001100" i.e. beginning with the most significant bit, others 497*0Sstevel@tonic-gateinsist on "00110001". Well, Perl isn't biased, so that's why we have two bit 498*0Sstevel@tonic-gatestring codes: 499*0Sstevel@tonic-gate 500*0Sstevel@tonic-gate $byte = pack( 'B8', '10001100' ); # start with MSB 501*0Sstevel@tonic-gate $byte = pack( 'b8', '00110001' ); # start with LSB 502*0Sstevel@tonic-gate 503*0Sstevel@tonic-gateIt is not possible to pack or unpack bit fields - just integral bytes. 504*0Sstevel@tonic-gateC<pack> always starts at the next byte boundary and "rounds up" to the 505*0Sstevel@tonic-gatenext multiple of 8 by adding zero bits as required. (If you do want bit 506*0Sstevel@tonic-gatefields, there is L<perlfunc/vec>. Or you could implement bit field 507*0Sstevel@tonic-gatehandling at the character string level, using split, substr, and 508*0Sstevel@tonic-gateconcatenation on unpacked bit strings.) 509*0Sstevel@tonic-gate 510*0Sstevel@tonic-gateTo illustrate unpacking for bit strings, we'll decompose a simple 511*0Sstevel@tonic-gatestatus register (a "-" stands for a "reserved" bit): 512*0Sstevel@tonic-gate 513*0Sstevel@tonic-gate +-----------------+-----------------+ 514*0Sstevel@tonic-gate | S Z - A - P - C | - - - - O D I T | 515*0Sstevel@tonic-gate +-----------------+-----------------+ 516*0Sstevel@tonic-gate MSB LSB MSB LSB 517*0Sstevel@tonic-gate 518*0Sstevel@tonic-gateConverting these two bytes to a string can be done with the unpack 519*0Sstevel@tonic-gatetemplate C<'b16'>. To obtain the individual bit values from the bit 520*0Sstevel@tonic-gatestring we use C<split> with the "empty" separator pattern which dissects 521*0Sstevel@tonic-gateinto individual characters. Bit values from the "reserved" positions are 522*0Sstevel@tonic-gatesimply assigned to C<undef>, a convenient notation for "I don't care where 523*0Sstevel@tonic-gatethis goes". 524*0Sstevel@tonic-gate 525*0Sstevel@tonic-gate ($carry, undef, $parity, undef, $auxcarry, undef, $zero, $sign, 526*0Sstevel@tonic-gate $trace, $interrupt, $direction, $overflow) = 527*0Sstevel@tonic-gate split( //, unpack( 'b16', $status ) ); 528*0Sstevel@tonic-gate 529*0Sstevel@tonic-gateWe could have used an unpack template C<'b12'> just as well, since the 530*0Sstevel@tonic-gatelast 4 bits can be ignored anyway. 531*0Sstevel@tonic-gate 532*0Sstevel@tonic-gate 533*0Sstevel@tonic-gate=head2 Uuencoding 534*0Sstevel@tonic-gate 535*0Sstevel@tonic-gateAnother odd-man-out in the template alphabet is C<u>, which packs an 536*0Sstevel@tonic-gate"uuencoded string". ("uu" is short for Unix-to-Unix.) Chances are that 537*0Sstevel@tonic-gateyou won't ever need this encoding technique which was invented to overcome 538*0Sstevel@tonic-gatethe shortcomings of old-fashioned transmission mediums that do not support 539*0Sstevel@tonic-gateother than simple ASCII data. The essential recipe is simple: Take three 540*0Sstevel@tonic-gatebytes, or 24 bits. Split them into 4 six-packs, adding a space (0x20) to 541*0Sstevel@tonic-gateeach. Repeat until all of the data is blended. Fold groups of 4 bytes into 542*0Sstevel@tonic-gatelines no longer than 60 and garnish them in front with the original byte count 543*0Sstevel@tonic-gate(incremented by 0x20) and a C<"\n"> at the end. - The C<pack> chef will 544*0Sstevel@tonic-gateprepare this for you, a la minute, when you select pack code C<u> on the menu: 545*0Sstevel@tonic-gate 546*0Sstevel@tonic-gate my $uubuf = pack( 'u', $bindat ); 547*0Sstevel@tonic-gate 548*0Sstevel@tonic-gateA repeat count after C<u> sets the number of bytes to put into an 549*0Sstevel@tonic-gateuuencoded line, which is the maximum of 45 by default, but could be 550*0Sstevel@tonic-gateset to some (smaller) integer multiple of three. C<unpack> simply ignores 551*0Sstevel@tonic-gatethe repeat count. 552*0Sstevel@tonic-gate 553*0Sstevel@tonic-gate 554*0Sstevel@tonic-gate=head2 Doing Sums 555*0Sstevel@tonic-gate 556*0Sstevel@tonic-gateAn even stranger template code is C<%>E<lt>I<number>E<gt>. First, because 557*0Sstevel@tonic-gateit's used as a prefix to some other template code. Second, because it 558*0Sstevel@tonic-gatecannot be used in C<pack> at all, and third, in C<unpack>, doesn't return the 559*0Sstevel@tonic-gatedata as defined by the template code it precedes. Instead it'll give you an 560*0Sstevel@tonic-gateinteger of I<number> bits that is computed from the data value by 561*0Sstevel@tonic-gatedoing sums. For numeric unpack codes, no big feat is achieved: 562*0Sstevel@tonic-gate 563*0Sstevel@tonic-gate my $buf = pack( 'iii', 100, 20, 3 ); 564*0Sstevel@tonic-gate print unpack( '%32i3', $buf ), "\n"; # prints 123 565*0Sstevel@tonic-gate 566*0Sstevel@tonic-gateFor string values, C<%> returns the sum of the byte values saving 567*0Sstevel@tonic-gateyou the trouble of a sum loop with C<substr> and C<ord>: 568*0Sstevel@tonic-gate 569*0Sstevel@tonic-gate print unpack( '%32A*', "\x01\x10" ), "\n"; # prints 17 570*0Sstevel@tonic-gate 571*0Sstevel@tonic-gateAlthough the C<%> code is documented as returning a "checksum": 572*0Sstevel@tonic-gatedon't put your trust in such values! Even when applied to a small number 573*0Sstevel@tonic-gateof bytes, they won't guarantee a noticeable Hamming distance. 574*0Sstevel@tonic-gate 575*0Sstevel@tonic-gateIn connection with C<b> or C<B>, C<%> simply adds bits, and this can be put 576*0Sstevel@tonic-gateto good use to count set bits efficiently: 577*0Sstevel@tonic-gate 578*0Sstevel@tonic-gate my $bitcount = unpack( '%32b*', $mask ); 579*0Sstevel@tonic-gate 580*0Sstevel@tonic-gateAnd an even parity bit can be determined like this: 581*0Sstevel@tonic-gate 582*0Sstevel@tonic-gate my $evenparity = unpack( '%1b*', $mask ); 583*0Sstevel@tonic-gate 584*0Sstevel@tonic-gate 585*0Sstevel@tonic-gate=head2 Unicode 586*0Sstevel@tonic-gate 587*0Sstevel@tonic-gateUnicode is a character set that can represent most characters in most of 588*0Sstevel@tonic-gatethe world's languages, providing room for over one million different 589*0Sstevel@tonic-gatecharacters. Unicode 3.1 specifies 94,140 characters: The Basic Latin 590*0Sstevel@tonic-gatecharacters are assigned to the numbers 0 - 127. The Latin-1 Supplement with 591*0Sstevel@tonic-gatecharacters that are used in several European languages is in the next 592*0Sstevel@tonic-gaterange, up to 255. After some more Latin extensions we find the character 593*0Sstevel@tonic-gatesets from languages using non-Roman alphabets, interspersed with a 594*0Sstevel@tonic-gatevariety of symbol sets such as currency symbols, Zapf Dingbats or Braille. 595*0Sstevel@tonic-gate(You might want to visit L<www.unicode.org> for a look at some of 596*0Sstevel@tonic-gatethem - my personal favourites are Telugu and Kannada.) 597*0Sstevel@tonic-gate 598*0Sstevel@tonic-gateThe Unicode character sets associates characters with integers. Encoding 599*0Sstevel@tonic-gatethese numbers in an equal number of bytes would more than double the 600*0Sstevel@tonic-gaterequirements for storing texts written in Latin alphabets. 601*0Sstevel@tonic-gateThe UTF-8 encoding avoids this by storing the most common (from a western 602*0Sstevel@tonic-gatepoint of view) characters in a single byte while encoding the rarer 603*0Sstevel@tonic-gateones in three or more bytes. 604*0Sstevel@tonic-gate 605*0Sstevel@tonic-gateSo what has this got to do with C<pack>? Well, if you want to convert 606*0Sstevel@tonic-gatebetween a Unicode number and its UTF-8 representation you can do so by 607*0Sstevel@tonic-gateusing template code C<U>. As an example, let's produce the UTF-8 608*0Sstevel@tonic-gaterepresentation of the Euro currency symbol (code number 0x20AC): 609*0Sstevel@tonic-gate 610*0Sstevel@tonic-gate $UTF8{Euro} = pack( 'U', 0x20AC ); 611*0Sstevel@tonic-gate 612*0Sstevel@tonic-gateInspecting C<$UTF8{Euro}> shows that it contains 3 bytes: "\xe2\x82\xac". The 613*0Sstevel@tonic-gateround trip can be completed with C<unpack>: 614*0Sstevel@tonic-gate 615*0Sstevel@tonic-gate $Unicode{Euro} = unpack( 'U', $UTF8{Euro} ); 616*0Sstevel@tonic-gate 617*0Sstevel@tonic-gateUsually you'll want to pack or unpack UTF-8 strings: 618*0Sstevel@tonic-gate 619*0Sstevel@tonic-gate # pack and unpack the Hebrew alphabet 620*0Sstevel@tonic-gate my $alefbet = pack( 'U*', 0x05d0..0x05ea ); 621*0Sstevel@tonic-gate my @hebrew = unpack( 'U*', $utf ); 622*0Sstevel@tonic-gate 623*0Sstevel@tonic-gate 624*0Sstevel@tonic-gate=head2 Another Portable Binary Encoding 625*0Sstevel@tonic-gate 626*0Sstevel@tonic-gateThe pack code C<w> has been added to support a portable binary data 627*0Sstevel@tonic-gateencoding scheme that goes way beyond simple integers. (Details can 628*0Sstevel@tonic-gatebe found at L<Casbah.org>, the Scarab project.) A BER (Binary Encoded 629*0Sstevel@tonic-gateRepresentation) compressed unsigned integer stores base 128 630*0Sstevel@tonic-gatedigits, most significant digit first, with as few digits as possible. 631*0Sstevel@tonic-gateBit eight (the high bit) is set on each byte except the last. There 632*0Sstevel@tonic-gateis no size limit to BER encoding, but Perl won't go to extremes. 633*0Sstevel@tonic-gate 634*0Sstevel@tonic-gate my $berbuf = pack( 'w*', 1, 128, 128+1, 128*128+127 ); 635*0Sstevel@tonic-gate 636*0Sstevel@tonic-gateA hex dump of C<$berbuf>, with spaces inserted at the right places, 637*0Sstevel@tonic-gateshows 01 8100 8101 81807F. Since the last byte is always less than 638*0Sstevel@tonic-gate128, C<unpack> knows where to stop. 639*0Sstevel@tonic-gate 640*0Sstevel@tonic-gate 641*0Sstevel@tonic-gate=head1 Template Grouping 642*0Sstevel@tonic-gate 643*0Sstevel@tonic-gatePrior to Perl 5.8, repetitions of templates had to be made by 644*0Sstevel@tonic-gateC<x>-multiplication of template strings. Now there is a better way as 645*0Sstevel@tonic-gatewe may use the pack codes C<(> and C<)> combined with a repeat count. 646*0Sstevel@tonic-gateThe C<unpack> template from the Stack Frame example can simply 647*0Sstevel@tonic-gatebe written like this: 648*0Sstevel@tonic-gate 649*0Sstevel@tonic-gate unpack( 'v2 (vXXCC)5 v5', $frame ) 650*0Sstevel@tonic-gate 651*0Sstevel@tonic-gateLet's explore this feature a little more. We'll begin with the equivalent of 652*0Sstevel@tonic-gate 653*0Sstevel@tonic-gate join( '', map( substr( $_, 0, 1 ), @str ) ) 654*0Sstevel@tonic-gate 655*0Sstevel@tonic-gatewhich returns a string consisting of the first character from each string. 656*0Sstevel@tonic-gateUsing pack, we can write 657*0Sstevel@tonic-gate 658*0Sstevel@tonic-gate pack( '(A)'.@str, @str ) 659*0Sstevel@tonic-gate 660*0Sstevel@tonic-gateor, because a repeat count C<*> means "repeat as often as required", 661*0Sstevel@tonic-gatesimply 662*0Sstevel@tonic-gate 663*0Sstevel@tonic-gate pack( '(A)*', @str ) 664*0Sstevel@tonic-gate 665*0Sstevel@tonic-gate(Note that the template C<A*> would only have packed C<$str[0]> in full 666*0Sstevel@tonic-gatelength.) 667*0Sstevel@tonic-gate 668*0Sstevel@tonic-gateTo pack dates stored as triplets ( day, month, year ) in an array C<@dates> 669*0Sstevel@tonic-gateinto a sequence of byte, byte, short integer we can write 670*0Sstevel@tonic-gate 671*0Sstevel@tonic-gate $pd = pack( '(CCS)*', map( @$_, @dates ) ); 672*0Sstevel@tonic-gate 673*0Sstevel@tonic-gateTo swap pairs of characters in a string (with even length) one could use 674*0Sstevel@tonic-gateseveral techniques. First, let's use C<x> and C<X> to skip forward and back: 675*0Sstevel@tonic-gate 676*0Sstevel@tonic-gate $s = pack( '(A)*', unpack( '(xAXXAx)*', $s ) ); 677*0Sstevel@tonic-gate 678*0Sstevel@tonic-gateWe can also use C<@> to jump to an offset, with 0 being the position where 679*0Sstevel@tonic-gatewe were when the last C<(> was encountered: 680*0Sstevel@tonic-gate 681*0Sstevel@tonic-gate $s = pack( '(A)*', unpack( '(@1A @0A @2)*', $s ) ); 682*0Sstevel@tonic-gate 683*0Sstevel@tonic-gateFinally, there is also an entirely different approach by unpacking big 684*0Sstevel@tonic-gateendian shorts and packing them in the reverse byte order: 685*0Sstevel@tonic-gate 686*0Sstevel@tonic-gate $s = pack( '(v)*', unpack( '(n)*', $s ); 687*0Sstevel@tonic-gate 688*0Sstevel@tonic-gate 689*0Sstevel@tonic-gate=head1 Lengths and Widths 690*0Sstevel@tonic-gate 691*0Sstevel@tonic-gate=head2 String Lengths 692*0Sstevel@tonic-gate 693*0Sstevel@tonic-gateIn the previous section we've seen a network message that was constructed 694*0Sstevel@tonic-gateby prefixing the binary message length to the actual message. You'll find 695*0Sstevel@tonic-gatethat packing a length followed by so many bytes of data is a 696*0Sstevel@tonic-gatefrequently used recipe since appending a null byte won't work 697*0Sstevel@tonic-gateif a null byte may be part of the data. Here is an example where both 698*0Sstevel@tonic-gatetechniques are used: after two null terminated strings with source and 699*0Sstevel@tonic-gatedestination address, a Short Message (to a mobile phone) is sent after 700*0Sstevel@tonic-gatea length byte: 701*0Sstevel@tonic-gate 702*0Sstevel@tonic-gate my $msg = pack( 'Z*Z*CA*', $src, $dst, length( $sm ), $sm ); 703*0Sstevel@tonic-gate 704*0Sstevel@tonic-gateUnpacking this message can be done with the same template: 705*0Sstevel@tonic-gate 706*0Sstevel@tonic-gate ( $src, $dst, $len, $sm ) = unpack( 'Z*Z*CA*', $msg ); 707*0Sstevel@tonic-gate 708*0Sstevel@tonic-gateThere's a subtle trap lurking in the offing: Adding another field after 709*0Sstevel@tonic-gatethe Short Message (in variable C<$sm>) is all right when packing, but this 710*0Sstevel@tonic-gatecannot be unpacked naively: 711*0Sstevel@tonic-gate 712*0Sstevel@tonic-gate # pack a message 713*0Sstevel@tonic-gate my $msg = pack( 'Z*Z*CA*C', $src, $dst, length( $sm ), $sm, $prio ); 714*0Sstevel@tonic-gate 715*0Sstevel@tonic-gate # unpack fails - $prio remains undefined! 716*0Sstevel@tonic-gate ( $src, $dst, $len, $sm, $prio ) = unpack( 'Z*Z*CA*C', $msg ); 717*0Sstevel@tonic-gate 718*0Sstevel@tonic-gateThe pack code C<A*> gobbles up all remaining bytes, and C<$prio> remains 719*0Sstevel@tonic-gateundefined! Before we let disappointment dampen the morale: Perl's got 720*0Sstevel@tonic-gatethe trump card to make this trick too, just a little further up the sleeve. 721*0Sstevel@tonic-gateWatch this: 722*0Sstevel@tonic-gate 723*0Sstevel@tonic-gate # pack a message: ASCIIZ, ASCIIZ, length/string, byte 724*0Sstevel@tonic-gate my $msg = pack( 'Z* Z* C/A* C', $src, $dst, $sm, $prio ); 725*0Sstevel@tonic-gate 726*0Sstevel@tonic-gate # unpack 727*0Sstevel@tonic-gate ( $src, $dst, $sm, $prio ) = unpack( 'Z* Z* C/A* C', $msg ); 728*0Sstevel@tonic-gate 729*0Sstevel@tonic-gateCombining two pack codes with a slash (C</>) associates them with a single 730*0Sstevel@tonic-gatevalue from the argument list. In C<pack>, the length of the argument is 731*0Sstevel@tonic-gatetaken and packed according to the first code while the argument itself 732*0Sstevel@tonic-gateis added after being converted with the template code after the slash. 733*0Sstevel@tonic-gateThis saves us the trouble of inserting the C<length> call, but it is 734*0Sstevel@tonic-gatein C<unpack> where we really score: The value of the length byte marks the 735*0Sstevel@tonic-gateend of the string to be taken from the buffer. Since this combination 736*0Sstevel@tonic-gatedoesn't make sense except when the second pack code isn't C<a*>, C<A*> 737*0Sstevel@tonic-gateor C<Z*>, Perl won't let you. 738*0Sstevel@tonic-gate 739*0Sstevel@tonic-gateThe pack code preceding C</> may be anything that's fit to represent a 740*0Sstevel@tonic-gatenumber: All the numeric binary pack codes, and even text codes such as 741*0Sstevel@tonic-gateC<A4> or C<Z*>: 742*0Sstevel@tonic-gate 743*0Sstevel@tonic-gate # pack/unpack a string preceded by its length in ASCII 744*0Sstevel@tonic-gate my $buf = pack( 'A4/A*', "Humpty-Dumpty" ); 745*0Sstevel@tonic-gate # unpack $buf: '13 Humpty-Dumpty' 746*0Sstevel@tonic-gate my $txt = unpack( 'A4/A*', $buf ); 747*0Sstevel@tonic-gate 748*0Sstevel@tonic-gateC</> is not implemented in Perls before 5.6, so if your code is required to 749*0Sstevel@tonic-gatework on older Perls you'll need to C<unpack( 'Z* Z* C')> to get the length, 750*0Sstevel@tonic-gatethen use it to make a new unpack string. For example 751*0Sstevel@tonic-gate 752*0Sstevel@tonic-gate # pack a message: ASCIIZ, ASCIIZ, length, string, byte (5.005 compatible) 753*0Sstevel@tonic-gate my $msg = pack( 'Z* Z* C A* C', $src, $dst, length $sm, $sm, $prio ); 754*0Sstevel@tonic-gate 755*0Sstevel@tonic-gate # unpack 756*0Sstevel@tonic-gate ( undef, undef, $len) = unpack( 'Z* Z* C', $msg ); 757*0Sstevel@tonic-gate ($src, $dst, $sm, $prio) = unpack ( "Z* Z* x A$len C", $msg ); 758*0Sstevel@tonic-gate 759*0Sstevel@tonic-gateBut that second C<unpack> is rushing ahead. It isn't using a simple literal 760*0Sstevel@tonic-gatestring for the template. So maybe we should introduce... 761*0Sstevel@tonic-gate 762*0Sstevel@tonic-gate=head2 Dynamic Templates 763*0Sstevel@tonic-gate 764*0Sstevel@tonic-gateSo far, we've seen literals used as templates. If the list of pack 765*0Sstevel@tonic-gateitems doesn't have fixed length, an expression constructing the 766*0Sstevel@tonic-gatetemplate is required (whenever, for some reason, C<()*> cannot be used). 767*0Sstevel@tonic-gateHere's an example: To store named string values in a way that can be 768*0Sstevel@tonic-gateconveniently parsed by a C program, we create a sequence of names and 769*0Sstevel@tonic-gatenull terminated ASCII strings, with C<=> between the name and the value, 770*0Sstevel@tonic-gatefollowed by an additional delimiting null byte. Here's how: 771*0Sstevel@tonic-gate 772*0Sstevel@tonic-gate my $env = pack( '(A*A*Z*)' . keys( %Env ) . 'C', 773*0Sstevel@tonic-gate map( { ( $_, '=', $Env{$_} ) } keys( %Env ) ), 0 ); 774*0Sstevel@tonic-gate 775*0Sstevel@tonic-gateLet's examine the cogs of this byte mill, one by one. There's the C<map> 776*0Sstevel@tonic-gatecall, creating the items we intend to stuff into the C<$env> buffer: 777*0Sstevel@tonic-gateto each key (in C<$_>) it adds the C<=> separator and the hash entry value. 778*0Sstevel@tonic-gateEach triplet is packed with the template code sequence C<A*A*Z*> that 779*0Sstevel@tonic-gateis repeated according to the number of keys. (Yes, that's what the C<keys> 780*0Sstevel@tonic-gatefunction returns in scalar context.) To get the very last null byte, 781*0Sstevel@tonic-gatewe add a C<0> at the end of the C<pack> list, to be packed with C<C>. 782*0Sstevel@tonic-gate(Attentive readers may have noticed that we could have omitted the 0.) 783*0Sstevel@tonic-gate 784*0Sstevel@tonic-gateFor the reverse operation, we'll have to determine the number of items 785*0Sstevel@tonic-gatein the buffer before we can let C<unpack> rip it apart: 786*0Sstevel@tonic-gate 787*0Sstevel@tonic-gate my $n = $env =~ tr/\0// - 1; 788*0Sstevel@tonic-gate my %env = map( split( /=/, $_ ), unpack( "(Z*)$n", $env ) ); 789*0Sstevel@tonic-gate 790*0Sstevel@tonic-gateThe C<tr> counts the null bytes. The C<unpack> call returns a list of 791*0Sstevel@tonic-gatename-value pairs each of which is taken apart in the C<map> block. 792*0Sstevel@tonic-gate 793*0Sstevel@tonic-gate 794*0Sstevel@tonic-gate=head2 Counting Repetitions 795*0Sstevel@tonic-gate 796*0Sstevel@tonic-gateRather than storing a sentinel at the end of a data item (or a list of items), 797*0Sstevel@tonic-gatewe could precede the data with a count. Again, we pack keys and values of 798*0Sstevel@tonic-gatea hash, preceding each with an unsigned short length count, and up front 799*0Sstevel@tonic-gatewe store the number of pairs: 800*0Sstevel@tonic-gate 801*0Sstevel@tonic-gate my $env = pack( 'S(S/A* S/A*)*', scalar keys( %Env ), %Env ); 802*0Sstevel@tonic-gate 803*0Sstevel@tonic-gateThis simplifies the reverse operation as the number of repetitions can be 804*0Sstevel@tonic-gateunpacked with the C</> code: 805*0Sstevel@tonic-gate 806*0Sstevel@tonic-gate my %env = unpack( 'S/(S/A* S/A*)', $env ); 807*0Sstevel@tonic-gate 808*0Sstevel@tonic-gateNote that this is one of the rare cases where you cannot use the same 809*0Sstevel@tonic-gatetemplate for C<pack> and C<unpack> because C<pack> can't determine 810*0Sstevel@tonic-gatea repeat count for a C<()>-group. 811*0Sstevel@tonic-gate 812*0Sstevel@tonic-gate 813*0Sstevel@tonic-gate=head1 Packing and Unpacking C Structures 814*0Sstevel@tonic-gate 815*0Sstevel@tonic-gateIn previous sections we have seen how to pack numbers and character 816*0Sstevel@tonic-gatestrings. If it were not for a couple of snags we could conclude this 817*0Sstevel@tonic-gatesection right away with the terse remark that C structures don't 818*0Sstevel@tonic-gatecontain anything else, and therefore you already know all there is to it. 819*0Sstevel@tonic-gateSorry, no: read on, please. 820*0Sstevel@tonic-gate 821*0Sstevel@tonic-gate=head2 The Alignment Pit 822*0Sstevel@tonic-gate 823*0Sstevel@tonic-gateIn the consideration of speed against memory requirements the balance 824*0Sstevel@tonic-gatehas been tilted in favor of faster execution. This has influenced the 825*0Sstevel@tonic-gateway C compilers allocate memory for structures: On architectures 826*0Sstevel@tonic-gatewhere a 16-bit or 32-bit operand can be moved faster between places in 827*0Sstevel@tonic-gatememory, or to or from a CPU register, if it is aligned at an even or 828*0Sstevel@tonic-gatemultiple-of-four or even at a multiple-of eight address, a C compiler 829*0Sstevel@tonic-gatewill give you this speed benefit by stuffing extra bytes into structures. 830*0Sstevel@tonic-gateIf you don't cross the C shoreline this is not likely to cause you any 831*0Sstevel@tonic-gategrief (although you should care when you design large data structures, 832*0Sstevel@tonic-gateor you want your code to be portable between architectures (you do want 833*0Sstevel@tonic-gatethat, don't you?)). 834*0Sstevel@tonic-gate 835*0Sstevel@tonic-gateTo see how this affects C<pack> and C<unpack>, we'll compare these two 836*0Sstevel@tonic-gateC structures: 837*0Sstevel@tonic-gate 838*0Sstevel@tonic-gate typedef struct { 839*0Sstevel@tonic-gate char c1; 840*0Sstevel@tonic-gate short s; 841*0Sstevel@tonic-gate char c2; 842*0Sstevel@tonic-gate long l; 843*0Sstevel@tonic-gate } gappy_t; 844*0Sstevel@tonic-gate 845*0Sstevel@tonic-gate typedef struct { 846*0Sstevel@tonic-gate long l; 847*0Sstevel@tonic-gate short s; 848*0Sstevel@tonic-gate char c1; 849*0Sstevel@tonic-gate char c2; 850*0Sstevel@tonic-gate } dense_t; 851*0Sstevel@tonic-gate 852*0Sstevel@tonic-gateTypically, a C compiler allocates 12 bytes to a C<gappy_t> variable, but 853*0Sstevel@tonic-gaterequires only 8 bytes for a C<dense_t>. After investigating this further, 854*0Sstevel@tonic-gatewe can draw memory maps, showing where the extra 4 bytes are hidden: 855*0Sstevel@tonic-gate 856*0Sstevel@tonic-gate 0 +4 +8 +12 857*0Sstevel@tonic-gate +--+--+--+--+--+--+--+--+--+--+--+--+ 858*0Sstevel@tonic-gate |c1|xx| s |c2|xx|xx|xx| l | xx = fill byte 859*0Sstevel@tonic-gate +--+--+--+--+--+--+--+--+--+--+--+--+ 860*0Sstevel@tonic-gate gappy_t 861*0Sstevel@tonic-gate 862*0Sstevel@tonic-gate 0 +4 +8 863*0Sstevel@tonic-gate +--+--+--+--+--+--+--+--+ 864*0Sstevel@tonic-gate | l | h |c1|c2| 865*0Sstevel@tonic-gate +--+--+--+--+--+--+--+--+ 866*0Sstevel@tonic-gate dense_t 867*0Sstevel@tonic-gate 868*0Sstevel@tonic-gateAnd that's where the first quirk strikes: C<pack> and C<unpack> 869*0Sstevel@tonic-gatetemplates have to be stuffed with C<x> codes to get those extra fill bytes. 870*0Sstevel@tonic-gate 871*0Sstevel@tonic-gateThe natural question: "Why can't Perl compensate for the gaps?" warrants 872*0Sstevel@tonic-gatean answer. One good reason is that C compilers might provide (non-ANSI) 873*0Sstevel@tonic-gateextensions permitting all sorts of fancy control over the way structures 874*0Sstevel@tonic-gateare aligned, even at the level of an individual structure field. And, if 875*0Sstevel@tonic-gatethis were not enough, there is an insidious thing called C<union> where 876*0Sstevel@tonic-gatethe amount of fill bytes cannot be derived from the alignment of the next 877*0Sstevel@tonic-gateitem alone. 878*0Sstevel@tonic-gate 879*0Sstevel@tonic-gateOK, so let's bite the bullet. Here's one way to get the alignment right 880*0Sstevel@tonic-gateby inserting template codes C<x>, which don't take a corresponding item 881*0Sstevel@tonic-gatefrom the list: 882*0Sstevel@tonic-gate 883*0Sstevel@tonic-gate my $gappy = pack( 'cxs cxxx l!', $c1, $s, $c2, $l ); 884*0Sstevel@tonic-gate 885*0Sstevel@tonic-gateNote the C<!> after C<l>: We want to make sure that we pack a long 886*0Sstevel@tonic-gateinteger as it is compiled by our C compiler. And even now, it will only 887*0Sstevel@tonic-gatework for the platforms where the compiler aligns things as above. 888*0Sstevel@tonic-gateAnd somebody somewhere has a platform where it doesn't. 889*0Sstevel@tonic-gate[Probably a Cray, where C<short>s, C<int>s and C<long>s are all 8 bytes. :-)] 890*0Sstevel@tonic-gate 891*0Sstevel@tonic-gateCounting bytes and watching alignments in lengthy structures is bound to 892*0Sstevel@tonic-gatebe a drag. Isn't there a way we can create the template with a simple 893*0Sstevel@tonic-gateprogram? Here's a C program that does the trick: 894*0Sstevel@tonic-gate 895*0Sstevel@tonic-gate #include <stdio.h> 896*0Sstevel@tonic-gate #include <stddef.h> 897*0Sstevel@tonic-gate 898*0Sstevel@tonic-gate typedef struct { 899*0Sstevel@tonic-gate char fc1; 900*0Sstevel@tonic-gate short fs; 901*0Sstevel@tonic-gate char fc2; 902*0Sstevel@tonic-gate long fl; 903*0Sstevel@tonic-gate } gappy_t; 904*0Sstevel@tonic-gate 905*0Sstevel@tonic-gate #define Pt(struct,field,tchar) \ 906*0Sstevel@tonic-gate printf( "@%d%s ", offsetof(struct,field), # tchar ); 907*0Sstevel@tonic-gate 908*0Sstevel@tonic-gate int main() { 909*0Sstevel@tonic-gate Pt( gappy_t, fc1, c ); 910*0Sstevel@tonic-gate Pt( gappy_t, fs, s! ); 911*0Sstevel@tonic-gate Pt( gappy_t, fc2, c ); 912*0Sstevel@tonic-gate Pt( gappy_t, fl, l! ); 913*0Sstevel@tonic-gate printf( "\n" ); 914*0Sstevel@tonic-gate } 915*0Sstevel@tonic-gate 916*0Sstevel@tonic-gateThe output line can be used as a template in a C<pack> or C<unpack> call: 917*0Sstevel@tonic-gate 918*0Sstevel@tonic-gate my $gappy = pack( '@0c @2s! @4c @8l!', $c1, $s, $c2, $l ); 919*0Sstevel@tonic-gate 920*0Sstevel@tonic-gateGee, yet another template code - as if we hadn't plenty. But 921*0Sstevel@tonic-gateC<@> saves our day by enabling us to specify the offset from the beginning 922*0Sstevel@tonic-gateof the pack buffer to the next item: This is just the value 923*0Sstevel@tonic-gatethe C<offsetof> macro (defined in C<E<lt>stddef.hE<gt>>) returns when 924*0Sstevel@tonic-gategiven a C<struct> type and one of its field names ("member-designator" in 925*0Sstevel@tonic-gateC standardese). 926*0Sstevel@tonic-gate 927*0Sstevel@tonic-gateNeither using offsets nor adding C<x>'s to bridge the gaps is satisfactory. 928*0Sstevel@tonic-gate(Just imagine what happens if the structure changes.) What we really need 929*0Sstevel@tonic-gateis a way of saying "skip as many bytes as required to the next multiple of N". 930*0Sstevel@tonic-gateIn fluent Templatese, you say this with C<x!N> where N is replaced by the 931*0Sstevel@tonic-gateappropriate value. Here's the next version of our struct packaging: 932*0Sstevel@tonic-gate 933*0Sstevel@tonic-gate my $gappy = pack( 'c x!2 s c x!4 l!', $c1, $s, $c2, $l ); 934*0Sstevel@tonic-gate 935*0Sstevel@tonic-gateThat's certainly better, but we still have to know how long all the 936*0Sstevel@tonic-gateintegers are, and portability is far away. Rather than C<2>, 937*0Sstevel@tonic-gatefor instance, we want to say "however long a short is". But this can be 938*0Sstevel@tonic-gatedone by enclosing the appropriate pack code in brackets: C<[s]>. So, here's 939*0Sstevel@tonic-gatethe very best we can do: 940*0Sstevel@tonic-gate 941*0Sstevel@tonic-gate my $gappy = pack( 'c x![s] s c x![l!] l!', $c1, $s, $c2, $l ); 942*0Sstevel@tonic-gate 943*0Sstevel@tonic-gate 944*0Sstevel@tonic-gate=head2 Alignment, Take 2 945*0Sstevel@tonic-gate 946*0Sstevel@tonic-gateI'm afraid that we're not quite through with the alignment catch yet. The 947*0Sstevel@tonic-gatehydra raises another ugly head when you pack arrays of structures: 948*0Sstevel@tonic-gate 949*0Sstevel@tonic-gate typedef struct { 950*0Sstevel@tonic-gate short count; 951*0Sstevel@tonic-gate char glyph; 952*0Sstevel@tonic-gate } cell_t; 953*0Sstevel@tonic-gate 954*0Sstevel@tonic-gate typedef cell_t buffer_t[BUFLEN]; 955*0Sstevel@tonic-gate 956*0Sstevel@tonic-gateWhere's the catch? Padding is neither required before the first field C<count>, 957*0Sstevel@tonic-gatenor between this and the next field C<glyph>, so why can't we simply pack 958*0Sstevel@tonic-gatelike this: 959*0Sstevel@tonic-gate 960*0Sstevel@tonic-gate # something goes wrong here: 961*0Sstevel@tonic-gate pack( 's!a' x @buffer, 962*0Sstevel@tonic-gate map{ ( $_->{count}, $_->{glyph} ) } @buffer ); 963*0Sstevel@tonic-gate 964*0Sstevel@tonic-gateThis packs C<3*@buffer> bytes, but it turns out that the size of 965*0Sstevel@tonic-gateC<buffer_t> is four times C<BUFLEN>! The moral of the story is that 966*0Sstevel@tonic-gatethe required alignment of a structure or array is propagated to the 967*0Sstevel@tonic-gatenext higher level where we have to consider padding I<at the end> 968*0Sstevel@tonic-gateof each component as well. Thus the correct template is: 969*0Sstevel@tonic-gate 970*0Sstevel@tonic-gate pack( 's!ax' x @buffer, 971*0Sstevel@tonic-gate map{ ( $_->{count}, $_->{glyph} ) } @buffer ); 972*0Sstevel@tonic-gate 973*0Sstevel@tonic-gate=head2 Alignment, Take 3 974*0Sstevel@tonic-gate 975*0Sstevel@tonic-gateAnd even if you take all the above into account, ANSI still lets this: 976*0Sstevel@tonic-gate 977*0Sstevel@tonic-gate typedef struct { 978*0Sstevel@tonic-gate char foo[2]; 979*0Sstevel@tonic-gate } foo_t; 980*0Sstevel@tonic-gate 981*0Sstevel@tonic-gatevary in size. The alignment constraint of the structure can be greater than 982*0Sstevel@tonic-gateany of its elements. [And if you think that this doesn't affect anything 983*0Sstevel@tonic-gatecommon, dismember the next cellphone that you see. Many have ARM cores, and 984*0Sstevel@tonic-gatethe ARM structure rules make C<sizeof (foo_t)> == 4] 985*0Sstevel@tonic-gate 986*0Sstevel@tonic-gate=head2 Pointers for How to Use Them 987*0Sstevel@tonic-gate 988*0Sstevel@tonic-gateThe title of this section indicates the second problem you may run into 989*0Sstevel@tonic-gatesooner or later when you pack C structures. If the function you intend 990*0Sstevel@tonic-gateto call expects a, say, C<void *> value, you I<cannot> simply take 991*0Sstevel@tonic-gatea reference to a Perl variable. (Although that value certainly is a 992*0Sstevel@tonic-gatememory address, it's not the address where the variable's contents are 993*0Sstevel@tonic-gatestored.) 994*0Sstevel@tonic-gate 995*0Sstevel@tonic-gateTemplate code C<P> promises to pack a "pointer to a fixed length string". 996*0Sstevel@tonic-gateIsn't this what we want? Let's try: 997*0Sstevel@tonic-gate 998*0Sstevel@tonic-gate # allocate some storage and pack a pointer to it 999*0Sstevel@tonic-gate my $memory = "\x00" x $size; 1000*0Sstevel@tonic-gate my $memptr = pack( 'P', $memory ); 1001*0Sstevel@tonic-gate 1002*0Sstevel@tonic-gateBut wait: doesn't C<pack> just return a sequence of bytes? How can we pass this 1003*0Sstevel@tonic-gatestring of bytes to some C code expecting a pointer which is, after all, 1004*0Sstevel@tonic-gatenothing but a number? The answer is simple: We have to obtain the numeric 1005*0Sstevel@tonic-gateaddress from the bytes returned by C<pack>. 1006*0Sstevel@tonic-gate 1007*0Sstevel@tonic-gate my $ptr = unpack( 'L!', $memptr ); 1008*0Sstevel@tonic-gate 1009*0Sstevel@tonic-gateObviously this assumes that it is possible to typecast a pointer 1010*0Sstevel@tonic-gateto an unsigned long and vice versa, which frequently works but should not 1011*0Sstevel@tonic-gatebe taken as a universal law. - Now that we have this pointer the next question 1012*0Sstevel@tonic-gateis: How can we put it to good use? We need a call to some C function 1013*0Sstevel@tonic-gatewhere a pointer is expected. The read(2) system call comes to mind: 1014*0Sstevel@tonic-gate 1015*0Sstevel@tonic-gate ssize_t read(int fd, void *buf, size_t count); 1016*0Sstevel@tonic-gate 1017*0Sstevel@tonic-gateAfter reading L<perlfunc> explaining how to use C<syscall> we can write 1018*0Sstevel@tonic-gatethis Perl function copying a file to standard output: 1019*0Sstevel@tonic-gate 1020*0Sstevel@tonic-gate require 'syscall.ph'; 1021*0Sstevel@tonic-gate sub cat($){ 1022*0Sstevel@tonic-gate my $path = shift(); 1023*0Sstevel@tonic-gate my $size = -s $path; 1024*0Sstevel@tonic-gate my $memory = "\x00" x $size; # allocate some memory 1025*0Sstevel@tonic-gate my $ptr = unpack( 'L', pack( 'P', $memory ) ); 1026*0Sstevel@tonic-gate open( F, $path ) || die( "$path: cannot open ($!)\n" ); 1027*0Sstevel@tonic-gate my $fd = fileno(F); 1028*0Sstevel@tonic-gate my $res = syscall( &SYS_read, fileno(F), $ptr, $size ); 1029*0Sstevel@tonic-gate print $memory; 1030*0Sstevel@tonic-gate close( F ); 1031*0Sstevel@tonic-gate } 1032*0Sstevel@tonic-gate 1033*0Sstevel@tonic-gateThis is neither a specimen of simplicity nor a paragon of portability but 1034*0Sstevel@tonic-gateit illustrates the point: We are able to sneak behind the scenes and 1035*0Sstevel@tonic-gateaccess Perl's otherwise well-guarded memory! (Important note: Perl's 1036*0Sstevel@tonic-gateC<syscall> does I<not> require you to construct pointers in this roundabout 1037*0Sstevel@tonic-gateway. You simply pass a string variable, and Perl forwards the address.) 1038*0Sstevel@tonic-gate 1039*0Sstevel@tonic-gateHow does C<unpack> with C<P> work? Imagine some pointer in the buffer 1040*0Sstevel@tonic-gateabout to be unpacked: If it isn't the null pointer (which will smartly 1041*0Sstevel@tonic-gateproduce the C<undef> value) we have a start address - but then what? 1042*0Sstevel@tonic-gatePerl has no way of knowing how long this "fixed length string" is, so 1043*0Sstevel@tonic-gateit's up to you to specify the actual size as an explicit length after C<P>. 1044*0Sstevel@tonic-gate 1045*0Sstevel@tonic-gate my $mem = "abcdefghijklmn"; 1046*0Sstevel@tonic-gate print unpack( 'P5', pack( 'P', $mem ) ); # prints "abcde" 1047*0Sstevel@tonic-gate 1048*0Sstevel@tonic-gateAs a consequence, C<pack> ignores any number or C<*> after C<P>. 1049*0Sstevel@tonic-gate 1050*0Sstevel@tonic-gate 1051*0Sstevel@tonic-gateNow that we have seen C<P> at work, we might as well give C<p> a whirl. 1052*0Sstevel@tonic-gateWhy do we need a second template code for packing pointers at all? The 1053*0Sstevel@tonic-gateanswer lies behind the simple fact that an C<unpack> with C<p> promises 1054*0Sstevel@tonic-gatea null-terminated string starting at the address taken from the buffer, 1055*0Sstevel@tonic-gateand that implies a length for the data item to be returned: 1056*0Sstevel@tonic-gate 1057*0Sstevel@tonic-gate my $buf = pack( 'p', "abc\x00efhijklmn" ); 1058*0Sstevel@tonic-gate print unpack( 'p', $buf ); # prints "abc" 1059*0Sstevel@tonic-gate 1060*0Sstevel@tonic-gate 1061*0Sstevel@tonic-gate 1062*0Sstevel@tonic-gateAlbeit this is apt to be confusing: As a consequence of the length being 1063*0Sstevel@tonic-gateimplied by the string's length, a number after pack code C<p> is a repeat 1064*0Sstevel@tonic-gatecount, not a length as after C<P>. 1065*0Sstevel@tonic-gate 1066*0Sstevel@tonic-gate 1067*0Sstevel@tonic-gateUsing C<pack(..., $x)> with C<P> or C<p> to get the address where C<$x> is 1068*0Sstevel@tonic-gateactually stored must be used with circumspection. Perl's internal machinery 1069*0Sstevel@tonic-gateconsiders the relation between a variable and that address as its very own 1070*0Sstevel@tonic-gateprivate matter and doesn't really care that we have obtained a copy. Therefore: 1071*0Sstevel@tonic-gate 1072*0Sstevel@tonic-gate=over 4 1073*0Sstevel@tonic-gate 1074*0Sstevel@tonic-gate=item * 1075*0Sstevel@tonic-gate 1076*0Sstevel@tonic-gateDo not use C<pack> with C<p> or C<P> to obtain the address of variable 1077*0Sstevel@tonic-gatethat's bound to go out of scope (and thereby freeing its memory) before you 1078*0Sstevel@tonic-gateare done with using the memory at that address. 1079*0Sstevel@tonic-gate 1080*0Sstevel@tonic-gate=item * 1081*0Sstevel@tonic-gate 1082*0Sstevel@tonic-gateBe very careful with Perl operations that change the value of the 1083*0Sstevel@tonic-gatevariable. Appending something to the variable, for instance, might require 1084*0Sstevel@tonic-gatereallocation of its storage, leaving you with a pointer into no-man's land. 1085*0Sstevel@tonic-gate 1086*0Sstevel@tonic-gate=item * 1087*0Sstevel@tonic-gate 1088*0Sstevel@tonic-gateDon't think that you can get the address of a Perl variable 1089*0Sstevel@tonic-gatewhen it is stored as an integer or double number! C<pack('P', $x)> will 1090*0Sstevel@tonic-gateforce the variable's internal representation to string, just as if you 1091*0Sstevel@tonic-gatehad written something like C<$x .= ''>. 1092*0Sstevel@tonic-gate 1093*0Sstevel@tonic-gate=back 1094*0Sstevel@tonic-gate 1095*0Sstevel@tonic-gateIt's safe, however, to P- or p-pack a string literal, because Perl simply 1096*0Sstevel@tonic-gateallocates an anonymous variable. 1097*0Sstevel@tonic-gate 1098*0Sstevel@tonic-gate 1099*0Sstevel@tonic-gate 1100*0Sstevel@tonic-gate=head1 Pack Recipes 1101*0Sstevel@tonic-gate 1102*0Sstevel@tonic-gateHere are a collection of (possibly) useful canned recipes for C<pack> 1103*0Sstevel@tonic-gateand C<unpack>: 1104*0Sstevel@tonic-gate 1105*0Sstevel@tonic-gate # Convert IP address for socket functions 1106*0Sstevel@tonic-gate pack( "C4", split /\./, "123.4.5.6" ); 1107*0Sstevel@tonic-gate 1108*0Sstevel@tonic-gate # Count the bits in a chunk of memory (e.g. a select vector) 1109*0Sstevel@tonic-gate unpack( '%32b*', $mask ); 1110*0Sstevel@tonic-gate 1111*0Sstevel@tonic-gate # Determine the endianness of your system 1112*0Sstevel@tonic-gate $is_little_endian = unpack( 'c', pack( 's', 1 ) ); 1113*0Sstevel@tonic-gate $is_big_endian = unpack( 'xc', pack( 's', 1 ) ); 1114*0Sstevel@tonic-gate 1115*0Sstevel@tonic-gate # Determine the number of bits in a native integer 1116*0Sstevel@tonic-gate $bits = unpack( '%32I!', ~0 ); 1117*0Sstevel@tonic-gate 1118*0Sstevel@tonic-gate # Prepare argument for the nanosleep system call 1119*0Sstevel@tonic-gate my $timespec = pack( 'L!L!', $secs, $nanosecs ); 1120*0Sstevel@tonic-gate 1121*0Sstevel@tonic-gateFor a simple memory dump we unpack some bytes into just as 1122*0Sstevel@tonic-gatemany pairs of hex digits, and use C<map> to handle the traditional 1123*0Sstevel@tonic-gatespacing - 16 bytes to a line: 1124*0Sstevel@tonic-gate 1125*0Sstevel@tonic-gate my $i; 1126*0Sstevel@tonic-gate print map( ++$i % 16 ? "$_ " : "$_\n", 1127*0Sstevel@tonic-gate unpack( 'H2' x length( $mem ), $mem ) ), 1128*0Sstevel@tonic-gate length( $mem ) % 16 ? "\n" : ''; 1129*0Sstevel@tonic-gate 1130*0Sstevel@tonic-gate 1131*0Sstevel@tonic-gate=head1 Funnies Section 1132*0Sstevel@tonic-gate 1133*0Sstevel@tonic-gate # Pulling digits out of nowhere... 1134*0Sstevel@tonic-gate print unpack( 'C', pack( 'x' ) ), 1135*0Sstevel@tonic-gate unpack( '%B*', pack( 'A' ) ), 1136*0Sstevel@tonic-gate unpack( 'H', pack( 'A' ) ), 1137*0Sstevel@tonic-gate unpack( 'A', unpack( 'C', pack( 'A' ) ) ), "\n"; 1138*0Sstevel@tonic-gate 1139*0Sstevel@tonic-gate # One for the road ;-) 1140*0Sstevel@tonic-gate my $advice = pack( 'all u can in a van' ); 1141*0Sstevel@tonic-gate 1142*0Sstevel@tonic-gate 1143*0Sstevel@tonic-gate=head1 Authors 1144*0Sstevel@tonic-gate 1145*0Sstevel@tonic-gateSimon Cozens and Wolfgang Laun. 1146*0Sstevel@tonic-gate 1147