1.\" $NetBSD: magic.5,v 1.11 2013/12/01 19:32:14 christos Exp $ 2.\" 3.\" $File: magic.man,v 1.79 2013/04/22 15:30:10 christos Exp $ 4.Dd April 22, 2013 5.Dt MAGIC 5 6.Os 7.\" install as magic.4 on USG, magic.5 on V7, Berkeley and Linux systems. 8.Sh NAME 9.Nm magic 10.Nd file command's magic pattern file 11.Sh DESCRIPTION 12This manual page documents the format of the magic file as 13used by the 14.Xr file 1 15command, version 5.16. 16The 17.Xr file 1 18command identifies the type of a file using, 19among other tests, 20a test for whether the file contains certain 21.Dq "magic patterns" . 22The file 23.Pa /usr/share/misc/magic 24specifies what patterns are to be tested for, what message or 25MIME type to print if a particular pattern is found, 26and additional information to extract from the file. 27.Pp 28Each line of the file specifies a test to be performed. 29A test compares the data starting at a particular offset 30in the file with a byte value, a string or a numeric value. 31If the test succeeds, a message is printed. 32The line consists of the following fields: 33.Bl -tag -width ".Dv message" 34.It Dv offset 35A number specifying the offset, in bytes, into the file of the data 36which is to be tested. 37.It Dv type 38The type of the data to be tested. 39The possible values are: 40.Bl -tag -width ".Dv lestring16" 41.It Dv byte 42A one-byte value. 43.It Dv short 44A two-byte value in this machine's native byte order. 45.It Dv long 46A four-byte value in this machine's native byte order. 47.It Dv quad 48An eight-byte value in this machine's native byte order. 49.It Dv float 50A 32-bit single precision IEEE floating point number in this machine's native byte order. 51.It Dv double 52A 64-bit double precision IEEE floating point number in this machine's native byte order. 53.It Dv string 54A string of bytes. 55The string type specification can be optionally followed 56by /[WwcCtbT]*. 57The 58.Dq W 59flag compacts whitespace in the target, which must 60contain at least one whitespace character. 61If the magic has 62.Dv n 63consecutive blanks, the target needs at least 64.Dv n 65consecutive blanks to match. 66The 67.Dq w 68flag treats every blank in the magic as an optional blank. 69The 70.Dq c 71flag specifies case insensitive matching: lower case 72characters in the magic match both lower and upper case characters in the 73target, whereas upper case characters in the magic only match upper case 74characters in the target. 75The 76.Dq C 77flag specifies case insensitive matching: upper case 78characters in the magic match both lower and upper case characters in the 79target, whereas lower case characters in the magic only match upper case 80characters in the target. 81To do a complete case insensitive match, specify both 82.Dq c 83and 84.Dq C . 85The 86.Dq t 87flag forces the test to be done for text files, while the 88.Dq b 89flag forces the test to be done for binary files. 90The 91.Dq T 92flag causes the string to be trimmed, i.e. leading and trailing whitespace 93is deleted before the string is printed. 94.It Dv pstring 95A Pascal-style string where the first byte/short/int is interpreted as the 96unsigned length. 97The length defaults to byte and can be specified as a modifier. 98The following modifiers are supported: 99.Bl -tag -compact -width B 100.It B 101A byte length (default). 102.It H 103A 2 byte big endian length. 104.It h 105A 2 byte big little length. 106.It L 107A 4 byte big endian length. 108.It l 109A 4 byte big little length. 110.It J 111The length includes itself in its count. 112.El 113The string is not NUL terminated. 114.Dq J 115is used rather than the more 116valuable 117.Dq I 118because this type of length is a feature of the JPEG 119format. 120.It Dv date 121A four-byte value interpreted as a UNIX date. 122.It Dv qdate 123A eight-byte value interpreted as a UNIX date. 124.It Dv ldate 125A four-byte value interpreted as a UNIX-style date, but interpreted as 126local time rather than UTC. 127.It Dv qldate 128An eight-byte value interpreted as a UNIX-style date, but interpreted as 129local time rather than UTC. 130.It Dv qwdate 131An eight-byte value interpreted as a Windows-style date. 132.It Dv beid3 133A 32-bit ID3 length in big-endian byte order. 134.It Dv beshort 135A two-byte value in big-endian byte order. 136.It Dv belong 137A four-byte value in big-endian byte order. 138.It Dv bequad 139An eight-byte value in big-endian byte order. 140.It Dv befloat 141A 32-bit single precision IEEE floating point number in big-endian byte order. 142.It Dv bedouble 143A 64-bit double precision IEEE floating point number in big-endian byte order. 144.It Dv bedate 145A four-byte value in big-endian byte order, 146interpreted as a Unix date. 147.It Dv beqdate 148An eight-byte value in big-endian byte order, 149interpreted as a Unix date. 150.It Dv beldate 151A four-byte value in big-endian byte order, 152interpreted as a UNIX-style date, but interpreted as local time rather 153than UTC. 154.It Dv beqldate 155An eight-byte value in big-endian byte order, 156interpreted as a UNIX-style date, but interpreted as local time rather 157than UTC. 158.It Dv beqwdate 159An eight-byte value in big-endian byte order, 160interpreted as a Windows-style date. 161.It Dv bestring16 162A two-byte unicode (UCS16) string in big-endian byte order. 163.It Dv leid3 164A 32-bit ID3 length in little-endian byte order. 165.It Dv leshort 166A two-byte value in little-endian byte order. 167.It Dv lelong 168A four-byte value in little-endian byte order. 169.It Dv lequad 170An eight-byte value in little-endian byte order. 171.It Dv lefloat 172A 32-bit single precision IEEE floating point number in little-endian byte order. 173.It Dv ledouble 174A 64-bit double precision IEEE floating point number in little-endian byte order. 175.It Dv ledate 176A four-byte value in little-endian byte order, 177interpreted as a UNIX date. 178.It Dv leqdate 179An eight-byte value in little-endian byte order, 180interpreted as a UNIX date. 181.It Dv leldate 182A four-byte value in little-endian byte order, 183interpreted as a UNIX-style date, but interpreted as local time rather 184than UTC. 185.It Dv leqldate 186An eight-byte value in little-endian byte order, 187interpreted as a UNIX-style date, but interpreted as local time rather 188than UTC. 189.It Dv leqwdate 190An eight-byte value in little-endian byte order, 191interpreted as a Windows-style date. 192.It Dv lestring16 193A two-byte unicode (UCS16) string in little-endian byte order. 194.It Dv melong 195A four-byte value in middle-endian (PDP-11) byte order. 196.It Dv medate 197A four-byte value in middle-endian (PDP-11) byte order, 198interpreted as a UNIX date. 199.It Dv meldate 200A four-byte value in middle-endian (PDP-11) byte order, 201interpreted as a UNIX-style date, but interpreted as local time rather 202than UTC. 203.It Dv indirect 204Starting at the given offset, consult the magic database again. 205.It Dv name 206Define a 207.Dq named 208magic instance that can be called from another 209.Dv use 210magic entry, like a subroutine call. 211Named instance direct magic offsets are relative to the offset of the 212previous matched entry, but indirect offsets are relative to the beginning 213of the file as usual. 214Named magic entries always match. 215.It Dv use 216Recursively call the named magic starting from the current offset. 217If the name of the referenced begins with a 218.Dv ^ 219then the endianness of the magic is switched; if the magic mentioned 220.Dv leshort 221for example, 222it is treated as 223.Dv beshort 224and vice versa. 225This is useful to avoid duplicating the rules for different endianness. 226.It Dv regex 227A regular expression match in extended POSIX regular expression syntax 228(like egrep). 229Regular expressions can take exponential time to process, and their 230performance is hard to predict, so their use is discouraged. 231When used in production environments, their performance 232should be carefully checked. 233The type specification can be optionally followed by 234.Dv /[c][s] . 235The 236.Dq c 237flag makes the match case insensitive, while the 238.Dq s 239flag update the offset to the start offset of the match, rather than the end. 240The regular expression is tested against line 241.Dv N + 1 242onwards, where 243.Dv N 244is the given offset. 245Line endings are assumed to be in the machine's native format. 246.Dv ^ 247and 248.Dv $ 249match the beginning and end of individual lines, respectively, 250not beginning and end of file. 251.It Dv search 252A literal string search starting at the given offset. 253The same modifier flags can be used as for string patterns. 254The modifier flags (if any) must be followed by 255.Dv /number 256the range, that is, the number of positions at which the match will be 257attempted, starting from the start offset. 258This is suitable for 259searching larger binary expressions with variable offsets, using 260.Dv \e 261escapes for special characters. 262The offset works as for regex. 263.It Dv default 264This is intended to be used with the test 265.Em x 266(which is always true) and it has no type. 267It matches when no other test at that continuation level has matched before. 268Clearing that matched tests for a continuation level, can be done using the 269.Dv clear 270test. 271.It Dv clear 272This test is always true and clears the match flag for that continuation level. 273It is intended to be used with the 274.Dv default 275test. 276.El 277.Pp 278For compatibility with the Single 279.Ux 280Standard, the type specifiers 281.Dv dC 282and 283.Dv d1 284are equivalent to 285.Dv byte , 286the type specifiers 287.Dv uC 288and 289.Dv u1 290are equivalent to 291.Dv ubyte , 292the type specifiers 293.Dv dS 294and 295.Dv d2 296are equivalent to 297.Dv short , 298the type specifiers 299.Dv uS 300and 301.Dv u2 302are equivalent to 303.Dv ushort , 304the type specifiers 305.Dv dI , 306.Dv dL , 307and 308.Dv d4 309are equivalent to 310.Dv long , 311the type specifiers 312.Dv uI , 313.Dv uL , 314and 315.Dv u4 316are equivalent to 317.Dv ulong , 318the type specifier 319.Dv d8 320is equivalent to 321.Dv quad , 322the type specifier 323.Dv u8 324is equivalent to 325.Dv uquad , 326and the type specifier 327.Dv s 328is equivalent to 329.Dv string . 330In addition, the type specifier 331.Dv dQ 332is equivalent to 333.Dv quad 334and the type specifier 335.Dv uQ 336is equivalent to 337.Dv uquad . 338.Pp 339Each top-level magic pattern (see below for an explanation of levels) 340is classified as text or binary according to the types used. 341Types 342.Dq regex 343and 344.Dq search 345are classified as text tests, unless non-printable characters are used 346in the pattern. 347All other tests are classified as binary. 348A top-level 349pattern is considered to be a test text when all its patterns are text 350patterns; otherwise, it is considered to be a binary pattern. 351When 352matching a file, binary patterns are tried first; if no match is 353found, and the file looks like text, then its encoding is determined 354and the text patterns are tried. 355.Pp 356The numeric types may optionally be followed by 357.Dv \*[Am] 358and a numeric value, 359to specify that the value is to be AND'ed with the 360numeric value before any comparisons are done. 361Prepending a 362.Dv u 363to the type indicates that ordered comparisons should be unsigned. 364.It Dv test 365The value to be compared with the value from the file. 366If the type is 367numeric, this value 368is specified in C form; if it is a string, it is specified as a C string 369with the usual escapes permitted (e.g. \en for new-line). 370.Pp 371Numeric values 372may be preceded by a character indicating the operation to be performed. 373It may be 374.Dv = , 375to specify that the value from the file must equal the specified value, 376.Dv \*[Lt] , 377to specify that the value from the file must be less than the specified 378value, 379.Dv \*[Gt] , 380to specify that the value from the file must be greater than the specified 381value, 382.Dv \*[Am] , 383to specify that the value from the file must have set all of the bits 384that are set in the specified value, 385.Dv ^ , 386to specify that the value from the file must have clear any of the bits 387that are set in the specified value, or 388.Dv ~ , 389the value specified after is negated before tested. 390.Dv x , 391to specify that any value will match. 392If the character is omitted, it is assumed to be 393.Dv = . 394Operators 395.Dv \*[Am] , 396.Dv ^ , 397and 398.Dv ~ 399don't work with floats and doubles. 400The operator 401.Dv !\& 402specifies that the line matches if the test does 403.Em not 404succeed. 405.Pp 406Numeric values are specified in C form; e.g. 407.Dv 13 408is decimal, 409.Dv 013 410is octal, and 411.Dv 0x13 412is hexadecimal. 413.Pp 414For string values, the string from the 415file must match the specified string. 416The operators 417.Dv = , 418.Dv \*[Lt] 419and 420.Dv \*[Gt] 421(but not 422.Dv \*[Am] ) 423can be applied to strings. 424The length used for matching is that of the string argument 425in the magic file. 426This means that a line can match any non-empty string (usually used to 427then print the string), with 428.Em \*[Gt]\e0 429(because all non-empty strings are greater than the empty string). 430.Pp 431The special test 432.Em x 433always evaluates to true. 434.It Dv message 435The message to be printed if the comparison succeeds. 436If the string contains a 437.Xr printf 3 438format specification, the value from the file (with any specified masking 439performed) is printed using the message as the format string. 440If the string begins with 441.Dq \eb , 442the message printed is the remainder of the string with no whitespace 443added before it: multiple matches are normally separated by a single 444space. 445.El 446.Pp 447An APPLE 4+4 character APPLE creator and type can be specified as: 448.Bd -literal -offset indent 449!:apple CREATYPE 450.Ed 451.Pp 452A MIME type is given on a separate line, which must be the next 453non-blank or comment line after the magic line that identifies the 454file type, and has the following format: 455.Bd -literal -offset indent 456!:mime MIMETYPE 457.Ed 458.Pp 459i.e. the literal string 460.Dq !:mime 461followed by the MIME type. 462.Pp 463An optional strength can be supplied on a separate line which refers to 464the current magic description using the following format: 465.Bd -literal -offset indent 466!:strength OP VALUE 467.Ed 468.Pp 469The operand 470.Dv OP 471can be: 472.Dv + , 473.Dv - , 474.Dv * , 475or 476.Dv / 477and 478.Dv VALUE 479is a constant between 0 and 255. 480This constant is applied using the specified operand 481to the currently computed default magic strength. 482.Pp 483Some file formats contain additional information which is to be printed 484along with the file type or need additional tests to determine the true 485file type. 486These additional tests are introduced by one or more 487.Em \*[Gt] 488characters preceding the offset. 489The number of 490.Em \*[Gt] 491on the line indicates the level of the test; a line with no 492.Em \*[Gt] 493at the beginning is considered to be at level 0. 494Tests are arranged in a tree-like hierarchy: 495if the test on a line at level 496.Em n 497succeeds, all following tests at level 498.Em n+1 499are performed, and the messages printed if the tests succeed, until a line 500with level 501.Em n 502(or less) appears. 503For more complex files, one can use empty messages to get just the 504"if/then" effect, in the following way: 505.Bd -literal -offset indent 5060 string MZ 507\*[Gt]0x18 leshort \*[Lt]0x40 MS-DOS executable 508\*[Gt]0x18 leshort \*[Gt]0x3f extended PC executable (e.g., MS Windows) 509.Ed 510.Pp 511Offsets do not need to be constant, but can also be read from the file 512being examined. 513If the first character following the last 514.Em \*[Gt] 515is a 516.Em \&( 517then the string after the parenthesis is interpreted as an indirect offset. 518That means that the number after the parenthesis is used as an offset in 519the file. 520The value at that offset is read, and is used again as an offset 521in the file. 522Indirect offsets are of the form: 523.Em (( x [.[bislBISL]][+\-][ y ]) . 524The value of 525.Em x 526is used as an offset in the file. 527A byte, id3 length, short or long is read at that offset depending on the 528.Em [bislBISLm] 529type specifier. 530The capitalized types interpret the number as a big endian 531value, whereas the small letter versions interpret the number as a little 532endian value; 533the 534.Em m 535type interprets the number as a middle endian (PDP-11) value. 536To that number the value of 537.Em y 538is added and the result is used as an offset in the file. 539The default type if one is not specified is long. 540.Pp 541That way variable length structures can be examined: 542.Bd -literal -offset indent 543# MS Windows executables are also valid MS-DOS executables 5440 string MZ 545\*[Gt]0x18 leshort \*[Lt]0x40 MZ executable (MS-DOS) 546# skip the whole block below if it is not an extended executable 547\*[Gt]0x18 leshort \*[Gt]0x3f 548\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 549\*[Gt]\*[Gt](0x3c.l) string LX\e0\e0 LX executable (OS/2) 550.Ed 551.Pp 552This strategy of examining has a drawback: You must make sure that 553you eventually print something, or users may get empty output (like, when 554there is neither PE\e0\e0 nor LE\e0\e0 in the above example) 555.Pp 556If this indirect offset cannot be used directly, simple calculations are 557possible: appending 558.Em [+-*/%\*[Am]|^]number 559inside parentheses allows one to modify 560the value read from the file before it is used as an offset: 561.Bd -literal -offset indent 562# MS Windows executables are also valid MS-DOS executables 5630 string MZ 564# sometimes, the value at 0x18 is less that 0x40 but there's still an 565# extended executable, simply appended to the file 566\*[Gt]0x18 leshort \*[Lt]0x40 567\*[Gt]\*[Gt](4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP) 568\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) 569.Ed 570.Pp 571Sometimes you do not know the exact offset as this depends on the length or 572position (when indirection was used before) of preceding fields. 573You can specify an offset relative to the end of the last up-level 574field using 575.Sq \*[Am] 576as a prefix to the offset: 577.Bd -literal -offset indent 5780 string MZ 579\*[Gt]0x18 leshort \*[Gt]0x3f 580\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 581# immediately following the PE signature is the CPU type 582\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x14c for Intel 80386 583\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x184 for DEC Alpha 584.Ed 585.Pp 586Indirect and relative offsets can be combined: 587.Bd -literal -offset indent 5880 string MZ 589\*[Gt]0x18 leshort \*[Lt]0x40 590\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) 591# if it's not COFF, go back 512 bytes and add the offset taken 592# from byte 2/3, which is yet another way of finding the start 593# of the extended executable 594\*[Gt]\*[Gt]\*[Gt]\*[Am](2.s-514) string LE LE executable (MS Windows VxD driver) 595.Ed 596.Pp 597Or the other way around: 598.Bd -literal -offset indent 5990 string MZ 600\*[Gt]0x18 leshort \*[Gt]0x3f 601\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) 602# at offset 0x80 (-4, since relative offsets start at the end 603# of the up-level match) inside the LE header, we find the absolute 604# offset to the code area, where we look for a specific signature 605\*[Gt]\*[Gt]\*[Gt](\*[Am]0x7c.l+0x26) string UPX \eb, UPX compressed 606.Ed 607.Pp 608Or even both! 609.Bd -literal -offset indent 6100 string MZ 611\*[Gt]0x18 leshort \*[Gt]0x3f 612\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) 613# at offset 0x58 inside the LE header, we find the relative offset 614# to a data area where we look for a specific signature 615\*[Gt]\*[Gt]\*[Gt]\*[Am](\*[Am]0x54.l-3) string UNACE \eb, ACE self-extracting archive 616.Ed 617.Pp 618If you have to deal with offset/length pairs in your file, even the 619second value in a parenthesized expression can be taken from the file itself, 620using another set of parentheses. 621Note that this additional indirect offset is always relative to the 622start of the main indirect offset. 623.Bd -literal -offset indent 6240 string MZ 625\*[Gt]0x18 leshort \*[Gt]0x3f 626\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 627# search for the PE section called ".idata"... 628\*[Gt]\*[Gt]\*[Gt]\*[Am]0xf4 search/0x140 .idata 629# ...and go to the end of it, calculated from start+length; 630# these are located 14 and 10 bytes after the section name 631\*[Gt]\*[Gt]\*[Gt]\*[Gt](\*[Am]0xe.l+(-4)) string PK\e3\e4 \eb, ZIP self-extracting archive 632.Ed 633.Pp 634If you have a list of known avalues at a particular continuation level, 635and you want to provide a switch-like default case: 636.Bd -literal -offset indent 637# clear that continuation level match 638\*[Gt]18 clear 639\*[Gt]18 lelong 1 one 640\*[Gt]18 lelong 2 two 641\*[Gt]18 default x 642# print default match 643\*[Gt]\*[Gt]18 lelong x unmatched 0x%x 644.Ed 645.Sh SEE ALSO 646.Xr file 1 647\- the command that reads this file. 648.Sh BUGS 649The formats 650.Dv long , 651.Dv belong , 652.Dv lelong , 653.Dv melong , 654.Dv short , 655.Dv beshort , 656and 657.Dv leshort 658do not depend on the length of the C data types 659.Dv short 660and 661.Dv long 662on the platform, even though the Single 663.Ux 664Specification implies that they do. However, as OS X Mountain Lion has 665passed the Single 666.Ux 667Specification validation suite, and supplies a version of 668.Xr file 1 669in which they do not depend on the sizes of the C data types and that is 670built for a 64-bit environment in which 671.Dv long 672is 8 bytes rather than 4 bytes, presumably the validation suite does not 673test whether, for example 674.Dv long 675refers to an item with the same size as the C data type 676.Dv long . 677There should probably be 678.Dv type 679names 680.Dv int8 , 681.Dv uint8 , 682.Dv int16 , 683.Dv uint16 , 684.Dv int32 , 685.Dv uint32 , 686.Dv int64 , 687and 688.Dv uint64 , 689and specified-byte-order variants of them, 690to make it clearer that those types have specified widths. 691.\" 692.\" From: guy@sun.uucp (Guy Harris) 693.\" Newsgroups: net.bugs.usg 694.\" Subject: /etc/magic's format isn't well documented 695.\" Message-ID: <2752@sun.uucp> 696.\" Date: 3 Sep 85 08:19:07 GMT 697.\" Organization: Sun Microsystems, Inc. 698.\" Lines: 136 699.\" 700.\" Here's a manual page for the format accepted by the "file" made by adding 701.\" the changes I posted to the S5R2 version. 702.\" 703.\" Modified for Ian Darwin's version of the file command. 704