1.\" $NetBSD: magic.5,v 1.12 2014/06/13 02:08:06 christos Exp $ 2.\" 3.\" $File: magic.man,v 1.84 2014/06/03 19:01:34 christos Exp $ 4.Dd June 3, 2014 5.Dt MAGIC 5 6.Os 7.\" install as magic.4 on USG, magic.5 on V7, Berkeley and Linux systems. 8.Sh NAME 9.Nm magic 10.Nd file command's magic pattern file 11.Sh DESCRIPTION 12This manual page documents the format of the magic file as 13used by the 14.Xr file 1 15command, version 5.19. 16The 17.Xr file 1 18command identifies the type of a file using, 19among other tests, 20a test for whether the file contains certain 21.Dq "magic patterns" . 22The file 23.Pa /usr/share/misc/magic 24specifies what patterns are to be tested for, what message or 25MIME type to print if a particular pattern is found, 26and additional information to extract from the file. 27.Pp 28Each line of the file specifies a test to be performed. 29A test compares the data starting at a particular offset 30in the file with a byte value, a string or a numeric value. 31If the test succeeds, a message is printed. 32The line consists of the following fields: 33.Bl -tag -width ".Dv message" 34.It Dv offset 35A number specifying the offset, in bytes, into the file of the data 36which is to be tested. 37.It Dv type 38The type of the data to be tested. 39The possible values are: 40.Bl -tag -width ".Dv lestring16" 41.It Dv byte 42A one-byte value. 43.It Dv short 44A two-byte value in this machine's native byte order. 45.It Dv long 46A four-byte value in this machine's native byte order. 47.It Dv quad 48An eight-byte value in this machine's native byte order. 49.It Dv float 50A 32-bit single precision IEEE floating point number in this machine's native byte order. 51.It Dv double 52A 64-bit double precision IEEE floating point number in this machine's native byte order. 53.It Dv string 54A string of bytes. 55The string type specification can be optionally followed 56by /[WwcCtbT]*. 57The 58.Dq W 59flag compacts whitespace in the target, which must 60contain at least one whitespace character. 61If the magic has 62.Dv n 63consecutive blanks, the target needs at least 64.Dv n 65consecutive blanks to match. 66The 67.Dq w 68flag treats every blank in the magic as an optional blank. 69The 70.Dq c 71flag specifies case insensitive matching: lower case 72characters in the magic match both lower and upper case characters in the 73target, whereas upper case characters in the magic only match upper case 74characters in the target. 75The 76.Dq C 77flag specifies case insensitive matching: upper case 78characters in the magic match both lower and upper case characters in the 79target, whereas lower case characters in the magic only match upper case 80characters in the target. 81To do a complete case insensitive match, specify both 82.Dq c 83and 84.Dq C . 85The 86.Dq t 87flag forces the test to be done for text files, while the 88.Dq b 89flag forces the test to be done for binary files. 90The 91.Dq T 92flag causes the string to be trimmed, i.e. leading and trailing whitespace 93is deleted before the string is printed. 94.It Dv pstring 95A Pascal-style string where the first byte/short/int is interpreted as the 96unsigned length. 97The length defaults to byte and can be specified as a modifier. 98The following modifiers are supported: 99.Bl -tag -compact -width B 100.It B 101A byte length (default). 102.It H 103A 2 byte big endian length. 104.It h 105A 2 byte big little length. 106.It L 107A 4 byte big endian length. 108.It l 109A 4 byte big little length. 110.It J 111The length includes itself in its count. 112.El 113The string is not NUL terminated. 114.Dq J 115is used rather than the more 116valuable 117.Dq I 118because this type of length is a feature of the JPEG 119format. 120.It Dv date 121A four-byte value interpreted as a UNIX date. 122.It Dv qdate 123A eight-byte value interpreted as a UNIX date. 124.It Dv ldate 125A four-byte value interpreted as a UNIX-style date, but interpreted as 126local time rather than UTC. 127.It Dv qldate 128An eight-byte value interpreted as a UNIX-style date, but interpreted as 129local time rather than UTC. 130.It Dv qwdate 131An eight-byte value interpreted as a Windows-style date. 132.It Dv beid3 133A 32-bit ID3 length in big-endian byte order. 134.It Dv beshort 135A two-byte value in big-endian byte order. 136.It Dv belong 137A four-byte value in big-endian byte order. 138.It Dv bequad 139An eight-byte value in big-endian byte order. 140.It Dv befloat 141A 32-bit single precision IEEE floating point number in big-endian byte order. 142.It Dv bedouble 143A 64-bit double precision IEEE floating point number in big-endian byte order. 144.It Dv bedate 145A four-byte value in big-endian byte order, 146interpreted as a Unix date. 147.It Dv beqdate 148An eight-byte value in big-endian byte order, 149interpreted as a Unix date. 150.It Dv beldate 151A four-byte value in big-endian byte order, 152interpreted as a UNIX-style date, but interpreted as local time rather 153than UTC. 154.It Dv beqldate 155An eight-byte value in big-endian byte order, 156interpreted as a UNIX-style date, but interpreted as local time rather 157than UTC. 158.It Dv beqwdate 159An eight-byte value in big-endian byte order, 160interpreted as a Windows-style date. 161.It Dv bestring16 162A two-byte unicode (UCS16) string in big-endian byte order. 163.It Dv leid3 164A 32-bit ID3 length in little-endian byte order. 165.It Dv leshort 166A two-byte value in little-endian byte order. 167.It Dv lelong 168A four-byte value in little-endian byte order. 169.It Dv lequad 170An eight-byte value in little-endian byte order. 171.It Dv lefloat 172A 32-bit single precision IEEE floating point number in little-endian byte order. 173.It Dv ledouble 174A 64-bit double precision IEEE floating point number in little-endian byte order. 175.It Dv ledate 176A four-byte value in little-endian byte order, 177interpreted as a UNIX date. 178.It Dv leqdate 179An eight-byte value in little-endian byte order, 180interpreted as a UNIX date. 181.It Dv leldate 182A four-byte value in little-endian byte order, 183interpreted as a UNIX-style date, but interpreted as local time rather 184than UTC. 185.It Dv leqldate 186An eight-byte value in little-endian byte order, 187interpreted as a UNIX-style date, but interpreted as local time rather 188than UTC. 189.It Dv leqwdate 190An eight-byte value in little-endian byte order, 191interpreted as a Windows-style date. 192.It Dv lestring16 193A two-byte unicode (UCS16) string in little-endian byte order. 194.It Dv melong 195A four-byte value in middle-endian (PDP-11) byte order. 196.It Dv medate 197A four-byte value in middle-endian (PDP-11) byte order, 198interpreted as a UNIX date. 199.It Dv meldate 200A four-byte value in middle-endian (PDP-11) byte order, 201interpreted as a UNIX-style date, but interpreted as local time rather 202than UTC. 203.It Dv indirect 204Starting at the given offset, consult the magic database again. 205.It Dv name 206Define a 207.Dq named 208magic instance that can be called from another 209.Dv use 210magic entry, like a subroutine call. 211Named instance direct magic offsets are relative to the offset of the 212previous matched entry, but indirect offsets are relative to the beginning 213of the file as usual. 214Named magic entries always match. 215.It Dv use 216Recursively call the named magic starting from the current offset. 217If the name of the referenced begins with a 218.Dv ^ 219then the endianness of the magic is switched; if the magic mentioned 220.Dv leshort 221for example, 222it is treated as 223.Dv beshort 224and vice versa. 225This is useful to avoid duplicating the rules for different endianness. 226.It Dv regex 227A regular expression match in extended POSIX regular expression syntax 228(like egrep). 229Regular expressions can take exponential time to process, and their 230performance is hard to predict, so their use is discouraged. 231When used in production environments, their performance 232should be carefully checked. 233The size of the string to search should also be limited by specifying 234.Dv /<length> , 235to avoid performance issues scanning long files. 236The type specification can also be optionally followed by 237.Dv /[c][s][l] . 238The 239.Dq c 240flag makes the match case insensitive, while the 241.Dq s 242flag update the offset to the start offset of the match, rather than the end. 243The 244.Dq l 245modifier, changes the limit of length to mean number of lines instead of a 246byte count. 247Lines are delimited by the platforms native line delimiter. 248When a line count is specified, an implicit byte count also computed assuming 249each line is 80 characters long. 250If neither a byte or line count is specified, the search is limited automatically 251to 8KiB. 252.Dv ^ 253and 254.Dv $ 255match the beginning and end of individual lines, respectively, 256not beginning and end of file. 257.It Dv search 258A literal string search starting at the given offset. 259The same modifier flags can be used as for string patterns. 260The search expression must contain the range in the form 261.Dv /number, 262that is the number of positions at which the match will be 263attempted, starting from the start offset. 264This is suitable for 265searching larger binary expressions with variable offsets, using 266.Dv \e 267escapes for special characters. 268The order of modifier and number is not relevant. 269.It Dv default 270This is intended to be used with the test 271.Em x 272(which is always true) and it has no type. 273It matches when no other test at that continuation level has matched before. 274Clearing that matched tests for a continuation level, can be done using the 275.Dv clear 276test. 277.It Dv clear 278This test is always true and clears the match flag for that continuation level. 279It is intended to be used with the 280.Dv default 281test. 282.El 283.Pp 284For compatibility with the Single 285.Ux 286Standard, the type specifiers 287.Dv dC 288and 289.Dv d1 290are equivalent to 291.Dv byte , 292the type specifiers 293.Dv uC 294and 295.Dv u1 296are equivalent to 297.Dv ubyte , 298the type specifiers 299.Dv dS 300and 301.Dv d2 302are equivalent to 303.Dv short , 304the type specifiers 305.Dv uS 306and 307.Dv u2 308are equivalent to 309.Dv ushort , 310the type specifiers 311.Dv dI , 312.Dv dL , 313and 314.Dv d4 315are equivalent to 316.Dv long , 317the type specifiers 318.Dv uI , 319.Dv uL , 320and 321.Dv u4 322are equivalent to 323.Dv ulong , 324the type specifier 325.Dv d8 326is equivalent to 327.Dv quad , 328the type specifier 329.Dv u8 330is equivalent to 331.Dv uquad , 332and the type specifier 333.Dv s 334is equivalent to 335.Dv string . 336In addition, the type specifier 337.Dv dQ 338is equivalent to 339.Dv quad 340and the type specifier 341.Dv uQ 342is equivalent to 343.Dv uquad . 344.Pp 345Each top-level magic pattern (see below for an explanation of levels) 346is classified as text or binary according to the types used. 347Types 348.Dq regex 349and 350.Dq search 351are classified as text tests, unless non-printable characters are used 352in the pattern. 353All other tests are classified as binary. 354A top-level 355pattern is considered to be a test text when all its patterns are text 356patterns; otherwise, it is considered to be a binary pattern. 357When 358matching a file, binary patterns are tried first; if no match is 359found, and the file looks like text, then its encoding is determined 360and the text patterns are tried. 361.Pp 362The numeric types may optionally be followed by 363.Dv \*[Am] 364and a numeric value, 365to specify that the value is to be AND'ed with the 366numeric value before any comparisons are done. 367Prepending a 368.Dv u 369to the type indicates that ordered comparisons should be unsigned. 370.It Dv test 371The value to be compared with the value from the file. 372If the type is 373numeric, this value 374is specified in C form; if it is a string, it is specified as a C string 375with the usual escapes permitted (e.g. \en for new-line). 376.Pp 377Numeric values 378may be preceded by a character indicating the operation to be performed. 379It may be 380.Dv = , 381to specify that the value from the file must equal the specified value, 382.Dv \*[Lt] , 383to specify that the value from the file must be less than the specified 384value, 385.Dv \*[Gt] , 386to specify that the value from the file must be greater than the specified 387value, 388.Dv \*[Am] , 389to specify that the value from the file must have set all of the bits 390that are set in the specified value, 391.Dv ^ , 392to specify that the value from the file must have clear any of the bits 393that are set in the specified value, or 394.Dv ~ , 395the value specified after is negated before tested. 396.Dv x , 397to specify that any value will match. 398If the character is omitted, it is assumed to be 399.Dv = . 400Operators 401.Dv \*[Am] , 402.Dv ^ , 403and 404.Dv ~ 405don't work with floats and doubles. 406The operator 407.Dv !\& 408specifies that the line matches if the test does 409.Em not 410succeed. 411.Pp 412Numeric values are specified in C form; e.g. 413.Dv 13 414is decimal, 415.Dv 013 416is octal, and 417.Dv 0x13 418is hexadecimal. 419.Pp 420Numeric operations are not performed on date types, instead the numeric 421value is interpreted as an offset. 422.Pp 423For string values, the string from the 424file must match the specified string. 425The operators 426.Dv = , 427.Dv \*[Lt] 428and 429.Dv \*[Gt] 430(but not 431.Dv \*[Am] ) 432can be applied to strings. 433The length used for matching is that of the string argument 434in the magic file. 435This means that a line can match any non-empty string (usually used to 436then print the string), with 437.Em \*[Gt]\e0 438(because all non-empty strings are greater than the empty string). 439.Pp 440Dates are treated as numerical values in the respective internal 441representation. 442.Pp 443The special test 444.Em x 445always evaluates to true. 446.It Dv message 447The message to be printed if the comparison succeeds. 448If the string contains a 449.Xr printf 3 450format specification, the value from the file (with any specified masking 451performed) is printed using the message as the format string. 452If the string begins with 453.Dq \eb , 454the message printed is the remainder of the string with no whitespace 455added before it: multiple matches are normally separated by a single 456space. 457.El 458.Pp 459An APPLE 4+4 character APPLE creator and type can be specified as: 460.Bd -literal -offset indent 461!:apple CREATYPE 462.Ed 463.Pp 464A MIME type is given on a separate line, which must be the next 465non-blank or comment line after the magic line that identifies the 466file type, and has the following format: 467.Bd -literal -offset indent 468!:mime MIMETYPE 469.Ed 470.Pp 471i.e. the literal string 472.Dq !:mime 473followed by the MIME type. 474.Pp 475An optional strength can be supplied on a separate line which refers to 476the current magic description using the following format: 477.Bd -literal -offset indent 478!:strength OP VALUE 479.Ed 480.Pp 481The operand 482.Dv OP 483can be: 484.Dv + , 485.Dv - , 486.Dv * , 487or 488.Dv / 489and 490.Dv VALUE 491is a constant between 0 and 255. 492This constant is applied using the specified operand 493to the currently computed default magic strength. 494.Pp 495Some file formats contain additional information which is to be printed 496along with the file type or need additional tests to determine the true 497file type. 498These additional tests are introduced by one or more 499.Em \*[Gt] 500characters preceding the offset. 501The number of 502.Em \*[Gt] 503on the line indicates the level of the test; a line with no 504.Em \*[Gt] 505at the beginning is considered to be at level 0. 506Tests are arranged in a tree-like hierarchy: 507if the test on a line at level 508.Em n 509succeeds, all following tests at level 510.Em n+1 511are performed, and the messages printed if the tests succeed, until a line 512with level 513.Em n 514(or less) appears. 515For more complex files, one can use empty messages to get just the 516"if/then" effect, in the following way: 517.Bd -literal -offset indent 5180 string MZ 519\*[Gt]0x18 leshort \*[Lt]0x40 MS-DOS executable 520\*[Gt]0x18 leshort \*[Gt]0x3f extended PC executable (e.g., MS Windows) 521.Ed 522.Pp 523Offsets do not need to be constant, but can also be read from the file 524being examined. 525If the first character following the last 526.Em \*[Gt] 527is a 528.Em \&( 529then the string after the parenthesis is interpreted as an indirect offset. 530That means that the number after the parenthesis is used as an offset in 531the file. 532The value at that offset is read, and is used again as an offset 533in the file. 534Indirect offsets are of the form: 535.Em (( x [.[bislBISL]][+\-][ y ]) . 536The value of 537.Em x 538is used as an offset in the file. 539A byte, id3 length, short or long is read at that offset depending on the 540.Em [bislBISLm] 541type specifier. 542The capitalized types interpret the number as a big endian 543value, whereas the small letter versions interpret the number as a little 544endian value; 545the 546.Em m 547type interprets the number as a middle endian (PDP-11) value. 548To that number the value of 549.Em y 550is added and the result is used as an offset in the file. 551The default type if one is not specified is long. 552.Pp 553That way variable length structures can be examined: 554.Bd -literal -offset indent 555# MS Windows executables are also valid MS-DOS executables 5560 string MZ 557\*[Gt]0x18 leshort \*[Lt]0x40 MZ executable (MS-DOS) 558# skip the whole block below if it is not an extended executable 559\*[Gt]0x18 leshort \*[Gt]0x3f 560\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 561\*[Gt]\*[Gt](0x3c.l) string LX\e0\e0 LX executable (OS/2) 562.Ed 563.Pp 564This strategy of examining has a drawback: You must make sure that 565you eventually print something, or users may get empty output (like, when 566there is neither PE\e0\e0 nor LE\e0\e0 in the above example) 567.Pp 568If this indirect offset cannot be used directly, simple calculations are 569possible: appending 570.Em [+-*/%\*[Am]|^]number 571inside parentheses allows one to modify 572the value read from the file before it is used as an offset: 573.Bd -literal -offset indent 574# MS Windows executables are also valid MS-DOS executables 5750 string MZ 576# sometimes, the value at 0x18 is less that 0x40 but there's still an 577# extended executable, simply appended to the file 578\*[Gt]0x18 leshort \*[Lt]0x40 579\*[Gt]\*[Gt](4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP) 580\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) 581.Ed 582.Pp 583Sometimes you do not know the exact offset as this depends on the length or 584position (when indirection was used before) of preceding fields. 585You can specify an offset relative to the end of the last up-level 586field using 587.Sq \*[Am] 588as a prefix to the offset: 589.Bd -literal -offset indent 5900 string MZ 591\*[Gt]0x18 leshort \*[Gt]0x3f 592\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 593# immediately following the PE signature is the CPU type 594\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x14c for Intel 80386 595\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x184 for DEC Alpha 596.Ed 597.Pp 598Indirect and relative offsets can be combined: 599.Bd -literal -offset indent 6000 string MZ 601\*[Gt]0x18 leshort \*[Lt]0x40 602\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) 603# if it's not COFF, go back 512 bytes and add the offset taken 604# from byte 2/3, which is yet another way of finding the start 605# of the extended executable 606\*[Gt]\*[Gt]\*[Gt]\*[Am](2.s-514) string LE LE executable (MS Windows VxD driver) 607.Ed 608.Pp 609Or the other way around: 610.Bd -literal -offset indent 6110 string MZ 612\*[Gt]0x18 leshort \*[Gt]0x3f 613\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) 614# at offset 0x80 (-4, since relative offsets start at the end 615# of the up-level match) inside the LE header, we find the absolute 616# offset to the code area, where we look for a specific signature 617\*[Gt]\*[Gt]\*[Gt](\*[Am]0x7c.l+0x26) string UPX \eb, UPX compressed 618.Ed 619.Pp 620Or even both! 621.Bd -literal -offset indent 6220 string MZ 623\*[Gt]0x18 leshort \*[Gt]0x3f 624\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) 625# at offset 0x58 inside the LE header, we find the relative offset 626# to a data area where we look for a specific signature 627\*[Gt]\*[Gt]\*[Gt]\*[Am](\*[Am]0x54.l-3) string UNACE \eb, ACE self-extracting archive 628.Ed 629.Pp 630If you have to deal with offset/length pairs in your file, even the 631second value in a parenthesized expression can be taken from the file itself, 632using another set of parentheses. 633Note that this additional indirect offset is always relative to the 634start of the main indirect offset. 635.Bd -literal -offset indent 6360 string MZ 637\*[Gt]0x18 leshort \*[Gt]0x3f 638\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 639# search for the PE section called ".idata"... 640\*[Gt]\*[Gt]\*[Gt]\*[Am]0xf4 search/0x140 .idata 641# ...and go to the end of it, calculated from start+length; 642# these are located 14 and 10 bytes after the section name 643\*[Gt]\*[Gt]\*[Gt]\*[Gt](\*[Am]0xe.l+(-4)) string PK\e3\e4 \eb, ZIP self-extracting archive 644.Ed 645.Pp 646If you have a list of known avalues at a particular continuation level, 647and you want to provide a switch-like default case: 648.Bd -literal -offset indent 649# clear that continuation level match 650\*[Gt]18 clear 651\*[Gt]18 lelong 1 one 652\*[Gt]18 lelong 2 two 653\*[Gt]18 default x 654# print default match 655\*[Gt]\*[Gt]18 lelong x unmatched 0x%x 656.Ed 657.Sh SEE ALSO 658.Xr file 1 659\- the command that reads this file. 660.Sh BUGS 661The formats 662.Dv long , 663.Dv belong , 664.Dv lelong , 665.Dv melong , 666.Dv short , 667.Dv beshort , 668and 669.Dv leshort 670do not depend on the length of the C data types 671.Dv short 672and 673.Dv long 674on the platform, even though the Single 675.Ux 676Specification implies that they do. However, as OS X Mountain Lion has 677passed the Single 678.Ux 679Specification validation suite, and supplies a version of 680.Xr file 1 681in which they do not depend on the sizes of the C data types and that is 682built for a 64-bit environment in which 683.Dv long 684is 8 bytes rather than 4 bytes, presumably the validation suite does not 685test whether, for example 686.Dv long 687refers to an item with the same size as the C data type 688.Dv long . 689There should probably be 690.Dv type 691names 692.Dv int8 , 693.Dv uint8 , 694.Dv int16 , 695.Dv uint16 , 696.Dv int32 , 697.Dv uint32 , 698.Dv int64 , 699and 700.Dv uint64 , 701and specified-byte-order variants of them, 702to make it clearer that those types have specified widths. 703.\" 704.\" From: guy@sun.uucp (Guy Harris) 705.\" Newsgroups: net.bugs.usg 706.\" Subject: /etc/magic's format isn't well documented 707.\" Message-ID: <2752@sun.uucp> 708.\" Date: 3 Sep 85 08:19:07 GMT 709.\" Organization: Sun Microsystems, Inc. 710.\" Lines: 136 711.\" 712.\" Here's a manual page for the format accepted by the "file" made by adding 713.\" the changes I posted to the S5R2 version. 714.\" 715.\" Modified for Ian Darwin's version of the file command. 716