1.\" $NetBSD: magic.5,v 1.14 2015/01/02 21:15:32 christos Exp $ 2.\" 3.\" $File: magic.man,v 1.85 2015/01/01 17:07:34 christos Exp $ 4.Dd January 1, 2015 5.Dt MAGIC 5 6.Os 7.\" install as magic.4 on USG, magic.5 on V7, Berkeley and Linux systems. 8.Sh NAME 9.Nm magic 10.Nd file command's magic pattern file 11.Sh DESCRIPTION 12This manual page documents the format of the magic file as 13used by the 14.Xr file 1 15command, version 5.22. 16The 17.Xr file 1 18command identifies the type of a file using, 19among other tests, 20a test for whether the file contains certain 21.Dq "magic patterns" . 22The file 23.Pa /usr/share/misc/magic 24specifies what patterns are to be tested for, what message or 25MIME type to print if a particular pattern is found, 26and additional information to extract from the file. 27.Pp 28Each line of the file specifies a test to be performed. 29A test compares the data starting at a particular offset 30in the file with a byte value, a string or a numeric value. 31If the test succeeds, a message is printed. 32The line consists of the following fields: 33.Bl -tag -width ".Dv message" 34.It Dv offset 35A number specifying the offset, in bytes, into the file of the data 36which is to be tested. 37.It Dv type 38The type of the data to be tested. 39The possible values are: 40.Bl -tag -width ".Dv lestring16" 41.It Dv byte 42A one-byte value. 43.It Dv short 44A two-byte value in this machine's native byte order. 45.It Dv long 46A four-byte value in this machine's native byte order. 47.It Dv quad 48An eight-byte value in this machine's native byte order. 49.It Dv float 50A 32-bit single precision IEEE floating point number in this machine's native byte order. 51.It Dv double 52A 64-bit double precision IEEE floating point number in this machine's native byte order. 53.It Dv string 54A string of bytes. 55The string type specification can be optionally followed 56by /[WwcCtbT]*. 57The 58.Dq W 59flag compacts whitespace in the target, which must 60contain at least one whitespace character. 61If the magic has 62.Dv n 63consecutive blanks, the target needs at least 64.Dv n 65consecutive blanks to match. 66The 67.Dq w 68flag treats every blank in the magic as an optional blank. 69The 70.Dq c 71flag specifies case insensitive matching: lower case 72characters in the magic match both lower and upper case characters in the 73target, whereas upper case characters in the magic only match upper case 74characters in the target. 75The 76.Dq C 77flag specifies case insensitive matching: upper case 78characters in the magic match both lower and upper case characters in the 79target, whereas lower case characters in the magic only match upper case 80characters in the target. 81To do a complete case insensitive match, specify both 82.Dq c 83and 84.Dq C . 85The 86.Dq t 87flag forces the test to be done for text files, while the 88.Dq b 89flag forces the test to be done for binary files. 90The 91.Dq T 92flag causes the string to be trimmed, i.e. leading and trailing whitespace 93is deleted before the string is printed. 94.It Dv pstring 95A Pascal-style string where the first byte/short/int is interpreted as the 96unsigned length. 97The length defaults to byte and can be specified as a modifier. 98The following modifiers are supported: 99.Bl -tag -compact -width B 100.It B 101A byte length (default). 102.It H 103A 2 byte big endian length. 104.It h 105A 2 byte big little length. 106.It L 107A 4 byte big endian length. 108.It l 109A 4 byte big little length. 110.It J 111The length includes itself in its count. 112.El 113The string is not NUL terminated. 114.Dq J 115is used rather than the more 116valuable 117.Dq I 118because this type of length is a feature of the JPEG 119format. 120.It Dv date 121A four-byte value interpreted as a UNIX date. 122.It Dv qdate 123A eight-byte value interpreted as a UNIX date. 124.It Dv ldate 125A four-byte value interpreted as a UNIX-style date, but interpreted as 126local time rather than UTC. 127.It Dv qldate 128An eight-byte value interpreted as a UNIX-style date, but interpreted as 129local time rather than UTC. 130.It Dv qwdate 131An eight-byte value interpreted as a Windows-style date. 132.It Dv beid3 133A 32-bit ID3 length in big-endian byte order. 134.It Dv beshort 135A two-byte value in big-endian byte order. 136.It Dv belong 137A four-byte value in big-endian byte order. 138.It Dv bequad 139An eight-byte value in big-endian byte order. 140.It Dv befloat 141A 32-bit single precision IEEE floating point number in big-endian byte order. 142.It Dv bedouble 143A 64-bit double precision IEEE floating point number in big-endian byte order. 144.It Dv bedate 145A four-byte value in big-endian byte order, 146interpreted as a Unix date. 147.It Dv beqdate 148An eight-byte value in big-endian byte order, 149interpreted as a Unix date. 150.It Dv beldate 151A four-byte value in big-endian byte order, 152interpreted as a UNIX-style date, but interpreted as local time rather 153than UTC. 154.It Dv beqldate 155An eight-byte value in big-endian byte order, 156interpreted as a UNIX-style date, but interpreted as local time rather 157than UTC. 158.It Dv beqwdate 159An eight-byte value in big-endian byte order, 160interpreted as a Windows-style date. 161.It Dv bestring16 162A two-byte unicode (UCS16) string in big-endian byte order. 163.It Dv leid3 164A 32-bit ID3 length in little-endian byte order. 165.It Dv leshort 166A two-byte value in little-endian byte order. 167.It Dv lelong 168A four-byte value in little-endian byte order. 169.It Dv lequad 170An eight-byte value in little-endian byte order. 171.It Dv lefloat 172A 32-bit single precision IEEE floating point number in little-endian byte order. 173.It Dv ledouble 174A 64-bit double precision IEEE floating point number in little-endian byte order. 175.It Dv ledate 176A four-byte value in little-endian byte order, 177interpreted as a UNIX date. 178.It Dv leqdate 179An eight-byte value in little-endian byte order, 180interpreted as a UNIX date. 181.It Dv leldate 182A four-byte value in little-endian byte order, 183interpreted as a UNIX-style date, but interpreted as local time rather 184than UTC. 185.It Dv leqldate 186An eight-byte value in little-endian byte order, 187interpreted as a UNIX-style date, but interpreted as local time rather 188than UTC. 189.It Dv leqwdate 190An eight-byte value in little-endian byte order, 191interpreted as a Windows-style date. 192.It Dv lestring16 193A two-byte unicode (UCS16) string in little-endian byte order. 194.It Dv melong 195A four-byte value in middle-endian (PDP-11) byte order. 196.It Dv medate 197A four-byte value in middle-endian (PDP-11) byte order, 198interpreted as a UNIX date. 199.It Dv meldate 200A four-byte value in middle-endian (PDP-11) byte order, 201interpreted as a UNIX-style date, but interpreted as local time rather 202than UTC. 203.It Dv indirect 204Starting at the given offset, consult the magic database again. 205The offset of th 206.Dv indirect 207magic is by default absolute in the file, but one can specify 208.Dv /r 209to indicate that the offset is relative from the beginning of the entry. 210.It Dv name 211Define a 212.Dq named 213magic instance that can be called from another 214.Dv use 215magic entry, like a subroutine call. 216Named instance direct magic offsets are relative to the offset of the 217previous matched entry, but indirect offsets are relative to the beginning 218of the file as usual. 219Named magic entries always match. 220.It Dv use 221Recursively call the named magic starting from the current offset. 222If the name of the referenced begins with a 223.Dv ^ 224then the endianness of the magic is switched; if the magic mentioned 225.Dv leshort 226for example, 227it is treated as 228.Dv beshort 229and vice versa. 230This is useful to avoid duplicating the rules for different endianness. 231.It Dv regex 232A regular expression match in extended POSIX regular expression syntax 233(like egrep). 234Regular expressions can take exponential time to process, and their 235performance is hard to predict, so their use is discouraged. 236When used in production environments, their performance 237should be carefully checked. 238The size of the string to search should also be limited by specifying 239.Dv /<length> , 240to avoid performance issues scanning long files. 241The type specification can also be optionally followed by 242.Dv /[c][s][l] . 243The 244.Dq c 245flag makes the match case insensitive, while the 246.Dq s 247flag update the offset to the start offset of the match, rather than the end. 248The 249.Dq l 250modifier, changes the limit of length to mean number of lines instead of a 251byte count. 252Lines are delimited by the platforms native line delimiter. 253When a line count is specified, an implicit byte count also computed assuming 254each line is 80 characters long. 255If neither a byte or line count is specified, the search is limited automatically 256to 8KiB. 257.Dv ^ 258and 259.Dv $ 260match the beginning and end of individual lines, respectively, 261not beginning and end of file. 262.It Dv search 263A literal string search starting at the given offset. 264The same modifier flags can be used as for string patterns. 265The search expression must contain the range in the form 266.Dv /number, 267that is the number of positions at which the match will be 268attempted, starting from the start offset. 269This is suitable for 270searching larger binary expressions with variable offsets, using 271.Dv \e 272escapes for special characters. 273The order of modifier and number is not relevant. 274.It Dv default 275This is intended to be used with the test 276.Em x 277(which is always true) and it has no type. 278It matches when no other test at that continuation level has matched before. 279Clearing that matched tests for a continuation level, can be done using the 280.Dv clear 281test. 282.It Dv clear 283This test is always true and clears the match flag for that continuation level. 284It is intended to be used with the 285.Dv default 286test. 287.El 288.Pp 289For compatibility with the Single 290.Ux 291Standard, the type specifiers 292.Dv dC 293and 294.Dv d1 295are equivalent to 296.Dv byte , 297the type specifiers 298.Dv uC 299and 300.Dv u1 301are equivalent to 302.Dv ubyte , 303the type specifiers 304.Dv dS 305and 306.Dv d2 307are equivalent to 308.Dv short , 309the type specifiers 310.Dv uS 311and 312.Dv u2 313are equivalent to 314.Dv ushort , 315the type specifiers 316.Dv dI , 317.Dv dL , 318and 319.Dv d4 320are equivalent to 321.Dv long , 322the type specifiers 323.Dv uI , 324.Dv uL , 325and 326.Dv u4 327are equivalent to 328.Dv ulong , 329the type specifier 330.Dv d8 331is equivalent to 332.Dv quad , 333the type specifier 334.Dv u8 335is equivalent to 336.Dv uquad , 337and the type specifier 338.Dv s 339is equivalent to 340.Dv string . 341In addition, the type specifier 342.Dv dQ 343is equivalent to 344.Dv quad 345and the type specifier 346.Dv uQ 347is equivalent to 348.Dv uquad . 349.Pp 350Each top-level magic pattern (see below for an explanation of levels) 351is classified as text or binary according to the types used. 352Types 353.Dq regex 354and 355.Dq search 356are classified as text tests, unless non-printable characters are used 357in the pattern. 358All other tests are classified as binary. 359A top-level 360pattern is considered to be a test text when all its patterns are text 361patterns; otherwise, it is considered to be a binary pattern. 362When 363matching a file, binary patterns are tried first; if no match is 364found, and the file looks like text, then its encoding is determined 365and the text patterns are tried. 366.Pp 367The numeric types may optionally be followed by 368.Dv \*[Am] 369and a numeric value, 370to specify that the value is to be AND'ed with the 371numeric value before any comparisons are done. 372Prepending a 373.Dv u 374to the type indicates that ordered comparisons should be unsigned. 375.It Dv test 376The value to be compared with the value from the file. 377If the type is 378numeric, this value 379is specified in C form; if it is a string, it is specified as a C string 380with the usual escapes permitted (e.g. \en for new-line). 381.Pp 382Numeric values 383may be preceded by a character indicating the operation to be performed. 384It may be 385.Dv = , 386to specify that the value from the file must equal the specified value, 387.Dv \*[Lt] , 388to specify that the value from the file must be less than the specified 389value, 390.Dv \*[Gt] , 391to specify that the value from the file must be greater than the specified 392value, 393.Dv \*[Am] , 394to specify that the value from the file must have set all of the bits 395that are set in the specified value, 396.Dv ^ , 397to specify that the value from the file must have clear any of the bits 398that are set in the specified value, or 399.Dv ~ , 400the value specified after is negated before tested. 401.Dv x , 402to specify that any value will match. 403If the character is omitted, it is assumed to be 404.Dv = . 405Operators 406.Dv \*[Am] , 407.Dv ^ , 408and 409.Dv ~ 410don't work with floats and doubles. 411The operator 412.Dv !\& 413specifies that the line matches if the test does 414.Em not 415succeed. 416.Pp 417Numeric values are specified in C form; e.g. 418.Dv 13 419is decimal, 420.Dv 013 421is octal, and 422.Dv 0x13 423is hexadecimal. 424.Pp 425Numeric operations are not performed on date types, instead the numeric 426value is interpreted as an offset. 427.Pp 428For string values, the string from the 429file must match the specified string. 430The operators 431.Dv = , 432.Dv \*[Lt] 433and 434.Dv \*[Gt] 435(but not 436.Dv \*[Am] ) 437can be applied to strings. 438The length used for matching is that of the string argument 439in the magic file. 440This means that a line can match any non-empty string (usually used to 441then print the string), with 442.Em \*[Gt]\e0 443(because all non-empty strings are greater than the empty string). 444.Pp 445Dates are treated as numerical values in the respective internal 446representation. 447.Pp 448The special test 449.Em x 450always evaluates to true. 451.It Dv message 452The message to be printed if the comparison succeeds. 453If the string contains a 454.Xr printf 3 455format specification, the value from the file (with any specified masking 456performed) is printed using the message as the format string. 457If the string begins with 458.Dq \eb , 459the message printed is the remainder of the string with no whitespace 460added before it: multiple matches are normally separated by a single 461space. 462.El 463.Pp 464An APPLE 4+4 character APPLE creator and type can be specified as: 465.Bd -literal -offset indent 466!:apple CREATYPE 467.Ed 468.Pp 469A MIME type is given on a separate line, which must be the next 470non-blank or comment line after the magic line that identifies the 471file type, and has the following format: 472.Bd -literal -offset indent 473!:mime MIMETYPE 474.Ed 475.Pp 476i.e. the literal string 477.Dq !:mime 478followed by the MIME type. 479.Pp 480An optional strength can be supplied on a separate line which refers to 481the current magic description using the following format: 482.Bd -literal -offset indent 483!:strength OP VALUE 484.Ed 485.Pp 486The operand 487.Dv OP 488can be: 489.Dv + , 490.Dv - , 491.Dv * , 492or 493.Dv / 494and 495.Dv VALUE 496is a constant between 0 and 255. 497This constant is applied using the specified operand 498to the currently computed default magic strength. 499.Pp 500Some file formats contain additional information which is to be printed 501along with the file type or need additional tests to determine the true 502file type. 503These additional tests are introduced by one or more 504.Em \*[Gt] 505characters preceding the offset. 506The number of 507.Em \*[Gt] 508on the line indicates the level of the test; a line with no 509.Em \*[Gt] 510at the beginning is considered to be at level 0. 511Tests are arranged in a tree-like hierarchy: 512if the test on a line at level 513.Em n 514succeeds, all following tests at level 515.Em n+1 516are performed, and the messages printed if the tests succeed, until a line 517with level 518.Em n 519(or less) appears. 520For more complex files, one can use empty messages to get just the 521"if/then" effect, in the following way: 522.Bd -literal -offset indent 5230 string MZ 524\*[Gt]0x18 leshort \*[Lt]0x40 MS-DOS executable 525\*[Gt]0x18 leshort \*[Gt]0x3f extended PC executable (e.g., MS Windows) 526.Ed 527.Pp 528Offsets do not need to be constant, but can also be read from the file 529being examined. 530If the first character following the last 531.Em \*[Gt] 532is a 533.Em \&( 534then the string after the parenthesis is interpreted as an indirect offset. 535That means that the number after the parenthesis is used as an offset in 536the file. 537The value at that offset is read, and is used again as an offset 538in the file. 539Indirect offsets are of the form: 540.Em (( x [.[bislBISL]][+\-][ y ]) . 541The value of 542.Em x 543is used as an offset in the file. 544A byte, id3 length, short or long is read at that offset depending on the 545.Em [bislBISLm] 546type specifier. 547The capitalized types interpret the number as a big endian 548value, whereas the small letter versions interpret the number as a little 549endian value; 550the 551.Em m 552type interprets the number as a middle endian (PDP-11) value. 553To that number the value of 554.Em y 555is added and the result is used as an offset in the file. 556The default type if one is not specified is long. 557.Pp 558That way variable length structures can be examined: 559.Bd -literal -offset indent 560# MS Windows executables are also valid MS-DOS executables 5610 string MZ 562\*[Gt]0x18 leshort \*[Lt]0x40 MZ executable (MS-DOS) 563# skip the whole block below if it is not an extended executable 564\*[Gt]0x18 leshort \*[Gt]0x3f 565\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 566\*[Gt]\*[Gt](0x3c.l) string LX\e0\e0 LX executable (OS/2) 567.Ed 568.Pp 569This strategy of examining has a drawback: You must make sure that 570you eventually print something, or users may get empty output (like, when 571there is neither PE\e0\e0 nor LE\e0\e0 in the above example) 572.Pp 573If this indirect offset cannot be used directly, simple calculations are 574possible: appending 575.Em [+-*/%\*[Am]|^]number 576inside parentheses allows one to modify 577the value read from the file before it is used as an offset: 578.Bd -literal -offset indent 579# MS Windows executables are also valid MS-DOS executables 5800 string MZ 581# sometimes, the value at 0x18 is less that 0x40 but there's still an 582# extended executable, simply appended to the file 583\*[Gt]0x18 leshort \*[Lt]0x40 584\*[Gt]\*[Gt](4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP) 585\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) 586.Ed 587.Pp 588Sometimes you do not know the exact offset as this depends on the length or 589position (when indirection was used before) of preceding fields. 590You can specify an offset relative to the end of the last up-level 591field using 592.Sq \*[Am] 593as a prefix to the offset: 594.Bd -literal -offset indent 5950 string MZ 596\*[Gt]0x18 leshort \*[Gt]0x3f 597\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 598# immediately following the PE signature is the CPU type 599\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x14c for Intel 80386 600\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x184 for DEC Alpha 601.Ed 602.Pp 603Indirect and relative offsets can be combined: 604.Bd -literal -offset indent 6050 string MZ 606\*[Gt]0x18 leshort \*[Lt]0x40 607\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) 608# if it's not COFF, go back 512 bytes and add the offset taken 609# from byte 2/3, which is yet another way of finding the start 610# of the extended executable 611\*[Gt]\*[Gt]\*[Gt]\*[Am](2.s-514) string LE LE executable (MS Windows VxD driver) 612.Ed 613.Pp 614Or the other way around: 615.Bd -literal -offset indent 6160 string MZ 617\*[Gt]0x18 leshort \*[Gt]0x3f 618\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) 619# at offset 0x80 (-4, since relative offsets start at the end 620# of the up-level match) inside the LE header, we find the absolute 621# offset to the code area, where we look for a specific signature 622\*[Gt]\*[Gt]\*[Gt](\*[Am]0x7c.l+0x26) string UPX \eb, UPX compressed 623.Ed 624.Pp 625Or even both! 626.Bd -literal -offset indent 6270 string MZ 628\*[Gt]0x18 leshort \*[Gt]0x3f 629\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) 630# at offset 0x58 inside the LE header, we find the relative offset 631# to a data area where we look for a specific signature 632\*[Gt]\*[Gt]\*[Gt]\*[Am](\*[Am]0x54.l-3) string UNACE \eb, ACE self-extracting archive 633.Ed 634.Pp 635If you have to deal with offset/length pairs in your file, even the 636second value in a parenthesized expression can be taken from the file itself, 637using another set of parentheses. 638Note that this additional indirect offset is always relative to the 639start of the main indirect offset. 640.Bd -literal -offset indent 6410 string MZ 642\*[Gt]0x18 leshort \*[Gt]0x3f 643\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 644# search for the PE section called ".idata"... 645\*[Gt]\*[Gt]\*[Gt]\*[Am]0xf4 search/0x140 .idata 646# ...and go to the end of it, calculated from start+length; 647# these are located 14 and 10 bytes after the section name 648\*[Gt]\*[Gt]\*[Gt]\*[Gt](\*[Am]0xe.l+(-4)) string PK\e3\e4 \eb, ZIP self-extracting archive 649.Ed 650.Pp 651If you have a list of known avalues at a particular continuation level, 652and you want to provide a switch-like default case: 653.Bd -literal -offset indent 654# clear that continuation level match 655\*[Gt]18 clear 656\*[Gt]18 lelong 1 one 657\*[Gt]18 lelong 2 two 658\*[Gt]18 default x 659# print default match 660\*[Gt]\*[Gt]18 lelong x unmatched 0x%x 661.Ed 662.Sh SEE ALSO 663.Xr file 1 664\- the command that reads this file. 665.Sh BUGS 666The formats 667.Dv long , 668.Dv belong , 669.Dv lelong , 670.Dv melong , 671.Dv short , 672.Dv beshort , 673and 674.Dv leshort 675do not depend on the length of the C data types 676.Dv short 677and 678.Dv long 679on the platform, even though the Single 680.Ux 681Specification implies that they do. However, as OS X Mountain Lion has 682passed the Single 683.Ux 684Specification validation suite, and supplies a version of 685.Xr file 1 686in which they do not depend on the sizes of the C data types and that is 687built for a 64-bit environment in which 688.Dv long 689is 8 bytes rather than 4 bytes, presumably the validation suite does not 690test whether, for example 691.Dv long 692refers to an item with the same size as the C data type 693.Dv long . 694There should probably be 695.Dv type 696names 697.Dv int8 , 698.Dv uint8 , 699.Dv int16 , 700.Dv uint16 , 701.Dv int32 , 702.Dv uint32 , 703.Dv int64 , 704and 705.Dv uint64 , 706and specified-byte-order variants of them, 707to make it clearer that those types have specified widths. 708.\" 709.\" From: guy@sun.uucp (Guy Harris) 710.\" Newsgroups: net.bugs.usg 711.\" Subject: /etc/magic's format isn't well documented 712.\" Message-ID: <2752@sun.uucp> 713.\" Date: 3 Sep 85 08:19:07 GMT 714.\" Organization: Sun Microsystems, Inc. 715.\" Lines: 136 716.\" 717.\" Here's a manual page for the format accepted by the "file" made by adding 718.\" the changes I posted to the S5R2 version. 719.\" 720.\" Modified for Ian Darwin's version of the file command. 721