1.\" $NetBSD: magic.5,v 1.2 2009/05/08 16:39:46 christos Exp $ 2.\" 3.\" $File: magic.man,v 1.59 2008/11/06 23:22:53 christos Exp $ 4.Dd August 30, 2008 5.Dt MAGIC 5 6.Os 7.\" install as magic.4 on USG, magic.5 on V7, Berkeley and Linux systems. 8.Sh NAME 9.Nm magic 10.Nd file command's magic pattern file 11.Sh DESCRIPTION 12This manual page documents the format of the magic file as 13used by the 14.Xr file 1 15command, version 5.03. 16The 17.Xr file 1 18command identifies the type of a file using, 19among other tests, 20a test for whether the file contains certain 21.Dq "magic patterns" . 22The file 23.Pa /usr/share/misc/magic 24specifies what patterns are to be tested for, what message or 25MIME type to print if a particular pattern is found, 26and additional information to extract from the file. 27.Pp 28Each line of the file specifies a test to be performed. 29A test compares the data starting at a particular offset 30in the file with a byte value, a string or a numeric value. 31If the test succeeds, a message is printed. 32The line consists of the following fields: 33.Bl -tag -width ".Dv message" 34.It Dv offset 35A number specifying the offset, in bytes, into the file of the data 36which is to be tested. 37.It Dv type 38The type of the data to be tested. 39The possible values are: 40.Bl -tag -width ".Dv lestring16" 41.It Dv byte 42A one-byte value. 43.It Dv short 44A two-byte value in this machine's native byte order. 45.It Dv long 46A four-byte value in this machine's native byte order. 47.It Dv quad 48An eight-byte value in this machine's native byte order. 49.It Dv float 50A 32-bit single precision IEEE floating point number in this machine's native byte order. 51.It Dv double 52A 64-bit double precision IEEE floating point number in this machine's native byte order. 53.It Dv string 54A string of bytes. 55The string type specification can be optionally followed 56by /[Bbc]*. 57The 58.Dq B 59flag compacts whitespace in the target, which must 60contain at least one whitespace character. 61If the magic has 62.Dv n 63consecutive blanks, the target needs at least 64.Dv n 65consecutive blanks to match. 66The 67.Dq b 68flag treats every blank in the target as an optional blank. 69Finally the 70.Dq c 71flag, specifies case insensitive matching: lowercase 72characters in the magic match both lower and upper case characters in the 73target, whereas upper case characters in the magic only match uppercase 74characters in the target. 75.It Dv pstring 76A Pascal-style string where the first byte is interpreted as the an 77unsigned length. 78The string is not NUL terminated. 79.It Dv date 80A four-byte value interpreted as a UNIX date. 81.It Dv qdate 82A eight-byte value interpreted as a UNIX date. 83.It Dv ldate 84A four-byte value interpreted as a UNIX-style date, but interpreted as 85local time rather than UTC. 86.It Dv qldate 87An eight-byte value interpreted as a UNIX-style date, but interpreted as 88local time rather than UTC. 89.It Dv beid3 90A 32-bit ID3 length in big-endian byte order. 91.It Dv beshort 92A two-byte value in big-endian byte order. 93.It Dv belong 94A four-byte value in big-endian byte order. 95.It Dv bequad 96An eight-byte value in big-endian byte order. 97.It Dv befloat 98A 32-bit single precision IEEE floating point number in big-endian byte order. 99.It Dv bedouble 100A 64-bit double precision IEEE floating point number in big-endian byte order. 101.It Dv bedate 102A four-byte value in big-endian byte order, 103interpreted as a Unix date. 104.It Dv beqdate 105An eight-byte value in big-endian byte order, 106interpreted as a Unix date. 107.It Dv beldate 108A four-byte value in big-endian byte order, 109interpreted as a UNIX-style date, but interpreted as local time rather 110than UTC. 111.It Dv beqldate 112An eight-byte value in big-endian byte order, 113interpreted as a UNIX-style date, but interpreted as local time rather 114than UTC. 115.It Dv bestring16 116A two-byte unicode (UCS16) string in big-endian byte order. 117.It Dv leid3 118A 32-bit ID3 length in little-endian byte order. 119.It Dv leshort 120A two-byte value in little-endian byte order. 121.It Dv lelong 122A four-byte value in little-endian byte order. 123.It Dv lequad 124An eight-byte value in little-endian byte order. 125.It Dv lefloat 126A 32-bit single precision IEEE floating point number in little-endian byte order. 127.It Dv ledouble 128A 64-bit double precision IEEE floating point number in little-endian byte order. 129.It Dv ledate 130A four-byte value in little-endian byte order, 131interpreted as a UNIX date. 132.It Dv leqdate 133An eight-byte value in little-endian byte order, 134interpreted as a UNIX date. 135.It Dv leldate 136A four-byte value in little-endian byte order, 137interpreted as a UNIX-style date, but interpreted as local time rather 138than UTC. 139.It Dv leqldate 140An eight-byte value in little-endian byte order, 141interpreted as a UNIX-style date, but interpreted as local time rather 142than UTC. 143.It Dv lestring16 144A two-byte unicode (UCS16) string in little-endian byte order. 145.It Dv melong 146A four-byte value in middle-endian (PDP-11) byte order. 147.It Dv medate 148A four-byte value in middle-endian (PDP-11) byte order, 149interpreted as a UNIX date. 150.It Dv meldate 151A four-byte value in middle-endian (PDP-11) byte order, 152interpreted as a UNIX-style date, but interpreted as local time rather 153than UTC. 154.It Dv indirect 155Starting at the given offset, consult the magic database again. 156.It Dv regex 157A regular expression match in extended POSIX regular expression syntax 158(like egrep). Regular expressions can take exponential time to 159process, and their performance is hard to predict, so their use is 160discouraged. When used in production environments, their performance 161should be carefully checked. The type specification can be optionally 162followed by 163.Dv /[c][s] . 164The 165.Dq c 166flag makes the match case insensitive, while the 167.Dq s 168flag update the offset to the start offset of the match, rather than the end. 169The regular expression is tested against line 170.Dv N + 1 171onwards, where 172.Dv N 173is the given offset. 174Line endings are assumed to be in the machine's native format. 175.Dv ^ 176and 177.Dv $ 178match the beginning and end of individual lines, respectively, 179not beginning and end of file. 180.It Dv search 181A literal string search starting at the given offset. The same 182modifier flags can be used as for string patterns. The modifier flags 183(if any) must be followed by 184.Dv /number 185the range, that is, the number of positions at which the match will be 186attempted, starting from the start offset. This is suitable for 187searching larger binary expressions with variable offsets, using 188.Dv \e 189escapes for special characters. The offset works as for regex. 190.It Dv default 191This is intended to be used with the test 192.Em x 193(which is always true) and a message that is to be used if there are 194no other matches. 195.El 196.Pp 197Each top-level magic pattern (see below for an explanation of levels) 198is classified as text or binary according to the types used. Types 199.Dq regex 200and 201.Dq search 202are classified as text tests, unless non-printable characters are used 203in the pattern. All other tests are classified as binary. A top-level 204pattern is considered to be a test text when all its patterns are text 205patterns; otherwise, it is considered to be a binary pattern. When 206matching a file, binary patterns are tried first; if no match is 207found, and the file looks like text, then its encoding is determined 208and the text patterns are tried. 209.Pp 210The numeric types may optionally be followed by 211.Dv \*[Am] 212and a numeric value, 213to specify that the value is to be AND'ed with the 214numeric value before any comparisons are done. 215Prepending a 216.Dv u 217to the type indicates that ordered comparisons should be unsigned. 218.It Dv test 219The value to be compared with the value from the file. 220If the type is 221numeric, this value 222is specified in C form; if it is a string, it is specified as a C string 223with the usual escapes permitted (e.g. \en for new-line). 224.Pp 225Numeric values 226may be preceded by a character indicating the operation to be performed. 227It may be 228.Dv = , 229to specify that the value from the file must equal the specified value, 230.Dv \*[Lt] , 231to specify that the value from the file must be less than the specified 232value, 233.Dv \*[Gt] , 234to specify that the value from the file must be greater than the specified 235value, 236.Dv \*[Am] , 237to specify that the value from the file must have set all of the bits 238that are set in the specified value, 239.Dv ^ , 240to specify that the value from the file must have clear any of the bits 241that are set in the specified value, or 242.Dv ~ , 243the value specified after is negated before tested. 244.Dv x , 245to specify that any value will match. 246If the character is omitted, it is assumed to be 247.Dv = . 248Operators 249.Dv \*[Am] , 250.Dv ^ , 251and 252.Dv ~ 253don't work with floats and doubles. 254The operator 255.Dv !\& 256specifies that the line matches if the test does 257.Em not 258succeed. 259.Pp 260Numeric values are specified in C form; e.g. 261.Dv 13 262is decimal, 263.Dv 013 264is octal, and 265.Dv 0x13 266is hexadecimal. 267.Pp 268For string values, the string from the 269file must match the specified string. 270The operators 271.Dv = , 272.Dv \*[Lt] 273and 274.Dv \*[Gt] 275(but not 276.Dv \*[Am] ) 277can be applied to strings. 278The length used for matching is that of the string argument 279in the magic file. 280This means that a line can match any non-empty string (usually used to 281then print the string), with 282.Em \*[Gt]\e0 283(because all non-empty strings are greater than the empty string). 284.Pp 285The special test 286.Em x 287always evaluates to true. 288.Dv message 289The message to be printed if the comparison succeeds. 290If the string contains a 291.Xr printf 3 292format specification, the value from the file (with any specified masking 293performed) is printed using the message as the format string. 294If the string begins with 295.Dq \eb , 296the message printed is the remainder of the string with no whitespace 297added before it: multiple matches are normally separated by a single 298space. 299.El 300.Pp 301An APPLE 4+4 character APPLE creator and type can be specified as: 302.Bd -literal -offset indent 303!:apple CREATYPE 304.Ed 305.Pp 306A MIME type is given on a separate line, which must be the next 307non-blank or comment line after the magic line that identifies the 308file type, and has the following format: 309.Bd -literal -offset indent 310!:mime MIMETYPE 311.Ed 312.Pp 313i.e. the literal string 314.Dq !:mime 315followed by the MIME type. 316.Pp 317An optional strength can be supplied on a separate line which refers to 318the current magic description using the following format: 319.Bd -literal -offset indent 320!:strength OP VALUE 321.Ed 322.Pp 323The operand 324.Dv OP 325can be: 326.Dv + , 327.Dv - , 328.Dv * , 329or 330.Dv / 331and 332.Dv VALUE 333is a constant between 0 and 255. 334This constant is applied using the specified operand 335to the currently computed default magic strength. 336.Pp 337Some file formats contain additional information which is to be printed 338along with the file type or need additional tests to determine the true 339file type. 340These additional tests are introduced by one or more 341.Em \*[Gt] 342characters preceding the offset. 343The number of 344.Em \*[Gt] 345on the line indicates the level of the test; a line with no 346.Em \*[Gt] 347at the beginning is considered to be at level 0. 348Tests are arranged in a tree-like hierarchy: 349If a the test on a line at level 350.Em n 351succeeds, all following tests at level 352.Em n+1 353are performed, and the messages printed if the tests succeed, untile a line 354with level 355.Em n 356(or less) appears. 357For more complex files, one can use empty messages to get just the 358"if/then" effect, in the following way: 359.Bd -literal -offset indent 3600 string MZ 361\*[Gt]0x18 leshort \*[Lt]0x40 MS-DOS executable 362\*[Gt]0x18 leshort \*[Gt]0x3f extended PC executable (e.g., MS Windows) 363.Ed 364.Pp 365Offsets do not need to be constant, but can also be read from the file 366being examined. 367If the first character following the last 368.Em \*[Gt] 369is a 370.Em ( 371then the string after the parenthesis is interpreted as an indirect offset. 372That means that the number after the parenthesis is used as an offset in 373the file. 374The value at that offset is read, and is used again as an offset 375in the file. 376Indirect offsets are of the form: 377.Em (( x [.[bislBISL]][+\-][ y ]) . 378The value of 379.Em x 380is used as an offset in the file. 381A byte, id3 length, short or long is read at that offset depending on the 382.Em [bislBISLm] 383type specifier. 384The capitalized types interpret the number as a big endian 385value, whereas the small letter versions interpret the number as a little 386endian value; 387the 388.Em m 389type interprets the number as a middle endian (PDP-11) value. 390To that number the value of 391.Em y 392is added and the result is used as an offset in the file. 393The default type if one is not specified is long. 394.Pp 395That way variable length structures can be examined: 396.Bd -literal -offset indent 397# MS Windows executables are also valid MS-DOS executables 3980 string MZ 399\*[Gt]0x18 leshort \*[Lt]0x40 MZ executable (MS-DOS) 400# skip the whole block below if it is not an extended executable 401\*[Gt]0x18 leshort \*[Gt]0x3f 402\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 403\*[Gt]\*[Gt](0x3c.l) string LX\e0\e0 LX executable (OS/2) 404.Ed 405.Pp 406This strategy of examining has a drawback: You must make sure that 407you eventually print something, or users may get empty output (like, when 408there is neither PE\e0\e0 nor LE\e0\e0 in the above example) 409.Pp 410If this indirect offset cannot be used directly, simple calculations are 411possible: appending 412.Em [+-*/%\*[Am]|^]number 413inside parentheses allows one to modify 414the value read from the file before it is used as an offset: 415.Bd -literal -offset indent 416# MS Windows executables are also valid MS-DOS executables 4170 string MZ 418# sometimes, the value at 0x18 is less that 0x40 but there's still an 419# extended executable, simply appended to the file 420\*[Gt]0x18 leshort \*[Lt]0x40 421\*[Gt]\*[Gt](4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP) 422\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) 423.Ed 424.Pp 425Sometimes you do not know the exact offset as this depends on the length or 426position (when indirection was used before) of preceding fields. 427You can specify an offset relative to the end of the last up-level 428field using 429.Sq \*[Am] 430as a prefix to the offset: 431.Bd -literal -offset indent 4320 string MZ 433\*[Gt]0x18 leshort \*[Gt]0x3f 434\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 435# immediately following the PE signature is the CPU type 436\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x14c for Intel 80386 437\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x184 for DEC Alpha 438.Ed 439.Pp 440Indirect and relative offsets can be combined: 441.Bd -literal -offset indent 4420 string MZ 443\*[Gt]0x18 leshort \*[Lt]0x40 444\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) 445# if it's not COFF, go back 512 bytes and add the offset taken 446# from byte 2/3, which is yet another way of finding the start 447# of the extended executable 448\*[Gt]\*[Gt]\*[Gt]\*[Am](2.s-514) string LE LE executable (MS Windows VxD driver) 449.Ed 450.Pp 451Or the other way around: 452.Bd -literal -offset indent 4530 string MZ 454\*[Gt]0x18 leshort \*[Gt]0x3f 455\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) 456# at offset 0x80 (-4, since relative offsets start at the end 457# of the up-level match) inside the LE header, we find the absolute 458# offset to the code area, where we look for a specific signature 459\*[Gt]\*[Gt]\*[Gt](\*[Am]0x7c.l+0x26) string UPX \eb, UPX compressed 460.Ed 461.Pp 462Or even both! 463.Bd -literal -offset indent 4640 string MZ 465\*[Gt]0x18 leshort \*[Gt]0x3f 466\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) 467# at offset 0x58 inside the LE header, we find the relative offset 468# to a data area where we look for a specific signature 469\*[Gt]\*[Gt]\*[Gt]\*[Am](\*[Am]0x54.l-3) string UNACE \eb, ACE self-extracting archive 470.Ed 471.Pp 472Finally, if you have to deal with offset/length pairs in your file, even the 473second value in a parenthesized expression can be taken from the file itself, 474using another set of parentheses. 475Note that this additional indirect offset is always relative to the 476start of the main indirect offset. 477.Bd -literal -offset indent 4780 string MZ 479\*[Gt]0x18 leshort \*[Gt]0x3f 480\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 481# search for the PE section called ".idata"... 482\*[Gt]\*[Gt]\*[Gt]\*[Am]0xf4 search/0x140 .idata 483# ...and go to the end of it, calculated from start+length; 484# these are located 14 and 10 bytes after the section name 485\*[Gt]\*[Gt]\*[Gt]\*[Gt](\*[Am]0xe.l+(-4)) string PK\e3\e4 \eb, ZIP self-extracting archive 486.Ed 487.Sh SEE ALSO 488.Xr file 1 489\- the command that reads this file. 490.Sh BUGS 491The formats 492.Dv long , 493.Dv belong , 494.Dv lelong , 495.Dv melong , 496.Dv short , 497.Dv beshort , 498.Dv leshort , 499.Dv date , 500.Dv bedate , 501.Dv medate , 502.Dv ledate , 503.Dv beldate , 504.Dv leldate , 505and 506.Dv meldate 507are system-dependent; perhaps they should be specified as a number 508of bytes (2B, 4B, etc), 509since the files being recognized typically come from 510a system on which the lengths are invariant. 511.\" 512.\" From: guy@sun.uucp (Guy Harris) 513.\" Newsgroups: net.bugs.usg 514.\" Subject: /etc/magic's format isn't well documented 515.\" Message-ID: <2752@sun.uucp> 516.\" Date: 3 Sep 85 08:19:07 GMT 517.\" Organization: Sun Microsystems, Inc. 518.\" Lines: 136 519.\" 520.\" Here's a manual page for the format accepted by the "file" made by adding 521.\" the changes I posted to the S5R2 version. 522.\" 523.\" Modified for Ian Darwin's version of the file command. 524