1.\" $NetBSD: magic.5,v 1.4 2009/05/08 20:20:39 wiz Exp $ 2.\" 3.\" $File: magic.man,v 1.59 2008/11/06 23:22:53 christos Exp $ 4.Dd August 30, 2008 5.Dt MAGIC 5 6.Os 7.\" install as magic.4 on USG, magic.5 on V7, Berkeley and Linux systems. 8.Sh NAME 9.Nm magic 10.Nd file command's magic pattern file 11.Sh DESCRIPTION 12This manual page documents the format of the magic file as 13used by the 14.Xr file 1 15command, version 5.03. 16The 17.Xr file 1 18command identifies the type of a file using, 19among other tests, 20a test for whether the file contains certain 21.Dq "magic patterns" . 22The file 23.Pa /usr/share/misc/magic 24specifies what patterns are to be tested for, what message or 25MIME type to print if a particular pattern is found, 26and additional information to extract from the file. 27.Pp 28Each line of the file specifies a test to be performed. 29A test compares the data starting at a particular offset 30in the file with a byte value, a string or a numeric value. 31If the test succeeds, a message is printed. 32The line consists of the following fields: 33.Bl -tag -width ".Dv message" 34.It Dv offset 35A number specifying the offset, in bytes, into the file of the data 36which is to be tested. 37.It Dv type 38The type of the data to be tested. 39The possible values are: 40.Bl -tag -width ".Dv lestring16" 41.It Dv byte 42A one-byte value. 43.It Dv short 44A two-byte value in this machine's native byte order. 45.It Dv long 46A four-byte value in this machine's native byte order. 47.It Dv quad 48An eight-byte value in this machine's native byte order. 49.It Dv float 50A 32-bit single precision IEEE floating point number in this machine's native byte order. 51.It Dv double 52A 64-bit double precision IEEE floating point number in this machine's native byte order. 53.It Dv string 54A string of bytes. 55The string type specification can be optionally followed 56by /[Bbc]*. 57The 58.Dq B 59flag compacts whitespace in the target, which must 60contain at least one whitespace character. 61If the magic has 62.Dv n 63consecutive blanks, the target needs at least 64.Dv n 65consecutive blanks to match. 66The 67.Dq b 68flag treats every blank in the target as an optional blank. 69Finally the 70.Dq c 71flag, specifies case insensitive matching: lowercase 72characters in the magic match both lower and upper case characters in the 73target, whereas upper case characters in the magic only match uppercase 74characters in the target. 75.It Dv pstring 76A Pascal-style string where the first byte is interpreted as the an 77unsigned length. 78The string is not NUL terminated. 79.It Dv date 80A four-byte value interpreted as a UNIX date. 81.It Dv qdate 82A eight-byte value interpreted as a UNIX date. 83.It Dv ldate 84A four-byte value interpreted as a UNIX-style date, but interpreted as 85local time rather than UTC. 86.It Dv qldate 87An eight-byte value interpreted as a UNIX-style date, but interpreted as 88local time rather than UTC. 89.It Dv beid3 90A 32-bit ID3 length in big-endian byte order. 91.It Dv beshort 92A two-byte value in big-endian byte order. 93.It Dv belong 94A four-byte value in big-endian byte order. 95.It Dv bequad 96An eight-byte value in big-endian byte order. 97.It Dv befloat 98A 32-bit single precision IEEE floating point number in big-endian byte order. 99.It Dv bedouble 100A 64-bit double precision IEEE floating point number in big-endian byte order. 101.It Dv bedate 102A four-byte value in big-endian byte order, 103interpreted as a Unix date. 104.It Dv beqdate 105An eight-byte value in big-endian byte order, 106interpreted as a Unix date. 107.It Dv beldate 108A four-byte value in big-endian byte order, 109interpreted as a UNIX-style date, but interpreted as local time rather 110than UTC. 111.It Dv beqldate 112An eight-byte value in big-endian byte order, 113interpreted as a UNIX-style date, but interpreted as local time rather 114than UTC. 115.It Dv bestring16 116A two-byte unicode (UCS16) string in big-endian byte order. 117.It Dv leid3 118A 32-bit ID3 length in little-endian byte order. 119.It Dv leshort 120A two-byte value in little-endian byte order. 121.It Dv lelong 122A four-byte value in little-endian byte order. 123.It Dv lequad 124An eight-byte value in little-endian byte order. 125.It Dv lefloat 126A 32-bit single precision IEEE floating point number in little-endian byte order. 127.It Dv ledouble 128A 64-bit double precision IEEE floating point number in little-endian byte order. 129.It Dv ledate 130A four-byte value in little-endian byte order, 131interpreted as a UNIX date. 132.It Dv leqdate 133An eight-byte value in little-endian byte order, 134interpreted as a UNIX date. 135.It Dv leldate 136A four-byte value in little-endian byte order, 137interpreted as a UNIX-style date, but interpreted as local time rather 138than UTC. 139.It Dv leqldate 140An eight-byte value in little-endian byte order, 141interpreted as a UNIX-style date, but interpreted as local time rather 142than UTC. 143.It Dv lestring16 144A two-byte unicode (UCS16) string in little-endian byte order. 145.It Dv melong 146A four-byte value in middle-endian (PDP-11) byte order. 147.It Dv medate 148A four-byte value in middle-endian (PDP-11) byte order, 149interpreted as a UNIX date. 150.It Dv meldate 151A four-byte value in middle-endian (PDP-11) byte order, 152interpreted as a UNIX-style date, but interpreted as local time rather 153than UTC. 154.It Dv indirect 155Starting at the given offset, consult the magic database again. 156.It Dv regex 157A regular expression match in extended POSIX regular expression syntax 158(like egrep). 159Regular expressions can take exponential time to process, and their 160performance is hard to predict, so their use is discouraged. 161When used in production environments, their performance 162should be carefully checked. 163The type specification can be optionally followed by 164.Dv /[c][s] . 165The 166.Dq c 167flag makes the match case insensitive, while the 168.Dq s 169flag update the offset to the start offset of the match, rather than the end. 170The regular expression is tested against line 171.Dv N + 1 172onwards, where 173.Dv N 174is the given offset. 175Line endings are assumed to be in the machine's native format. 176.Dv ^ 177and 178.Dv $ 179match the beginning and end of individual lines, respectively, 180not beginning and end of file. 181.It Dv search 182A literal string search starting at the given offset. 183The same modifier flags can be used as for string patterns. 184The modifier flags (if any) must be followed by 185.Dv /number 186the range, that is, the number of positions at which the match will be 187attempted, starting from the start offset. 188This is suitable for 189searching larger binary expressions with variable offsets, using 190.Dv \e 191escapes for special characters. 192The offset works as for regex. 193.It Dv default 194This is intended to be used with the test 195.Em x 196(which is always true) and a message that is to be used if there are 197no other matches. 198.El 199.Pp 200Each top-level magic pattern (see below for an explanation of levels) 201is classified as text or binary according to the types used. 202Types 203.Dq regex 204and 205.Dq search 206are classified as text tests, unless non-printable characters are used 207in the pattern. 208All other tests are classified as binary. 209A top-level 210pattern is considered to be a test text when all its patterns are text 211patterns; otherwise, it is considered to be a binary pattern. 212When 213matching a file, binary patterns are tried first; if no match is 214found, and the file looks like text, then its encoding is determined 215and the text patterns are tried. 216.Pp 217The numeric types may optionally be followed by 218.Dv \*[Am] 219and a numeric value, 220to specify that the value is to be AND'ed with the 221numeric value before any comparisons are done. 222Prepending a 223.Dv u 224to the type indicates that ordered comparisons should be unsigned. 225.It Dv test 226The value to be compared with the value from the file. 227If the type is 228numeric, this value 229is specified in C form; if it is a string, it is specified as a C string 230with the usual escapes permitted (e.g. \en for new-line). 231.Pp 232Numeric values 233may be preceded by a character indicating the operation to be performed. 234It may be 235.Dv = , 236to specify that the value from the file must equal the specified value, 237.Dv \*[Lt] , 238to specify that the value from the file must be less than the specified 239value, 240.Dv \*[Gt] , 241to specify that the value from the file must be greater than the specified 242value, 243.Dv \*[Am] , 244to specify that the value from the file must have set all of the bits 245that are set in the specified value, 246.Dv ^ , 247to specify that the value from the file must have clear any of the bits 248that are set in the specified value, or 249.Dv ~ , 250the value specified after is negated before tested. 251.Dv x , 252to specify that any value will match. 253If the character is omitted, it is assumed to be 254.Dv = . 255Operators 256.Dv \*[Am] , 257.Dv ^ , 258and 259.Dv ~ 260don't work with floats and doubles. 261The operator 262.Dv !\& 263specifies that the line matches if the test does 264.Em not 265succeed. 266.Pp 267Numeric values are specified in C form; e.g. 268.Dv 13 269is decimal, 270.Dv 013 271is octal, and 272.Dv 0x13 273is hexadecimal. 274.Pp 275For string values, the string from the 276file must match the specified string. 277The operators 278.Dv = , 279.Dv \*[Lt] 280and 281.Dv \*[Gt] 282(but not 283.Dv \*[Am] ) 284can be applied to strings. 285The length used for matching is that of the string argument 286in the magic file. 287This means that a line can match any non-empty string (usually used to 288then print the string), with 289.Em \*[Gt]\e0 290(because all non-empty strings are greater than the empty string). 291.Pp 292The special test 293.Em x 294always evaluates to true. 295.Dv message 296The message to be printed if the comparison succeeds. 297If the string contains a 298.Xr printf 3 299format specification, the value from the file (with any specified masking 300performed) is printed using the message as the format string. 301If the string begins with 302.Dq \eb , 303the message printed is the remainder of the string with no whitespace 304added before it: multiple matches are normally separated by a single 305space. 306.El 307.Pp 308An APPLE 4+4 character APPLE creator and type can be specified as: 309.Bd -literal -offset indent 310!:apple CREATYPE 311.Ed 312.Pp 313A MIME type is given on a separate line, which must be the next 314non-blank or comment line after the magic line that identifies the 315file type, and has the following format: 316.Bd -literal -offset indent 317!:mime MIMETYPE 318.Ed 319.Pp 320i.e. the literal string 321.Dq !:mime 322followed by the MIME type. 323.Pp 324An optional strength can be supplied on a separate line which refers to 325the current magic description using the following format: 326.Bd -literal -offset indent 327!:strength OP VALUE 328.Ed 329.Pp 330The operand 331.Dv OP 332can be: 333.Dv + , 334.Dv - , 335.Dv * , 336or 337.Dv / 338and 339.Dv VALUE 340is a constant between 0 and 255. 341This constant is applied using the specified operand 342to the currently computed default magic strength. 343.Pp 344Some file formats contain additional information which is to be printed 345along with the file type or need additional tests to determine the true 346file type. 347These additional tests are introduced by one or more 348.Em \*[Gt] 349characters preceding the offset. 350The number of 351.Em \*[Gt] 352on the line indicates the level of the test; a line with no 353.Em \*[Gt] 354at the beginning is considered to be at level 0. 355Tests are arranged in a tree-like hierarchy: 356If a the test on a line at level 357.Em n 358succeeds, all following tests at level 359.Em n+1 360are performed, and the messages printed if the tests succeed, untile a line 361with level 362.Em n 363(or less) appears. 364For more complex files, one can use empty messages to get just the 365"if/then" effect, in the following way: 366.Bd -literal -offset indent 3670 string MZ 368\*[Gt]0x18 leshort \*[Lt]0x40 MS-DOS executable 369\*[Gt]0x18 leshort \*[Gt]0x3f extended PC executable (e.g., MS Windows) 370.Ed 371.Pp 372Offsets do not need to be constant, but can also be read from the file 373being examined. 374If the first character following the last 375.Em \*[Gt] 376is a 377.Em ( 378then the string after the parenthesis is interpreted as an indirect offset. 379That means that the number after the parenthesis is used as an offset in 380the file. 381The value at that offset is read, and is used again as an offset 382in the file. 383Indirect offsets are of the form: 384.Em (( x [.[bislBISL]][+\-][ y ]) . 385The value of 386.Em x 387is used as an offset in the file. 388A byte, id3 length, short or long is read at that offset depending on the 389.Em [bislBISLm] 390type specifier. 391The capitalized types interpret the number as a big endian 392value, whereas the small letter versions interpret the number as a little 393endian value; 394the 395.Em m 396type interprets the number as a middle endian (PDP-11) value. 397To that number the value of 398.Em y 399is added and the result is used as an offset in the file. 400The default type if one is not specified is long. 401.Pp 402That way variable length structures can be examined: 403.Bd -literal -offset indent 404# MS Windows executables are also valid MS-DOS executables 4050 string MZ 406\*[Gt]0x18 leshort \*[Lt]0x40 MZ executable (MS-DOS) 407# skip the whole block below if it is not an extended executable 408\*[Gt]0x18 leshort \*[Gt]0x3f 409\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 410\*[Gt]\*[Gt](0x3c.l) string LX\e0\e0 LX executable (OS/2) 411.Ed 412.Pp 413This strategy of examining has a drawback: You must make sure that 414you eventually print something, or users may get empty output (like, when 415there is neither PE\e0\e0 nor LE\e0\e0 in the above example) 416.Pp 417If this indirect offset cannot be used directly, simple calculations are 418possible: appending 419.Em [+-*/%\*[Am]|^]number 420inside parentheses allows one to modify 421the value read from the file before it is used as an offset: 422.Bd -literal -offset indent 423# MS Windows executables are also valid MS-DOS executables 4240 string MZ 425# sometimes, the value at 0x18 is less that 0x40 but there's still an 426# extended executable, simply appended to the file 427\*[Gt]0x18 leshort \*[Lt]0x40 428\*[Gt]\*[Gt](4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP) 429\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) 430.Ed 431.Pp 432Sometimes you do not know the exact offset as this depends on the length or 433position (when indirection was used before) of preceding fields. 434You can specify an offset relative to the end of the last up-level 435field using 436.Sq \*[Am] 437as a prefix to the offset: 438.Bd -literal -offset indent 4390 string MZ 440\*[Gt]0x18 leshort \*[Gt]0x3f 441\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 442# immediately following the PE signature is the CPU type 443\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x14c for Intel 80386 444\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x184 for DEC Alpha 445.Ed 446.Pp 447Indirect and relative offsets can be combined: 448.Bd -literal -offset indent 4490 string MZ 450\*[Gt]0x18 leshort \*[Lt]0x40 451\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) 452# if it's not COFF, go back 512 bytes and add the offset taken 453# from byte 2/3, which is yet another way of finding the start 454# of the extended executable 455\*[Gt]\*[Gt]\*[Gt]\*[Am](2.s-514) string LE LE executable (MS Windows VxD driver) 456.Ed 457.Pp 458Or the other way around: 459.Bd -literal -offset indent 4600 string MZ 461\*[Gt]0x18 leshort \*[Gt]0x3f 462\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) 463# at offset 0x80 (-4, since relative offsets start at the end 464# of the up-level match) inside the LE header, we find the absolute 465# offset to the code area, where we look for a specific signature 466\*[Gt]\*[Gt]\*[Gt](\*[Am]0x7c.l+0x26) string UPX \eb, UPX compressed 467.Ed 468.Pp 469Or even both! 470.Bd -literal -offset indent 4710 string MZ 472\*[Gt]0x18 leshort \*[Gt]0x3f 473\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) 474# at offset 0x58 inside the LE header, we find the relative offset 475# to a data area where we look for a specific signature 476\*[Gt]\*[Gt]\*[Gt]\*[Am](\*[Am]0x54.l-3) string UNACE \eb, ACE self-extracting archive 477.Ed 478.Pp 479Finally, if you have to deal with offset/length pairs in your file, even the 480second value in a parenthesized expression can be taken from the file itself, 481using another set of parentheses. 482Note that this additional indirect offset is always relative to the 483start of the main indirect offset. 484.Bd -literal -offset indent 4850 string MZ 486\*[Gt]0x18 leshort \*[Gt]0x3f 487\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 488# search for the PE section called ".idata"... 489\*[Gt]\*[Gt]\*[Gt]\*[Am]0xf4 search/0x140 .idata 490# ...and go to the end of it, calculated from start+length; 491# these are located 14 and 10 bytes after the section name 492\*[Gt]\*[Gt]\*[Gt]\*[Gt](\*[Am]0xe.l+(-4)) string PK\e3\e4 \eb, ZIP self-extracting archive 493.Ed 494.Sh SEE ALSO 495.Xr file 1 496\- the command that reads this file. 497.Sh BUGS 498The formats 499.Dv long , 500.Dv belong , 501.Dv lelong , 502.Dv melong , 503.Dv short , 504.Dv beshort , 505.Dv leshort , 506.Dv date , 507.Dv bedate , 508.Dv medate , 509.Dv ledate , 510.Dv beldate , 511.Dv leldate , 512and 513.Dv meldate 514are system-dependent; perhaps they should be specified as a number 515of bytes (2B, 4B, etc), 516since the files being recognized typically come from 517a system on which the lengths are invariant. 518.\" 519.\" From: guy@sun.uucp (Guy Harris) 520.\" Newsgroups: net.bugs.usg 521.\" Subject: /etc/magic's format isn't well documented 522.\" Message-ID: <2752@sun.uucp> 523.\" Date: 3 Sep 85 08:19:07 GMT 524.\" Organization: Sun Microsystems, Inc. 525.\" Lines: 136 526.\" 527.\" Here's a manual page for the format accepted by the "file" made by adding 528.\" the changes I posted to the S5R2 version. 529.\" 530.\" Modified for Ian Darwin's version of the file command. 531