1.\" $File: magic.man,v 1.68 2011/04/20 19:08:44 christos Exp $ 2.Dd April 20, 2011 3.Dt MAGIC __FSECTION__ 4.Os 5.\" install as magic.4 on USG, magic.5 on V7, Berkeley and Linux systems. 6.Sh NAME 7.Nm magic 8.Nd file command's magic pattern file 9.Sh DESCRIPTION 10This manual page documents the format of the magic file as 11used by the 12.Xr file __CSECTION__ 13command, version __VERSION__. 14The 15.Xr file __CSECTION__ 16command identifies the type of a file using, 17among other tests, 18a test for whether the file contains certain 19.Dq "magic patterns" . 20The file 21.Pa __MAGIC__ 22specifies what patterns are to be tested for, what message or 23MIME type to print if a particular pattern is found, 24and additional information to extract from the file. 25.Pp 26Each line of the file specifies a test to be performed. 27A test compares the data starting at a particular offset 28in the file with a byte value, a string or a numeric value. 29If the test succeeds, a message is printed. 30The line consists of the following fields: 31.Bl -tag -width ".Dv message" 32.It Dv offset 33A number specifying the offset, in bytes, into the file of the data 34which is to be tested. 35.It Dv type 36The type of the data to be tested. 37The possible values are: 38.Bl -tag -width ".Dv lestring16" 39.It Dv byte 40A one-byte value. 41.It Dv short 42A two-byte value in this machine's native byte order. 43.It Dv long 44A four-byte value in this machine's native byte order. 45.It Dv quad 46An eight-byte value in this machine's native byte order. 47.It Dv float 48A 32-bit single precision IEEE floating point number in this machine's native byte order. 49.It Dv double 50A 64-bit double precision IEEE floating point number in this machine's native byte order. 51.It Dv string 52A string of bytes. 53The string type specification can be optionally followed 54by /[WwcCtb]*. 55The 56.Dq W 57flag compacts whitespace in the target, which must 58contain at least one whitespace character. 59If the magic has 60.Dv n 61consecutive blanks, the target needs at least 62.Dv n 63consecutive blanks to match. 64The 65.Dq w 66flag treats every blank in the target as an optional blank. 67The 68.Dq c 69flag, specifies case insensitive matching: lower case 70characters in the magic match both lower and upper case characters in the 71target, whereas upper case characters in the magic only match upper case 72characters in the target. 73The 74.Dq C 75flag, specifies case insensitive matching: upper case 76characters in the magic match both lower and upper case characters in the 77target, whereas lower case characters in the magic only match upper case 78characters in the target. 79To do a complete case insensitive match, specify both 80.Dq c 81and 82.Dq C . 83The 84.Dq t 85flag, forces the test to be done for text files, while the 86.Dq b 87flag, forces the test to be done for binary files. 88.It Dv pstring 89A Pascal-style string where the first byte/short/int is interpreted as the an 90unsigned length. 91The length defaults to byte and can be specified as a modifier. 92The following modifiers are supported: 93.Bl -tag -compact -width B 94.It B 95A byte length (default). 96.It H 97A 2 byte big endian length. 98.It h 99A 2 byte big little length. 100.It L 101A 4 byte big endian length. 102.It l 103A 4 byte big little length. 104.It J 105The length includes itself in its count. 106.El 107The string is not NUL terminated. 108.Dq J 109is used rather than the more 110valuable 111.Dq I 112because this type of length is a feature of the JPEG 113format. 114.It Dv date 115A four-byte value interpreted as a UNIX date. 116.It Dv qdate 117A eight-byte value interpreted as a UNIX date. 118.It Dv ldate 119A four-byte value interpreted as a UNIX-style date, but interpreted as 120local time rather than UTC. 121.It Dv qldate 122An eight-byte value interpreted as a UNIX-style date, but interpreted as 123local time rather than UTC. 124.It Dv beid3 125A 32-bit ID3 length in big-endian byte order. 126.It Dv beshort 127A two-byte value in big-endian byte order. 128.It Dv belong 129A four-byte value in big-endian byte order. 130.It Dv bequad 131An eight-byte value in big-endian byte order. 132.It Dv befloat 133A 32-bit single precision IEEE floating point number in big-endian byte order. 134.It Dv bedouble 135A 64-bit double precision IEEE floating point number in big-endian byte order. 136.It Dv bedate 137A four-byte value in big-endian byte order, 138interpreted as a Unix date. 139.It Dv beqdate 140An eight-byte value in big-endian byte order, 141interpreted as a Unix date. 142.It Dv beldate 143A four-byte value in big-endian byte order, 144interpreted as a UNIX-style date, but interpreted as local time rather 145than UTC. 146.It Dv beqldate 147An eight-byte value in big-endian byte order, 148interpreted as a UNIX-style date, but interpreted as local time rather 149than UTC. 150.It Dv bestring16 151A two-byte unicode (UCS16) string in big-endian byte order. 152.It Dv leid3 153A 32-bit ID3 length in little-endian byte order. 154.It Dv leshort 155A two-byte value in little-endian byte order. 156.It Dv lelong 157A four-byte value in little-endian byte order. 158.It Dv lequad 159An eight-byte value in little-endian byte order. 160.It Dv lefloat 161A 32-bit single precision IEEE floating point number in little-endian byte order. 162.It Dv ledouble 163A 64-bit double precision IEEE floating point number in little-endian byte order. 164.It Dv ledate 165A four-byte value in little-endian byte order, 166interpreted as a UNIX date. 167.It Dv leqdate 168An eight-byte value in little-endian byte order, 169interpreted as a UNIX date. 170.It Dv leldate 171A four-byte value in little-endian byte order, 172interpreted as a UNIX-style date, but interpreted as local time rather 173than UTC. 174.It Dv leqldate 175An eight-byte value in little-endian byte order, 176interpreted as a UNIX-style date, but interpreted as local time rather 177than UTC. 178.It Dv lestring16 179A two-byte unicode (UCS16) string in little-endian byte order. 180.It Dv melong 181A four-byte value in middle-endian (PDP-11) byte order. 182.It Dv medate 183A four-byte value in middle-endian (PDP-11) byte order, 184interpreted as a UNIX date. 185.It Dv meldate 186A four-byte value in middle-endian (PDP-11) byte order, 187interpreted as a UNIX-style date, but interpreted as local time rather 188than UTC. 189.It Dv indirect 190Starting at the given offset, consult the magic database again. 191.It Dv regex 192A regular expression match in extended POSIX regular expression syntax 193(like egrep). Regular expressions can take exponential time to 194process, and their performance is hard to predict, so their use is 195discouraged. When used in production environments, their performance 196should be carefully checked. The type specification can be optionally 197followed by 198.Dv /[c][s] . 199The 200.Dq c 201flag makes the match case insensitive, while the 202.Dq s 203flag update the offset to the start offset of the match, rather than the end. 204The regular expression is tested against line 205.Dv N + 1 206onwards, where 207.Dv N 208is the given offset. 209Line endings are assumed to be in the machine's native format. 210.Dv ^ 211and 212.Dv $ 213match the beginning and end of individual lines, respectively, 214not beginning and end of file. 215.It Dv search 216A literal string search starting at the given offset. The same 217modifier flags can be used as for string patterns. The modifier flags 218(if any) must be followed by 219.Dv /number 220the range, that is, the number of positions at which the match will be 221attempted, starting from the start offset. This is suitable for 222searching larger binary expressions with variable offsets, using 223.Dv \e 224escapes for special characters. The offset works as for regex. 225.It Dv default 226This is intended to be used with the test 227.Em x 228(which is always true) and a message that is to be used if there are 229no other matches. 230.El 231.Pp 232Each top-level magic pattern (see below for an explanation of levels) 233is classified as text or binary according to the types used. Types 234.Dq regex 235and 236.Dq search 237are classified as text tests, unless non-printable characters are used 238in the pattern. All other tests are classified as binary. A top-level 239pattern is considered to be a test text when all its patterns are text 240patterns; otherwise, it is considered to be a binary pattern. When 241matching a file, binary patterns are tried first; if no match is 242found, and the file looks like text, then its encoding is determined 243and the text patterns are tried. 244.Pp 245The numeric types may optionally be followed by 246.Dv \*[Am] 247and a numeric value, 248to specify that the value is to be AND'ed with the 249numeric value before any comparisons are done. 250Prepending a 251.Dv u 252to the type indicates that ordered comparisons should be unsigned. 253.It Dv test 254The value to be compared with the value from the file. 255If the type is 256numeric, this value 257is specified in C form; if it is a string, it is specified as a C string 258with the usual escapes permitted (e.g. \en for new-line). 259.Pp 260Numeric values 261may be preceded by a character indicating the operation to be performed. 262It may be 263.Dv = , 264to specify that the value from the file must equal the specified value, 265.Dv \*[Lt] , 266to specify that the value from the file must be less than the specified 267value, 268.Dv \*[Gt] , 269to specify that the value from the file must be greater than the specified 270value, 271.Dv \*[Am] , 272to specify that the value from the file must have set all of the bits 273that are set in the specified value, 274.Dv ^ , 275to specify that the value from the file must have clear any of the bits 276that are set in the specified value, or 277.Dv ~ , 278the value specified after is negated before tested. 279.Dv x , 280to specify that any value will match. 281If the character is omitted, it is assumed to be 282.Dv = . 283Operators 284.Dv \*[Am] , 285.Dv ^ , 286and 287.Dv ~ 288don't work with floats and doubles. 289The operator 290.Dv !\& 291specifies that the line matches if the test does 292.Em not 293succeed. 294.Pp 295Numeric values are specified in C form; e.g. 296.Dv 13 297is decimal, 298.Dv 013 299is octal, and 300.Dv 0x13 301is hexadecimal. 302.Pp 303For string values, the string from the 304file must match the specified string. 305The operators 306.Dv = , 307.Dv \*[Lt] 308and 309.Dv \*[Gt] 310(but not 311.Dv \*[Am] ) 312can be applied to strings. 313The length used for matching is that of the string argument 314in the magic file. 315This means that a line can match any non-empty string (usually used to 316then print the string), with 317.Em \*[Gt]\e0 318(because all non-empty strings are greater than the empty string). 319.Pp 320The special test 321.Em x 322always evaluates to true. 323.It Dv message 324The message to be printed if the comparison succeeds. 325If the string contains a 326.Xr printf 3 327format specification, the value from the file (with any specified masking 328performed) is printed using the message as the format string. 329If the string begins with 330.Dq \eb , 331the message printed is the remainder of the string with no whitespace 332added before it: multiple matches are normally separated by a single 333space. 334.El 335.Pp 336An APPLE 4+4 character APPLE creator and type can be specified as: 337.Bd -literal -offset indent 338!:apple CREATYPE 339.Ed 340.Pp 341A MIME type is given on a separate line, which must be the next 342non-blank or comment line after the magic line that identifies the 343file type, and has the following format: 344.Bd -literal -offset indent 345!:mime MIMETYPE 346.Ed 347.Pp 348i.e. the literal string 349.Dq !:mime 350followed by the MIME type. 351.Pp 352An optional strength can be supplied on a separate line which refers to 353the current magic description using the following format: 354.Bd -literal -offset indent 355!:strength OP VALUE 356.Ed 357.Pp 358The operand 359.Dv OP 360can be: 361.Dv + , 362.Dv - , 363.Dv * , 364or 365.Dv / 366and 367.Dv VALUE 368is a constant between 0 and 255. 369This constant is applied using the specified operand 370to the currently computed default magic strength. 371.Pp 372Some file formats contain additional information which is to be printed 373along with the file type or need additional tests to determine the true 374file type. 375These additional tests are introduced by one or more 376.Em \*[Gt] 377characters preceding the offset. 378The number of 379.Em \*[Gt] 380on the line indicates the level of the test; a line with no 381.Em \*[Gt] 382at the beginning is considered to be at level 0. 383Tests are arranged in a tree-like hierarchy: 384if the test on a line at level 385.Em n 386succeeds, all following tests at level 387.Em n+1 388are performed, and the messages printed if the tests succeed, until a line 389with level 390.Em n 391(or less) appears. 392For more complex files, one can use empty messages to get just the 393"if/then" effect, in the following way: 394.Bd -literal -offset indent 3950 string MZ 396\*[Gt]0x18 leshort \*[Lt]0x40 MS-DOS executable 397\*[Gt]0x18 leshort \*[Gt]0x3f extended PC executable (e.g., MS Windows) 398.Ed 399.Pp 400Offsets do not need to be constant, but can also be read from the file 401being examined. 402If the first character following the last 403.Em \*[Gt] 404is a 405.Em \&( 406then the string after the parenthesis is interpreted as an indirect offset. 407That means that the number after the parenthesis is used as an offset in 408the file. 409The value at that offset is read, and is used again as an offset 410in the file. 411Indirect offsets are of the form: 412.Em (( x [.[bislBISL]][+\-][ y ]) . 413The value of 414.Em x 415is used as an offset in the file. 416A byte, id3 length, short or long is read at that offset depending on the 417.Em [bislBISLm] 418type specifier. 419The capitalized types interpret the number as a big endian 420value, whereas the small letter versions interpret the number as a little 421endian value; 422the 423.Em m 424type interprets the number as a middle endian (PDP-11) value. 425To that number the value of 426.Em y 427is added and the result is used as an offset in the file. 428The default type if one is not specified is long. 429.Pp 430That way variable length structures can be examined: 431.Bd -literal -offset indent 432# MS Windows executables are also valid MS-DOS executables 4330 string MZ 434\*[Gt]0x18 leshort \*[Lt]0x40 MZ executable (MS-DOS) 435# skip the whole block below if it is not an extended executable 436\*[Gt]0x18 leshort \*[Gt]0x3f 437\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 438\*[Gt]\*[Gt](0x3c.l) string LX\e0\e0 LX executable (OS/2) 439.Ed 440.Pp 441This strategy of examining has a drawback: You must make sure that 442you eventually print something, or users may get empty output (like, when 443there is neither PE\e0\e0 nor LE\e0\e0 in the above example) 444.Pp 445If this indirect offset cannot be used directly, simple calculations are 446possible: appending 447.Em [+-*/%\*[Am]|^]number 448inside parentheses allows one to modify 449the value read from the file before it is used as an offset: 450.Bd -literal -offset indent 451# MS Windows executables are also valid MS-DOS executables 4520 string MZ 453# sometimes, the value at 0x18 is less that 0x40 but there's still an 454# extended executable, simply appended to the file 455\*[Gt]0x18 leshort \*[Lt]0x40 456\*[Gt]\*[Gt](4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP) 457\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) 458.Ed 459.Pp 460Sometimes you do not know the exact offset as this depends on the length or 461position (when indirection was used before) of preceding fields. 462You can specify an offset relative to the end of the last up-level 463field using 464.Sq \*[Am] 465as a prefix to the offset: 466.Bd -literal -offset indent 4670 string MZ 468\*[Gt]0x18 leshort \*[Gt]0x3f 469\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 470# immediately following the PE signature is the CPU type 471\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x14c for Intel 80386 472\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x184 for DEC Alpha 473.Ed 474.Pp 475Indirect and relative offsets can be combined: 476.Bd -literal -offset indent 4770 string MZ 478\*[Gt]0x18 leshort \*[Lt]0x40 479\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) 480# if it's not COFF, go back 512 bytes and add the offset taken 481# from byte 2/3, which is yet another way of finding the start 482# of the extended executable 483\*[Gt]\*[Gt]\*[Gt]\*[Am](2.s-514) string LE LE executable (MS Windows VxD driver) 484.Ed 485.Pp 486Or the other way around: 487.Bd -literal -offset indent 4880 string MZ 489\*[Gt]0x18 leshort \*[Gt]0x3f 490\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) 491# at offset 0x80 (-4, since relative offsets start at the end 492# of the up-level match) inside the LE header, we find the absolute 493# offset to the code area, where we look for a specific signature 494\*[Gt]\*[Gt]\*[Gt](\*[Am]0x7c.l+0x26) string UPX \eb, UPX compressed 495.Ed 496.Pp 497Or even both! 498.Bd -literal -offset indent 4990 string MZ 500\*[Gt]0x18 leshort \*[Gt]0x3f 501\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) 502# at offset 0x58 inside the LE header, we find the relative offset 503# to a data area where we look for a specific signature 504\*[Gt]\*[Gt]\*[Gt]\*[Am](\*[Am]0x54.l-3) string UNACE \eb, ACE self-extracting archive 505.Ed 506.Pp 507Finally, if you have to deal with offset/length pairs in your file, even the 508second value in a parenthesized expression can be taken from the file itself, 509using another set of parentheses. 510Note that this additional indirect offset is always relative to the 511start of the main indirect offset. 512.Bd -literal -offset indent 5130 string MZ 514\*[Gt]0x18 leshort \*[Gt]0x3f 515\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 516# search for the PE section called ".idata"... 517\*[Gt]\*[Gt]\*[Gt]\*[Am]0xf4 search/0x140 .idata 518# ...and go to the end of it, calculated from start+length; 519# these are located 14 and 10 bytes after the section name 520\*[Gt]\*[Gt]\*[Gt]\*[Gt](\*[Am]0xe.l+(-4)) string PK\e3\e4 \eb, ZIP self-extracting archive 521.Ed 522.Sh SEE ALSO 523.Xr file __CSECTION__ 524\- the command that reads this file. 525.Sh BUGS 526The formats 527.Dv long , 528.Dv belong , 529.Dv lelong , 530.Dv melong , 531.Dv short , 532.Dv beshort , 533.Dv leshort , 534.Dv date , 535.Dv bedate , 536.Dv medate , 537.Dv ledate , 538.Dv beldate , 539.Dv leldate , 540and 541.Dv meldate 542are system-dependent; perhaps they should be specified as a number 543of bytes (2B, 4B, etc), 544since the files being recognized typically come from 545a system on which the lengths are invariant. 546.\" 547.\" From: guy@sun.uucp (Guy Harris) 548.\" Newsgroups: net.bugs.usg 549.\" Subject: /etc/magic's format isn't well documented 550.\" Message-ID: <2752@sun.uucp> 551.\" Date: 3 Sep 85 08:19:07 GMT 552.\" Organization: Sun Microsystems, Inc. 553.\" Lines: 136 554.\" 555.\" Here's a manual page for the format accepted by the "file" made by adding 556.\" the changes I posted to the S5R2 version. 557.\" 558.\" Modified for Ian Darwin's version of the file command. 559